PipelineScore
← Back to user leaderboard
User profile

bench-rat

12 runsFirst seen 2026-04-12Avg 67.6
Best PipelineScore
84.5MAINLINE
Total tokens63.6KAcross every task this user has run
Avg latency1455msPer task, across all submissions
Tasks run30012 submissions x ~25 tasks
Rigs used11Distinct hardware tags

Category signature

Average score per category across all 12 runs.

Code
68.8
Reason
67.9
Write
68.2
Tool Use
68.7
RAG
66.9
Speed
63.3

Hardware mix

Rigs this user benchmarked on.

2x-rtx-40902 (17%)
m3-ultra-256gb1 (8%)
a100-80gb1 (8%)
rtx-3090-24gb1 (8%)
rtx-4070-12gb1 (8%)
m3-max-64gb1 (8%)
rtx-4080-16gb1 (8%)
macbook-pro-m3-pro-18gb1 (8%)
ryzen-7950x-cpu-only1 (8%)
m2-pro-16gb1 (8%)
rtx-3060-12gb1 (8%)

Provider mix

Where they spend their tokens.

meta3 (25%)
alibaba2 (17%)
deepseek1 (8%)
microsoft1 (8%)
zhipu1 (8%)
cognitivecomputations1 (8%)
google1 (8%)
yi1 (8%)
huggingface1 (8%)

Models tried

Best score per model. Click a model to see its full page.

#ModelBest ScoreTier
1DeepSeek Coder V2 236B84.5MAINLINE
2Llama 3.1 70B Instruct79.1MAINLINE
3Qwen 2.5 VL 72B77.7MAINLINE
4Qwen 3 32B Instruct77.0MAINLINE
5WizardLM 2 8x22B74.5FEEDER
6GLM 4 9B Chat70.0FEEDER
7Code Llama 34B Instruct69.5FEEDER
8Llama 3.1 8B Instruct66.7FEEDER
9Dolphin 2.9 Llama 3 8B63.6FEEDER
10CodeGemma 7B60.9FEEDER
11Yi 1.5 6B Chat58.1TAP
12SmolLM 1.7B29.0DRIP

All submissions

Every run, ordered by score.

#ModelScoreTier
1DeepSeek Coder V2 236B84.5MAINLINE
2Llama 3.1 70B Instruct79.1MAINLINE
3Qwen 2.5 VL 72B77.7MAINLINE
4Qwen 3 32B Instruct77.0MAINLINE
5WizardLM 2 8x22B74.5FEEDER
6GLM 4 9B Chat70.0FEEDER
7Code Llama 34B Instruct69.5FEEDER
8Llama 3.1 8B Instruct66.7FEEDER
9Dolphin 2.9 Llama 3 8B63.6FEEDER
10CodeGemma 7B60.9FEEDER
11Yi 1.5 6B Chat58.1TAP
12SmolLM 1.7B29.0DRIP