PipelineScore
← Back to user leaderboard
User profile

local-llama-fan

15 runsFirst seen 2026-04-10Avg 64.5
Best PipelineScore
84.8MAINLINE
Total tokens82.7KAcross every task this user has run
Avg latency718msPer task, across all submissions
Tasks run37515 submissions x ~25 tasks
Rigs used9Distinct hardware tags

Category signature

Average score per category across all 15 runs.

Code
64.1
Reason
65.3
Write
63.7
Tool Use
64.6
RAG
65.4
Speed
63.9

Hardware mix

Rigs this user benchmarked on.

rtx-3060-12gb4 (27%)
m2-pro-16gb2 (13%)
macbook-pro-m3-pro-18gb2 (13%)
m3-air-16gb2 (13%)
h100-80gb1 (7%)
h200-141gb1 (7%)
m3-max-128gb1 (7%)
dgx-h1001 (7%)
m3-pro-36gb1 (7%)

Provider mix

Where they spend their tokens.

alibaba4 (27%)
google2 (13%)
mistral2 (13%)
nous1 (7%)
deepseek1 (7%)
cognitivecomputations1 (7%)
yi1 (7%)
zyphra1 (7%)
microsoft1 (7%)
community1 (7%)

Models tried

Best score per model. Click a model to see its full page.

#ModelBest ScoreTier
1Hermes 3 Llama 3.1 405B84.8MAINLINE
2DeepSeek R1 671B-A37B83.0MAINLINE
3Qwen 2.5 32B Instruct76.0MAINLINE
4Gemma 3 12B IT75.7MAINLINE
5Qwen 2.5 14B Instruct75.2MAINLINE
6Mixtral 8x7B Instruct69.2FEEDER
7Mistral Nemo 12B Instruct68.7FEEDER
8Dolphin 2.9 Llama 3 8B66.8FEEDER
9Yi 1.5 6B Chat61.1FEEDER
10Zamba 2 7B Instruct59.5TAP
11Qwen 2.5 3B Instruct58.8TAP
12Qwen 2.5 Coder 1.5B53.3TAP
13Phi 3 Mini 3.8B52.9TAP
14Gemma 2 2B IT51.2TAP
15TinyLlama 1.1B Chat30.8DRIP

All submissions

Every run, ordered by score.

#ModelScoreTier
1Hermes 3 Llama 3.1 405B84.8MAINLINE
2DeepSeek R1 671B-A37B83.0MAINLINE
3Qwen 2.5 32B Instruct76.0MAINLINE
4Gemma 3 12B IT75.7MAINLINE
5Qwen 2.5 14B Instruct75.2MAINLINE
6Mixtral 8x7B Instruct69.2FEEDER
7Mistral Nemo 12B Instruct68.7FEEDER
8Dolphin 2.9 Llama 3 8B66.8FEEDER
9Yi 1.5 6B Chat61.1FEEDER
10Zamba 2 7B Instruct59.5TAP
11Qwen 2.5 3B Instruct58.8TAP
12Qwen 2.5 Coder 1.5B53.3TAP
13Phi 3 Mini 3.8B52.9TAP
14Gemma 2 2B IT51.2TAP
15TinyLlama 1.1B Chat30.8DRIP