PipelineScore
← Back to user leaderboard
User profile

inference-monk

15 runsFirst seen 2026-04-14Avg 68.2
Best PipelineScore
84.7MAINLINE
Total tokens78.8KAcross every task this user has run
Avg latency981msPer task, across all submissions
Tasks run37515 submissions x ~25 tasks
Rigs used11Distinct hardware tags

Category signature

Average score per category across all 15 runs.

Code
69.3
Reason
67.7
Write
68.1
Tool Use
66.6
RAG
69.7
Speed
67.4

Hardware mix

Rigs this user benchmarked on.

rtx-4070-12gb4 (27%)
m3-pro-36gb2 (13%)
cloud-api1 (7%)
m3-max-64gb1 (7%)
h200-141gb1 (7%)
m3-ultra-256gb1 (7%)
rtx-4080-16gb-offload1 (7%)
rtx-4080-16gb1 (7%)
rtx-3060-12gb1 (7%)
m2-air-16gb1 (7%)
macbook-pro-m3-pro-18gb1 (7%)

Provider mix

Where they spend their tokens.

deepseek2 (13%)
alibaba2 (13%)
nous2 (13%)
mistral2 (13%)
google2 (13%)
internlm1 (7%)
cognitivecomputations1 (7%)
community1 (7%)
cohere1 (7%)
bigcode1 (7%)

Models tried

Best score per model. Click a model to see its full page.

#ModelBest ScoreTier
1DeepSeek Coder V2 236B84.7MAINLINE
2DeepSeek R1 Distill Qwen 32B84.0MAINLINE
3Qwen 3 235B-A22B MoE83.0MAINLINE
4Hermes 3 Llama 3.1 70B77.2MAINLINE
5Mistral Small 24B Instruct73.2FEEDER
6Gemma 3 12B IT73.2FEEDER
7Mistral Nemo 12B Instruct71.0FEEDER
8InternLM 2.5 7B Chat66.6FEEDER
9Dolphin 3.0 Llama 3.1 8B65.2FEEDER
10Hermes 3 Llama 3.1 8B64.1FEEDER
11LLaVA OneVision 7B63.7FEEDER
12Aya 23 8B61.3FEEDER
13StarCoder2 7B56.7TAP
14Qwen 2.5 Coder 1.5B55.0TAP
15Gemma 2 2B IT44.0TAP

All submissions

Every run, ordered by score.

#ModelScoreTier
1DeepSeek Coder V2 236B84.7MAINLINE
2DeepSeek R1 Distill Qwen 32B84.0MAINLINE
3Qwen 3 235B-A22B MoE83.0MAINLINE
4Hermes 3 Llama 3.1 70B77.2MAINLINE
5Mistral Small 24B Instruct73.2FEEDER
6Gemma 3 12B IT73.2FEEDER
7Mistral Nemo 12B Instruct71.0FEEDER
8InternLM 2.5 7B Chat66.6FEEDER
9Dolphin 3.0 Llama 3.1 8B65.2FEEDER
10Hermes 3 Llama 3.1 8B64.1FEEDER
11LLaVA OneVision 7B63.7FEEDER
12Aya 23 8B61.3FEEDER
13StarCoder2 7B56.7TAP
14Qwen 2.5 Coder 1.5B55.0TAP
15Gemma 2 2B IT44.0TAP