PipelineScore
← Back to user leaderboard
User profile

gguf-pilgrim

12 runsFirst seen 2026-04-16Avg 69.1
Best PipelineScore
80.0MAINLINE
Total tokens67.1KAcross every task this user has run
Avg latency1221msPer task, across all submissions
Tasks run30012 submissions x ~25 tasks
Rigs used9Distinct hardware tags

Category signature

Average score per category across all 12 runs.

Code
68.3
Reason
70.5
Write
69.4
Tool Use
69.2
RAG
70.6
Speed
66.5

Hardware mix

Rigs this user benchmarked on.

cloud-api2 (17%)
rtx-4080-16gb-offload2 (17%)
macbook-pro-m3-pro-18gb2 (17%)
m3-max-128gb1 (8%)
2x-rtx-40901 (8%)
rtx-3090-24gb1 (8%)
rtx-4090-24gb1 (8%)
rtx-4080-16gb1 (8%)
m3-air-16gb1 (8%)

Provider mix

Where they spend their tokens.

alibaba2 (17%)
google2 (17%)
mistral2 (17%)
deepseek1 (8%)
nous1 (8%)
cohere1 (8%)
meta1 (8%)
internlm1 (8%)
bigcode1 (8%)

Models tried

Best score per model. Click a model to see its full page.

#ModelBest ScoreTier
1Qwen 2.5 Coder 32B80.0MAINLINE
2DeepSeek V3 671B-A37B79.6MAINLINE
3Hermes 3 Llama 3.1 70B78.1MAINLINE
4Gemma 3 27B IT76.9MAINLINE
5Devstral Small 24B75.6MAINLINE
6Mixtral 8x22B Instruct74.7FEEDER
7Gemma 2 27B IT74.7FEEDER
8Command R67.9FEEDER
9Llama 3.1 8B Instruct63.6FEEDER
10InternLM 2.5 7B Chat62.7FEEDER
11Qwen 2.5 1.5B Instruct48.1TAP
12StarCoder2 3B47.0TAP

All submissions

Every run, ordered by score.

#ModelScoreTier
1Qwen 2.5 Coder 32B80.0MAINLINE
2DeepSeek V3 671B-A37B79.6MAINLINE
3Hermes 3 Llama 3.1 70B78.1MAINLINE
4Gemma 3 27B IT76.9MAINLINE
5Devstral Small 24B75.6MAINLINE
6Mixtral 8x22B Instruct74.7FEEDER
7Gemma 2 27B IT74.7FEEDER
8Command R67.9FEEDER
9Llama 3.1 8B Instruct63.6FEEDER
10InternLM 2.5 7B Chat62.7FEEDER
11Qwen 2.5 1.5B Instruct48.1TAP
12StarCoder2 3B47.0TAP