PipelineScore
← Back to user leaderboard
User profile

mlx-mike

13 runsFirst seen 2026-04-13Avg 70.9
Best PipelineScore
77.8MAINLINE
Total tokens70.9KAcross every task this user has run
Avg latency1029msPer task, across all submissions
Tasks run32513 submissions x ~25 tasks
Rigs used11Distinct hardware tags

Category signature

Average score per category across all 13 runs.

Code
71.2
Reason
71.1
Write
70.6
Tool Use
71.6
RAG
70.9
Speed
69.2

Hardware mix

Rigs this user benchmarked on.

m3-pro-36gb2 (15%)
rtx-4070-12gb2 (15%)
h200-141gb1 (8%)
b200-192gb1 (8%)
m3-max-128gb1 (8%)
rtx-3060-12gb1 (8%)
a100-40gb1 (8%)
m3-max-64gb1 (8%)
ryzen-7950x-rtx-30901 (8%)
rtx-4090-24gb1 (8%)
ryzen-7950x-cpu-only1 (8%)

Provider mix

Where they spend their tokens.

alibaba3 (23%)
microsoft2 (15%)
mistral2 (15%)
yi2 (15%)
nous1 (8%)
google1 (8%)
upstage1 (8%)
meta1 (8%)

Models tried

Best score per model. Click a model to see its full page.

#ModelBest ScoreTier
1WizardLM 2 8x22B77.8MAINLINE
2Mixtral 8x22B Instruct77.4MAINLINE
3Codestral 22B76.9MAINLINE
4Qwen 3 14B Instruct74.8FEEDER
5Yi 1.5 34B Chat72.7FEEDER
6Hermes 3 Llama 3.1 8B69.7FEEDER
7Qwen 3 8B Instruct69.5FEEDER
8Gemma 2 27B IT69.0FEEDER
9SOLAR 10.7B Instruct66.7FEEDER
10Qwen 2.5 7B Instruct66.3FEEDER
11Code Llama 34B Instruct64.6FEEDER
12Phi 3 Small 7B63.9FEEDER

All submissions

Every run, ordered by score.

#ModelScoreTier
1WizardLM 2 8x22B77.8MAINLINE
2Mixtral 8x22B Instruct77.4MAINLINE
3Codestral 22B76.9MAINLINE
4Qwen 3 14B Instruct74.8FEEDER
5Yi 1.5 34B Chat72.7FEEDER
6Yi 1.5 34B Chat72.2FEEDER
7Hermes 3 Llama 3.1 8B69.7FEEDER
8Qwen 3 8B Instruct69.5FEEDER
9Gemma 2 27B IT69.0FEEDER
10SOLAR 10.7B Instruct66.7FEEDER
11Qwen 2.5 7B Instruct66.3FEEDER
12Code Llama 34B Instruct64.6FEEDER
13Phi 3 Small 7B63.9FEEDER