← Back to user leaderboard
User profile
midnight-bencher
6 runsFirst seen 2026-04-12Avg 65.8
Total tokens32.1KAcross every task this user has run
Avg latency1011msPer task, across all submissions
Tasks run1506 submissions x ~25 tasks
Rigs used6Distinct hardware tags
Category signature
Average score per category across all 6 runs.
Code
67.5
Reason
63.6
Write
66.3
Tool Use
66.6
RAG
67.1
Speed
63.3
Hardware mix
Rigs this user benchmarked on.
m3-ultra-256gb1 (17%)
dgx-h1001 (17%)
m3-pro-36gb1 (17%)
rtx-3080-10gb1 (17%)
rtx-3060-12gb1 (17%)
m1-air-8gb1 (17%)
Provider mix
Where they spend their tokens.
meta3 (50%)
alibaba1 (17%)
community1 (17%)
google1 (17%)
Models tried
Best score per model. Click a model to see its full page.
| # | Model | Best Score | Tier |
|---|---|---|---|
| 1 | Llama 4 70B Instruct | 81.0 | MAINLINE |
| 2 | Qwen 2.5 VL 72B | 80.0 | MAINLINE |
| 3 | LLaVA OneVision 7B | 65.2 | FEEDER |
| 4 | Gemma 2 9B IT | 65.0 | FEEDER |
| 5 | Llama 3.2 3B Instruct | 54.4 | TAP |
All submissions
Every run, ordered by score.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 | Llama 4 70B Instruct | 81.0 | MAINLINE |
| 2 | Qwen 2.5 VL 72B | 80.0 | MAINLINE |
| 3 | LLaVA OneVision 7B | 65.2 | FEEDER |
| 4 | Gemma 2 9B IT | 65.0 | FEEDER |
| 5 | Llama 3.2 3B Instruct | 54.4 | TAP |
| 6 | Llama 3.2 3B Instruct | 49.4 | TAP |