← Back to leaderboard
local
gemma4:12b-it-qat_gpu
Released Context 0Kgemma4-12b-it-qat_gpu
PipelineScore
80.9MAINLINERanked #1 of 1 models · 100th percentileRAG is the headline (100.0); throughput is the soft spot (0.0). Best-fit profile: Agentic.
Category breakdown
Score per category, normalized 0–100 against the v1 anchor.
Code
87.5
Reason
80.0
Tool Use
87.5
RAG
100.0
Speed
0.0
Strengths
RAG100.0
Code87.5
Tool Use87.5
Sample tasks
A taste of what the test pack measures. Full prompts are private and rotated daily.
CodeDifficulty 1code-fib-1
Fibonacci function
Write a Python `fib(n)` returning the nth Fibonacci number, O(n).
ReasonDifficulty 1reason-math-1
Train meeting time
Two trains, opposite directions, given speeds and start times — when do they meet?
RAGDifficulty 2rag-extract-1
Extract metrics to JSON
From the context, extract net sales, operating margin, and free cash flow as a JSON object. Numbers only.
Tool UseDifficulty 2tool-schema-1
OpenAPI param selection
Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'
RAGDifficulty 2rag-grounding-1
Refuses to fabricate
Context lacks the answer — does the model fabricate or correctly say it can't?