local

gemma4:12b-it-qat_gpu

Released Context 0Kgemma4-12b-it-qat_gpu

PipelineScore

80.9MAINLINE

Ranked #1 of 1 models · 100th percentileRAG is the headline (100.0); throughput is the soft spot (0.0). Best-fit profile: Agentic.

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code

87.5

Reason

80.0

Tool Use

87.5

RAG

100.0

Speed

0.0

RAG100.0

Code87.5

Tool Use87.5

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Two trains, opposite directions, given speeds and start times — when do they meet?

RAGDifficulty 2rag-extract-1

From the context, extract net sales, operating margin, and free cash flow as a JSON object. Numbers only.

Tool UseDifficulty 2tool-schema-1

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Context lacks the answer — does the model fabricate or correctly say it can't?