PipelineScore
← Back to leaderboard
local

gemma4:12b-it-qat_gpu

Released Context 0Kgemma4-12b-it-qat_gpu
PipelineScore
80.9MAINLINE
Ranked #1 of 1 models · 100th percentileRAG is the headline (100.0); throughput is the soft spot (0.0). Best-fit profile: Agentic.

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code
87.5
Reason
80.0
Tool Use
87.5
RAG
100.0
Speed
0.0

Strengths

RAG100.0
Code87.5
Tool Use87.5

Sample tasks

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Fibonacci function

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Train meeting time

Two trains, opposite directions, given speeds and start times — when do they meet?

RAGDifficulty 2rag-extract-1

Extract metrics to JSON

From the context, extract net sales, operating margin, and free cash flow as a JSON object. Numbers only.

Tool UseDifficulty 2tool-schema-1

OpenAPI param selection

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Refuses to fabricate

Context lacks the answer — does the model fabricate or correctly say it can't?

Compare with