Google

Gemini 2.5 Pro

Released 2026-01-28Context 2,000Kgemini-2-5-proLab Verified

PipelineScore

85.9MAINLINE

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code

84.4

Reason

87.7

Write

85.2

Tool Use

84.1

RAG

91.8

Speed

82.9

RAG91.8

Reason87.7

Write85.2

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Two trains, opposite directions, given speeds and start times — when do they meet?

WriteDifficulty 1write-tagline-1

Write 5 one-line taglines for a benchmark tool. <=10 words each, no numbering.

Tool UseDifficulty 2tool-schema-1

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Context lacks the answer — does the model fabricate or correctly say it can't?