PipelineScore
← Back to leaderboard
Google

Gemini 2.5 Pro

Released 2026-01-28Context 2,000Kgemini-2-5-proLab Verified
PipelineScore
85.9MAINLINE

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code
84.4
Reason
87.7
Write
85.2
Tool Use
84.1
RAG
91.8
Speed
82.9

Strengths

RAG91.8
Reason87.7
Write85.2

Sample tasks

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Fibonacci function

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Train meeting time

Two trains, opposite directions, given speeds and start times — when do they meet?

WriteDifficulty 1write-tagline-1

Five distinct taglines

Write 5 one-line taglines for a benchmark tool. <=10 words each, no numbering.

Tool UseDifficulty 2tool-schema-1

OpenAPI param selection

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Refuses to fabricate

Context lacks the answer — does the model fabricate or correctly say it can't?

Compare with