← Back to leaderboard
OpenAI
GPT-5.5
Released 2026-03-02Context 400Kgpt-5-5Lab Verified
PipelineScore
88.7MAINLINECategory breakdown
Score per category, normalized 0–100 against the v1 anchor.
Code
91.0
Reason
90.4
Write
87.8
Tool Use
88.6
RAG
86.2
Speed
87.5
Strengths
Code91.0
Reason90.4
Tool Use88.6
Sample tasks
A taste of what the test pack measures. Full prompts are private and rotated daily.
CodeDifficulty 1code-fib-1
Fibonacci function
Write a Python `fib(n)` returning the nth Fibonacci number, O(n).
ReasonDifficulty 1reason-math-1
Train meeting time
Two trains, opposite directions, given speeds and start times — when do they meet?
WriteDifficulty 1write-tagline-1
Five distinct taglines
Write 5 one-line taglines for a benchmark tool. <=10 words each, no numbering.
Tool UseDifficulty 2tool-schema-1
OpenAPI param selection
Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'
RAGDifficulty 2rag-grounding-1
Refuses to fabricate
Context lacks the answer — does the model fabricate or correctly say it can't?