alibaba

Qwen 2.5 Coder 1.5B

Released 2024-11-12Context 131.072Kqwen-2-5-coder-1-5b

PipelineScore

53.5TAP

An even spread: no standout, no liability (51 to 56 across all five categories). Best-fit profile: Local-first.

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code

51.3

Reason

53.5

Tool Use

51.3

RAG

56.1

Speed

53.5

RAG56.1

Reason53.5

Speed53.5

Every submission of Qwen 2.5 Coder 1.5B on the 0–100 scale. The spread is the point: where it runs changes what you get.

0255075100

Best 55.0 on m2-air-16gb · lowest 48.4 on snapdragon-x-elite-32gb · spread 6.6 pts across 4 runs. Hover a dot for its rig.

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Two trains, opposite directions, given speeds and start times — when do they meet?

RAGDifficulty 2rag-extract-1

From the context, extract net sales, operating margin, and free cash flow as a JSON object. Numbers only.

Tool UseDifficulty 2tool-schema-1

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Context lacks the answer — does the model fabricate or correctly say it can't?