huggingface

SmolLM2 1.7B

Released 2024-11-04Context 8.192Ksmollm2-1-7b

PipelineScore

42.0TAP

An even spread: no standout, no liability (39 to 46 across all five categories). Best-fit profile: Coding.

Category breakdown

Score per category, normalized 0–100 against the v1 anchor.

Code

45.5

Reason

41.5

Tool Use

39.3

RAG

44.7

Speed

41.7

Code45.5

RAG44.7

Speed41.7

Every submission of SmolLM2 1.7B on the 0–100 scale. The spread is the point: where it runs changes what you get.

0255075100

Best 43.0 on m2-air-16gb · lowest 40.6 on lab-macbook-pro-m3-pro-18gb · spread 2.4 pts across 4 runs. Hover a dot for its rig.

A taste of what the test pack measures. Full prompts are private and rotated daily.

CodeDifficulty 1code-fib-1

Write a Python `fib(n)` returning the nth Fibonacci number, O(n).

ReasonDifficulty 1reason-math-1

Two trains, opposite directions, given speeds and start times — when do they meet?

RAGDifficulty 2rag-extract-1

From the context, extract net sales, operating margin, and free cash flow as a JSON object. Numbers only.

Tool UseDifficulty 2tool-schema-1

Given an OpenAPI schema with limit/offset/sort, fill JSON for 'next 50, recent first.'

RAGDifficulty 2rag-grounding-1

Context lacks the answer — does the model fabricate or correctly say it can't?