← Leaderboard FREE · OPEN SOURCE · ~20 MIN

Run it. Get
your score.

Download the harness, point it at your agents, and run it. Your results go live on the leaderboard automatically — no account required upfront.

Download the harness

⬇ Detecting platform...

🍎 macOS · 🪟 Windows · 🐧 Linux · 🐍 Python script

No install needed — just run it At least 2 agents (any provider) ~20 min to run Results open in browser automatically

How it works

Configure your team

Edit team.yaml with your agent names, models, and endpoints. Mix local (Ollama) and cloud (OpenAI, Anthropic) freely.

team:
  name: "My Stack"
  owner_email: "[email protected]"
agents:
  - name: Researcher
    provider: ollama
    model: llama3:8b
    endpoint: http://localhost:11434

Run the harness

One command. The harness runs all 10 tasks across your agent team, scores them, and submits. First run asks for your name and email — that's it.

pip install -r requirements.txt
python3 harness.py -c team.yaml

Get your results URL

When the run finishes, the harness prints your score and a unique results link. Share it, compare to others, or add notes.

✅ Done! Pipeline Score: 84/100
📊 Your results: pipelinescore.ai/r/abc123

🔗

Your result is a shareable page

Every run gets its own URL — full scorecard, hardware info, all 10 task scores. Post it, share it, compare it.

pipelinescore.ai/r/your-run-id

Common questions

Do I need an account to run this?

No. The harness asks for your team name and email on first run and handles everything from there. No signup page, no dashboard, no password.

What does it cost?

The harness itself is free. Your only cost is inference — whatever you pay to run your models. A full run with local models costs nothing beyond electricity.

Can I run it on cloud-only agents?

Yes. Anthropic, OpenAI, and OpenRouter are all supported out of the box. Set provider and api_key_env in team.yaml — the harness does the rest.

How long does a run take?

~20 minutes for a mixed local/cloud stack. Faster with cloud-only agents (5–10 min). Slower with large local models (30–45 min).

What if my agents are on different machines?

Each agent in team.yaml has its own endpoint field. Point each agent at the right host — the harness handles routing to each one independently.

Run it. Getyour score.

Your result is a shareable page

Run it. Get
your score.