About AI Model Benchmarks
How It Works
This dashboard aggregates benchmark data for ~30 AI models from multiple authoritative sources:
- Artificial Analysis — Performance metrics (speed, latency, throughput)
- LMSys Arena — Human preference rankings and Elo scores
- Official Model Cards — Standardized benchmark scores published by providers
- Independent Evaluations — Third-party benchmarks like MMLU-Pro, GPQA, LiveCodeBench
Data Format
All model data is stored as individual Markdown files with YAML frontmatter in content/models/. This makes it easy to add, update, or remove models by simply editing text files. The dashboard reads this data at build time.
Scores Explained
Intelligence ScoreA composite metric (0-100) reflecting overall reasoning and knowledge capability across standard benchmarks.
Coding ScoreDerived from HumanEval, LiveCodeBench, and related coding benchmarks.
MMLUMassive Multitask Language Understanding — 57 subjects from STEM to humanities.
GPQAGraduate-level PhD-style science questions in physics, chemistry, and biology.
MATH-500Competition-level mathematics problems across algebra, geometry, number theory, and more.
Adding / Editing Models
To add a new model, create a new Markdown file in content/models/ following the frontmatter schema defined in content.config.ts. The model will automatically appear on the dashboard after rebuild.
To update a model's benchmarks, edit its YAML frontmatter values and rebuild.
Technical Stack
- Nuxt 4 + Nuxt Content v3
- Vue 3 Composition API (script setup)
- TypeScript (strict mode)
- Tailwind CSS v4
- SQLite (via Node 22+ native module)
- Static Site Generation (SSG)