About AI Model Benchmarks

How It Works

This dashboard aggregates benchmark data for ~30 AI models from multiple authoritative sources:

  • Artificial Analysis — Performance metrics (speed, latency, throughput)
  • LMSys Arena — Human preference rankings and Elo scores
  • Official Model Cards — Standardized benchmark scores published by providers
  • Independent Evaluations — Third-party benchmarks like MMLU-Pro, GPQA, LiveCodeBench

Data Format

All model data is stored as individual Markdown files with YAML frontmatter in content/models/. This makes it easy to add, update, or remove models by simply editing text files. The dashboard reads this data at build time.

Scores Explained

Intelligence ScoreA composite metric (0-100) reflecting overall reasoning and knowledge capability across standard benchmarks.
Coding ScoreDerived from HumanEval, LiveCodeBench, and related coding benchmarks.
MMLUMassive Multitask Language Understanding — 57 subjects from STEM to humanities.
GPQAGraduate-level PhD-style science questions in physics, chemistry, and biology.
MATH-500Competition-level mathematics problems across algebra, geometry, number theory, and more.

Adding / Editing Models

To add a new model, create a new Markdown file in content/models/ following the frontmatter schema defined in content.config.ts. The model will automatically appear on the dashboard after rebuild.

To update a model's benchmarks, edit its YAML frontmatter values and rebuild.

Technical Stack

  • Nuxt 4 + Nuxt Content v3
  • Vue 3 Composition API (script setup)
  • TypeScript (strict mode)
  • Tailwind CSS v4
  • SQLite (via Node 22+ native module)
  • Static Site Generation (SSG)