About AI Model Benchmarks

How It Works

This dashboard aggregates benchmark data for ~30 AI models from multiple authoritative sources:

Artificial Analysis — Performance metrics (speed, latency, throughput)
LMSys Arena — Human preference rankings and Elo scores
Official Model Cards — Standardized benchmark scores published by providers
Independent Evaluations — Third-party benchmarks like MMLU-Pro, GPQA, LiveCodeBench

Data Format

All model data is stored as individual Markdown files with YAML frontmatter in content/models/. This makes it easy to add, update, or remove models by simply editing text files. The dashboard reads this data at build time.

Scores Explained

Intelligence ScoreA composite metric (0-100) reflecting overall reasoning and knowledge capability across standard benchmarks.

Coding ScoreDerived from HumanEval, LiveCodeBench, and related coding benchmarks.

MMLUMassive Multitask Language Understanding — 57 subjects from STEM to humanities.

GPQAGraduate-level PhD-style science questions in physics, chemistry, and biology.

MATH-500Competition-level mathematics problems across algebra, geometry, number theory, and more.

Adding / Editing Models

To add a new model, create a new Markdown file in content/models/ following the frontmatter schema defined in content.config.ts. The model will automatically appear on the dashboard after rebuild.

To update a model's benchmarks, edit its YAML frontmatter values and rebuild.

Technical Stack

Nuxt 4 + Nuxt Content v3
Vue 3 Composition API (script setup)
TypeScript (strict mode)
Tailwind CSS v4
SQLite (via Node 22+ native module)
Static Site Generation (SSG)