APEX-Agents

APEX was created by Mercor to assess frontier AI models on professional tasks that would typically take seasoned practitioners hours to complete. The benchmark contains 400 test cases – 100 per domain – spanning investment banking, management consulting, big law, and primary care medicine. Each case consists of a task prompt reflecting real-world professional workflows, a set of source documents (averaging around 26,000 tokens per case), and a detailed grading rubric. The tasks and rubrics were designed by domain experts with an average of over seven years of industry experience, and each case underwent a secondary expert review for quality control.

Evaluation is rubric-based: each rubric decomposes response quality into discrete, objective criteria assessed as pass or fail – analogous to unit tests for code. Models receive each prompt eight times, and responses are graded by a judge LM against the expert-generated criteria. The final score represents the mean percentage of rubric criteria satisfied. The benchmark is designed so that tasks represent authentic, high-value deliverables – such as financial models, contract analyses, clinical assessments, and strategic recommendations – rather than abstract reasoning exercises.

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

APEX-Agents

APEX-Agents

APEX-Agents

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

APEX-Agents

APEX-Agents