CL-bench

CL-bench, created by the Tencent Hunyuan team and Fudan University’s NLP group, probes “context learning”: whether a model can absorb genuinely new knowledge presented in its context at inference time and then reason with it. The knowledge is deliberately chosen to be absent from pretraining, including fictional legal systems, novel financial instruments, invented game rules, and empirically derived laws, so the benchmark measures learning from context rather than recall of memorized facts.

Tasks span four categories: domain knowledge reasoning, rule system application, procedural task execution, and empirical discovery & simulation.

Methodology

We source results from the public CL-bench leaderboard.

CL-bench contains 1,899 expert-authored tasks paired with 31,607 verification rubrics (about 63 rubrics per context), spanning 4 categories and 18 sub-categories. Scoring is binary per task: a task counts as solved only if the response passes all of its rubrics. The headline metric is the solving rate, the percentage of tasks fully solved, and our chart also exposes the per-category solving rates.

For full details, see the CL-bench paper and code.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

CL-bench

CL-bench

Methodology

CL-bench

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

CL-bench

CL-bench

Methodology