CommonSenseQA 2

CommonSenseQA 2 extends classic commonsense QA with more diverse relations and more carefully constructed distractors, making superficial pattern matching less effective. Questions target practical knowledge about objects, social situations, and cause-and-effect, emphasizing the difference between plausible and correct answers.

The benchmark is designed to better reflect real-world ambiguity and to reduce annotation artifacts that can inflate scores. As a result, high performance often indicates genuine conceptual understanding rather than exploitation of dataset biases.

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

CommonSenseQA 2

CommonSenseQA 2

CommonSenseQA 2

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

CommonSenseQA 2

CommonSenseQA 2