HellaSwag

HellaSwag asks models to choose the most sensible continuation of a narrative, with distractors generated and filtered to be confoundingly plausible. The dataset draws on instructional and video description corpora, making success contingent on physical commonsense and knowledge of everyday sequences of actions.

By construction, HellaSwag resists shallow heuristics that earlier models exploited. It is a staple probe of commonsense reasoning and robustness to adversarially curated distractors.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

HellaSwag

HellaSwag

HellaSwag

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

HellaSwag

HellaSwag