About HellaSwag

HellaSwag asks models to choose the most sensible continuation of a narrative, with distractors generated and filtered to be confoundingly plausible. The dataset draws on instructional and video description corpora, making success contingent on physical commonsense and knowledge of everyday sequences of actions.

By construction, HellaSwag resists shallow heuristics that earlier models exploited. It is a staple probe of commonsense reasoning and robustness to adversarially curated distractors.