ARC AI2
A multiple-choice science benchmark assessing grade-school science knowledge and reasoning on questions from the AI2 ARC dataset.
About ARC AI2
The AI2 ARC benchmark contains elementary and middle-school science questions that require factual knowledge, commonsense, and multi-hop reasoning. While questions are short and recognizable to humans, they are intentionally constructed so that shallow cues are insufficient, prompting models to combine knowledge with reasoning to select the correct option.
ARC AI2 is widely used to measure science understanding under constrained context. Reported results typically aggregate across the Easy and Challenge splits, with the challenge portion emphasizing non-trivial reasoning and resistance to annotation artifacts.