SuperGLUE

SuperGLUE aggregates tasks such as CB, RTE, BoolQ, MultiRC, WiC, WSC, COPA, and ReCoRD, and reports an overall score to summarize general NLU competence. The problems minimize annotation artifacts and require nuanced reasoning, coreference resolution, and lexical semantics.

Model cards commonly cite SuperGLUE as a holistic indicator of progress in language understanding beyond single-task fine-tuning or pattern exploitation.