TriviaQA combines questions with supporting web or Wikipedia evidence, requiring retrieval and reading comprehension to extract or generate precise answers. The dataset’s breadth and varied phrasing make it a strong probe of knowledge coverage and robustness to paraphrase.
Performance is typically reported with exact-match and F1 scores. As a result, TriviaQA is widely used in model reports to evaluate open-domain QA systems and retrieval-augmented generation.
Have a question? Noticed something wrong? Let us know.
An open-domain question answering benchmark with challenging trivia questions paired with evidence documents.