A reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. SQuAD contains 107,785 question-answer pairs on 536 articles.
Language(s)
English
Dataset description link
Year
2016
Domain
General
Text types
Encyclopedia entries
Annotations
Questions-answers
Publication
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
Publication link
NLP Topic
Number of units
107785