A reading comprehension task on a dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. Systems must select the answer from all possible spans in the passage, thus needing to cope with a fairly large number of candidates.
Publication
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
Language
English
URL Task
NLP topic
Abstract task
Dataset
Year
2016
Publication link
Ranking metric
F1
Task results
System | F1 Sort ascending | Accuracy | MacroF1 | Pearson correlation | ICM |
---|---|---|---|---|---|
Roberta large | 0.8724 | 0.8724 | 0.8724 | 0.8724 | 0.87 |
Xlm roberta large | 0.8581 | 0.8581 | 0.8581 | 0.8581 | 0.86 |
Roberta base | 0.8427 | 0.8427 | 0.8427 | 0.8427 | 0.84 |
Ixa ehu ixambert base cased | 0.8187 | 0.8187 | 0.8187 | 0.8187 | 0.82 |
Bert base multilingual cased | 0.8059 | 0.8059 | 0.8059 | 0.8059 | 0.81 |
Xlm roberta base | 0.7998 | 0.7998 | 0.7998 | 0.7998 | 0.80 |
Bert base cased | 0.7968 | 0.7968 | 0.7968 | 0.7968 | 0.80 |
Distilbert base uncased | 0.7602 | 0.7602 | 0.7602 | 0.7602 | 0.76 |
Distilbert base multilingual cased | 0.7467 | 0.7467 | 0.7467 | 0.7467 | 0.75 |