SQAC | Leaderboard

The Spanish Question Answering Corpus (SQAC) is an extractive QA dataset with no unanswerable questions. It is created from texts extracted from the Spanish Wikipedia, encyclopedic articles, newswire articles from Wikinews, and the Spanish section of the AnCora corpus, which is a mix from different newswire and literature sources. It was created by commissioning the creation of 18,817 questions with the annotation of their answer spans from 6,247 textual contexts. The guidelines were adapted from SQuAD v1.1 (Rajpurkar et al., 2016), and the annotators were all native Spanish speakers with university studies in various fields related to linguistics. Following the XQuAD (Artetxe, Ruder, and Yogatama, 2019) structure, no additional answers were collected.

Language(s)

Spanish

Dataset description link

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405

Year

2022

Domain

General

News

Text types

Encyclopedia entries

News

Annotations

question-answer

Data access

Public

Data link

https://huggingface.co/datasets/PlanTL-GOB-ES/SQAC

Publication

Asier Gutiérrez Fandiño, Jordi Armengol-Estapé, Marc Pàmies, Joan Llop-Palao,Joaquín Silveira-Ocampo,Casimiro Pio Carrino, Carme Armentano-Oller, Carlos Rodriguez-Penagos, Aitor Gonzalez-Agirre, Marta Villegas (2016) Procesamiento del Lenguaje Natural, Revista nº 68, marzo de 2022, pp. 39-60.

Publication link

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405

NLP Topic

question answering

Number of units

8817

Log in or register to post comments