DIANN-2023-EN | Leaderboard

The corpus contains abstracts of scientific articles from Elsevier journals belonging to the biomedical domain. Specifically, the texts were collected between 2017 and 2018. The corpus is provided in two partitions, a training and an evaluation partition. The training partition contains 500 texts. These texts correspond to the training and evaluation partitions made public for the DIANN competition at IberLEF 2018. In addition, a private test partition containing 100 texts is provided. Since this is the partition used to evaluate systems on the ODESIA Leaderboard, this partition will not be made public. All disabilities mentioned in the texts have been annotated in the corpus.

Language(s)

English

Year

2023

Domain

Health

Text types

Abstracts scientific articles

Format

json

NLP Topic

(named) entity recognition

Number of units

600

Type of units

Documents

Tokens

108412

Documents

600

Training set size

500

Test set size

100

Log in or register to post comments