The EXIST 2023 English corpus is a collection of tweets labelled with information related to sexism: whether the tweet is sexist, the type of intention of the author of the tweet shows and the type of sexism that is being exerted.
Language(s)
English
Dataset description link
Year
2023
Domain
Social
Text types
Tweets
Annotations
Binary label indicating whether a tweet expresses sexism, multiclass lables about the type of sexism and the intention of the author
Format
json
Data access
Registration
Data link
Publication
Plaza, L. et al. (2023). Overview of EXIST 2023 – Learning with Disagreement for Sexism Identification and Characterization. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_23
Publication link
NLP Topic
Number of units
4152
Type of units
Tweets
Training set size
2870
Test set size
838
Development set size
444