Generative Model Evaluation
EXIST 2024
The EXIST-2024 Memes dataset is specifically designed to foster and facilitate research on the automatic detection of sexism in visual content shared on social media. It forms the foundation for tasks 4, 5, and 6 of the EXIST-2024 challenge, focusing on the detailed analysis of memes as a medium for expressing, criticizing, or describing sexist behaviors. This dataset consists of over 5,000 labeled memes, carefully balanced linguistically between Spanish and English, enabling intercultural and linguistic comparative analysis. The data is divided into two main partitions: a training partition with 4,044 memes and a test partition with 1,053 memes, ensuring robust model evaluation and comparison.
The proposed tasks for this dataset structurally replicate those originally designed for text in tweets but are carefully adapted to visual content:
- Task 4 addresses binary identification, requiring systems to determine whether a specific meme is sexist or not. This task lays the essential groundwork for subsequent classification, acting as a fundamental initial filter.
- Task 5 delves into semantic and pragmatic analysis by determining the underlying intention behind the creation of a sexist meme. Due to the predominantly direct or critical nature of memes, this task is limited to the categories “DIRECT,” when the meme explicitly expresses sexism, and “JUDGEMENTAL,” when the meme criticizes or denounces sexist behaviors. The category “REPORTED” is practically excluded due to its low prevalence in visual content like memes.
- Finally, task 6 tackles a complex multi-label classification challenge, assigning each meme one or more categories that specifically describe the type or types of sexism represented. The available categories are “IDEOLOGICAL-INEQUALITY,” when the meme delegitimizes feminist movements or denies gender inequality; “STEREOTYPING-DOMINANCE,” for memes that promote stereotypes or male supremacy; “OBJECTIFICATION,” when the meme reduces women to objects or inappropriately emphasizes their physical attributes; “SEXUAL-VIOLENCE,” if it includes suggestions or threats of a sexual nature; and “MISOGYNY-NON-SEXUAL-VIOLENCE,” when it expresses hatred towards women without explicit sexual implications.
Each instance in the EXIST-2024 Memes dataset is represented by two main elements: an image and an associated text, automatically extracted using OCR techniques, allowing models to employ advanced multimodal methods for content analysis. The evaluation of participating models utilizes the specialized metric Information Contrast Measure (ICM), suitable for a soft evaluation context, considering intrinsic differences and potential subjectivity in annotations provided by multiple evaluators with diverse sociodemographic characteristics.

Model | EXIST 2024: Sexism Identification (memes)) | EXIST 2024: Source intention (memes) | EXIST 2024: Sexism Categorization (memes) |
---|---|---|---|
claude-3.5-sonnet | 0.523 | 0.290 | 0.269 |
gemini-2.0-flash | 0.523 | 0.370 | 0.311 |
gpt-4o | 0.541 | 0.351 | 0.202 |
Model | EXIST 2024: Sexism Identification (memes)) | EXIST 2024: Source intention (memes) | EXIST 2024: Sexism Categorization (memes) |
---|---|---|---|
claude-3.5-sonnet | 0.585 | 0.253 | 0.314 |
gemini-2.0-flash | 0.541 | 0.198 | 0.228 |
gpt-4o | 0.588 | 0.258 | 0.155 |
Task | Baseline ES | Best Result ES | Baseline EN | Best Result EN | EFF EN | EFF ES | GAP |
---|---|---|---|---|---|---|---|
EXIST 2024: Sexism Identification (memes) | 0.325 | 0.541 | 0.373 | 0.588 | 0.3429 | 0.32 | 0.0668 |
EXIST 2024: Source intention (memes) | 0.188 | 0.370 | 0.208 | 0.258 | 0.0631 | 0.2241 | -0.7183 |
EXIST 2024: Sexism Categorization (memes) | 0.081 | 0.311 | 0.067 | 0.314 | 0.2647 | 0.2503 | 0.0546 |