Leaderboard ODESIA v2 - Results
Odesia Core Tasks
Tasks | Spanish baseline | Best result Spanish | English baseline | Best result English | Gap |
---|---|---|---|---|---|
EXIST 2022: Sexism detection (ES) | 0.69 | 0.77 | 0.67 | 0.81 | 17% |
EXIST 2022: Sexism categorisation (ES) | 0.46 | 0.57 | 0.44 | 0.58 | 10% |
DIPROMATS 2023: Propaganda identification (ES) | 0.75 | 0.82 | 0.71 | 0.82 | 11% |
DIPROMATS 2023: Coarse propaganda characterization (ES) | 0.22 | 0.47 | 0.21 | 0.55 | 48% |
DIPROMATS 2023: Fine-grained propaganda characterization (ES) | 0.09 | 0.26 | 0.08 | 0.47 | 299% |
DIANN 2023: Disability detection (ES) | 0.75 | 0.84 | 0.67 | 0.79 | 1% |
EXIST-2023: Sexism identification (ES) | 0.47 | 0.64 | 0.44 | 0.64 | 10% |
EXIST-2023: Source Intention (ES) | 0.25 | 0.42 | 0.22 | 0.36 | -4% |
EXIST-2023: Sexism categorization (ES) | 0.22 | 0.40 | 0.21 | 0.40 | 12% |
SQAC-SQUAD 2024: Question answering (ES) | 0.13 | 0.46 | 0.12 | 0.46 | 19% |
# | System | Arithmetic mean | EXIST 2022: Sexism detection (ES) | EXIST 2022: Sexism categorisation (ES) | DIPROMATS 2023: Propaganda identification (ES) | DIPROMATS 2023: Coarse propaganda characterization (ES) | DIPROMATS 2023: Fine-grained propaganda characterization (ES) | DIANN 2023: Disability detection (ES) | EXIST-2023: Sexism identification (ES) | EXIST-2023: Source Intention (ES) | EXIST-2023: Sexism categorization (ES) | SQAC-SQUAD 2024: Question answering (ES) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | distilbert-base-multilingual-cased | 0.459 | 0.72 | 0.47 | 0.75 | 0.34 | 0.09 | 0.78 | 0.57 | 0.36 | 0.29 | 0.22 |
2 | distillbert-base-spanish-uncased | 0.473 | 0.72 | 0.51 | 0.77 | 0.34 | 0.07 | 0.75 | 0.60 | 0.39 | 0.33 | 0.25 |
3 | xlm-roberta-base | 0.515 | 0.74 | 0.50 | 0.79 | 0.47 | 0.10 | 0.84 | 0.62 | 0.40 | 0.32 | 0.37 |
4 | ixambert-base-cased | 0.485 | 0.71 | 0.49 | 0.77 | 0.32 | 0.06 | 0.83 | 0.60 | 0.37 | 0.34 | 0.36 |
5 | bert-base-multilingual-cased | 0.488 | 0.72 | 0.47 | 0.78 | 0.35 | 0.10 | 0.84 | 0.60 | 0.37 | 0.33 | 0.32 |
6 | bert-base-spanish-wwm-cased | 0.524 | 0.72 | 0.54 | 0.79 | 0.44 | 0.14 | 0.81 | 0.63 | 0.39 | 0.37 | 0.41 |
7 | PlanTL-GOB-ES-roberta-base-bne | 0.521 | 0.74 | 0.56 | 0.81 | 0.42 | 0.12 | 0.75 | 0.63 | 0.40 | 0.37 | 0.41 |
8 | bertin-roberta-base-spanish | 0.493 | 0.73 | 0.49 | 0.76 | 0.36 | 0.08 | 0.75 | 0.62 | 0.39 | 0.33 | 0.42 |
9 | PlanTL-GOB-ES-roberta-large-bne | 0.552 | 0.75 | 0.57 | 0.82 | 0.44 | 0.24 | 0.82 | 0.64 | 0.40 | 0.38 | 0.46 |
10 | xlm-roberta-large | 0.564 | 0.77 | 0.56 | 0.82 | 0.47 | 0.26 | 0.84 | 0.64 | 0.42 | 0.40 | 0.46 |
# | System | Arithmetic mean | EXIST 2022: Sexism detection (EN) | EXIST 2022: Sexism categorisation (EN) | DIANN 2023: Disability detection (EN) | DIPROMATS 2023: Propaganda identification (EN) | DIPROMATS 2023: Coarse propaganda characterization (EN) | DIPROMATS 2023: Fine-grained propaganda characterization (EN) | EXIST-2023: Sexism categorization (EN) | EXIST-2023: Sexism identification (EN) | EXIST-2023: Source intention (EN) | SQAC-SQUAD 2024: Question answering (EN) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | bert-base-multilingual-cased | 0.501 | 0.76 | 0.50 | 0.73 | 0.80 | 0.48 | 0.18 | 0.34 | 0.60 | 0.32 | 0.30 |
2 | distilbert-base-multilingual-cased | 0.472 | 0.74 | 0.53 | 0.68 | 0.77 | 0.45 | 0.16 | 0.30 | 0.58 | 0.31 | 0.20 |
3 | distilbert-base-uncased | 0.497 | 0.77 | 0.55 | 0.66 | 0.78 | 0.47 | 0.14 | 0.37 | 0.62 | 0.34 | 0.27 |
4 | bert-base-cased | 0.513 | 0.76 | 0.53 | 0.72 | 0.81 | 0.50 | 0.21 | 0.37 | 0.61 | 0.32 | 0.30 |
5 | ixambert-base-cased | 0.503 | 0.75 | 0.53 | 0.73 | 0.78 | 0.49 | 0.14 | 0.36 | 0.61 | 0.32 | 0.32 |
6 | xlm-roberta-base | 0.517 | 0.76 | 0.53 | 0.76 | 0.80 | 0.54 | 0.16 | 0.35 | 0.62 | 0.32 | 0.33 |
7 | roberta-base | 0.530 | 0.78 | 0.53 | 0.75 | 0.81 | 0.52 | 0.19 | 0.38 | 0.63 | 0.33 | 0.38 |
8 | xlm-roberta-large | 0.565 | 0.79 | 0.56 | 0.78 | 0.81 | 0.52 | 0.39 | 0.39 | 0.63 | 0.36 | 0.42 |
9 | roberta-large | 0.587 | 0.81 | 0.58 | 0.79 | 0.82 | 0.55 | 0.47 | 0.40 | 0.64 | 0.35 | 0.46 |
Odesia Extended Tasks
Tasks | Spanish baseline | Best result Spanish | English baseline | Best result English | Gap |
---|---|---|---|---|---|
MLDOC 2018: Document classification (ES) | 0.93 | 0.96 | 0.88 | 0.98 | 40% |
Multilingual Complex Named Entity Recognition 2022 (ES) | 0.52 | 0.71 | 0.55 | 0.75 | 5% |
SQAC-SQUAD 2016: Question answering (ES) | 0.53 | 0.77 | 0.52 | 0.88 | 25% |
Semantic Textual Similarity 2017 (ES) | 0.68 | 0.81 | 0.70 | 0.86 | 13% |
# | System | Arithmetic mean | MLDOC 2018: Document classification (ES) | Multilingual Complex Named Entity Recognition 2022 (ES) | SQAC-SQUAD 2016: Question answering (ES) | Semantic Textual Similarity 2017 (ES) |
---|---|---|---|---|---|---|
1 | ixambert-base-cased | 0.778 | 0.96 | 0.63 | 0.71 | 0.81 |
2 | bertin-roberta-base-spanish | 0.745 | 0.96 | 0.62 | 0.73 | 0.67 |
3 | distilbert-base-multilingual-cased | 0.698 | 0.94 | 0.61 | 0.55 | 0.69 |
4 | bert-base-multilingual-cased | 0.753 | 0.96 | 0.64 | 0.71 | 0.70 |
5 | xlm-roberta-base | 0.753 | 0.95 | 0.66 | 0.67 | 0.73 |
6 | distillbert-base-spanish-uncased | 0.710 | 0.96 | 0.61 | 0.53 | 0.74 |
7 | PlanTL-GOB-ES-roberta-base-bne | 0.773 | 0.96 | 0.64 | 0.74 | 0.75 |
8 | PlanTL-GOB-ES-roberta-large-bne | 0.780 | 0.96 | 0.63 | 0.77 | 0.76 |
9 | bert-base-spanish-wwm-cased | 0.773 | 0.96 | 0.63 | 0.71 | 0.79 |
10 | xlm-roberta-large | 0.810 | 0.96 | 0.71 | 0.77 | 0.80 |
# | System | Arithmetic mean | MLDOC 2018: Document classification (EN) | Multilingual Complex Named Entity Recognition 2022 (EN) | SQAC-SQUAD 2016: Question answering (EN) | Semantic Textual Similarity 2017 (EN) |
---|---|---|---|---|---|---|
1 | bert-base-multilingual-cased | 0.813 | 0.97 | 0.67 | 0.81 | 0.80 |
2 | ixambert-base-cased | 0.813 | 0.98 | 0.65 | 0.80 | 0.82 |
3 | distilbert-base-multilingual-cased | 0.778 | 0.97 | 0.63 | 0.75 | 0.76 |
4 | xlm-roberta-base | 0.818 | 0.98 | 0.69 | 0.80 | 0.80 |
5 | distilbert-base-uncased | 0.805 | 0.97 | 0.67 | 0.77 | 0.81 |
6 | bert-base-cased | 0.813 | 0.97 | 0.68 | 0.78 | 0.82 |
7 | roberta-base | 0.845 | 0.98 | 0.70 | 0.85 | 0.85 |
8 | roberta-large | 0.868 | 0.98 | 0.75 | 0.88 | 0.86 |
9 | xlm-roberta-large | 0.855 | 0.98 | 0.74 | 0.86 | 0.84 |