Leaderboard ODESIA v2 - Resultados

Odesia Core Tasks


Tareas Spanish baseline Mejor resultado en Español Baseline Inglés Mejor resultado en Inglés Gap
EXIST 2022: Sexism detection (ES) 0.69 0.77 0.67 0.81 17%
EXIST 2022: Sexism categorisation (ES) 0.46 0.57 0.44 0.58 10%
DIPROMATS 2023: Propaganda identification (ES) 0.75 0.82 0.71 0.82 11%
DIPROMATS 2023: Coarse propaganda characterization (ES) 0.22 0.47 0.21 0.55 48%
DIPROMATS 2023: Fine-grained propaganda characterization (ES) 0.09 0.26 0.08 0.47 299%
DIANN 2023: Disability detection (ES) 0.75 0.84 0.67 0.79 1%
EXIST-2023: Sexism identification (ES) 0.47 0.64 0.44 0.64 10%
EXIST-2023: Source Intention (ES) 0.25 0.42 0.22 0.36 -4%
EXIST-2023: Sexism categorization (ES) 0.22 0.40 0.21 0.40 12%
SQAC-SQUAD 2024: Question answering (ES) 0.13 0.46 0.12 0.46 19%
# Sistema Media aritmética EXIST 2022: Sexism detection (ES) EXIST 2022: Sexism categorisation (ES) DIPROMATS 2023: Propaganda identification (ES) DIPROMATS 2023: Coarse propaganda characterization (ES) DIPROMATS 2023: Fine-grained propaganda characterization (ES) DIANN 2023: Disability detection (ES) EXIST-2023: Sexism identification (ES) EXIST-2023: Source Intention (ES) EXIST-2023: Sexism categorization (ES) SQAC-SQUAD 2024: Question answering (ES)
1 distilbert-base-multilingual-cased 0.459 0.72 0.47 0.75 0.34 0.09 0.78 0.57 0.36 0.29 0.22
2 distillbert-base-spanish-uncased 0.473 0.72 0.51 0.77 0.34 0.07 0.75 0.60 0.39 0.33 0.25
3 xlm-roberta-base 0.515 0.74 0.50 0.79 0.47 0.10 0.84 0.62 0.40 0.32 0.37
4 ixambert-base-cased 0.485 0.71 0.49 0.77 0.32 0.06 0.83 0.60 0.37 0.34 0.36
5 bert-base-multilingual-cased 0.488 0.72 0.47 0.78 0.35 0.10 0.84 0.60 0.37 0.33 0.32
6 bert-base-spanish-wwm-cased 0.524 0.72 0.54 0.79 0.44 0.14 0.81 0.63 0.39 0.37 0.41
7 PlanTL-GOB-ES-roberta-base-bne 0.521 0.74 0.56 0.81 0.42 0.12 0.75 0.63 0.40 0.37 0.41
8 bertin-roberta-base-spanish 0.493 0.73 0.49 0.76 0.36 0.08 0.75 0.62 0.39 0.33 0.42
9 PlanTL-GOB-ES-roberta-large-bne 0.552 0.75 0.57 0.82 0.44 0.24 0.82 0.64 0.40 0.38 0.46
10 xlm-roberta-large 0.564 0.77 0.56 0.82 0.47 0.26 0.84 0.64 0.42 0.40 0.46
# Sistema Media aritmética EXIST 2022: Sexism detection (EN) EXIST 2022: Sexism categorisation (EN) DIANN 2023: Disability detection (EN) DIPROMATS 2023: Propaganda identification (EN) DIPROMATS 2023: Coarse propaganda characterization (EN) DIPROMATS 2023: Fine-grained propaganda characterization (EN) EXIST-2023: Sexism categorization (EN) EXIST-2023: Sexism identification (EN) EXIST-2023: Source intention (EN) SQAC-SQUAD 2024: Question answering (EN)
1 bert-base-multilingual-cased 0.501 0.76 0.50 0.73 0.80 0.48 0.18 0.34 0.60 0.32 0.30
2 distilbert-base-multilingual-cased 0.472 0.74 0.53 0.68 0.77 0.45 0.16 0.30 0.58 0.31 0.20
3 distilbert-base-uncased 0.497 0.77 0.55 0.66 0.78 0.47 0.14 0.37 0.62 0.34 0.27
4 bert-base-cased 0.513 0.76 0.53 0.72 0.81 0.50 0.21 0.37 0.61 0.32 0.30
5 ixambert-base-cased 0.503 0.75 0.53 0.73 0.78 0.49 0.14 0.36 0.61 0.32 0.32
6 xlm-roberta-base 0.517 0.76 0.53 0.76 0.80 0.54 0.16 0.35 0.62 0.32 0.33
7 roberta-base 0.530 0.78 0.53 0.75 0.81 0.52 0.19 0.38 0.63 0.33 0.38
8 xlm-roberta-large 0.565 0.79 0.56 0.78 0.81 0.52 0.39 0.39 0.63 0.36 0.42
9 roberta-large 0.587 0.81 0.58 0.79 0.82 0.55 0.47 0.40 0.64 0.35 0.46

Tareas Extended ODESIA


Tareas Spanish baseline Mejor resultado en Español Baseline Inglés Mejor resultado en Inglés Gap
MLDOC 2018: Document classification (ES) 0.93 0.96 0.88 0.98 40%
Multilingual Complex Named Entity Recognition 2022 (ES) 0.52 0.71 0.55 0.75 5%
SQAC-SQUAD 2016: Question answering (ES) 0.53 0.77 0.52 0.88 25%
Semantic Textual Similarity 2017 (ES) 0.68 0.81 0.70 0.86 13%
DIANN 2018: Negation detection (ES) 0.75 0.96 0.42 0.92 93%
# Sistema Media aritmética MLDOC 2018: Document classification (ES) Multilingual Complex Named Entity Recognition 2022 (ES) SQAC-SQUAD 2016: Question answering (ES) Semantic Textual Similarity 2017 (ES) DIANN 2018: Negation detection (ES)
1 xlm-roberta-base 0.772 0.95 0.66 0.67 0.73 0.85
2 xlm-roberta-large 0.832 0.96 0.71 0.77 0.80 0.92
3 bert-base-multilingual-cased 0.750 0.96 0.64 0.71 0.70 0.74
4 distilbert-base-multilingual-cased 0.724 0.94 0.61 0.55 0.69 0.83
5 PlanTL-GOB-ES-roberta-base-bne 0.792 0.96 0.64 0.74 0.75 0.87
6 PlanTL-GOB-ES-roberta-large-bne 0.730 0.96 0.63 0.77 0.76 0.53
7 bertin-roberta-base-spanish 0.772 0.96 0.62 0.73 0.67 0.88
8 bert-base-spanish-wwm-cased 0.810 0.96 0.63 0.71 0.79 0.96
9 distillbert-base-spanish-uncased 0.724 0.96 0.61 0.53 0.74 0.78
10 ixambert-base-cased 0.768 0.96 0.63 0.71 0.81 0.73
# Sistema Media aritmética MLDOC 2018: Document classification (EN) Multilingual Complex Named Entity Recognition 2022 (EN) SQAC-SQUAD 2016: Question answering (EN) Semantic Textual Similarity 2017 (EN) DIANN 2018: Negation detection (EN)
1 ixambert-base-cased 0.804 0.98 0.65 0.80 0.82 0.77
2 bert-base-cased 0.784 0.97 0.68 0.78 0.82 0.67
3 distilbert-base-uncased 0.800 0.97 0.67 0.77 0.81 0.78
4 roberta-large 0.864 0.98 0.75 0.88 0.86 0.85
5 roberta-base 0.852 0.98 0.70 0.85 0.85 0.88
6 distilbert-base-multilingual-cased 0.774 0.97 0.63 0.75 0.76 0.76
7 xlm-roberta-large 0.868 0.98 0.74 0.86 0.84 0.92
8 xlm-roberta-base 0.808 0.98 0.69 0.80 0.80 0.77
9 bert-base-multilingual-cased 0.784 0.97 0.67 0.81 0.80 0.67