Leaderboard ODESIA v2 - Resultados
Odesia Core Tasks
Tareas | Spanish baseline | Mejor resultado en Español | Baseline Inglés | Mejor resultado en Inglés | Gap |
---|---|---|---|---|---|
EXIST 2022: Sexism detection (ES) | 0.69 | 0.77 | 0.67 | 0.81 | 17% |
EXIST 2022: Sexism categorisation (ES) | 0.46 | 0.57 | 0.44 | 0.58 | 10% |
DIPROMATS 2023: Propaganda identification (ES) | 0.75 | 0.82 | 0.71 | 0.82 | 11% |
DIPROMATS 2023: Coarse propaganda characterization (ES) | 0.22 | 0.47 | 0.21 | 0.55 | 48% |
DIPROMATS 2023: Fine-grained propaganda characterization (ES) | 0.09 | 0.26 | 0.08 | 0.47 | 299% |
DIANN 2023: Disability detection (ES) | 0.75 | 0.84 | 0.67 | 0.79 | 1% |
EXIST-2023: Sexism identification (ES) | 0.47 | 0.64 | 0.44 | 0.64 | 10% |
EXIST-2023: Source Intention (ES) | 0.25 | 0.42 | 0.22 | 0.36 | -4% |
EXIST-2023: Sexism categorization (ES) | 0.22 | 0.40 | 0.21 | 0.40 | 12% |
SQAC-SQUAD 2024: Question answering (ES) | 0.13 | 0.46 | 0.12 | 0.46 | 19% |
# | Sistema | Media aritmética | EXIST 2022: Sexism detection (ES) | EXIST 2022: Sexism categorisation (ES) | DIPROMATS 2023: Propaganda identification (ES) | DIPROMATS 2023: Coarse propaganda characterization (ES) | DIPROMATS 2023: Fine-grained propaganda characterization (ES) | DIANN 2023: Disability detection (ES) | EXIST-2023: Sexism identification (ES) | EXIST-2023: Source Intention (ES) | EXIST-2023: Sexism categorization (ES) | SQAC-SQUAD 2024: Question answering (ES) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | distilbert-base-multilingual-cased | 0.459 | 0.72 | 0.47 | 0.75 | 0.34 | 0.09 | 0.78 | 0.57 | 0.36 | 0.29 | 0.22 |
2 | distillbert-base-spanish-uncased | 0.473 | 0.72 | 0.51 | 0.77 | 0.34 | 0.07 | 0.75 | 0.60 | 0.39 | 0.33 | 0.25 |
3 | xlm-roberta-base | 0.515 | 0.74 | 0.50 | 0.79 | 0.47 | 0.10 | 0.84 | 0.62 | 0.40 | 0.32 | 0.37 |
4 | ixambert-base-cased | 0.485 | 0.71 | 0.49 | 0.77 | 0.32 | 0.06 | 0.83 | 0.60 | 0.37 | 0.34 | 0.36 |
5 | bert-base-multilingual-cased | 0.488 | 0.72 | 0.47 | 0.78 | 0.35 | 0.10 | 0.84 | 0.60 | 0.37 | 0.33 | 0.32 |
6 | bert-base-spanish-wwm-cased | 0.524 | 0.72 | 0.54 | 0.79 | 0.44 | 0.14 | 0.81 | 0.63 | 0.39 | 0.37 | 0.41 |
7 | PlanTL-GOB-ES-roberta-base-bne | 0.521 | 0.74 | 0.56 | 0.81 | 0.42 | 0.12 | 0.75 | 0.63 | 0.40 | 0.37 | 0.41 |
8 | bertin-roberta-base-spanish | 0.493 | 0.73 | 0.49 | 0.76 | 0.36 | 0.08 | 0.75 | 0.62 | 0.39 | 0.33 | 0.42 |
9 | PlanTL-GOB-ES-roberta-large-bne | 0.552 | 0.75 | 0.57 | 0.82 | 0.44 | 0.24 | 0.82 | 0.64 | 0.40 | 0.38 | 0.46 |
10 | xlm-roberta-large | 0.564 | 0.77 | 0.56 | 0.82 | 0.47 | 0.26 | 0.84 | 0.64 | 0.42 | 0.40 | 0.46 |
# | Sistema | Media aritmética | EXIST 2022: Sexism detection (EN) | EXIST 2022: Sexism categorisation (EN) | DIANN 2023: Disability detection (EN) | DIPROMATS 2023: Propaganda identification (EN) | DIPROMATS 2023: Coarse propaganda characterization (EN) | DIPROMATS 2023: Fine-grained propaganda characterization (EN) | EXIST-2023: Sexism identification (ES) | EXIST-2023: Source Intention (ES) | EXIST-2023: Sexism categorization (ES) | EXIST-2023: Sexism categorization (EN) | EXIST-2023: Sexism identification (EN) | EXIST-2023: Source intention (EN) | SQAC-SQUAD 2024: Question answering (EN) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | bert-base-multilingual-cased | 0.485 | 0.76 | 0.50 | 0.73 | 0.80 | 0.48 | 0.18 | 0.60 | 0.37 | 0.33 | 0.34 | 0.60 | 0.32 | 0.30 |
2 | distilbert-base-multilingual-cased | 0.457 | 0.74 | 0.53 | 0.68 | 0.77 | 0.45 | 0.16 | 0.57 | 0.36 | 0.29 | 0.30 | 0.58 | 0.31 | 0.20 |
3 | distilbert-base-uncased | 0.382 | 0.77 | 0.55 | 0.66 | 0.78 | 0.47 | 0.14 | 0.37 | 0.62 | 0.34 | 0.27 | 0.00 | 0.00 | 0.00 |
4 | bert-base-cased | 0.395 | 0.76 | 0.53 | 0.72 | 0.81 | 0.50 | 0.21 | 0.37 | 0.61 | 0.32 | 0.30 | 0.00 | 0.00 | 0.00 |
5 | ixambert-base-cased | 0.488 | 0.75 | 0.53 | 0.73 | 0.78 | 0.49 | 0.14 | 0.60 | 0.37 | 0.34 | 0.36 | 0.61 | 0.32 | 0.32 |
6 | xlm-roberta-base | 0.501 | 0.76 | 0.53 | 0.76 | 0.80 | 0.54 | 0.16 | 0.62 | 0.40 | 0.32 | 0.35 | 0.62 | 0.32 | 0.33 |
7 | roberta-base | 0.408 | 0.78 | 0.53 | 0.75 | 0.81 | 0.52 | 0.19 | 0.38 | 0.63 | 0.33 | 0.38 | 0.00 | 0.00 | 0.00 |
8 | xlm-roberta-large | 0.547 | 0.79 | 0.56 | 0.78 | 0.81 | 0.52 | 0.39 | 0.64 | 0.42 | 0.40 | 0.39 | 0.63 | 0.36 | 0.42 |
9 | roberta-large | 0.452 | 0.81 | 0.58 | 0.79 | 0.82 | 0.55 | 0.47 | 0.40 | 0.64 | 0.35 | 0.46 | 0.00 | 0.00 | 0.00 |
10 | distillbert-base-spanish-uncased | 0.102 | 0.60 | 0.39 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
11 | PlanTL-GOB-ES-roberta-base-bne | 0.108 | 0.63 | 0.40 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
12 | bertin-roberta-base-spanish | 0.103 | 0.62 | 0.39 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
13 | bert-base-spanish-wwm-cased | 0.107 | 0.63 | 0.39 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
14 | PlanTL-GOB-ES-roberta-large-bne | 0.109 | 0.64 | 0.40 | 0.38 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Tareas Extended ODESIA
Tareas | Spanish baseline | Mejor resultado en Español | Baseline Inglés | Mejor resultado en Inglés | Gap |
---|---|---|---|---|---|
MLDOC 2018: Document classification (ES) | 0.93 | 0.96 | 0.88 | 0.98 | 40% |
Multilingual Complex Named Entity Recognition 2022 (ES) | 0.52 | 0.71 | 0.55 | 0.75 | 5% |
SQAC-SQUAD 2016: Question answering (ES) | 0.53 | 0.77 | 0.52 | 0.88 | 25% |
Semantic Textual Similarity 2017 (ES) | 0.68 | 0.81 | 0.70 | 0.86 | 13% |
DIANN 2018: Negation detection (ES) | 0.75 | 0.96 | 0.42 | 0.92 | 93% |
# | Sistema | Media aritmética | MLDOC 2018: Document classification (ES) | Multilingual Complex Named Entity Recognition 2022 (ES) | SQAC-SQUAD 2016: Question answering (ES) | Semantic Textual Similarity 2017 (ES) | DIANN 2018: Negation detection (ES) |
---|---|---|---|---|---|---|---|
1 | xlm-roberta-base | 0.772 | 0.95 | 0.66 | 0.67 | 0.73 | 0.85 |
2 | xlm-roberta-large | 0.832 | 0.96 | 0.71 | 0.77 | 0.80 | 0.92 |
3 | bert-base-multilingual-cased | 0.750 | 0.96 | 0.64 | 0.71 | 0.70 | 0.74 |
4 | distilbert-base-multilingual-cased | 0.724 | 0.94 | 0.61 | 0.55 | 0.69 | 0.83 |
5 | PlanTL-GOB-ES-roberta-base-bne | 0.792 | 0.96 | 0.64 | 0.74 | 0.75 | 0.87 |
6 | PlanTL-GOB-ES-roberta-large-bne | 0.730 | 0.96 | 0.63 | 0.77 | 0.76 | 0.53 |
7 | bertin-roberta-base-spanish | 0.772 | 0.96 | 0.62 | 0.73 | 0.67 | 0.88 |
8 | bert-base-spanish-wwm-cased | 0.810 | 0.96 | 0.63 | 0.71 | 0.79 | 0.96 |
9 | distillbert-base-spanish-uncased | 0.724 | 0.96 | 0.61 | 0.53 | 0.74 | 0.78 |
10 | ixambert-base-cased | 0.768 | 0.96 | 0.63 | 0.71 | 0.81 | 0.73 |
# | Sistema | Media aritmética | MLDOC 2018: Document classification (EN) | Multilingual Complex Named Entity Recognition 2022 (EN) | SQAC-SQUAD 2016: Question answering (EN) | Semantic Textual Similarity 2017 (EN) | DIANN 2018: Negation detection (EN) |
---|---|---|---|---|---|---|---|
1 | ixambert-base-cased | 0.804 | 0.98 | 0.65 | 0.80 | 0.82 | 0.77 |
2 | bert-base-cased | 0.784 | 0.97 | 0.68 | 0.78 | 0.82 | 0.67 |
3 | distilbert-base-uncased | 0.800 | 0.97 | 0.67 | 0.77 | 0.81 | 0.78 |
4 | roberta-large | 0.864 | 0.98 | 0.75 | 0.88 | 0.86 | 0.85 |
5 | roberta-base | 0.852 | 0.98 | 0.70 | 0.85 | 0.85 | 0.88 |
6 | distilbert-base-multilingual-cased | 0.774 | 0.97 | 0.63 | 0.75 | 0.76 | 0.76 |
7 | xlm-roberta-large | 0.868 | 0.98 | 0.74 | 0.86 | 0.84 | 0.92 |
8 | xlm-roberta-base | 0.808 | 0.98 | 0.69 | 0.80 | 0.80 | 0.77 |
9 | bert-base-multilingual-cased | 0.784 | 0.97 | 0.67 | 0.81 | 0.80 | 0.67 |