Leaderboard ODESIA v2 - Resultados

Odesia Core Tasks

Tareas	Spanish baseline	Mejor resultado en Español	Baseline Inglés	Mejor resultado en Inglés	Gap
EXIST 2022: Sexism detection (ES)	0.69	0.77	0.67	0.81	17%
EXIST 2022: Sexism categorisation (ES)	0.46	0.57	0.44	0.58	10%
DIPROMATS 2023: Propaganda identification (ES)	0.75	0.82	0.71	0.82	11%
DIPROMATS 2023: Coarse propaganda characterization (ES)	0.22	0.47	0.21	0.55	48%
DIPROMATS 2023: Fine-grained propaganda characterization (ES)	0.09	0.26	0.08	0.47	299%
DIANN 2023: Disability detection (ES)	0.75	0.84	0.67	0.79	1%
EXIST-2023: Sexism identification (ES)	0.47	0.64	0.44	0.64	10%
EXIST-2023: Source Intention (ES)	0.25	0.42	0.22	0.36	-4%
EXIST-2023: Sexism categorization (ES)	0.22	0.40	0.21	0.40	12%
SQAC-SQUAD 2024: Question answering (ES)	0.13	0.46	0.12	0.46	19%

#	Sistema	Media aritmética	EXIST 2022: Sexism detection (ES)	EXIST 2022: Sexism categorisation (ES)	DIPROMATS 2023: Propaganda identification (ES)	DIPROMATS 2023: Coarse propaganda characterization (ES)	DIPROMATS 2023: Fine-grained propaganda characterization (ES)	DIANN 2023: Disability detection (ES)	EXIST-2023: Sexism identification (ES)	EXIST-2023: Source Intention (ES)	EXIST-2023: Sexism categorization (ES)	SQAC-SQUAD 2024: Question answering (ES)
1	distilbert-base-multilingual-cased	0.459	0.72	0.47	0.75	0.34	0.09	0.78	0.57	0.36	0.29	0.22
2	distillbert-base-spanish-uncased	0.473	0.72	0.51	0.77	0.34	0.07	0.75	0.60	0.39	0.33	0.25
3	xlm-roberta-base	0.515	0.74	0.50	0.79	0.47	0.10	0.84	0.62	0.40	0.32	0.37
4	ixambert-base-cased	0.485	0.71	0.49	0.77	0.32	0.06	0.83	0.60	0.37	0.34	0.36
5	bert-base-multilingual-cased	0.488	0.72	0.47	0.78	0.35	0.10	0.84	0.60	0.37	0.33	0.32
6	bert-base-spanish-wwm-cased	0.524	0.72	0.54	0.79	0.44	0.14	0.81	0.63	0.39	0.37	0.41
7	PlanTL-GOB-ES-roberta-base-bne	0.521	0.74	0.56	0.81	0.42	0.12	0.75	0.63	0.40	0.37	0.41
8	bertin-roberta-base-spanish	0.493	0.73	0.49	0.76	0.36	0.08	0.75	0.62	0.39	0.33	0.42
9	PlanTL-GOB-ES-roberta-large-bne	0.552	0.75	0.57	0.82	0.44	0.24	0.82	0.64	0.40	0.38	0.46
10	xlm-roberta-large	0.564	0.77	0.56	0.82	0.47	0.26	0.84	0.64	0.42	0.40	0.46

#	Sistema	Media aritmética	EXIST 2022: Sexism detection (EN)	EXIST 2022: Sexism categorisation (EN)	DIANN 2023: Disability detection (EN)	DIPROMATS 2023: Propaganda identification (EN)	DIPROMATS 2023: Coarse propaganda characterization (EN)	DIPROMATS 2023: Fine-grained propaganda characterization (EN)	EXIST-2023: Sexism identification (ES)	EXIST-2023: Source Intention (ES)	EXIST-2023: Sexism categorization (ES)	EXIST-2023: Sexism categorization (EN)	EXIST-2023: Sexism identification (EN)	EXIST-2023: Source intention (EN)	SQAC-SQUAD 2024: Question answering (EN)
1	bert-base-multilingual-cased	0.485	0.76	0.50	0.73	0.80	0.48	0.18	0.60	0.37	0.33	0.34	0.60	0.32	0.30
2	distilbert-base-multilingual-cased	0.457	0.74	0.53	0.68	0.77	0.45	0.16	0.57	0.36	0.29	0.30	0.58	0.31	0.20
3	distilbert-base-uncased	0.382	0.77	0.55	0.66	0.78	0.47	0.14	0.37	0.62	0.34	0.27	0.00	0.00	0.00
4	bert-base-cased	0.395	0.76	0.53	0.72	0.81	0.50	0.21	0.37	0.61	0.32	0.30	0.00	0.00	0.00
5	ixambert-base-cased	0.488	0.75	0.53	0.73	0.78	0.49	0.14	0.60	0.37	0.34	0.36	0.61	0.32	0.32
6	xlm-roberta-base	0.501	0.76	0.53	0.76	0.80	0.54	0.16	0.62	0.40	0.32	0.35	0.62	0.32	0.33
7	roberta-base	0.408	0.78	0.53	0.75	0.81	0.52	0.19	0.38	0.63	0.33	0.38	0.00	0.00	0.00
8	xlm-roberta-large	0.547	0.79	0.56	0.78	0.81	0.52	0.39	0.64	0.42	0.40	0.39	0.63	0.36	0.42
9	roberta-large	0.452	0.81	0.58	0.79	0.82	0.55	0.47	0.40	0.64	0.35	0.46	0.00	0.00	0.00
10	distillbert-base-spanish-uncased	0.102	0.60	0.39	0.33	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
11	PlanTL-GOB-ES-roberta-base-bne	0.108	0.63	0.40	0.37	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
12	bertin-roberta-base-spanish	0.103	0.62	0.39	0.33	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
13	bert-base-spanish-wwm-cased	0.107	0.63	0.39	0.37	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
14	PlanTL-GOB-ES-roberta-large-bne	0.109	0.64	0.40	0.38	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Tareas Extended ODESIA

Tareas	Spanish baseline	Mejor resultado en Español	Baseline Inglés	Mejor resultado en Inglés	Gap
MLDOC 2018: Document classification (ES)	0.93	0.96	0.88	0.98	40%
Multilingual Complex Named Entity Recognition 2022 (ES)	0.52	0.71	0.55	0.75	5%
SQAC-SQUAD 2016: Question answering (ES)	0.53	0.77	0.52	0.88	25%
Semantic Textual Similarity 2017 (ES)	0.68	0.81	0.70	0.86	13%
DIANN 2018: Negation detection (ES)	0.75	0.96	0.42	0.92	93%

#	Sistema	Media aritmética	MLDOC 2018: Document classification (ES)	Multilingual Complex Named Entity Recognition 2022 (ES)	SQAC-SQUAD 2016: Question answering (ES)	Semantic Textual Similarity 2017 (ES)	DIANN 2018: Negation detection (ES)
1	xlm-roberta-base	0.772	0.95	0.66	0.67	0.73	0.85
2	xlm-roberta-large	0.832	0.96	0.71	0.77	0.80	0.92
3	bert-base-multilingual-cased	0.750	0.96	0.64	0.71	0.70	0.74
4	distilbert-base-multilingual-cased	0.724	0.94	0.61	0.55	0.69	0.83
5	PlanTL-GOB-ES-roberta-base-bne	0.792	0.96	0.64	0.74	0.75	0.87
6	PlanTL-GOB-ES-roberta-large-bne	0.730	0.96	0.63	0.77	0.76	0.53
7	bertin-roberta-base-spanish	0.772	0.96	0.62	0.73	0.67	0.88
8	bert-base-spanish-wwm-cased	0.810	0.96	0.63	0.71	0.79	0.96
9	distillbert-base-spanish-uncased	0.724	0.96	0.61	0.53	0.74	0.78
10	ixambert-base-cased	0.768	0.96	0.63	0.71	0.81	0.73

#	Sistema	Media aritmética	MLDOC 2018: Document classification (EN)	Multilingual Complex Named Entity Recognition 2022 (EN)	SQAC-SQUAD 2016: Question answering (EN)	Semantic Textual Similarity 2017 (EN)	DIANN 2018: Negation detection (EN)
1	ixambert-base-cased	0.804	0.98	0.65	0.80	0.82	0.77
2	bert-base-cased	0.784	0.97	0.68	0.78	0.82	0.67
3	distilbert-base-uncased	0.800	0.97	0.67	0.77	0.81	0.78
4	roberta-large	0.864	0.98	0.75	0.88	0.86	0.85
5	roberta-base	0.852	0.98	0.70	0.85	0.85	0.88
6	distilbert-base-multilingual-cased	0.774	0.97	0.63	0.75	0.76	0.76
7	xlm-roberta-large	0.868	0.98	0.74	0.86	0.84	0.92
8	xlm-roberta-base	0.808	0.98	0.69	0.80	0.80	0.77
9	bert-base-multilingual-cased	0.784	0.97	0.67	0.81	0.80	0.67