Leaderboard ODESIA v2 - Results

Odesia Core Tasks

Tasks	Spanish baseline	Best result Spanish	English baseline	Best result English	Gap
EXIST 2022: Sexism detection (ES)	0.69	0.77	0.67	0.81	17%
EXIST 2022: Sexism categorisation (ES)	0.46	0.57	0.44	0.58	10%
DIPROMATS 2023: Propaganda identification (ES)	0.75	0.82	0.71	0.82	11%
DIPROMATS 2023: Coarse propaganda characterization (ES)	0.22	0.47	0.21	0.55	48%
DIPROMATS 2023: Fine-grained propaganda characterization (ES)	0.09	0.26	0.08	0.47	299%
DIANN 2023: Disability detection (ES)	0.75	0.84	0.67	0.79	1%
EXIST-2023: Sexism identification (ES)	0.47	0.64	0.44	0.64	10%
EXIST-2023: Source Intention (ES)	0.25	0.42	0.22	0.36	-4%
EXIST-2023: Sexism categorization (ES)	0.22	0.40	0.21	0.40	12%
SQAC-SQUAD 2024: Question answering (ES)	0.13	0.46	0.12	0.46	19%

#	System	Arithmetic mean	EXIST 2022: Sexism detection (ES)	EXIST 2022: Sexism categorisation (ES)	DIPROMATS 2023: Propaganda identification (ES)	DIPROMATS 2023: Coarse propaganda characterization (ES)	DIPROMATS 2023: Fine-grained propaganda characterization (ES)	DIANN 2023: Disability detection (ES)	EXIST-2023: Sexism identification (ES)	EXIST-2023: Source Intention (ES)	EXIST-2023: Sexism categorization (ES)	SQAC-SQUAD 2024: Question answering (ES)
1	distilbert-base-multilingual-cased	0.459	0.72	0.47	0.75	0.34	0.09	0.78	0.57	0.36	0.29	0.22
2	distillbert-base-spanish-uncased	0.473	0.72	0.51	0.77	0.34	0.07	0.75	0.60	0.39	0.33	0.25
3	xlm-roberta-base	0.515	0.74	0.50	0.79	0.47	0.10	0.84	0.62	0.40	0.32	0.37
4	ixambert-base-cased	0.485	0.71	0.49	0.77	0.32	0.06	0.83	0.60	0.37	0.34	0.36
5	bert-base-multilingual-cased	0.488	0.72	0.47	0.78	0.35	0.10	0.84	0.60	0.37	0.33	0.32
6	bert-base-spanish-wwm-cased	0.524	0.72	0.54	0.79	0.44	0.14	0.81	0.63	0.39	0.37	0.41
7	PlanTL-GOB-ES-roberta-base-bne	0.521	0.74	0.56	0.81	0.42	0.12	0.75	0.63	0.40	0.37	0.41
8	bertin-roberta-base-spanish	0.493	0.73	0.49	0.76	0.36	0.08	0.75	0.62	0.39	0.33	0.42
9	PlanTL-GOB-ES-roberta-large-bne	0.552	0.75	0.57	0.82	0.44	0.24	0.82	0.64	0.40	0.38	0.46
10	xlm-roberta-large	0.564	0.77	0.56	0.82	0.47	0.26	0.84	0.64	0.42	0.40	0.46

#	System	Arithmetic mean	EXIST 2022: Sexism detection (EN)	EXIST 2022: Sexism categorisation (EN)	DIANN 2023: Disability detection (EN)	DIPROMATS 2023: Propaganda identification (EN)	DIPROMATS 2023: Coarse propaganda characterization (EN)	DIPROMATS 2023: Fine-grained propaganda characterization (EN)	EXIST-2023: Sexism categorization (EN)	EXIST-2023: Sexism identification (EN)	EXIST-2023: Source intention (EN)	SQAC-SQUAD 2024: Question answering (EN)
1	bert-base-multilingual-cased	0.501	0.76	0.50	0.73	0.80	0.48	0.18	0.34	0.60	0.32	0.30
2	distilbert-base-multilingual-cased	0.472	0.74	0.53	0.68	0.77	0.45	0.16	0.30	0.58	0.31	0.20
3	distilbert-base-uncased	0.497	0.77	0.55	0.66	0.78	0.47	0.14	0.37	0.62	0.34	0.27
4	bert-base-cased	0.513	0.76	0.53	0.72	0.81	0.50	0.21	0.37	0.61	0.32	0.30
5	ixambert-base-cased	0.503	0.75	0.53	0.73	0.78	0.49	0.14	0.36	0.61	0.32	0.32
6	xlm-roberta-base	0.517	0.76	0.53	0.76	0.80	0.54	0.16	0.35	0.62	0.32	0.33
7	roberta-base	0.530	0.78	0.53	0.75	0.81	0.52	0.19	0.38	0.63	0.33	0.38
8	xlm-roberta-large	0.565	0.79	0.56	0.78	0.81	0.52	0.39	0.39	0.63	0.36	0.42
9	roberta-large	0.587	0.81	0.58	0.79	0.82	0.55	0.47	0.40	0.64	0.35	0.46

Odesia Extended Tasks

Tasks	Spanish baseline	Best result Spanish	English baseline	Best result English	Gap
MLDOC 2018: Document classification (ES)	0.93	0.96	0.88	0.98	40%
Multilingual Complex Named Entity Recognition 2022 (ES)	0.52	0.71	0.55	0.75	5%
SQAC-SQUAD 2016: Question answering (ES)	0.53	0.77	0.52	0.88	25%
Semantic Textual Similarity 2017 (ES)	0.68	0.81	0.70	0.86	13%

#	System	Arithmetic mean	MLDOC 2018: Document classification (ES)	Multilingual Complex Named Entity Recognition 2022 (ES)	SQAC-SQUAD 2016: Question answering (ES)	Semantic Textual Similarity 2017 (ES)
1	ixambert-base-cased	0.778	0.96	0.63	0.71	0.81
2	bertin-roberta-base-spanish	0.745	0.96	0.62	0.73	0.67
3	distilbert-base-multilingual-cased	0.698	0.94	0.61	0.55	0.69
4	bert-base-multilingual-cased	0.753	0.96	0.64	0.71	0.70
5	xlm-roberta-base	0.753	0.95	0.66	0.67	0.73
6	distillbert-base-spanish-uncased	0.710	0.96	0.61	0.53	0.74
7	PlanTL-GOB-ES-roberta-base-bne	0.773	0.96	0.64	0.74	0.75
8	PlanTL-GOB-ES-roberta-large-bne	0.780	0.96	0.63	0.77	0.76
9	bert-base-spanish-wwm-cased	0.773	0.96	0.63	0.71	0.79
10	xlm-roberta-large	0.810	0.96	0.71	0.77	0.80

#	System	Arithmetic mean	MLDOC 2018: Document classification (EN)	Multilingual Complex Named Entity Recognition 2022 (EN)	SQAC-SQUAD 2016: Question answering (EN)	Semantic Textual Similarity 2017 (EN)
1	bert-base-multilingual-cased	0.813	0.97	0.67	0.81	0.80
2	ixambert-base-cased	0.813	0.98	0.65	0.80	0.82
3	distilbert-base-multilingual-cased	0.778	0.97	0.63	0.75	0.76
4	xlm-roberta-base	0.818	0.98	0.69	0.80	0.80
5	distilbert-base-uncased	0.805	0.97	0.67	0.77	0.81
6	bert-base-cased	0.813	0.97	0.68	0.78	0.82
7	roberta-base	0.845	0.98	0.70	0.85	0.85
8	roberta-large	0.868	0.98	0.75	0.88	0.86
9	xlm-roberta-large	0.855	0.98	0.74	0.86	0.84