Leaderboard ODESIA v1 - Results

Odesia Core Tasks

Tasks	Spanish baseline	Best result Spanish	English baseline	Best result English	Gap
EXIST 2022: Sexism detection	0.69	0.78	0.67	0.82	40%
EXIST 2022: Sexism categorisation	0.46	0.57	0.44	0.59	30%
DIPROMATS 2023: Propaganda identification	0.75	0.83	0.71	0.81	25%
DIPROMATS 2023: Coarse propaganda characterization	0.22	0.56	0.21	0.55	2%
DIPROMATS 2023: Fine-grained propaganda characterization	0.09	0.45	0.08	0.55	23%
DIANN 2023: Disability detection	0.75	0.79	0.67	0.80	71%

System	Arithmetic mean	EXIST 2022: Sexism detection	EXIST 2022: Sexism categorisation	DIPROMATS 2023: Propaganda identification	DIPROMATS 2023: Coarse propaganda characterization	DIPROMATS 2023: Fine-grained propaganda characterization	DIANN 2023: Disability detection
Qwen2.5-7B	0.6363	0.7816	0.5526	0.8282	0.5572	0.4528	0.6455
Mistral-7B-v03	0.6218	0.7701	0.5476	0.8104	0.5279	0.4338	0.6408
Llama-3.1-8B	0.6273	0.7663	0.5682	0.8169	0.5385	0.4419	0.6319
Ixa ehu ixambert base cased	0.5201	0.6743	0.4875	0.7666	0.3796	0.0543	0.7580
Bertin roberta base spanish	0.5288	0.7280	0.4941	0.7596	0.2532	0.2500	0.6877
Xlm roberta large	0.6528	0.7663	0.5593	0.8186	0.5343	0.4527	0.7855
Xlm roberta base	0.5880	0.7395	0.4997	0.7894	0.4504	0.2668	0.7819
PlanTL GOB ES roberta large bne	0.6152	0.7241	0.5668	0.8177	0.5173	0.3894	0.6757
PlanTL GOB ES roberta base bne	0.6013	0.7356	0.5554	0.8149	0.4906	0.2944	0.7169
Distilbert base multilingual cased	0.5421	0.7222	0.4669	0.7507	0.4036	0.2222	0.6868
Dccuchile bert base spanish wwm cased	0.5953	0.7146	0.5370	0.7916	0.4874	0.2931	0.7478
CenIA distillbert base spanish uncased	0.5423	0.7203	0.5118	0.7708	0.4198	0.1782	0.6531
Bert base multilingual cased	0.5687	0.7222	0.4693	0.7821	0.4231	0.2562	0.7592

System	Arithmetic mean	EXIST 2022: Sexism detection	EXIST 2022: Sexism categorisation	DIANN 2023: Disability detection	DIPROMATS 2023: Propaganda identification	DIPROMATS 2023: Coarse propaganda characterization	DIPROMATS 2023: Fine-grained propaganda characterization
Qwen2.5-7B	0.6517	0.8129	0.5490	0.6623	0.8119	0.5359	0.5382
Mistral-7B-v03	0.6544	0.8226	0.5297	0.6731	0.8085	0.5539	0.5384
Llama-3.1-8B	0.6479	0.8031	0.5870	0.6385	0.7845	0.5414	0.5331
Ixa ehu ixambert base cased	0.6091	0.7563	0.5300	0.7450	0.7796	0.4430	0.4004
Xlm roberta large	0.6506	0.7953	0.5422	0.7740	0.7931	0.4867	0.5123
Xlm roberta base	0.6056	0.7661	0.5345	0.7438	0.7791	0.4329	0.3773
Roberta large	0.6788	0.8187	0.5846	0.7982	0.7984	0.5204	0.5526
Roberta base	0.6294	0.7875	0.5258	0.7612	0.7799	0.4811	0.4406
Distilbert base uncased	0.5828	0.7739	0.5486	0.6966	0.7687	0.4054	0.3035
Distilbert base multilingual cased	0.5665	0.7388	0.4792	0.6950	0.7471	0.3794	0.3592
Bert base cased	0.6142	0.7641	0.5344	0.7364	0.7763	0.4468	0.4271
Bert base multilingual cased	0.5971	0.7563	0.5022	0.7384	0.7709	0.4266	0.3884

Odesia Extended Tasks

Tasks	Spanish baseline	Best result Spanish	English baseline	Best result English	Gap
MLDOC 2018: Document classification	0.93	0.96	0.88	0.98	66%
Multilingual Complex Named Entity Recognition 2022	0.52	0.68	0.55	0.70	-6%
SQAC-SQUAD 2016: Question answering	0.53	0.80	0.52	0.87	22%
Semantic Textual Similarity 2017	0.68	0.83	0.70	0.87	8%

System	Arithmetic mean	MLDOC 2018: Document classification	Multilingual Complex Named Entity Recognition 2022	SQAC-SQUAD 2016: Question answering	Semantic Textual Similarity 2017
Qwen2.5-7B	0.7213	0.9627	0.5490	0.8040	0.5694
Mistral-7B-v03	0.7663	0.9555	0.5767	0.7867	0.7463
Llama-3.1-8B	0.7497	0.9636	0.5457	0.7889	0.7006
Ixa ehu ixambert base cased	0.7764	0.9579	0.5926	0.7429	0.8120
Bertin roberta base spanish	0.7484	0.9605	0.5215	0.7298	0.7818
Xlm roberta large	0.8156	0.9641	0.6801	0.7895	0.8287
Xlm roberta base	0.7646	0.9534	0.6201	0.6988	0.7861
PlanTL GOB ES roberta large bne	0.7922	0.9567	0.6069	0.7818	0.8232
PlanTL GOB ES roberta base bne	0.7823	0.9570	0.6041	0.7584	0.8096
Distilbert base multilingual cased	0.7088	0.9425	0.5580	0.5566	0.7781
Dccuchile bert base spanish wwm cased	0.7661	0.9564	0.5472	0.7276	0.8330
CenIA distillbert base spanish uncased	0.7182	0.9553	0.5894	0.5329	0.7951
Bert base multilingual cased	0.7613	0.9562	0.5992	0.6976	0.7920
distilbert-base-multilingual-cased	0.1375	0.0000	0.0000	0.5500	0.0000

System	Arithmetic mean	MLDOC 2018: Document classification	Multilingual Complex Named Entity Recognition 2022	SQAC-SQUAD 2016: Question answering	Semantic Textual Similarity 2017
Qwen2.5-7B	0.7371	0.9817	0.5333	0.8240	0.6092
Mistral-7B-v03	0.7769	0.9822	0.5681	0.8107	0.7465
Llama-3.1-8B	0.7783	0.9809	0.5426	0.8197	0.7699
Ixa ehu ixambert base cased	0.8047	0.9756	0.6075	0.8187	0.8170
Xlm roberta large	0.8457	0.9789	0.7007	0.8581	0.8450
Xlm roberta base	0.7984	0.9761	0.6080	0.7998	0.8097
Roberta large	0.8556	0.9832	0.7012	0.8724	0.8656
Roberta base	0.8345	0.9802	0.6577	0.8427	0.8572
Distilbert base uncased	0.8063	0.9726	0.6563	0.7602	0.8360
Distilbert base multilingual cased	0.7681	0.9693	0.5693	0.7467	0.7872
Bert base cased	0.8036	0.9749	0.5993	0.7968	0.8434
Bert base multilingual cased	0.8035	0.9716	0.6252	0.8059	0.8112