Monolingual document classification task performed on the Spanish dataset of the Multilingual Document Classification Corpus (MLDoc) (Schwenk and Li, 2018), a cross-lingual document classification dataset covering 8 languages. The corpus consists of 14,458 news articles from Reuters classified in four categories: Corporate/Industrial, Economics, Government/Social and Markets. The task consists in classifying each document in one of the four classes.
Publication
Holger Schwenk and Xian Li. 2018. A Corpus for Multilingual Document Classification in Eight Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Language
Spanish
NLP topic
Abstract task
Year
2018
Publication link
Ranking metric
F1
Task results
System | MacroF1 Sort ascending |
---|---|
Xlm roberta large | 0.9641 |
Llama-3.1-8B | 0.9636 |
Qwen2.5-7B | 0.9627 |
Bertin roberta base spanish | 0.9605 |
Ixa ehu ixambert base cased | 0.9579 |
PlanTL GOB ES roberta base bne | 0.9570 |
PlanTL GOB ES roberta large bne | 0.9567 |
Dccuchile bert base spanish wwm cased | 0.9564 |
Bert base multilingual cased | 0.9562 |
Mistral-7B-v03 | 0.9555 |
Pagination
- Page 1
- Next page