MLDOC 2018: Document classification

Monolingual document classification task performed on the English dataset of the Multilingual Document Classification Corpus (MLDoc) (Schwenk and Li, 2018), a cross-lingual document classification dataset covering 8 languages.  The corpus consists of 14,458 news articles from Reuters classified in four categories: Corporate/Industrial, Economics, Government/Social and Markets. The task consists in classifying each document in one of the four classes.

Publication
Holger Schwenk and Xian Li. 2018. A Corpus for Multilingual Document Classification in Eight Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Language
English
Abstract task
Year
2018
Ranking metric
F1

Task results

System Precision Recall F1 Sort ascending CEM Accuracy MacroPrecision MacroRecall MacroF1 RMSE MicroPrecision MicroRecall MicroF1 MAE MAP UAS LAS MLAS BLEX Pearson correlation Spearman correlation MeasureC BERTScore EMR Exact Match F0.5 Hierarchical F ICM MeasureC Propensity F Reliability Sensitivity Sentiment Graph F1 WAC b2 erde30 sent weighted f1
Roberta large 0.9832 0.9832 0.9832 0.9832 0.98
Roberta base 0.9802 0.9802 0.9802 0.9802 0.98
Xlm roberta large 0.9789 0.9789 0.9789 0.9789 0.98
Xlm roberta base 0.9761 0.9761 0.9761 0.9761 0.98
Ixa ehu ixambert base cased 0.9756 0.9756 0.9756 0.9756 0.98
Bert base cased 0.9749 0.9749 0.9749 0.9749 0.97
Distilbert base uncased 0.9726 0.9726 0.9726 0.9726 0.97
Bert base multilingual cased 0.9716 0.9716 0.9716 0.9716 0.97
Distilbert base multilingual cased 0.9693 0.9693 0.9693 0.9693 0.97

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.