System Lak NLP Year 2022 Results split Test Task Classification of stereotypes Official Metric for Ranking ICM Source Publication Hierarchical F 0.86 Propensity F 0.85 ICM -0.42