ODESIA Challenge @ SEPLN 2024 (July 1 - September 14)

Registration open until July 30

HOW TO PARTICPATE

System
Arithmetic mean
Ixa ehu ixambert base cased 0.4748 0.6743 0.4875 0.7666 0.3796 0.0543 0.6868 0.6117 0.3890 0.3412 0.3570
Bertin roberta base spanish 0.4912 0.7280 0.4941 0.7596 0.2532 0.1782 0.6877 0.6465 0.4146 0.3331 0.4172
Xlm roberta large 0.5873 0.7663 0.5593 0.8186 0.5343 0.4527 0.7855 0.6564 0.4414 0.3995 0.4589
Xlm roberta base 0.5264 0.7395 0.4997 0.7894 0.4504 0.2668 0.7819 0.6236 0.4245 0.3195 0.3691
PlanTL GOB ES roberta large bne 0.5626 0.7241 0.5668 0.8177 0.5173 0.3894 0.6757 0.6671 0.4237 0.3798 0.4640
PlanTL GOB ES roberta base bne 0.5453 0.7356 0.5554 0.8149 0.4906 0.2944 0.7169 0.6531 0.4173 0.3688 0.4061
Distilbert base multilingual cased 0.4728 0.7222 0.4669 0.7507 0.4036 0.2222 0.6868 0.5851 0.3823 0.2874 0.2207
Dccuchile bert base spanish wwm cased 0.5408 0.7146 0.5370 0.7916 0.4874 0.2931 0.7478 0.6326 0.4182 0.3738 0.4118
CenIA distillbert base spanish uncased 0.4864 0.7203 0.5118 0.7708 0.4198 0.1782 0.6531 0.6128 0.4160 0.3324 0.2484
Bert base multilingual cased 0.5073 0.7222 0.4693 0.7821 0.4231 0.2562 0.7592 0.6136 0.3917 0.3326 0.3225
XLM-RoBERTa-large-v3 0.5462 0.7452 0.5540 0.8224 0.5425 0.4581 0.5967 0.5441 0.4384 0.3609 0.4000
XLM-RoBERTa-large-2 0.5320 0.7452 0.5540 0.8224 0.5425 0.4581 0.5967 0.5441 0.4384 0.3609 0.2581
XLM-RoBERTa-large 0.4951 0.7452 0.5540 0.8224 0.5425 0.4581 0.5967 0.5441 0.3371 0.0925 0.2581

CHALLENGE RULES

This challenge aims to promote the development and evaluation of language models in Spanish using the evaluation platform and datasets provided by the ODESIA project (Espacio de Observación del Desarrollo del Español en la Inteligencia Artificial).

The challenge consists of solving 10 discriminative tasks in Spanish, that belong to the ODESIA Leaderboard and are evaluated on private data. These tasks, with private evaluation data, belong to the ODESIA-CORE section of the ODESIA Leaderboard. The ODESIA Leaderboard is an application that provides an evaluation infrastructure for pretrained language models in English and Spanish that allows a direct comparison between the performance of models in one and the other language. Additionally, the leaderboard has an ODESIA-EXTENDED section with 4 tasks with pre-existing public evaluation data, but these are not part of the challenge. Although for all tasks ODESIA provides bilingual data (Spanish and English), this challenge focuses only on the Spanish tasks (Spanish portion of ODESIA-CORE).

The team submitting the best system will receive a cash prize of 3.000 euros, donated by the company Llorente y Cuenca Madrid, SL (see details below).

The ODESIA-CORE benchmark consists of 10 discriminative tasks with public training datasets and private test datasets (not previously distributed by any means) created within the ODESIA initiative. The private nature of the test data guarantees the absence of contamination in the leaderboard results: no LLM should have seen the test set annotations in its pre-training phase. This is a summary of the tasks:

Name Domain Task Abstract Task Metric
DIANN 2023 Biomedical Disability detection Sequence labeling F1 Macro
DIPROMATS 2023 Politics Propaganda identification Binary Classification ICM-Norm
Propaganda characterization, coarse-grained Multiclass Hierarchical Classification, Multilabel ICM-Norm
Propaganda characterization, fine-grained Multiclass Hierarchical Classification, Multilabel ICM-Norm
EXIST 2022 Social Sexism detection Binary Classification Accuracy
Sexism categorization Multiclass Classification F1 Macro
EXIST 2023 Social Sexism detection Binary classification Soft-ICM-Norm
Source intention categorization Multiclass Hierarchical Classification Soft-ICM-Norm
Sexism categorization Multiclass Hierarchical Classification, Multilabel Soft-ICM-Norm
SQUAD-SQAC 2024 Scientific Extractive Question-Answering Sequence labeling F1

The winning system will be the one that, at the end of the competition, obtains the best average score for Spanish version of the ODESIA-CORE tasks.

All types of Natural Language Processing (NLP) systems that that are applied uniformly to all tasks will be accepted. That is, each participation must be a single system that applies to all tasks, instead of different approaches for each task. A submission in which the solution for each task is constructed independently will not be acceptable.

For illustrative purposes, systems with the following characteristics (non-exhaustive list) are acceptable:

  • The system is an encoder-type LLM (or an ensemble of LLMs), to which a fine-tuning process is applied for each of the challenge tasks, using the training data provided in the participants’ package or from other sources as deemed appropriate by the participating team.
  • The system uses one or more generative LLMs, combined with a uniform zero-shot, one-shot or few-shot prompting strategy.
  • The system uses one or more generative LLMs combined with a retrieval-augmented generation strategy on the training dataset or other external sources.
  • Any combination of the above methods, as long as it is applied uniformly to all datasets.

To ensure the originality of the solutions provided, the organizers may request the participating teams to supply the implementation code of their solution, along with all the materials that are necessary to reproduce their results. The code will be supplied as a link to a GitHub repository along with a Docker image to facilitate its execution.

Models or systems for which no form of verification or reproduction of results is provided, if required by the organization, will be reported as participants in the results table but will not be eligible for the challenge prize.

  • Teams will have to pre-register for the challenge before they can participate.
  • Each team will register a single account on the "ODESIA-Leaderboard" evaluation platform using the form provided for this purpose (link).
  • The organizers will provide a username and password on the “ODESIA-Leaderboard” platform upon validation of the registration data.

  • The results will be submitted through the ODESIA Leaderboard at https://leaderboard.odesia.uned.es/leaderboard/submit , where they will be automatically evaluated using the metrics corresponding to each task.
  • For each submission, teams should format their prediction files following the specifications described in the README files of each dataset (included in the download package).
  • In addition, the following fields must be completed on the prediction submission page:
    • Team Name: Login details on the ODESIA platform of the team representative, which will be provided when registering for the challenge.
    • Email: Contact email used for the challenge registration.
    • Affiliation: Institution the participants belong to (if applicable).
    • System name: To be formatted as “{team_name}-{submission_number}”, where team_name will be a permanent identifier for all submissions from the same participant, and where “submission_number” will be a number from 1 to 20 corresponding to each of the 20 submissions allowed per team during the contest.
    • Model URL: (optional) URL of the model used (e.g. in Hugging Face), if applicable.
    • System description: A 300 to 500-word description of the system used to generate the predictions.
    • GitHub URL: Optionally, teams can add the link to the source code used to generate the results.
    • Leaderboard version: "Challenge" must be selected.
    • Submission languages: Check only "Spanish".
    • ZIP File: The results of the system are sent as a compressed file with predictions formatted as specified above.
Once the system output is submitted, the evaluation process can take up to two minutes. Once completed, the application allows participants to check the results and choose whether to make them public on the leaderboard. Regardless of whether they are made public, each submission will count towards the maximum of twenty (20) per team allowed during the contest.

The ODESIA Leaderboard uses the PyEvALL evaluation library for classification tasks. PyEvALL is accessible from the Pip package manager and can be used during the development phase to evaluate the DIPROMATS 2023, EXIST 2022 and EXIST 2023 tasks. The F1 metric implemented to evaluate the tasks of the original SQUAD/SQAC dataset has been adapted to evaluate the tasks of the SQUAD-SQAC 2024 dataset. Its original implementation, for use in the development phase, can be found in SQUAD METRIC. The DIANN 2023 sequence labeling task is evaluated with the Macro F1 metric implemented in the HuggingFace Evaluate library, which has also been adapted for the ODESIA Leaderboard.

The only restrictions on participating teams are:

  • All team members must be of legal age.
  • No person may be a member of more than one team.

  • A single prize of 3,000 euros (donated by the company “Llorente y Cuenca Madrid, SL”) will be awarded to the team that presents the system with the best global average performance in the ODESIA-CORE tasks in Spanish.
  • To be eligible for the prize, the following conditions are established:
    • Teams must make their results public on the ODESIA Leaderboard before the end date of the contest.
    • The winning team must obtain an average score higher than that of the baseline models provided by the organization. Specifically, they must surpass the model that achieves the best average, which is XLM-Roberta-Large with a score of 0.5873.
    • There must be a minimum of five teams submitting results; if this number is not met, the organization reserves the right to defer the deadline of the challenge.
    • The winning team commits to present its solution at the Award ceremony (see section "Results Presentation and Prize Award Ceremony").
    • Employees of UNED, LlyC Madrid S.L., Red.es, SEDIA, and any other entity related to the ODESIA project may participate in the challenge but will not be eligible for the final cash prize.

  • Common sense rules of ethics and professional conduct must be respected. The organizers reserve the right to disqualify teams that have violated the rules.
  • No limits are imposed on the costs associated with the implementation of the solutions, but the organizers may request information on these costs.
  • The organizers reserve the right to update the rules in response to unforeseen circumstances, to better serve the competition's mission.
  • The organizers reserve all rights regarding the final decision.

  • The winning team, and a selection of other teams submitting innovative solutions, will be asked to submit a technical report in PDF format of at least 4 pages (excluding references) detailing their solution.
  • The report will include a discussion of the strategies adopted by the team in the development of their proposal and the evaluation results.
  • The report will include a breakdown of the costs of implementing the system and the use of datasets provided by the organization and by third parties, if applicable.
  • If there is sufficient material, the option of publishing the technical reports in a special issue or in a joint journal article will be considered.

  • The presentation of results and the official award ceremony will take place within the framework of the XL International Congress of the Spanish Society for Natural Language Processing (SEPLN), to be held in Valladolid on September 24-27, 2024 (session scheduled for 25th Sept at 5:30 CEST). Acceptance of the award implies mandatory attendance (in person or online) at the session.
  • All participants will receive certificates of participation at the end of the competition.

This challenge is organized in the framework of the ODESIA Project, a cooperation between the Spanish public university UNED and Red.es, a public Business Entity associated to the Ministry for Digital Processing and Civil Service, through the Secretary of State for Digitalization and Artificial Intelligence. The project is partially funded by the European Union (NextGenerationEU funds) through the "Plan de Recuperación, Transformación y Resiliencia", by the Ministry of Economic Affairs and Digital Transformation and by the UNED. It belongs to the activities of the "Plan de Tecnologías del Lenguaje de la Secretaría de Estado de Inteligencia Artificial y Digitalización" from Spain.

  • Organizing Committee:
    • Alejandro Benito-Santos (co-chair, UNED)
    • Roser Morante (co-chair, UNED)
    • Julio Gonzalo (UNED)
    • Jorge Carrillo-de-Albornoz (UNED)
    • Laura Plaza (UNED)
    • Enrique Amigó (UNED)
    • Víctor Fresno (UNED)
    • Andrés Fernández (UNED)
    • Adrián Ghajari (UNED)
    • Guillermo Marco (UNED)
    • Eva Sánchez (UNED)
    • Miguel Lucas (LLyC)
  • Advisory Board:
    • TBA

For questions related to the challenge, please join our Discord server: #odesia-challenge-2024. You can also contact the challenge co-chairs, Alejandro Benito-Santos (al.benito@lsi.uned.es) and Roser Morante (r.morant@lsi.uned.es).

  • Registration opens: July 1, 2024
  • Registration closes: July 30, 2024*
  • Submissions deadline: September 14, 2024*
  • Official results announced: 16-20 September 2024
  • Award ceremony and presentation of results: 25 September 2024 - 5:30pm, at SEPLN 2024
*23:59 AoE (Anywhere on Earth)