📖 What is IberBench?

IberBench is hub comprised of datasets for languages across Iberian and Latin American regions, aimed to be used as a benchmark to evaluate causal language models. This initiative aims to enrich the Natural Language Processing (NLP) community in the Iberian Peninsula and Latin America. The benchmark enables the evaluation of NLP models in multiple Spanish variants and other languages such as Catalan, Galician, Basque, Portuguese, and Latin American Spanish, fostering assessments and developments that reflect the linguistic diversity of these regions.

We hope to drive multilingual research that considers the cultural and linguistic richness and complexity of the Spanish-speaking world, encouraging the creation of models that are truly representative of these realities.

📂 What are the data sources?

IberBench contains datasets from prominent workshops in the field such as IberLEF@SEPLN or PAN@CLEF, with the aim to incorporate standardized and consistent evaluation within this context, enhancing the value of the data and models derived from this effort.

We strictly adhere to all established guidelines and regulations concerning the use and publication of this data. Specifically:

The collected datasets may be published on 🤗HuggingFace under open access, with appropriate credit given to the authors.
Under no circumstances will we claim ownership of the datasets.

In any publication or presentation resulting from work with this data, we recognize the importance of citing and crediting to the organizing teams that crafted the datasets used at IberBench.

🙋 How can I join to IberBench?

IberBench comprises a committee composed of specialists in NLP, language ethics, and gender discrimination, drawn from both academia and industry, which will oversee the development of the project, ensuring its quality and relevance.

To be part of this committee, you can ask to join the IberBench organization at 🤗HuggingFace. Your request will be validated by experts already belonging to the organization.

🤝 How can I contribute to IberBench?

First, the initial committee will gather all the datasets from prominent workshops. From this, you can contribute with new datasets to the IberBench organization. The process is as follows:

Open a new discussion in the IberBench discussions space, linking to an existing dataset in the 🤗HuggingFace hub and explaining why the inclusion is relevant.
Discuss with the committee for the approval or rejection of the dataset.
If approval: your dataset will be included into the IberBench datasets, and will be used to evaluate LLMs in the IberBench leaderboard.

IberBench will never claim ownership over the dataset, the original author will receive all credits.

💬 Social networks

You can reach us at:

X: https://x.com/IberBench
🤗 Discussions: https://huggingface.co/spaces/iberbench/README/discussions

🫶 Acknowledgements

IberBench has been funded by the Valencian Institute for Business Competitiveness (IVACE).

AI & ML interests

Team members 6