Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
# New data model | |
The new model is constructed by taking individual json files in data/new_eval, combining them together into | |
a simple format, and from the combined df, we create individual files for each models. | |
For the new eval runs which has to be appended, we first analyze the model associated with the json file | |
produced from eval harness, select the corresponding model file to append, find the unique rows (unique configuration | |
of model name, language, task group and few shot) in the json file, append if unique rows are not 0. | |
--- | |
title: Leaderboard | |
emoji: π | |
colorFrom: blue | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 4.19.2 | |
app_file: app.py | |
pinned: false | |
license: unknown | |
--- | |
# Introduction | |
This is the OpenGPT-X mutlilingual leaderboard source code repository. | |
The leaderboard aims to provied an overview of LLM performance over various languages. | |
The basic task set consists of MMLU, ARC, HellaSwag, GSM8k, TruthfulQA and belebele. | |
To make the results comparable to the Open LLM leaderboard (https://huggingface.co/open-llm-leaderboard) we selected the former five tasks based on our internal machine translations of the English base tasks, in addition to the high-quality multilingual benchmark belebele by Meta. | |
# Usage | |
The actually hosted leaderboard can be found under https://huggingface.co/spaces/openGPT-X/leaderboard. | |
In order to extend its functionality please create a PR. | |
# Adding new tasks | |
In order to add new evaluation tasks proceed as follows: | |
1. Add task information to `TASK_INFO` in `src/data.py`. It should be a dict mapping the task display name to the metric to be shown, as well as a dict containing mappings from two-letter language codes to the corresponding lm-eval-harness task selection string. See existing task information for reference. | |
2. Add evaluation results as detailed below. | |
# Adding new models | |
It is possible to change the display name of a particular model. | |
Simply add an entry to `_MODEL_NAMES` in `src/data.py`. | |
# Adding evaluation results | |
Copy the `.json`-output generated by the lm-eval-harness into `data`. | |