FAQ

#1
by alielfilali01 - opened
Open Arabic LLM Leaderboard org

Please feel free to ask all your questions here

The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)

Open Arabic LLM Leaderboard org

The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)

@MohamedRashad
There is some heavy models that are currently on eval in parallel and that's what blocking the leaderboard, we expect to see more Finished (more than 14) by tomorrow.
I checked and it seems that all the models in requests dataset are in the PENDING toggle under the "Submit here" tab, so apologies but i fail to understand what you meant in generally

I found the model i submitted now πŸ˜…

Everything is working great ^^

I know this might seem obvious to many users here, but some (myself included) still think the current leaderboard is the final evaluation.

Please make it clear to users that the ranking is not finalβ€”the evaluation is still ongoing.

Also, could you provide an estimated timeline for when the evaluation will be complete?

Open Arabic LLM Leaderboard org

Dear @soufianechami , Leaderboards by nature are never at a final state, models are coming eveyday and got submitted then evaluated respectively. In order to be up to date, you will have (it is a must) to check on the leaderboard every ounce and a while

I'm curious whether there will be a section for embedding models?

Huggingface has a leaderboard for embedding models (https://huggingface.co/spaces/mteb/leaderboard) but the scores and ranking are all based on English, Chinese, French and Polish.

It's hard to know which of the models may work well for Arabic, e.g. for building the retrieval part of a RAG system.

Open Arabic LLM Leaderboard org

@rahimnathwani you can find Arabic under STS -> Other

Hi, thanks for compiling this resource!

Could you provide the exact lighteval command / config used for the evaluations? For example, in the ./examples/tasks/OALL.txt from the official lighteval repo, (almost) all tasks are evaluated 5-shot with |5|1 however in the leaderboard, everything is 0-shot.

Open Arabic LLM Leaderboard org

Hi, thanks for compiling this resource!

Could you provide the exact lighteval command / config used for the evaluations? For example, in the ./examples/tasks/OALL.txt from the official lighteval repo, (almost) all tasks are evaluated 5-shot with |5|1 however in the leaderboard, everything is 0-shot.

Hello,
yes please only change all to |0|0
This is our setting.

Also, for AlGhafa benchmark, which dataset is used?

There are multiple datasets here: https://gitlab.com/tiiuae/alghafa/-/tree/main?ref_type=heads

Also, on the OALL/Datasets I can find:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Translated

And:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Native

So which one is used? And how is the final metric is calculated over the benchmark datasets?

Hi @alielfilali01 ,
I submitted finetuned adapter airev-ai/Amal-70b-v2.3.2 (base model - airev-ai/Amal-70b-v2) with bfloat16 precision couple of hours back. But I could see the status as failed in the requests card. Both adapter and base model are public, have added model card and even attached a valid license. I am kinda unsure why the model submission is failed. Any assistance in this matter would be greatly appreciated.

Thank you.

Open Arabic LLM Leaderboard org

Hi @ManojShack
Thanks for submitting your model to the leaderboard.
Regarding your concern, when attempting to evaluate your models we ran against errors that your models are missing config.json. Thus, ensure the config is included and submit again.

Sign up or log in to comment