open-llm-leaderboard/open_llm_leaderboard · It seems that the results of some recent evaluation tasks have not been uploaded

Nov 21, 2023

Hi, @clefourrier
It seems that the status of some recent evaluation tasks has been completed, but their results have not been uploaded.
I observed that the latest update of https://huggingface.co/datasets/open-llm-leaderboard/results/tree/main was 15 hours ago. However, the running status in https://huggingface.co/datasets/open-llm-leaderboard/requests/tree/main is still being updated.

If it’s not too much trouble, could you take a look?

clefourrier

Open LLM Leaderboard org Nov 21, 2023

•

edited Nov 21, 2023

Hi :)
Of course! Do you have a specific model you want me to take a look at?
Also cc: @SaylorTwift

Azure99

Nov 21, 2023

@clefourrier
Thank you for your reply, the model I want to check is Azure99/blossom-v3-mistral-7b.
In addition, I mean that almost all tasks are in the completion state, but the results have not been uploaded. This may be an unexpected error.

clefourrier

Open LLM Leaderboard org Nov 21, 2023

Hi!
Thank you very much! Yes, it would seem from the logs that we have a problem when pushing the results! We'll try to fix it asap!

clefourrier

Open LLM Leaderboard org Nov 21, 2023

(But I can confirm your model was evaluated properly)

Azure99

Nov 21, 2023

This is really great news, thank you again for your work.

clefourrier

Open LLM Leaderboard org Nov 21, 2023

I pushed the results of all currently evaluated models manually, while we're working on a fix

Weyaxi

Nov 21, 2023

Hi @clefourrier , same thing is happening here (FINISHED in requests but no result):

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Weyaxi/test-help-steer-filtered-orig_eval_request_False_float16_Original.json

Azure99

Nov 22, 2023

Hi @clefourrier , sorry to bother you again.
Could you manually push the evaluation results of the models once more?Many models have been evaluated in the past day, particularly Microsoft's Orca-2, and I think many people would be interested in that.

clefourrier

Open LLM Leaderboard org Nov 22, 2023

Hi! I just did :)
Are you sure microsoft's model has been evaluated however? I'm not seeing it in the results, I suspect it has not finished running yet

Azure99

Nov 22, 2023

@clefourrier Thank you for your prompt response.
I apologize, it seems I made an error – the Orca model is still under evaluation. Looking forward to the leaderboard being corrected appropriately.

clefourrier

Open LLM Leaderboard org Nov 22, 2023

(Tagging @SaylorTwift re-results not uploaded, btw)

SaylorTwift

Open LLM Leaderboard org Nov 22, 2023

Hi ! The issue has been fixed, sorry for the confusion, you should see results uploaded again. Don't hesistate to re-open if you have any issue :)

SaylorTwift changed discussion status to closed Nov 22, 2023

Azure99

Nov 28, 2023

@clefourrier @SaylorTwift It seems that the same problem has occurred again. The model has been evaluated, but the results have not been uploaded.
https://huggingface.co/datasets/open-llm-leaderboard/results/commits/main

Azure99 changed discussion status to open Nov 28, 2023

clefourrier

Open LLM Leaderboard org Nov 28, 2023

Hi!
Thank you for reporting! Just did a manual upload of the results.

Azure99 changed discussion status to closed Nov 28, 2023

Azure99

Dec 12, 2023

Yes, it's here again. It seems that a batch of tasks has failed recently. Can you take another look?
Azure99/blossom-v3_1-yi-34b
@clefourrier @SaylorTwift
Thank you again.

Azure99 changed discussion status to open Dec 12, 2023

clefourrier

Open LLM Leaderboard org Dec 12, 2023

Hi @Azure99 , can you point us to the request file please?

Azure99

Dec 12, 2023

hi @clefourrier , it's here.
/Azure99/blossom-v3_1-yi-34b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/32cdbd36e7352b1bbed8fcc55dc5ea8b7b81382b
By the way, the recent evaluations of several 34B models have all failed, but evaluations of models with sizes like 7B are normal.

clefourrier

Open LLM Leaderboard org Dec 12, 2023

Hi! There was a connectivity issue when trying to read your model - I added it back to pending.

clefourrier changed discussion status to closed Dec 12, 2023

Azure99

Jan 4

Hi @clefourrier , just like before, another task unexpectedly failed, could you help with restarting it? Thank you very much.
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/2a2c89c16442b63af92aed7a68eea23d3073c7ee

Azure99 changed discussion status to open Jan 4

clefourrier

Open LLM Leaderboard org Jan 4

•

edited Jan 4

Hi @Azure99 , the model failed to be loaded - can you upload it in safetensors as required on the Submit page please? :)
Once it's uploaded, I'll relaunch it.

clefourrier

Open LLM Leaderboard org Jan 5

(Two points as a side note, to simplify our work next time: please point to the request file directly, not to the commit of the request file, and please open a new discussion instead of reopening old ones, so that we can sort them by order of priority more easily. That would really simplify things for us :) )

clefourrier

Open LLM Leaderboard org Jan 11

Closing for inactivity, feel free to reopen if needed

clefourrier changed discussion status to closed Jan 11