Spaces:
Running
on
CPU Upgrade
There seems to be a problem with the mixtral finetuning evaluations
I have seen that some mixtral finetunes have failed the evaluation.
This also includes our finetunes. Does anyone know why this could be?
Currently almost all successfully evaluated mixtral models have the sliding window set or had it set at the time of evaluation, which is actually not correct, but could it be because of that?
Regards,
David
@DavidGF I was wondering why Mixtrals weren't showing up on the leaderboard. Out of curiosity can you provide links to some that failed?
@DavidGF I was wondering why Mixtrals weren't showing up on the leaderboard. Out of curiosity can you provide links to some that failed?
Hey Phil337,
of course! Just to name a few moe models :
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/c2e645c3e96d1c424e688314f9864c76090243ca
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/1fd2ef8cb1af226bbe607e86543bf53a7048cd43
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Undi95/Mixtral-4x7B-DPO-RPChat_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Undi95/Llamix2-MLewd-4x13B_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Undi95/Project-8x7B_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/VAGOsolutions/SauerkrautLM-Mixtral-8x7B_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/883bfa183acca66468b9d97d7f25b405575b8282#d2h-655087
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/62b9fb4ea3c8fb9e85aaf8fb7838bff5644461d0
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/26f6b1c43bece72a802aa5ed0499588d4879f1ae
@DavidGF Thanks! That list includes a couple I was waiting on, especially Dolphin.
Hi!
Thanks for the detailed report!
We changed the cluster on which the leaderboard backend is running and it would appear it has problems connecting to the hub - we are investigating and going to fix asap
Hi ! Some models failed while we migrated the leaderboard backend. I will requeue all those models. Thanks for the notice :)
I'm sorry to reopen this topic, but there still seems to be some problem, especially with the instruct models:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/8c76a8d6dcd32bda6c3ac06e238ab2b40b7c0c3f
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/c0bc6933b4ee2432b3d5169d0278fcd94471a42d
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/YeungNLP/firefly-mixtral-8x7b-v1_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/66a0a6d0f525cd643f88e99a191c933dc40a5e91
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/dfurman/Mixtral-8x7B-peft-v0.1_eval_request_False_float16_Adapter.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/ehartford/dolphin-2.5-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/orangetin/OpenHermes-Mixtral-8x7B_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/perlthoughts/Mistral-7B-Instruct-v0.2-2x7B-MoE_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/xDAN2099/xDAN-L1-moe-4x7b_eval_request_False_bfloat16_Original.json
Hi,
FYI, the new cluster is having strong connectivity problems, we are putting all evals on hold til it's fixed, and we'll relaunch all FAILED evals of the past 2 days
OpenPipe/mistral-ft-optimized-1218
cognitivecomputations/dolphin-2.6-mixtral-8x7b
Those models have issues getting into the evaluation que as well.
Thanks for fixing!