Spaces:
Running
on
CPU Upgrade
Certain models perhaps clogging up the leaderboard?, Check logs?
it appears some of the same models have been stuck in the running state for months, could they be clogging up the leaderboard?
for the Hugging Face Open LLM Leaderboard, GPTQ being mis-submitted as float32/16 model clogged it months ago,
I don't know what exactly is going on, but could this be the case, would you plz look into it?
could a time-out of 1-3 months be set in place ?
so that no model runs for too long, so that incompatible / misconfigured models (or it maybe leaderboard's framework setup not necessarily the models's fault) are put in the failed category, so that other models get to run, instead of being held up in the queue
one concern are large models (60B+) being run with float32 precision, could they be run with bfloat16, or if they are natively float16, with float16 percision?)
also meta-llama is a gated model, could that be cause of delay? (llama-2 is months old by now)
see:
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/upstage/SOLAR-0-70b-16bit_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/bigscience/bloom_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/augtoma/qCammel-70-x_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/h2oai/h2ogpt-4096-llama2-70b-chat_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/meta-llama/Llama-2-70b-chat-hf_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/meta-llama/Llama-2-70b-hf_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/TheBloke/Falcon-180B-Chat-GPTQ_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/upstage/SOLAR-0-70b-16bit_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/stabilityai/StableBeluga2_eval_request_False_float32_Original.json
please look in to the logs, esp for long running models, to glean any insight on the situation, for the next step forward?
would you cancel the running float32 runs,
to reduce compute load, esp since the apparently have been stuck for a long time (weeks, months), (save compute time/memory)
the following model's native precision is lower:
GPTQ:
- TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
- TheBloke/Falcon-180B-Chat-GPTQ
- compressed-llm/vicuna-13b-v1.3-gptq
bfloat16:
- ai4bharat/Airavata
- bigscience/bloom
- google/gemma-7b-it
- meta-llama/Meta-Llama-3-8B
- stanford-oval/Llama-2-7b-WikiChat
float16:
- augtoma/qCammel-70-x
- h2oai/h2ogpt-4096-llama2-70b-chat
the others are float32 models, but bfloat16 is prob the closest with minimal loss of presision (prob ~0.2-1.8 percent difference maybe)
perhaps maybe submit the models with a more suitable precision unless it fails, next time...