maderix
/

llama-65b-4bit

Inference Endpoints

Model card Files Files and versions Community

maderix commited on Mar 14, 2023

Commit

b97fdb6

•

1 Parent(s): 6515d51

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -14,4 +14,5 @@ Installation instructions as mentioned in above repo:
 5. Run python cuda_setup.py install in venv
 6. You can either convert the llama models yourself with the instructions from GPTQ-for-llama repo
 7. or directly use these weights by individually downloading them following these instructions (https://huggingface.co/docs/huggingface_hub/guides/download)
-8. Profit!

 5. Run python cuda_setup.py install in venv
 6. You can either convert the llama models yourself with the instructions from GPTQ-for-llama repo
 7. or directly use these weights by individually downloading them following these instructions (https://huggingface.co/docs/huggingface_hub/guides/download)
+8. Profit!
+9. Best results are obtained by putting a repetition_penalty(~1/0.85),temperature=0.7 in model.generate() for most LLaMA models