HF1BitLLM
/

Llama3-8B-1.58-100B-tokens

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

medmekk HF staff commited on Sep 18

Commit

27e8609

•

1 Parent(s): 716ec0d

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -27,7 +27,13 @@ For a deeper dive into the methods and results, check out our [blog post](https:
 You can easily load and test our model in Transformers. Just follow the code below:
 ```python
 model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

 You can easily load and test our model in Transformers. Just follow the code below:
+Start by installing the transformers version with the correct configuration to load bitnet models
+```bash
+pip install git+https://github.com/huggingface/transformers.git@refs/pull/33410/head
+```
+And then load the model :
 ```python
 model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")