HF1BitLLM
/

Llama3-8B-1.58-100B-tokens

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

medmekk HF staff commited on Sep 14

Commit

2465824

•

1 Parent(s): 05f37de

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -59,9 +59,19 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details

 ## How to Get Started with the Model
+You can easily load and test our model in Transformers. Just follow the code below:
+```python
+model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
+input_text = "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:"
+input_ids = tokenizer.encode(input_text, return_tensors="pt").cuda()
+output = model.generate(input_ids, max_length=10, do_sample=False)
+generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
+print(generated_text)
 ## Training Details