alexmarques commited on
Commit
8e4a37a
1 Parent(s): fa37030

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -46
README.md CHANGED
@@ -39,8 +39,6 @@ GPTQ used a 1% damping factor and 256 sequences of 8,192 random tokens.
39
 
40
  ## Deployment
41
 
42
- ### Use with vLLM
43
-
44
  This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
45
 
46
  ```python
@@ -71,50 +69,6 @@ print(generated_text)
71
 
72
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
73
 
74
- ### Use with transformers
75
-
76
- The following example contemplates how the model can be deployed in Transformers using the `generate()` function.
77
-
78
-
79
- ```python
80
- from transformers import AutoTokenizer, AutoModelForCausalLM
81
-
82
- model_id = "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8"
83
-
84
- tokenizer = AutoTokenizer.from_pretrained(model_id)
85
- model = AutoModelForCausalLM.from_pretrained(
86
- model_id,
87
- torch_dtype="auto",
88
- device_map="auto",
89
- )
90
-
91
- messages = [
92
- {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
93
- {"role": "user", "content": "Who are you?"},
94
- ]
95
-
96
- input_ids = tokenizer.apply_chat_template(
97
- messages,
98
- add_generation_prompt=True,
99
- return_tensors="pt"
100
- ).to(model.device)
101
-
102
- terminators = [
103
- tokenizer.eos_token_id,
104
- tokenizer.convert_tokens_to_ids("<|eot_id|>")
105
- ]
106
-
107
- outputs = model.generate(
108
- input_ids,
109
- max_new_tokens=256,
110
- eos_token_id=terminators,
111
- do_sample=True,
112
- temperature=0.6,
113
- top_p=0.9,
114
- )
115
- response = outputs[0][input_ids.shape[-1]:]
116
- print(tokenizer.decode(response, skip_special_tokens=True))
117
- ```
118
 
119
  ## Creation
120
 
 
39
 
40
  ## Deployment
41
 
 
 
42
  This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
43
 
44
  ```python
 
69
 
70
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ## Creation
74