joanllop commited on
Commit
9c4f7c3
1 Parent(s): e312cd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -1
README.md CHANGED
@@ -133,8 +133,135 @@ The accelerated partition is composed of 1,120 nodes with the following specific
133
  ---
134
 
135
  ## How to use
 
136
 
137
- <span style="color:red">TODO</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  ---
140
 
 
133
  ---
134
 
135
  ## How to use
136
+ This section offers examples of how to perform inference using various methods.
137
 
138
+ ### Inference
139
+ You'll find different techniques for running inference, including Huggingface's Text Generation Pipeline, multi-GPU configurations, and vLLM for scalable and efficient generation.
140
+
141
+ #### Inference with Huggingface's Text Generation Pipeline
142
+ The Huggingface Text Generation Pipeline provides a straightforward way to run inference using the Salamandra-2b model.
143
+
144
+ ```bash
145
+ pip install transformers torch accelerate sentencepiece protobuf
146
+ ```
147
+ <details>
148
+ <summary>Show code</summary>
149
+
150
+ ```python
151
+ from transformers import pipeline, set_seed
152
+
153
+ model_id = "projecte-aina/salamandra-2b"
154
+
155
+ # Sample prompts
156
+ prompts = [
157
+ ""
158
+ ]
159
+
160
+ # Create the pipeline
161
+ generator = pipeline("text-generation", model_id, device_map="auto")
162
+ generation_args = {
163
+ "temperature": 0.1,
164
+ "top_p": 0.95,
165
+ "max_new_tokens": 25,
166
+ "repetition_penalty": 1.2,
167
+ "do_sample": True
168
+ }
169
+
170
+ # Fix the seed
171
+ set_seed(1)
172
+ # Generate texts
173
+ outputs = generator(prompts, **generation_args)
174
+ # Print outputs
175
+ for output in outputs:
176
+ print(output[0]["generated_text"])
177
+
178
+ ```
179
+ </details>
180
+
181
+ #### Inference with single / multi GPU
182
+ This section provides a simple example of how to run inference using Huggingface's AutoModel class.
183
+
184
+ ```bash
185
+ pip install transformers torch accelerate sentencepiece protobuf
186
+ ```
187
+
188
+ <details>
189
+ <summary>Show code</summary>
190
+
191
+ ```python
192
+ from transformers import AutoTokenizer, AutoModelForCausalLM
193
+ import torch
194
+
195
+ model_id = "projecte-aina/salamandra-2b"
196
+
197
+ # Input text
198
+ text = "El mercat del barri és"
199
+
200
+ # Load the tokenizer
201
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
202
+ # Load the model
203
+ model = AutoModelForCausalLM.from_pretrained(
204
+ model_id,
205
+ device_map="auto",
206
+ torch_dtype=torch.bfloat16
207
+ )
208
+
209
+ generation_args = {
210
+ "temperature": 0.1,
211
+ "top_p": 0.95,
212
+ "max_new_tokens": 25,
213
+ "repetition_penalty": 1.2,
214
+ "do_sample": True
215
+ }
216
+
217
+ inputs = tokenizer(text, return_tensors="pt")
218
+ # Generate texts
219
+ output = model.generate(input_ids=inputs["input_ids"].to(model.device), attention_mask=inputs["attention_mask"], **generation_args)
220
+ # Print outputs
221
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
222
+ ```
223
+
224
+ </details>
225
+
226
+ #### Inference with vLLM
227
+ vLLM is an efficient library for inference that enables faster and more scalable text generation.
228
+
229
+ ```bash
230
+ pip install vllm
231
+ ```
232
+
233
+ <details>
234
+ <summary>Show code</summary>
235
+
236
+ ```python
237
+ from vllm import LLM, SamplingParams
238
+
239
+ model_id = "projecte-aina/salamandra-2b"
240
+
241
+ # Sample prompts
242
+ prompts = [
243
+ "",
244
+ ]
245
+ # Create a sampling params object
246
+ sampling_params = SamplingParams(
247
+ temperature=0.1,
248
+ top_p=0.95,
249
+ seed=1,
250
+ max_tokens=25,
251
+ repetition_penalty=1.2)
252
+
253
+ # Create an LLM
254
+ llm = LLM(model=model_id)
255
+ # Generate texts
256
+ outputs = llm.generate(prompts, sampling_params)
257
+ # Print outputs
258
+ for output in outputs:
259
+ prompt = output.prompt
260
+ generated_text = output.outputs[0].text
261
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
262
+ ```
263
+
264
+ </details>
265
 
266
  ---
267