loubnabnl HF staff commited on
Commit
6696e96
1 Parent(s): 12c3b95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -58
README.md CHANGED
@@ -3,8 +3,6 @@ library_name: transformers
3
  license: apache-2.0
4
  language:
5
  - en
6
- datasets:
7
- - HuggingFaceTB/smollm-corpus
8
  ---
9
 
10
 
@@ -14,13 +12,6 @@ datasets:
14
  <img src="https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png" alt="SmolLM" width="1100" height="600">
15
  </center>
16
 
17
- ## Table of Contents
18
-
19
- 1. [Model Summary](##model-summary)
20
- 2. [Limitations](##limitations)
21
- 3. [Training](##training)
22
- 4. [License](##license)
23
- 5. [Citation](##citation)
24
 
25
  ## Model Summary
26
 
@@ -36,65 +27,23 @@ This is the SmolLM-135M-Instruct.
36
  pip install transformers
37
  ```
38
 
39
- #### Running the model on CPU/GPU/multi GPU
40
- * _Using full precision_
41
  ```python
42
  # pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
43
  from transformers import AutoModelForCausalLM, AutoTokenizer
44
- checkpoint = "HuggingFaceTB/SmolLM-135M"
 
45
  device = "cuda" # for GPU usage or "cpu" for CPU usage
46
  tokenizer = AutoTokenizer.from_pretrained(checkpoint)
47
  # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
48
  model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
49
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
50
- outputs = model.generate(inputs)
51
- print(tokenizer.decode(outputs[0]))
52
- ```
53
- ```bash
54
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
55
- Memory footprint: 12624.81 MB
56
- ```
57
- * _Using `torch.bfloat16`_
58
- ```python
59
- # pip install accelerate
60
- import torch
61
- from transformers import AutoTokenizer, AutoModelForCausalLM
62
- checkpoint = "HuggingFaceTB/SmolLM-135M"
63
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
64
- # for fp16 use `torch_dtype=torch.float16` instead
65
- model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
66
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
67
- outputs = model.generate(inputs)
68
- print(tokenizer.decode(outputs[0]))
69
- ```
70
- ```bash
71
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
72
- Memory footprint: 269.03 MB
73
- ```
74
-
75
- #### Quantized Versions through `bitsandbytes`
76
- * _Using 8-bit precision (int8)_
77
 
78
- ```python
79
- # pip install bitsandbytes accelerate
80
- from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
81
- # to use 4bit use `load_in_4bit=True` instead
82
- quantization_config = BitsAndBytesConfig(load_in_8bit=True)
83
- checkpoint = "HuggingFaceTB/SmolLM-135M"
84
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
85
- model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)
86
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
87
- outputs = model.generate(inputs)
88
  print(tokenizer.decode(outputs[0]))
89
  ```
90
- ```bash
91
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
92
- # load_in_8bit
93
- Memory footprint: 162.87 MB
94
- # load_in_4bit
95
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
96
- Memory footprint: 109.78 MB
97
- ```
98
 
99
  # Limitations
100
 
 
3
  license: apache-2.0
4
  language:
5
  - en
 
 
6
  ---
7
 
8
 
 
12
  <img src="https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png" alt="SmolLM" width="1100" height="600">
13
  </center>
14
 
 
 
 
 
 
 
 
15
 
16
  ## Model Summary
17
 
 
27
  pip install transformers
28
  ```
29
 
 
 
30
  ```python
31
  # pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
+ checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct"
34
+
35
  device = "cuda" # for GPU usage or "cpu" for CPU usage
36
  tokenizer = AutoTokenizer.from_pretrained(checkpoint)
37
  # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
38
  model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ messages = [{"role": "user", "content": "List the steps to bake a chocolate cake from scratch."}]
41
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
42
+ print(input_text)
43
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to("cuda")
44
+ outputs = model.generate(inputs, max_new_tokens=100, temperature=0.6, top_p=0.92, do_sample=True)
 
 
 
 
 
45
  print(tokenizer.decode(outputs[0]))
46
  ```
 
 
 
 
 
 
 
 
47
 
48
  # Limitations
49