Update README.md
Browse files
README.md
CHANGED
@@ -60,11 +60,15 @@ datasets:
|
|
60 |
license: apache-2.0
|
61 |
---
|
62 |
|
63 |
-
|
|
|
|
|
|
|
|
|
64 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
65 |
and dataset collection.
|
66 |
|
67 |
-
According ot the original [blog]() here are the notable improvements:
|
68 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
69 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
70 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
@@ -86,10 +90,13 @@ The reported results are the following :
|
|
86 |
|
87 |
# Using the model
|
88 |
|
|
|
|
|
89 |
```python
|
|
|
90 |
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
91 |
import torch
|
92 |
-
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto",
|
93 |
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
94 |
|
95 |
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
@@ -99,7 +106,24 @@ outputs = model.generate(inputs, max_length=200)
|
|
99 |
|
100 |
print(tokenizer.decode(outputs[0]))
|
101 |
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
|
|
|
|
|
|
102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
```
|
104 |
|
105 |
|
@@ -193,7 +217,7 @@ In total, the model was trained for 2.65 million steps.
|
|
193 |
|
194 |
## Contribution
|
195 |
|
196 |
-
This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
|
197 |
|
198 |
## Examples
|
199 |
|
|
|
60 |
license: apache-2.0
|
61 |
---
|
62 |
|
63 |
+
|
64 |
+
# Model card for FLan-UL2
|
65 |
+
|
66 |
+
![model image](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/ul2.png)
|
67 |
+
|
68 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
69 |
and dataset collection.
|
70 |
|
71 |
+
According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
|
72 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
73 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
74 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
|
|
90 |
|
91 |
# Using the model
|
92 |
|
93 |
+
For more efficient memory usage, we advise you to load the model in `8bit` using `load_in_8bit` flag as follows:
|
94 |
+
|
95 |
```python
|
96 |
+
# pip install accelerate transformers bitsandbytes
|
97 |
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
98 |
import torch
|
99 |
+
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bit=True)
|
100 |
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
101 |
|
102 |
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
|
|
106 |
|
107 |
print(tokenizer.decode(outputs[0]))
|
108 |
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
109 |
+
```
|
110 |
+
|
111 |
+
Otherwise, you can load and run the model in `bfloat16` as follows:
|
112 |
|
113 |
+
```python
|
114 |
+
# pip install accelerate transformers
|
115 |
+
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
116 |
+
import torch
|
117 |
+
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", torch_dtype=torch.bfloat16, device_map="auto")
|
118 |
+
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
119 |
+
|
120 |
+
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
121 |
+
|
122 |
+
inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
|
123 |
+
outputs = model.generate(inputs, max_length=200)
|
124 |
+
|
125 |
+
print(tokenizer.decode(outputs[0]))
|
126 |
+
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
127 |
```
|
128 |
|
129 |
|
|
|
217 |
|
218 |
## Contribution
|
219 |
|
220 |
+
This model was originally contributed by [Yi Tay](https://www.yitay.net/?author=636616684c5e64780328eece), and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
|
221 |
|
222 |
## Examples
|
223 |
|