minghaowu commited on
Commit
01642ee
1 Parent(s): 9116e3a

Update README.md (#1)

Browse files

- Update README.md (d07bce9314981218de14c5dd679f573ed7698304)

Files changed (1) hide show
  1. README.md +126 -30
README.md CHANGED
@@ -1,53 +1,149 @@
1
  ---
2
- license: apache-2.0
3
  tags:
4
  - generated_from_trainer
 
5
  model-index:
6
- - name: flan-t5-base-distil-v4
7
  results: []
 
 
 
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
  should probably proofread and complete it, then remove this comment. -->
12
 
13
- # flan-t5-base-distil-v4
14
-
15
- This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on an unknown dataset.
16
-
17
- ## Model description
18
-
19
- More information needed
20
-
21
- ## Intended uses & limitations
22
-
23
- More information needed
24
-
25
- ## Training and evaluation data
26
-
27
- More information needed
28
-
29
- ## Training procedure
30
-
31
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  The following hyperparameters were used during training:
34
  - learning_rate: 0.0005
35
- - train_batch_size: 8
36
- - eval_batch_size: 8
37
  - seed: 42
38
- - gradient_accumulation_steps: 64
39
  - total_train_batch_size: 512
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: linear
42
  - num_epochs: 5
43
 
44
- ### Training results
 
 
 
45
 
 
46
 
47
 
48
- ### Framework versions
49
 
50
- - Transformers 4.27.4
51
- - Pytorch 1.13.1+cu117
52
- - Datasets 2.2.0
53
- - Tokenizers 0.13.2
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
  tags:
4
  - generated_from_trainer
5
+ - instruction fine-tuning
6
  model-index:
7
+ - name: flan-t5-small-distil-v2
8
  results: []
9
+ language:
10
+ - en
11
+ pipeline_tag: text2text-generation
12
+ widget:
13
+ - text: >-
14
+ how can I become more healthy?
15
+ example_title: example
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
+ <p align="center" width="100%">
22
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/LaMnin.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
23
+ </p>
24
+
25
+ # LaMini-FLAN-T5-77M
26
+
27
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
28
+
29
+ This model is one of our LaMini model series in paper "[LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini)". This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini/).
30
+ You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
31
+
32
+ <table>
33
+ <thead>
34
+ <tr>
35
+ <th>Base model</th>
36
+ <th colspan="4">LaMini series (#parameters)</th>
37
+ </tr>
38
+ </thead>
39
+ <tbody>
40
+ <tr>
41
+ <td>T5</td>
42
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
43
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
44
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
45
+ <td></td>
46
+ </tr>
47
+ <tr>
48
+ <td>Flan-T5</td>
49
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
50
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
51
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
52
+ <td></td>
53
+ </tr>
54
+ <tr>
55
+ <td>Cerebras-GPT</td>
56
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
57
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
58
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
59
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
60
+ </tr>
61
+ <tr>
62
+ <td>GPT-2</td>
63
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
64
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
65
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
66
+ <td></td>
67
+ </tr>
68
+ <tr>
69
+ <td>GPT-Neo</td>
70
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
71
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
72
+ <td></td>
73
+ <td></td>
74
+ </tr>
75
+ <tr>
76
+ <td>GPT-J</td>
77
+ <td colspan="4">coming soon</td>
78
+ </tr>
79
+ <tr>
80
+ <td>LLaMA</td>
81
+ <td colspan="4">coming soon</td>
82
+ </tr>
83
+
84
+
85
+ </tbody>
86
+ </table>
87
+
88
+
89
+ ## Use
90
+
91
+ ### Intended use
92
+ We recommend using the model to response to human instructions written in natural language.
93
+
94
+ We now show you how to load and use our model using HuggingFace `pipline()`.
95
+
96
+ ```python
97
+ # pip install -q transformers
98
+ from transformers import pipeline
99
+
100
+ checkpoint = "{model_name}"
101
+
102
+ model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
103
+
104
+ input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
105
+ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
106
+
107
+ print("Response": generated_text)
108
+ ```
109
+
110
+ ## Training Procedure
111
+
112
+ <p align="center" width="100%">
113
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
114
+ </p>
115
+
116
+ We initialize with [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) and fine-tune it on our [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 77M.
117
+
118
+ ### Training Hyperparameters
119
 
120
  The following hyperparameters were used during training:
121
  - learning_rate: 0.0005
122
+ - train_batch_size: 128
123
+ - eval_batch_size: 64
124
  - seed: 42
125
+ - gradient_accumulation_steps: 4
126
  - total_train_batch_size: 512
127
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
128
  - lr_scheduler_type: linear
129
  - num_epochs: 5
130
 
131
+ ## Evaluation
132
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
133
+
134
+ ## Limitations
135
 
136
+ More information needed
137
 
138
 
139
+ # Citation
140
 
141
+ ```bibtex
142
+ @misc{lamini,
143
+ title={LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions},
144
+ author={},
145
+ year={2023},
146
+ publisher = {GitHub},
147
+ journal = {GitHub repository},
148
+ }
149
+ ```