TinyPixel commited on
Commit
63af509
1 Parent(s): f1a6051

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,6 +1,204 @@
1
  ---
2
  library_name: peft
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ## Training procedure
5
 
6
 
@@ -16,6 +214,13 @@ The following `bitsandbytes` quantization config was used during training:
16
  - bnb_4bit_use_double_quant: False
17
  - bnb_4bit_compute_dtype: float16
18
 
 
 
 
 
 
 
 
19
  The following `bitsandbytes` quantization config was used during training:
20
  - quant_method: bitsandbytes
21
  - load_in_8bit: False
@@ -27,8 +232,8 @@ The following `bitsandbytes` quantization config was used during training:
27
  - bnb_4bit_quant_type: nf4
28
  - bnb_4bit_use_double_quant: False
29
  - bnb_4bit_compute_dtype: float16
 
30
  ### Framework versions
31
 
32
- - PEFT 0.5.0
33
 
34
- - PEFT 0.5.0
 
1
  ---
2
  library_name: peft
3
+ base_model: TinyPixel/Llama-2-7B-bf16-sharded
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
202
  ## Training procedure
203
 
204
 
 
214
  - bnb_4bit_use_double_quant: False
215
  - bnb_4bit_compute_dtype: float16
216
 
217
+ ### Framework versions
218
+
219
+
220
+ - PEFT 0.6.2
221
+ ## Training procedure
222
+
223
+
224
  The following `bitsandbytes` quantization config was used during training:
225
  - quant_method: bitsandbytes
226
  - load_in_8bit: False
 
232
  - bnb_4bit_quant_type: nf4
233
  - bnb_4bit_use_double_quant: False
234
  - bnb_4bit_compute_dtype: float16
235
+
236
  ### Framework versions
237
 
 
238
 
239
+ - PEFT 0.6.2
adapter_config.json CHANGED
@@ -1,6 +1,7 @@
1
  {
 
2
  "auto_mapping": null,
3
- "base_model_name_or_path": "stabilityai/stablelm-3b-4e1t",
4
  "bias": "none",
5
  "fan_in_fan_out": false,
6
  "inference_mode": true,
@@ -12,15 +13,16 @@
12
  "modules_to_save": null,
13
  "peft_type": "LORA",
14
  "r": 16,
 
15
  "revision": null,
16
  "target_modules": [
17
- "q_proj",
18
- "k_proj",
19
- "o_proj",
20
  "v_proj",
 
21
  "gate_proj",
 
22
  "down_proj",
23
- "up_proj"
 
24
  ],
25
  "task_type": "CAUSAL_LM"
26
  }
 
1
  {
2
+ "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyPixel/Llama-2-7B-bf16-sharded",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
 
13
  "modules_to_save": null,
14
  "peft_type": "LORA",
15
  "r": 16,
16
+ "rank_pattern": {},
17
  "revision": null,
18
  "target_modules": [
 
 
 
19
  "v_proj",
20
+ "up_proj",
21
  "gate_proj",
22
+ "k_proj",
23
  "down_proj",
24
+ "o_proj",
25
+ "q_proj"
26
  ],
27
  "task_type": "CAUSAL_LM"
28
  }
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bae2073e36bb32cfbba8de769e77f64c03b07ec350b5827681ed7636c12884c1
3
- size 100299853
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec81727ac8e416e42f13349e7f02f59b25eb46e08e46be176ef861b7a2d36fa7
3
+ size 160069389
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f05d63146dc686c004ba1a411b7fb76eeddc7f51e55f4c5067ae92ad6ba5f78b
3
+ size 159967880
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ecbc943d06bb9e8c8a39e4f638b03e0faf3ed5c5c2026eadf7f0c97048905bf6
3
- size 200654493
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb44253fa7192bcdc50a06304e61657a3e62589481f8ce4fb5b59323c7239704
3
+ size 320193565
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d0ecb1da131f28ebbdaed7969d4c7cdc38820701e74de5e10734a014050ce8ad
3
  size 14575
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42cb62d981319b46cbdce26b9ec7f200027409aa4ae35c2f3b90e39a846b8669
3
  size 14575
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d95b72119680d337f83be2a4d08a065f8278279b4937edf5107b5f0d8275c4a5
3
  size 627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:559bb3d641dd3013085e1ee1df433744bf219aefc5dc1f66c7d7e4d6236d0768
3
  size 627
special_tokens_map.json CHANGED
@@ -1,6 +1,24 @@
1
  {
2
- "bos_token": "<|endoftext|>",
3
- "eos_token": "<|endoftext|>",
4
- "pad_token": "<|endoftext|>",
5
- "unk_token": "<|endoftext|>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  }
 
1
  {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
  }
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,212 +1,37 @@
1
  {
2
- "add_prefix_space": false,
3
  "added_tokens_decoder": {
4
  "0": {
5
- "content": "<|endoftext|>",
6
  "lstrip": false,
7
- "normalized": false,
8
  "rstrip": false,
9
  "single_word": false,
10
  "special": true
11
  },
12
  "1": {
13
- "content": "<|padding|>",
14
- "lstrip": false,
15
- "normalized": false,
16
- "rstrip": false,
17
- "single_word": false,
18
- "special": true
19
- },
20
- "50254": {
21
- "content": " ",
22
  "lstrip": false,
23
  "normalized": true,
24
  "rstrip": false,
25
  "single_word": false,
26
- "special": false
27
- },
28
- "50255": {
29
- "content": " ",
30
- "lstrip": false,
31
- "normalized": true,
32
- "rstrip": false,
33
- "single_word": false,
34
- "special": false
35
- },
36
- "50256": {
37
- "content": " ",
38
- "lstrip": false,
39
- "normalized": true,
40
- "rstrip": false,
41
- "single_word": false,
42
- "special": false
43
- },
44
- "50257": {
45
- "content": " ",
46
- "lstrip": false,
47
- "normalized": true,
48
- "rstrip": false,
49
- "single_word": false,
50
- "special": false
51
- },
52
- "50258": {
53
- "content": " ",
54
- "lstrip": false,
55
- "normalized": true,
56
- "rstrip": false,
57
- "single_word": false,
58
- "special": false
59
- },
60
- "50259": {
61
- "content": " ",
62
- "lstrip": false,
63
- "normalized": true,
64
- "rstrip": false,
65
- "single_word": false,
66
- "special": false
67
- },
68
- "50260": {
69
- "content": " ",
70
- "lstrip": false,
71
- "normalized": true,
72
- "rstrip": false,
73
- "single_word": false,
74
- "special": false
75
- },
76
- "50261": {
77
- "content": " ",
78
- "lstrip": false,
79
- "normalized": true,
80
- "rstrip": false,
81
- "single_word": false,
82
- "special": false
83
- },
84
- "50262": {
85
- "content": " ",
86
- "lstrip": false,
87
- "normalized": true,
88
- "rstrip": false,
89
- "single_word": false,
90
- "special": false
91
- },
92
- "50263": {
93
- "content": " ",
94
- "lstrip": false,
95
- "normalized": true,
96
- "rstrip": false,
97
- "single_word": false,
98
- "special": false
99
- },
100
- "50264": {
101
- "content": " ",
102
- "lstrip": false,
103
- "normalized": true,
104
- "rstrip": false,
105
- "single_word": false,
106
- "special": false
107
- },
108
- "50265": {
109
- "content": " ",
110
- "lstrip": false,
111
- "normalized": true,
112
- "rstrip": false,
113
- "single_word": false,
114
- "special": false
115
- },
116
- "50266": {
117
- "content": " ",
118
- "lstrip": false,
119
- "normalized": true,
120
- "rstrip": false,
121
- "single_word": false,
122
- "special": false
123
- },
124
- "50267": {
125
- "content": " ",
126
- "lstrip": false,
127
- "normalized": true,
128
- "rstrip": false,
129
- "single_word": false,
130
- "special": false
131
- },
132
- "50268": {
133
- "content": " ",
134
- "lstrip": false,
135
- "normalized": true,
136
- "rstrip": false,
137
- "single_word": false,
138
- "special": false
139
- },
140
- "50269": {
141
- "content": " ",
142
- "lstrip": false,
143
- "normalized": true,
144
- "rstrip": false,
145
- "single_word": false,
146
- "special": false
147
- },
148
- "50270": {
149
- "content": " ",
150
- "lstrip": false,
151
- "normalized": true,
152
- "rstrip": false,
153
- "single_word": false,
154
- "special": false
155
- },
156
- "50271": {
157
- "content": " ",
158
- "lstrip": false,
159
- "normalized": true,
160
- "rstrip": false,
161
- "single_word": false,
162
- "special": false
163
- },
164
- "50272": {
165
- "content": " ",
166
- "lstrip": false,
167
- "normalized": true,
168
- "rstrip": false,
169
- "single_word": false,
170
- "special": false
171
- },
172
- "50273": {
173
- "content": " ",
174
- "lstrip": false,
175
- "normalized": true,
176
- "rstrip": false,
177
- "single_word": false,
178
- "special": false
179
- },
180
- "50274": {
181
- "content": " ",
182
- "lstrip": false,
183
- "normalized": true,
184
- "rstrip": false,
185
- "single_word": false,
186
- "special": false
187
- },
188
- "50275": {
189
- "content": " ",
190
- "lstrip": false,
191
- "normalized": true,
192
- "rstrip": false,
193
- "single_word": false,
194
- "special": false
195
  },
196
- "50276": {
197
- "content": " ",
198
  "lstrip": false,
199
  "normalized": true,
200
  "rstrip": false,
201
  "single_word": false,
202
- "special": false
203
  }
204
  },
205
- "bos_token": "<|endoftext|>",
206
- "clean_up_tokenization_spaces": true,
207
- "eos_token": "<|endoftext|>",
208
  "model_max_length": 1024,
209
- "pad_token": "<|endoftext|>",
210
- "tokenizer_class": "GPTNeoXTokenizer",
211
- "unk_token": "<|endoftext|>"
 
 
212
  }
 
1
  {
 
2
  "added_tokens_decoder": {
3
  "0": {
4
+ "content": "<unk>",
5
  "lstrip": false,
6
+ "normalized": true,
7
  "rstrip": false,
8
  "single_word": false,
9
  "special": true
10
  },
11
  "1": {
12
+ "content": "<s>",
 
 
 
 
 
 
 
 
13
  "lstrip": false,
14
  "normalized": true,
15
  "rstrip": false,
16
  "single_word": false,
17
+ "special": true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  },
19
+ "2": {
20
+ "content": "</s>",
21
  "lstrip": false,
22
  "normalized": true,
23
  "rstrip": false,
24
  "single_word": false,
25
+ "special": true
26
  }
27
  },
28
+ "bos_token": "<s>",
29
+ "clean_up_tokenization_spaces": false,
30
+ "eos_token": "</s>",
31
  "model_max_length": 1024,
32
+ "pad_token": "</s>",
33
+ "sp_model_kwargs": {},
34
+ "tokenizer_class": "LlamaTokenizer",
35
+ "unk_token": "<unk>",
36
+ "use_default_system_prompt": false
37
  }
trainer_state.json CHANGED
@@ -1,313 +1,271 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 2.955223880597015,
5
  "eval_steps": 500,
6
- "global_step": 99,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.06,
13
- "learning_rate": 4e-05,
14
- "loss": 2.0227,
15
  "step": 2
16
  },
17
  {
18
- "epoch": 0.12,
19
- "learning_rate": 8e-05,
20
- "loss": 2.1777,
21
  "step": 4
22
  },
23
  {
24
- "epoch": 0.18,
25
- "learning_rate": 0.00012,
26
- "loss": 1.5886,
27
  "step": 6
28
  },
29
  {
30
- "epoch": 0.24,
31
- "learning_rate": 0.00016,
32
- "loss": 1.7764,
33
  "step": 8
34
  },
35
  {
36
- "epoch": 0.3,
37
- "learning_rate": 0.0002,
38
- "loss": 1.8589,
39
  "step": 10
40
  },
41
  {
42
- "epoch": 0.36,
43
- "learning_rate": 0.0001955056179775281,
44
- "loss": 1.8044,
45
  "step": 12
46
  },
47
  {
48
- "epoch": 0.42,
49
- "learning_rate": 0.00019101123595505618,
50
- "loss": 1.3535,
51
  "step": 14
52
  },
53
  {
54
- "epoch": 0.48,
55
- "learning_rate": 0.00018651685393258427,
56
- "loss": 1.5432,
57
  "step": 16
58
  },
59
  {
60
- "epoch": 0.54,
61
- "learning_rate": 0.00018202247191011236,
62
- "loss": 1.4256,
63
  "step": 18
64
  },
65
  {
66
- "epoch": 0.6,
67
- "learning_rate": 0.00017752808988764045,
68
- "loss": 1.3982,
69
  "step": 20
70
  },
71
  {
72
- "epoch": 0.66,
73
- "learning_rate": 0.00017303370786516853,
74
- "loss": 1.2797,
75
  "step": 22
76
  },
77
  {
78
- "epoch": 0.72,
79
- "learning_rate": 0.00016853932584269662,
80
- "loss": 1.5707,
81
  "step": 24
82
  },
83
  {
84
- "epoch": 0.78,
85
- "learning_rate": 0.00016404494382022474,
86
- "loss": 1.5817,
87
  "step": 26
88
  },
89
  {
90
- "epoch": 0.84,
91
- "learning_rate": 0.0001595505617977528,
92
- "loss": 1.4543,
93
  "step": 28
94
  },
95
  {
96
- "epoch": 0.9,
97
- "learning_rate": 0.0001550561797752809,
98
- "loss": 1.7071,
99
  "step": 30
100
  },
101
  {
102
- "epoch": 0.96,
103
- "learning_rate": 0.000150561797752809,
104
- "loss": 1.8541,
105
  "step": 32
106
  },
107
  {
108
- "epoch": 1.01,
109
- "learning_rate": 0.0001460674157303371,
110
- "loss": 2.0083,
111
  "step": 34
112
  },
113
  {
114
- "epoch": 1.07,
115
- "learning_rate": 0.00014157303370786517,
116
- "loss": 1.5323,
117
  "step": 36
118
  },
119
  {
120
- "epoch": 1.13,
121
- "learning_rate": 0.00013707865168539326,
122
- "loss": 1.6096,
123
  "step": 38
124
  },
125
  {
126
- "epoch": 1.19,
127
- "learning_rate": 0.00013258426966292135,
128
- "loss": 1.7131,
129
  "step": 40
130
  },
131
  {
132
- "epoch": 1.25,
133
- "learning_rate": 0.00012808988764044944,
134
- "loss": 1.5119,
135
  "step": 42
136
  },
137
  {
138
- "epoch": 1.31,
139
- "learning_rate": 0.00012359550561797752,
140
- "loss": 1.1917,
141
  "step": 44
142
  },
143
  {
144
- "epoch": 1.37,
145
- "learning_rate": 0.00011910112359550563,
146
- "loss": 1.3534,
147
  "step": 46
148
  },
149
  {
150
- "epoch": 1.43,
151
- "learning_rate": 0.0001146067415730337,
152
- "loss": 1.2448,
153
  "step": 48
154
  },
155
  {
156
- "epoch": 1.49,
157
- "learning_rate": 0.0001101123595505618,
158
- "loss": 1.4007,
159
  "step": 50
160
  },
161
  {
162
- "epoch": 1.55,
163
- "learning_rate": 0.00010561797752808989,
164
- "loss": 1.4253,
165
  "step": 52
166
  },
167
  {
168
- "epoch": 1.61,
169
- "learning_rate": 0.00010112359550561799,
170
- "loss": 1.7311,
171
  "step": 54
172
  },
173
  {
174
- "epoch": 1.67,
175
- "learning_rate": 9.662921348314608e-05,
176
- "loss": 1.0368,
177
  "step": 56
178
  },
179
  {
180
- "epoch": 1.73,
181
- "learning_rate": 9.213483146067416e-05,
182
- "loss": 1.3187,
183
  "step": 58
184
  },
185
  {
186
- "epoch": 1.79,
187
- "learning_rate": 8.764044943820225e-05,
188
- "loss": 1.2071,
189
  "step": 60
190
  },
191
  {
192
- "epoch": 1.85,
193
- "learning_rate": 8.314606741573034e-05,
194
- "loss": 1.764,
195
  "step": 62
196
  },
197
  {
198
- "epoch": 1.91,
199
- "learning_rate": 7.865168539325843e-05,
200
- "loss": 1.4301,
201
  "step": 64
202
  },
203
  {
204
- "epoch": 1.97,
205
- "learning_rate": 7.415730337078653e-05,
206
- "loss": 1.586,
207
  "step": 66
208
  },
209
  {
210
- "epoch": 2.03,
211
- "learning_rate": 6.966292134831462e-05,
212
- "loss": 1.5166,
213
  "step": 68
214
  },
215
  {
216
- "epoch": 2.09,
217
- "learning_rate": 6.51685393258427e-05,
218
- "loss": 1.4832,
219
  "step": 70
220
  },
221
  {
222
- "epoch": 2.15,
223
- "learning_rate": 6.067415730337079e-05,
224
- "loss": 1.3071,
225
  "step": 72
226
  },
227
  {
228
- "epoch": 2.21,
229
- "learning_rate": 5.6179775280898885e-05,
230
- "loss": 1.3457,
231
  "step": 74
232
  },
233
  {
234
- "epoch": 2.27,
235
- "learning_rate": 5.168539325842697e-05,
236
- "loss": 1.1732,
237
  "step": 76
238
  },
239
  {
240
- "epoch": 2.33,
241
- "learning_rate": 4.719101123595506e-05,
242
- "loss": 1.2885,
243
  "step": 78
244
  },
245
  {
246
- "epoch": 2.39,
247
- "learning_rate": 4.269662921348315e-05,
248
- "loss": 1.0507,
249
  "step": 80
250
  },
251
  {
252
- "epoch": 2.45,
253
- "learning_rate": 3.8202247191011236e-05,
254
- "loss": 1.4467,
255
  "step": 82
256
  },
257
  {
258
- "epoch": 2.51,
259
- "learning_rate": 3.370786516853933e-05,
260
- "loss": 1.5236,
261
  "step": 84
262
- },
263
- {
264
- "epoch": 2.57,
265
- "learning_rate": 2.9213483146067417e-05,
266
- "loss": 1.1199,
267
- "step": 86
268
- },
269
- {
270
- "epoch": 2.63,
271
- "learning_rate": 2.4719101123595505e-05,
272
- "loss": 1.4098,
273
- "step": 88
274
- },
275
- {
276
- "epoch": 2.69,
277
- "learning_rate": 2.0224719101123596e-05,
278
- "loss": 1.2576,
279
- "step": 90
280
- },
281
- {
282
- "epoch": 2.75,
283
- "learning_rate": 1.5730337078651687e-05,
284
- "loss": 1.2179,
285
- "step": 92
286
- },
287
- {
288
- "epoch": 2.81,
289
- "learning_rate": 1.1235955056179776e-05,
290
- "loss": 1.6813,
291
- "step": 94
292
- },
293
- {
294
- "epoch": 2.87,
295
- "learning_rate": 6.741573033707865e-06,
296
- "loss": 1.3637,
297
- "step": 96
298
- },
299
- {
300
- "epoch": 2.93,
301
- "learning_rate": 2.247191011235955e-06,
302
- "loss": 1.3635,
303
- "step": 98
304
  }
305
  ],
306
  "logging_steps": 2,
307
- "max_steps": 99,
308
  "num_train_epochs": 3,
309
  "save_steps": 500,
310
- "total_flos": 2.619520731788083e+16,
311
  "trial_name": null,
312
  "trial_params": null
313
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 2.90280777537797,
5
  "eval_steps": 500,
6
+ "global_step": 84,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.07,
13
+ "learning_rate": 4.4444444444444447e-05,
14
+ "loss": 1.9603,
15
  "step": 2
16
  },
17
  {
18
+ "epoch": 0.14,
19
+ "learning_rate": 8.888888888888889e-05,
20
+ "loss": 1.9408,
21
  "step": 4
22
  },
23
  {
24
+ "epoch": 0.21,
25
+ "learning_rate": 0.00013333333333333334,
26
+ "loss": 1.9536,
27
  "step": 6
28
  },
29
  {
30
+ "epoch": 0.28,
31
+ "learning_rate": 0.00017777777777777779,
32
+ "loss": 1.9137,
33
  "step": 8
34
  },
35
  {
36
+ "epoch": 0.35,
37
+ "learning_rate": 0.00019733333333333335,
38
+ "loss": 1.8533,
39
  "step": 10
40
  },
41
  {
42
+ "epoch": 0.41,
43
+ "learning_rate": 0.000192,
44
+ "loss": 1.9176,
45
  "step": 12
46
  },
47
  {
48
+ "epoch": 0.48,
49
+ "learning_rate": 0.0001866666666666667,
50
+ "loss": 1.7541,
51
  "step": 14
52
  },
53
  {
54
+ "epoch": 0.55,
55
+ "learning_rate": 0.00018133333333333334,
56
+ "loss": 1.749,
57
  "step": 16
58
  },
59
  {
60
+ "epoch": 0.62,
61
+ "learning_rate": 0.00017600000000000002,
62
+ "loss": 1.7884,
63
  "step": 18
64
  },
65
  {
66
+ "epoch": 0.69,
67
+ "learning_rate": 0.00017066666666666668,
68
+ "loss": 1.7902,
69
  "step": 20
70
  },
71
  {
72
+ "epoch": 0.76,
73
+ "learning_rate": 0.00016533333333333333,
74
+ "loss": 1.7846,
75
  "step": 22
76
  },
77
  {
78
+ "epoch": 0.83,
79
+ "learning_rate": 0.00016,
80
+ "loss": 1.8141,
81
  "step": 24
82
  },
83
  {
84
+ "epoch": 0.9,
85
+ "learning_rate": 0.00015466666666666667,
86
+ "loss": 1.7497,
87
  "step": 26
88
  },
89
  {
90
+ "epoch": 0.97,
91
+ "learning_rate": 0.00014933333333333335,
92
+ "loss": 1.6854,
93
  "step": 28
94
  },
95
  {
96
+ "epoch": 1.04,
97
+ "learning_rate": 0.000144,
98
+ "loss": 1.7584,
99
  "step": 30
100
  },
101
  {
102
+ "epoch": 1.11,
103
+ "learning_rate": 0.00013866666666666669,
104
+ "loss": 1.7144,
105
  "step": 32
106
  },
107
  {
108
+ "epoch": 1.17,
109
+ "learning_rate": 0.00013333333333333334,
110
+ "loss": 1.8151,
111
  "step": 34
112
  },
113
  {
114
+ "epoch": 1.24,
115
+ "learning_rate": 0.00012800000000000002,
116
+ "loss": 1.7134,
117
  "step": 36
118
  },
119
  {
120
+ "epoch": 1.31,
121
+ "learning_rate": 0.00012266666666666668,
122
+ "loss": 1.6403,
123
  "step": 38
124
  },
125
  {
126
+ "epoch": 1.38,
127
+ "learning_rate": 0.00011733333333333334,
128
+ "loss": 1.7344,
129
  "step": 40
130
  },
131
  {
132
+ "epoch": 1.45,
133
+ "learning_rate": 0.00011200000000000001,
134
+ "loss": 1.5999,
135
  "step": 42
136
  },
137
  {
138
+ "epoch": 1.52,
139
+ "learning_rate": 0.00010666666666666667,
140
+ "loss": 1.7983,
141
  "step": 44
142
  },
143
  {
144
+ "epoch": 1.59,
145
+ "learning_rate": 0.00010133333333333335,
146
+ "loss": 1.6534,
147
  "step": 46
148
  },
149
  {
150
+ "epoch": 1.66,
151
+ "learning_rate": 9.6e-05,
152
+ "loss": 1.6992,
153
  "step": 48
154
  },
155
  {
156
+ "epoch": 1.73,
157
+ "learning_rate": 9.066666666666667e-05,
158
+ "loss": 1.6868,
159
  "step": 50
160
  },
161
  {
162
+ "epoch": 1.8,
163
+ "learning_rate": 8.533333333333334e-05,
164
+ "loss": 1.6562,
165
  "step": 52
166
  },
167
  {
168
+ "epoch": 1.87,
169
+ "learning_rate": 8e-05,
170
+ "loss": 1.796,
171
  "step": 54
172
  },
173
  {
174
+ "epoch": 1.94,
175
+ "learning_rate": 7.466666666666667e-05,
176
+ "loss": 1.7124,
177
  "step": 56
178
  },
179
  {
180
+ "epoch": 2.0,
181
+ "learning_rate": 6.933333333333334e-05,
182
+ "loss": 1.6639,
183
  "step": 58
184
  },
185
  {
186
+ "epoch": 2.07,
187
+ "learning_rate": 6.400000000000001e-05,
188
+ "loss": 1.7065,
189
  "step": 60
190
  },
191
  {
192
+ "epoch": 2.14,
193
+ "learning_rate": 5.866666666666667e-05,
194
+ "loss": 1.7584,
195
  "step": 62
196
  },
197
  {
198
+ "epoch": 2.21,
199
+ "learning_rate": 5.333333333333333e-05,
200
+ "loss": 1.4923,
201
  "step": 64
202
  },
203
  {
204
+ "epoch": 2.28,
205
+ "learning_rate": 4.8e-05,
206
+ "loss": 1.6243,
207
  "step": 66
208
  },
209
  {
210
+ "epoch": 2.35,
211
+ "learning_rate": 4.266666666666667e-05,
212
+ "loss": 1.6932,
213
  "step": 68
214
  },
215
  {
216
+ "epoch": 2.42,
217
+ "learning_rate": 3.733333333333334e-05,
218
+ "loss": 1.519,
219
  "step": 70
220
  },
221
  {
222
+ "epoch": 2.49,
223
+ "learning_rate": 3.2000000000000005e-05,
224
+ "loss": 1.6789,
225
  "step": 72
226
  },
227
  {
228
+ "epoch": 2.56,
229
+ "learning_rate": 2.6666666666666667e-05,
230
+ "loss": 1.6725,
231
  "step": 74
232
  },
233
  {
234
+ "epoch": 2.63,
235
+ "learning_rate": 2.1333333333333335e-05,
236
+ "loss": 1.6314,
237
  "step": 76
238
  },
239
  {
240
+ "epoch": 2.7,
241
+ "learning_rate": 1.6000000000000003e-05,
242
+ "loss": 1.6345,
243
  "step": 78
244
  },
245
  {
246
+ "epoch": 2.76,
247
+ "learning_rate": 1.0666666666666667e-05,
248
+ "loss": 1.6166,
249
  "step": 80
250
  },
251
  {
252
+ "epoch": 2.83,
253
+ "learning_rate": 5.333333333333334e-06,
254
+ "loss": 1.7619,
255
  "step": 82
256
  },
257
  {
258
+ "epoch": 2.9,
259
+ "learning_rate": 0.0,
260
+ "loss": 1.6937,
261
  "step": 84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
  }
263
  ],
264
  "logging_steps": 2,
265
+ "max_steps": 84,
266
  "num_train_epochs": 3,
267
  "save_steps": 500,
268
+ "total_flos": 5.489048895986074e+16,
269
  "trial_name": null,
270
  "trial_params": null
271
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:667639befa7411dfffdec199ef1e6179390d304e4531677c185ca673c703bc09
3
- size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb593eff79eb8b030f1ca704ef5454d10e923dfa13dd87fc9f5f719c09a971e5
3
+ size 4091