learn3r commited on
Commit
a2b5bb6
1 Parent(s): f764e74

End of training

Browse files
Files changed (5) hide show
  1. README.md +3 -1
  2. all_results.json +13 -0
  3. eval_results.json +8 -0
  4. train_results.json +8 -0
  5. trainer_state.json +4262 -0
README.md CHANGED
@@ -3,6 +3,8 @@ license: apache-2.0
3
  base_model: google/long-t5-tglobal-xl
4
  tags:
5
  - generated_from_trainer
 
 
6
  model-index:
7
  - name: longt5_xl_sfd_memsum_40
8
  results: []
@@ -13,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # longt5_xl_sfd_memsum_40
15
 
16
- This model is a fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 5.2679
19
 
 
3
  base_model: google/long-t5-tglobal-xl
4
  tags:
5
  - generated_from_trainer
6
+ datasets:
7
+ - learn3r/summ_screen_memsum_oracle
8
  model-index:
9
  - name: longt5_xl_sfd_memsum_40
10
  results: []
 
15
 
16
  # longt5_xl_sfd_memsum_40
17
 
18
+ This model is a fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the learn3r/summ_screen_memsum_oracle dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 5.2679
21
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 38.96,
3
+ "eval_loss": 5.267875671386719,
4
+ "eval_runtime": 14.0885,
5
+ "eval_samples": 338,
6
+ "eval_samples_per_second": 23.991,
7
+ "eval_steps_per_second": 3.052,
8
+ "train_loss": 0.32971865670822026,
9
+ "train_runtime": 25027.7089,
10
+ "train_samples": 3673,
11
+ "train_samples_per_second": 5.87,
12
+ "train_steps_per_second": 0.045
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 38.96,
3
+ "eval_loss": 5.267875671386719,
4
+ "eval_runtime": 14.0885,
5
+ "eval_samples": 338,
6
+ "eval_samples_per_second": 23.991,
7
+ "eval_steps_per_second": 3.052
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 38.96,
3
+ "train_loss": 0.32971865670822026,
4
+ "train_runtime": 25027.7089,
5
+ "train_samples": 3673,
6
+ "train_samples_per_second": 5.87,
7
+ "train_steps_per_second": 0.045
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,4262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 2.2837178707122803,
3
+ "best_model_checkpoint": "/exports/eddie/scratch/s1970716/models/longt5_xl_sfd_memsum_40/checkpoint-57",
4
+ "epoch": 38.95652173913044,
5
+ "eval_steps": 500,
6
+ "global_step": 1120,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.07,
13
+ "grad_norm": 1.5414308309555054,
14
+ "learning_rate": 0.001,
15
+ "loss": 3.1508,
16
+ "step": 2
17
+ },
18
+ {
19
+ "epoch": 0.14,
20
+ "grad_norm": 1.1113317012786865,
21
+ "learning_rate": 0.001,
22
+ "loss": 3.2124,
23
+ "step": 4
24
+ },
25
+ {
26
+ "epoch": 0.21,
27
+ "grad_norm": 0.7978044152259827,
28
+ "learning_rate": 0.001,
29
+ "loss": 2.9466,
30
+ "step": 6
31
+ },
32
+ {
33
+ "epoch": 0.28,
34
+ "grad_norm": 0.6611266136169434,
35
+ "learning_rate": 0.001,
36
+ "loss": 2.8914,
37
+ "step": 8
38
+ },
39
+ {
40
+ "epoch": 0.35,
41
+ "grad_norm": 0.6737671494483948,
42
+ "learning_rate": 0.001,
43
+ "loss": 2.7507,
44
+ "step": 10
45
+ },
46
+ {
47
+ "epoch": 0.42,
48
+ "grad_norm": 0.41579362750053406,
49
+ "learning_rate": 0.001,
50
+ "loss": 2.6653,
51
+ "step": 12
52
+ },
53
+ {
54
+ "epoch": 0.49,
55
+ "grad_norm": 0.37621861696243286,
56
+ "learning_rate": 0.001,
57
+ "loss": 2.579,
58
+ "step": 14
59
+ },
60
+ {
61
+ "epoch": 0.56,
62
+ "grad_norm": 0.581123411655426,
63
+ "learning_rate": 0.001,
64
+ "loss": 2.631,
65
+ "step": 16
66
+ },
67
+ {
68
+ "epoch": 0.63,
69
+ "grad_norm": 0.6653873920440674,
70
+ "learning_rate": 0.001,
71
+ "loss": 2.6978,
72
+ "step": 18
73
+ },
74
+ {
75
+ "epoch": 0.7,
76
+ "grad_norm": 0.45471808314323425,
77
+ "learning_rate": 0.001,
78
+ "loss": 2.6541,
79
+ "step": 20
80
+ },
81
+ {
82
+ "epoch": 0.77,
83
+ "grad_norm": 0.38716837763786316,
84
+ "learning_rate": 0.001,
85
+ "loss": 2.5325,
86
+ "step": 22
87
+ },
88
+ {
89
+ "epoch": 0.83,
90
+ "grad_norm": 0.40614280104637146,
91
+ "learning_rate": 0.001,
92
+ "loss": 2.5584,
93
+ "step": 24
94
+ },
95
+ {
96
+ "epoch": 0.9,
97
+ "grad_norm": 0.3157476484775543,
98
+ "learning_rate": 0.001,
99
+ "loss": 2.5054,
100
+ "step": 26
101
+ },
102
+ {
103
+ "epoch": 0.97,
104
+ "grad_norm": 0.3763532042503357,
105
+ "learning_rate": 0.001,
106
+ "loss": 2.5238,
107
+ "step": 28
108
+ },
109
+ {
110
+ "epoch": 0.97,
111
+ "eval_loss": 2.314685106277466,
112
+ "eval_runtime": 14.2455,
113
+ "eval_samples_per_second": 23.727,
114
+ "eval_steps_per_second": 3.018,
115
+ "step": 28
116
+ },
117
+ {
118
+ "epoch": 1.04,
119
+ "grad_norm": 0.3202365040779114,
120
+ "learning_rate": 0.001,
121
+ "loss": 2.3018,
122
+ "step": 30
123
+ },
124
+ {
125
+ "epoch": 1.11,
126
+ "grad_norm": 0.4395117461681366,
127
+ "learning_rate": 0.001,
128
+ "loss": 2.1284,
129
+ "step": 32
130
+ },
131
+ {
132
+ "epoch": 1.18,
133
+ "grad_norm": 0.47938525676727295,
134
+ "learning_rate": 0.001,
135
+ "loss": 2.1782,
136
+ "step": 34
137
+ },
138
+ {
139
+ "epoch": 1.25,
140
+ "grad_norm": 1.0489932298660278,
141
+ "learning_rate": 0.001,
142
+ "loss": 2.1666,
143
+ "step": 36
144
+ },
145
+ {
146
+ "epoch": 1.32,
147
+ "grad_norm": 0.34978240728378296,
148
+ "learning_rate": 0.001,
149
+ "loss": 2.1598,
150
+ "step": 38
151
+ },
152
+ {
153
+ "epoch": 1.39,
154
+ "grad_norm": 0.28448227047920227,
155
+ "learning_rate": 0.001,
156
+ "loss": 2.023,
157
+ "step": 40
158
+ },
159
+ {
160
+ "epoch": 1.46,
161
+ "grad_norm": 0.3449482023715973,
162
+ "learning_rate": 0.001,
163
+ "loss": 2.0545,
164
+ "step": 42
165
+ },
166
+ {
167
+ "epoch": 1.53,
168
+ "grad_norm": 0.6303105354309082,
169
+ "learning_rate": 0.001,
170
+ "loss": 2.1264,
171
+ "step": 44
172
+ },
173
+ {
174
+ "epoch": 1.6,
175
+ "grad_norm": 0.3750413656234741,
176
+ "learning_rate": 0.001,
177
+ "loss": 2.1176,
178
+ "step": 46
179
+ },
180
+ {
181
+ "epoch": 1.67,
182
+ "grad_norm": 0.44775936007499695,
183
+ "learning_rate": 0.001,
184
+ "loss": 2.1624,
185
+ "step": 48
186
+ },
187
+ {
188
+ "epoch": 1.74,
189
+ "grad_norm": 0.34849441051483154,
190
+ "learning_rate": 0.001,
191
+ "loss": 2.0809,
192
+ "step": 50
193
+ },
194
+ {
195
+ "epoch": 1.81,
196
+ "grad_norm": 0.28750795125961304,
197
+ "learning_rate": 0.001,
198
+ "loss": 2.0913,
199
+ "step": 52
200
+ },
201
+ {
202
+ "epoch": 1.88,
203
+ "grad_norm": 0.2710261940956116,
204
+ "learning_rate": 0.001,
205
+ "loss": 2.0855,
206
+ "step": 54
207
+ },
208
+ {
209
+ "epoch": 1.95,
210
+ "grad_norm": 0.28888803720474243,
211
+ "learning_rate": 0.001,
212
+ "loss": 2.1298,
213
+ "step": 56
214
+ },
215
+ {
216
+ "epoch": 1.98,
217
+ "eval_loss": 2.2837178707122803,
218
+ "eval_runtime": 14.2329,
219
+ "eval_samples_per_second": 23.748,
220
+ "eval_steps_per_second": 3.021,
221
+ "step": 57
222
+ },
223
+ {
224
+ "epoch": 2.02,
225
+ "grad_norm": 0.2953917682170868,
226
+ "learning_rate": 0.001,
227
+ "loss": 1.9808,
228
+ "step": 58
229
+ },
230
+ {
231
+ "epoch": 2.09,
232
+ "grad_norm": 2.0295698642730713,
233
+ "learning_rate": 0.001,
234
+ "loss": 1.7208,
235
+ "step": 60
236
+ },
237
+ {
238
+ "epoch": 2.16,
239
+ "grad_norm": 0.4198295772075653,
240
+ "learning_rate": 0.001,
241
+ "loss": 1.7108,
242
+ "step": 62
243
+ },
244
+ {
245
+ "epoch": 2.23,
246
+ "grad_norm": 0.3806304335594177,
247
+ "learning_rate": 0.001,
248
+ "loss": 1.677,
249
+ "step": 64
250
+ },
251
+ {
252
+ "epoch": 2.3,
253
+ "grad_norm": 0.36914414167404175,
254
+ "learning_rate": 0.001,
255
+ "loss": 1.6717,
256
+ "step": 66
257
+ },
258
+ {
259
+ "epoch": 2.37,
260
+ "grad_norm": 0.34834641218185425,
261
+ "learning_rate": 0.001,
262
+ "loss": 1.6516,
263
+ "step": 68
264
+ },
265
+ {
266
+ "epoch": 2.43,
267
+ "grad_norm": 0.3333311080932617,
268
+ "learning_rate": 0.001,
269
+ "loss": 1.6677,
270
+ "step": 70
271
+ },
272
+ {
273
+ "epoch": 2.5,
274
+ "grad_norm": 0.41852736473083496,
275
+ "learning_rate": 0.001,
276
+ "loss": 1.6135,
277
+ "step": 72
278
+ },
279
+ {
280
+ "epoch": 2.57,
281
+ "grad_norm": 0.40690872073173523,
282
+ "learning_rate": 0.001,
283
+ "loss": 1.6447,
284
+ "step": 74
285
+ },
286
+ {
287
+ "epoch": 2.64,
288
+ "grad_norm": 0.4816026985645294,
289
+ "learning_rate": 0.001,
290
+ "loss": 1.7096,
291
+ "step": 76
292
+ },
293
+ {
294
+ "epoch": 2.71,
295
+ "grad_norm": 0.8278970122337341,
296
+ "learning_rate": 0.001,
297
+ "loss": 1.6721,
298
+ "step": 78
299
+ },
300
+ {
301
+ "epoch": 2.78,
302
+ "grad_norm": 0.27179405093193054,
303
+ "learning_rate": 0.001,
304
+ "loss": 1.6351,
305
+ "step": 80
306
+ },
307
+ {
308
+ "epoch": 2.85,
309
+ "grad_norm": 0.2935534417629242,
310
+ "learning_rate": 0.001,
311
+ "loss": 1.6402,
312
+ "step": 82
313
+ },
314
+ {
315
+ "epoch": 2.92,
316
+ "grad_norm": 0.3557533919811249,
317
+ "learning_rate": 0.001,
318
+ "loss": 1.6755,
319
+ "step": 84
320
+ },
321
+ {
322
+ "epoch": 2.99,
323
+ "grad_norm": 0.3663090169429779,
324
+ "learning_rate": 0.001,
325
+ "loss": 1.7525,
326
+ "step": 86
327
+ },
328
+ {
329
+ "epoch": 2.99,
330
+ "eval_loss": 2.333530902862549,
331
+ "eval_runtime": 14.2215,
332
+ "eval_samples_per_second": 23.767,
333
+ "eval_steps_per_second": 3.024,
334
+ "step": 86
335
+ },
336
+ {
337
+ "epoch": 3.06,
338
+ "grad_norm": 0.37869980931282043,
339
+ "learning_rate": 0.001,
340
+ "loss": 1.3103,
341
+ "step": 88
342
+ },
343
+ {
344
+ "epoch": 3.13,
345
+ "grad_norm": 0.32068952918052673,
346
+ "learning_rate": 0.001,
347
+ "loss": 1.2763,
348
+ "step": 90
349
+ },
350
+ {
351
+ "epoch": 3.2,
352
+ "grad_norm": 0.33284562826156616,
353
+ "learning_rate": 0.001,
354
+ "loss": 1.2235,
355
+ "step": 92
356
+ },
357
+ {
358
+ "epoch": 3.27,
359
+ "grad_norm": 0.4027257561683655,
360
+ "learning_rate": 0.001,
361
+ "loss": 1.2606,
362
+ "step": 94
363
+ },
364
+ {
365
+ "epoch": 3.34,
366
+ "grad_norm": 0.8390576839447021,
367
+ "learning_rate": 0.001,
368
+ "loss": 1.2426,
369
+ "step": 96
370
+ },
371
+ {
372
+ "epoch": 3.41,
373
+ "grad_norm": 0.3938799798488617,
374
+ "learning_rate": 0.001,
375
+ "loss": 1.2499,
376
+ "step": 98
377
+ },
378
+ {
379
+ "epoch": 3.48,
380
+ "grad_norm": 0.39702850580215454,
381
+ "learning_rate": 0.001,
382
+ "loss": 1.3165,
383
+ "step": 100
384
+ },
385
+ {
386
+ "epoch": 3.55,
387
+ "grad_norm": 0.4196929633617401,
388
+ "learning_rate": 0.001,
389
+ "loss": 1.2884,
390
+ "step": 102
391
+ },
392
+ {
393
+ "epoch": 3.62,
394
+ "grad_norm": 0.4435763657093048,
395
+ "learning_rate": 0.001,
396
+ "loss": 1.3407,
397
+ "step": 104
398
+ },
399
+ {
400
+ "epoch": 3.69,
401
+ "grad_norm": 0.3989652395248413,
402
+ "learning_rate": 0.001,
403
+ "loss": 1.3119,
404
+ "step": 106
405
+ },
406
+ {
407
+ "epoch": 3.76,
408
+ "grad_norm": 0.35742542147636414,
409
+ "learning_rate": 0.001,
410
+ "loss": 1.2882,
411
+ "step": 108
412
+ },
413
+ {
414
+ "epoch": 3.83,
415
+ "grad_norm": 0.3512905538082123,
416
+ "learning_rate": 0.001,
417
+ "loss": 1.276,
418
+ "step": 110
419
+ },
420
+ {
421
+ "epoch": 3.9,
422
+ "grad_norm": 0.35384494066238403,
423
+ "learning_rate": 0.001,
424
+ "loss": 1.31,
425
+ "step": 112
426
+ },
427
+ {
428
+ "epoch": 3.97,
429
+ "grad_norm": 0.38055282831192017,
430
+ "learning_rate": 0.001,
431
+ "loss": 1.2954,
432
+ "step": 114
433
+ },
434
+ {
435
+ "epoch": 4.0,
436
+ "eval_loss": 2.4995365142822266,
437
+ "eval_runtime": 14.2261,
438
+ "eval_samples_per_second": 23.759,
439
+ "eval_steps_per_second": 3.023,
440
+ "step": 115
441
+ },
442
+ {
443
+ "epoch": 4.03,
444
+ "grad_norm": 0.33426040410995483,
445
+ "learning_rate": 0.001,
446
+ "loss": 1.0702,
447
+ "step": 116
448
+ },
449
+ {
450
+ "epoch": 4.1,
451
+ "grad_norm": 0.3915540277957916,
452
+ "learning_rate": 0.001,
453
+ "loss": 0.93,
454
+ "step": 118
455
+ },
456
+ {
457
+ "epoch": 4.17,
458
+ "grad_norm": 0.3885715901851654,
459
+ "learning_rate": 0.001,
460
+ "loss": 0.8923,
461
+ "step": 120
462
+ },
463
+ {
464
+ "epoch": 4.24,
465
+ "grad_norm": 0.3450148105621338,
466
+ "learning_rate": 0.001,
467
+ "loss": 0.9367,
468
+ "step": 122
469
+ },
470
+ {
471
+ "epoch": 4.31,
472
+ "grad_norm": 0.34307780861854553,
473
+ "learning_rate": 0.001,
474
+ "loss": 0.9471,
475
+ "step": 124
476
+ },
477
+ {
478
+ "epoch": 4.38,
479
+ "grad_norm": 0.3358236253261566,
480
+ "learning_rate": 0.001,
481
+ "loss": 0.9597,
482
+ "step": 126
483
+ },
484
+ {
485
+ "epoch": 4.45,
486
+ "grad_norm": 0.32632964849472046,
487
+ "learning_rate": 0.001,
488
+ "loss": 0.9678,
489
+ "step": 128
490
+ },
491
+ {
492
+ "epoch": 4.52,
493
+ "grad_norm": 0.3462707996368408,
494
+ "learning_rate": 0.001,
495
+ "loss": 0.975,
496
+ "step": 130
497
+ },
498
+ {
499
+ "epoch": 4.59,
500
+ "grad_norm": 0.4402617812156677,
501
+ "learning_rate": 0.001,
502
+ "loss": 0.9908,
503
+ "step": 132
504
+ },
505
+ {
506
+ "epoch": 4.66,
507
+ "grad_norm": 0.4284449815750122,
508
+ "learning_rate": 0.001,
509
+ "loss": 0.9721,
510
+ "step": 134
511
+ },
512
+ {
513
+ "epoch": 4.73,
514
+ "grad_norm": 0.3820173144340515,
515
+ "learning_rate": 0.001,
516
+ "loss": 0.983,
517
+ "step": 136
518
+ },
519
+ {
520
+ "epoch": 4.8,
521
+ "grad_norm": 0.5288845300674438,
522
+ "learning_rate": 0.001,
523
+ "loss": 0.9376,
524
+ "step": 138
525
+ },
526
+ {
527
+ "epoch": 4.87,
528
+ "grad_norm": 0.489346981048584,
529
+ "learning_rate": 0.001,
530
+ "loss": 0.9801,
531
+ "step": 140
532
+ },
533
+ {
534
+ "epoch": 4.94,
535
+ "grad_norm": 0.4548284709453583,
536
+ "learning_rate": 0.001,
537
+ "loss": 1.0518,
538
+ "step": 142
539
+ },
540
+ {
541
+ "epoch": 4.97,
542
+ "eval_loss": 2.832628011703491,
543
+ "eval_runtime": 14.24,
544
+ "eval_samples_per_second": 23.736,
545
+ "eval_steps_per_second": 3.02,
546
+ "step": 143
547
+ },
548
+ {
549
+ "epoch": 5.01,
550
+ "grad_norm": 0.4093753397464752,
551
+ "learning_rate": 0.001,
552
+ "loss": 0.9381,
553
+ "step": 144
554
+ },
555
+ {
556
+ "epoch": 5.08,
557
+ "grad_norm": 0.5986877679824829,
558
+ "learning_rate": 0.001,
559
+ "loss": 0.6649,
560
+ "step": 146
561
+ },
562
+ {
563
+ "epoch": 5.15,
564
+ "grad_norm": 0.4789908230304718,
565
+ "learning_rate": 0.001,
566
+ "loss": 0.6925,
567
+ "step": 148
568
+ },
569
+ {
570
+ "epoch": 5.22,
571
+ "grad_norm": 0.4392305612564087,
572
+ "learning_rate": 0.001,
573
+ "loss": 0.6901,
574
+ "step": 150
575
+ },
576
+ {
577
+ "epoch": 5.29,
578
+ "grad_norm": 0.49618610739707947,
579
+ "learning_rate": 0.001,
580
+ "loss": 0.7173,
581
+ "step": 152
582
+ },
583
+ {
584
+ "epoch": 5.36,
585
+ "grad_norm": 0.513729453086853,
586
+ "learning_rate": 0.001,
587
+ "loss": 0.6951,
588
+ "step": 154
589
+ },
590
+ {
591
+ "epoch": 5.43,
592
+ "grad_norm": 0.4622480571269989,
593
+ "learning_rate": 0.001,
594
+ "loss": 0.6757,
595
+ "step": 156
596
+ },
597
+ {
598
+ "epoch": 5.5,
599
+ "grad_norm": 0.4192960560321808,
600
+ "learning_rate": 0.001,
601
+ "loss": 0.7125,
602
+ "step": 158
603
+ },
604
+ {
605
+ "epoch": 5.57,
606
+ "grad_norm": 0.3887766897678375,
607
+ "learning_rate": 0.001,
608
+ "loss": 0.6905,
609
+ "step": 160
610
+ },
611
+ {
612
+ "epoch": 5.63,
613
+ "grad_norm": 0.3633786141872406,
614
+ "learning_rate": 0.001,
615
+ "loss": 0.73,
616
+ "step": 162
617
+ },
618
+ {
619
+ "epoch": 5.7,
620
+ "grad_norm": 0.4010884761810303,
621
+ "learning_rate": 0.001,
622
+ "loss": 0.7473,
623
+ "step": 164
624
+ },
625
+ {
626
+ "epoch": 5.77,
627
+ "grad_norm": 0.4541136920452118,
628
+ "learning_rate": 0.001,
629
+ "loss": 0.7659,
630
+ "step": 166
631
+ },
632
+ {
633
+ "epoch": 5.84,
634
+ "grad_norm": 0.4243777394294739,
635
+ "learning_rate": 0.001,
636
+ "loss": 0.7742,
637
+ "step": 168
638
+ },
639
+ {
640
+ "epoch": 5.91,
641
+ "grad_norm": 0.40389642119407654,
642
+ "learning_rate": 0.001,
643
+ "loss": 0.7312,
644
+ "step": 170
645
+ },
646
+ {
647
+ "epoch": 5.98,
648
+ "grad_norm": 0.39572134613990784,
649
+ "learning_rate": 0.001,
650
+ "loss": 0.7083,
651
+ "step": 172
652
+ },
653
+ {
654
+ "epoch": 5.98,
655
+ "eval_loss": 2.9095335006713867,
656
+ "eval_runtime": 14.2201,
657
+ "eval_samples_per_second": 23.769,
658
+ "eval_steps_per_second": 3.024,
659
+ "step": 172
660
+ },
661
+ {
662
+ "epoch": 6.05,
663
+ "grad_norm": 0.34924399852752686,
664
+ "learning_rate": 0.001,
665
+ "loss": 0.5277,
666
+ "step": 174
667
+ },
668
+ {
669
+ "epoch": 6.12,
670
+ "grad_norm": 0.42162197828292847,
671
+ "learning_rate": 0.001,
672
+ "loss": 0.5141,
673
+ "step": 176
674
+ },
675
+ {
676
+ "epoch": 6.19,
677
+ "grad_norm": 0.43131959438323975,
678
+ "learning_rate": 0.001,
679
+ "loss": 0.5295,
680
+ "step": 178
681
+ },
682
+ {
683
+ "epoch": 6.26,
684
+ "grad_norm": 0.3414030075073242,
685
+ "learning_rate": 0.001,
686
+ "loss": 0.5293,
687
+ "step": 180
688
+ },
689
+ {
690
+ "epoch": 6.33,
691
+ "grad_norm": 0.2947409451007843,
692
+ "learning_rate": 0.001,
693
+ "loss": 0.5094,
694
+ "step": 182
695
+ },
696
+ {
697
+ "epoch": 6.4,
698
+ "grad_norm": 0.3493882119655609,
699
+ "learning_rate": 0.001,
700
+ "loss": 0.5103,
701
+ "step": 184
702
+ },
703
+ {
704
+ "epoch": 6.47,
705
+ "grad_norm": 0.32564929127693176,
706
+ "learning_rate": 0.001,
707
+ "loss": 0.4955,
708
+ "step": 186
709
+ },
710
+ {
711
+ "epoch": 6.54,
712
+ "grad_norm": 0.3580136299133301,
713
+ "learning_rate": 0.001,
714
+ "loss": 0.4923,
715
+ "step": 188
716
+ },
717
+ {
718
+ "epoch": 6.61,
719
+ "grad_norm": 0.40878280997276306,
720
+ "learning_rate": 0.001,
721
+ "loss": 0.5312,
722
+ "step": 190
723
+ },
724
+ {
725
+ "epoch": 6.68,
726
+ "grad_norm": 0.42687222361564636,
727
+ "learning_rate": 0.001,
728
+ "loss": 0.5977,
729
+ "step": 192
730
+ },
731
+ {
732
+ "epoch": 6.75,
733
+ "grad_norm": 0.3438556492328644,
734
+ "learning_rate": 0.001,
735
+ "loss": 0.5811,
736
+ "step": 194
737
+ },
738
+ {
739
+ "epoch": 6.82,
740
+ "grad_norm": 0.3497261703014374,
741
+ "learning_rate": 0.001,
742
+ "loss": 0.542,
743
+ "step": 196
744
+ },
745
+ {
746
+ "epoch": 6.89,
747
+ "grad_norm": 0.5538045167922974,
748
+ "learning_rate": 0.001,
749
+ "loss": 0.5392,
750
+ "step": 198
751
+ },
752
+ {
753
+ "epoch": 6.96,
754
+ "grad_norm": 0.40168508887290955,
755
+ "learning_rate": 0.001,
756
+ "loss": 0.5124,
757
+ "step": 200
758
+ },
759
+ {
760
+ "epoch": 6.99,
761
+ "eval_loss": 3.4108099937438965,
762
+ "eval_runtime": 14.2162,
763
+ "eval_samples_per_second": 23.776,
764
+ "eval_steps_per_second": 3.025,
765
+ "step": 201
766
+ },
767
+ {
768
+ "epoch": 7.03,
769
+ "grad_norm": 0.45531949400901794,
770
+ "learning_rate": 0.001,
771
+ "loss": 0.4621,
772
+ "step": 202
773
+ },
774
+ {
775
+ "epoch": 7.1,
776
+ "grad_norm": 0.420217901468277,
777
+ "learning_rate": 0.001,
778
+ "loss": 0.3526,
779
+ "step": 204
780
+ },
781
+ {
782
+ "epoch": 7.17,
783
+ "grad_norm": 0.39295145869255066,
784
+ "learning_rate": 0.001,
785
+ "loss": 0.3522,
786
+ "step": 206
787
+ },
788
+ {
789
+ "epoch": 7.23,
790
+ "grad_norm": 0.42002934217453003,
791
+ "learning_rate": 0.001,
792
+ "loss": 0.37,
793
+ "step": 208
794
+ },
795
+ {
796
+ "epoch": 7.3,
797
+ "grad_norm": 0.3919942378997803,
798
+ "learning_rate": 0.001,
799
+ "loss": 0.4123,
800
+ "step": 210
801
+ },
802
+ {
803
+ "epoch": 7.37,
804
+ "grad_norm": 0.34964632987976074,
805
+ "learning_rate": 0.001,
806
+ "loss": 0.4155,
807
+ "step": 212
808
+ },
809
+ {
810
+ "epoch": 7.44,
811
+ "grad_norm": 0.3752292990684509,
812
+ "learning_rate": 0.001,
813
+ "loss": 0.392,
814
+ "step": 214
815
+ },
816
+ {
817
+ "epoch": 7.51,
818
+ "grad_norm": 0.31345558166503906,
819
+ "learning_rate": 0.001,
820
+ "loss": 0.3973,
821
+ "step": 216
822
+ },
823
+ {
824
+ "epoch": 7.58,
825
+ "grad_norm": 0.32503190636634827,
826
+ "learning_rate": 0.001,
827
+ "loss": 0.3788,
828
+ "step": 218
829
+ },
830
+ {
831
+ "epoch": 7.65,
832
+ "grad_norm": 0.40439680218696594,
833
+ "learning_rate": 0.001,
834
+ "loss": 0.3997,
835
+ "step": 220
836
+ },
837
+ {
838
+ "epoch": 7.72,
839
+ "grad_norm": 0.5034092664718628,
840
+ "learning_rate": 0.001,
841
+ "loss": 0.3934,
842
+ "step": 222
843
+ },
844
+ {
845
+ "epoch": 7.79,
846
+ "grad_norm": 0.4480283260345459,
847
+ "learning_rate": 0.001,
848
+ "loss": 0.4095,
849
+ "step": 224
850
+ },
851
+ {
852
+ "epoch": 7.86,
853
+ "grad_norm": 0.44010230898857117,
854
+ "learning_rate": 0.001,
855
+ "loss": 0.3963,
856
+ "step": 226
857
+ },
858
+ {
859
+ "epoch": 7.93,
860
+ "grad_norm": 0.4241428077220917,
861
+ "learning_rate": 0.001,
862
+ "loss": 0.4081,
863
+ "step": 228
864
+ },
865
+ {
866
+ "epoch": 8.0,
867
+ "grad_norm": 1.4331179857254028,
868
+ "learning_rate": 0.001,
869
+ "loss": 0.4503,
870
+ "step": 230
871
+ },
872
+ {
873
+ "epoch": 8.0,
874
+ "eval_loss": 3.445927858352661,
875
+ "eval_runtime": 14.2138,
876
+ "eval_samples_per_second": 23.78,
877
+ "eval_steps_per_second": 3.025,
878
+ "step": 230
879
+ },
880
+ {
881
+ "epoch": 8.07,
882
+ "grad_norm": 0.26638248562812805,
883
+ "learning_rate": 0.001,
884
+ "loss": 0.2491,
885
+ "step": 232
886
+ },
887
+ {
888
+ "epoch": 8.14,
889
+ "grad_norm": 0.30395275354385376,
890
+ "learning_rate": 0.001,
891
+ "loss": 0.2501,
892
+ "step": 234
893
+ },
894
+ {
895
+ "epoch": 8.21,
896
+ "grad_norm": 0.36178329586982727,
897
+ "learning_rate": 0.001,
898
+ "loss": 0.2818,
899
+ "step": 236
900
+ },
901
+ {
902
+ "epoch": 8.28,
903
+ "grad_norm": 0.33713603019714355,
904
+ "learning_rate": 0.001,
905
+ "loss": 0.3005,
906
+ "step": 238
907
+ },
908
+ {
909
+ "epoch": 8.35,
910
+ "grad_norm": 0.2999133765697479,
911
+ "learning_rate": 0.001,
912
+ "loss": 0.2863,
913
+ "step": 240
914
+ },
915
+ {
916
+ "epoch": 8.42,
917
+ "grad_norm": 0.4105747938156128,
918
+ "learning_rate": 0.001,
919
+ "loss": 0.2946,
920
+ "step": 242
921
+ },
922
+ {
923
+ "epoch": 8.49,
924
+ "grad_norm": 0.3367477357387543,
925
+ "learning_rate": 0.001,
926
+ "loss": 0.3052,
927
+ "step": 244
928
+ },
929
+ {
930
+ "epoch": 8.56,
931
+ "grad_norm": 0.3373625874519348,
932
+ "learning_rate": 0.001,
933
+ "loss": 0.3166,
934
+ "step": 246
935
+ },
936
+ {
937
+ "epoch": 8.63,
938
+ "grad_norm": 0.2816782295703888,
939
+ "learning_rate": 0.001,
940
+ "loss": 0.2819,
941
+ "step": 248
942
+ },
943
+ {
944
+ "epoch": 8.7,
945
+ "grad_norm": 0.30990245938301086,
946
+ "learning_rate": 0.001,
947
+ "loss": 0.3023,
948
+ "step": 250
949
+ },
950
+ {
951
+ "epoch": 8.77,
952
+ "grad_norm": 0.317903608083725,
953
+ "learning_rate": 0.001,
954
+ "loss": 0.3128,
955
+ "step": 252
956
+ },
957
+ {
958
+ "epoch": 8.83,
959
+ "grad_norm": 0.3024023473262787,
960
+ "learning_rate": 0.001,
961
+ "loss": 0.3167,
962
+ "step": 254
963
+ },
964
+ {
965
+ "epoch": 8.9,
966
+ "grad_norm": 0.3257545828819275,
967
+ "learning_rate": 0.001,
968
+ "loss": 0.3121,
969
+ "step": 256
970
+ },
971
+ {
972
+ "epoch": 8.97,
973
+ "grad_norm": 0.34182751178741455,
974
+ "learning_rate": 0.001,
975
+ "loss": 0.3145,
976
+ "step": 258
977
+ },
978
+ {
979
+ "epoch": 8.97,
980
+ "eval_loss": 3.52162766456604,
981
+ "eval_runtime": 14.1996,
982
+ "eval_samples_per_second": 23.804,
983
+ "eval_steps_per_second": 3.028,
984
+ "step": 258
985
+ },
986
+ {
987
+ "epoch": 9.04,
988
+ "grad_norm": 0.2942521274089813,
989
+ "learning_rate": 0.001,
990
+ "loss": 0.2462,
991
+ "step": 260
992
+ },
993
+ {
994
+ "epoch": 9.11,
995
+ "grad_norm": 0.2955419719219208,
996
+ "learning_rate": 0.001,
997
+ "loss": 0.205,
998
+ "step": 262
999
+ },
1000
+ {
1001
+ "epoch": 9.18,
1002
+ "grad_norm": 0.29875773191452026,
1003
+ "learning_rate": 0.001,
1004
+ "loss": 0.2185,
1005
+ "step": 264
1006
+ },
1007
+ {
1008
+ "epoch": 9.25,
1009
+ "grad_norm": 0.9682367444038391,
1010
+ "learning_rate": 0.001,
1011
+ "loss": 0.5828,
1012
+ "step": 266
1013
+ },
1014
+ {
1015
+ "epoch": 9.32,
1016
+ "grad_norm": 0.32209742069244385,
1017
+ "learning_rate": 0.001,
1018
+ "loss": 0.2255,
1019
+ "step": 268
1020
+ },
1021
+ {
1022
+ "epoch": 9.39,
1023
+ "grad_norm": 0.2810392379760742,
1024
+ "learning_rate": 0.001,
1025
+ "loss": 0.2157,
1026
+ "step": 270
1027
+ },
1028
+ {
1029
+ "epoch": 9.46,
1030
+ "grad_norm": 0.26448503136634827,
1031
+ "learning_rate": 0.001,
1032
+ "loss": 0.2136,
1033
+ "step": 272
1034
+ },
1035
+ {
1036
+ "epoch": 9.53,
1037
+ "grad_norm": 0.2602413296699524,
1038
+ "learning_rate": 0.001,
1039
+ "loss": 0.2264,
1040
+ "step": 274
1041
+ },
1042
+ {
1043
+ "epoch": 9.6,
1044
+ "grad_norm": 0.30358317494392395,
1045
+ "learning_rate": 0.001,
1046
+ "loss": 0.2349,
1047
+ "step": 276
1048
+ },
1049
+ {
1050
+ "epoch": 9.67,
1051
+ "grad_norm": 0.2755320966243744,
1052
+ "learning_rate": 0.001,
1053
+ "loss": 0.2346,
1054
+ "step": 278
1055
+ },
1056
+ {
1057
+ "epoch": 9.74,
1058
+ "grad_norm": 0.257522314786911,
1059
+ "learning_rate": 0.001,
1060
+ "loss": 0.2312,
1061
+ "step": 280
1062
+ },
1063
+ {
1064
+ "epoch": 9.81,
1065
+ "grad_norm": 0.3185376524925232,
1066
+ "learning_rate": 0.001,
1067
+ "loss": 0.2519,
1068
+ "step": 282
1069
+ },
1070
+ {
1071
+ "epoch": 9.88,
1072
+ "grad_norm": 0.352522611618042,
1073
+ "learning_rate": 0.001,
1074
+ "loss": 0.2678,
1075
+ "step": 284
1076
+ },
1077
+ {
1078
+ "epoch": 9.95,
1079
+ "grad_norm": 0.3061072826385498,
1080
+ "learning_rate": 0.001,
1081
+ "loss": 0.2573,
1082
+ "step": 286
1083
+ },
1084
+ {
1085
+ "epoch": 9.98,
1086
+ "eval_loss": 4.012668609619141,
1087
+ "eval_runtime": 14.2083,
1088
+ "eval_samples_per_second": 23.789,
1089
+ "eval_steps_per_second": 3.026,
1090
+ "step": 287
1091
+ },
1092
+ {
1093
+ "epoch": 10.02,
1094
+ "grad_norm": 0.34469902515411377,
1095
+ "learning_rate": 0.001,
1096
+ "loss": 0.2394,
1097
+ "step": 288
1098
+ },
1099
+ {
1100
+ "epoch": 10.09,
1101
+ "grad_norm": 0.26758918166160583,
1102
+ "learning_rate": 0.001,
1103
+ "loss": 0.1749,
1104
+ "step": 290
1105
+ },
1106
+ {
1107
+ "epoch": 10.16,
1108
+ "grad_norm": 0.24024738371372223,
1109
+ "learning_rate": 0.001,
1110
+ "loss": 0.1687,
1111
+ "step": 292
1112
+ },
1113
+ {
1114
+ "epoch": 10.23,
1115
+ "grad_norm": 0.2440188080072403,
1116
+ "learning_rate": 0.001,
1117
+ "loss": 0.1634,
1118
+ "step": 294
1119
+ },
1120
+ {
1121
+ "epoch": 10.3,
1122
+ "grad_norm": 0.2564130425453186,
1123
+ "learning_rate": 0.001,
1124
+ "loss": 0.1759,
1125
+ "step": 296
1126
+ },
1127
+ {
1128
+ "epoch": 10.37,
1129
+ "grad_norm": 0.27317214012145996,
1130
+ "learning_rate": 0.001,
1131
+ "loss": 0.19,
1132
+ "step": 298
1133
+ },
1134
+ {
1135
+ "epoch": 10.43,
1136
+ "grad_norm": 0.3196158707141876,
1137
+ "learning_rate": 0.001,
1138
+ "loss": 0.1903,
1139
+ "step": 300
1140
+ },
1141
+ {
1142
+ "epoch": 10.5,
1143
+ "grad_norm": 0.3399882912635803,
1144
+ "learning_rate": 0.001,
1145
+ "loss": 0.2016,
1146
+ "step": 302
1147
+ },
1148
+ {
1149
+ "epoch": 10.57,
1150
+ "grad_norm": 0.28635892271995544,
1151
+ "learning_rate": 0.001,
1152
+ "loss": 0.189,
1153
+ "step": 304
1154
+ },
1155
+ {
1156
+ "epoch": 10.64,
1157
+ "grad_norm": 0.2602458596229553,
1158
+ "learning_rate": 0.001,
1159
+ "loss": 0.1792,
1160
+ "step": 306
1161
+ },
1162
+ {
1163
+ "epoch": 10.71,
1164
+ "grad_norm": 0.29866456985473633,
1165
+ "learning_rate": 0.001,
1166
+ "loss": 0.1974,
1167
+ "step": 308
1168
+ },
1169
+ {
1170
+ "epoch": 10.78,
1171
+ "grad_norm": 0.2757103443145752,
1172
+ "learning_rate": 0.001,
1173
+ "loss": 0.2025,
1174
+ "step": 310
1175
+ },
1176
+ {
1177
+ "epoch": 10.85,
1178
+ "grad_norm": 0.2643924355506897,
1179
+ "learning_rate": 0.001,
1180
+ "loss": 0.2106,
1181
+ "step": 312
1182
+ },
1183
+ {
1184
+ "epoch": 10.92,
1185
+ "grad_norm": 0.27770230174064636,
1186
+ "learning_rate": 0.001,
1187
+ "loss": 0.1999,
1188
+ "step": 314
1189
+ },
1190
+ {
1191
+ "epoch": 10.99,
1192
+ "grad_norm": 0.2809506356716156,
1193
+ "learning_rate": 0.001,
1194
+ "loss": 0.213,
1195
+ "step": 316
1196
+ },
1197
+ {
1198
+ "epoch": 10.99,
1199
+ "eval_loss": 3.9713540077209473,
1200
+ "eval_runtime": 14.2091,
1201
+ "eval_samples_per_second": 23.788,
1202
+ "eval_steps_per_second": 3.026,
1203
+ "step": 316
1204
+ },
1205
+ {
1206
+ "epoch": 11.06,
1207
+ "grad_norm": 0.19931960105895996,
1208
+ "learning_rate": 0.001,
1209
+ "loss": 0.1467,
1210
+ "step": 318
1211
+ },
1212
+ {
1213
+ "epoch": 11.13,
1214
+ "grad_norm": 0.20454713702201843,
1215
+ "learning_rate": 0.001,
1216
+ "loss": 0.1357,
1217
+ "step": 320
1218
+ },
1219
+ {
1220
+ "epoch": 11.2,
1221
+ "grad_norm": 0.22833150625228882,
1222
+ "learning_rate": 0.001,
1223
+ "loss": 0.1357,
1224
+ "step": 322
1225
+ },
1226
+ {
1227
+ "epoch": 11.27,
1228
+ "grad_norm": 0.45052453875541687,
1229
+ "learning_rate": 0.001,
1230
+ "loss": 0.1401,
1231
+ "step": 324
1232
+ },
1233
+ {
1234
+ "epoch": 11.34,
1235
+ "grad_norm": 0.2621672451496124,
1236
+ "learning_rate": 0.001,
1237
+ "loss": 0.1402,
1238
+ "step": 326
1239
+ },
1240
+ {
1241
+ "epoch": 11.41,
1242
+ "grad_norm": 0.2570265829563141,
1243
+ "learning_rate": 0.001,
1244
+ "loss": 0.1743,
1245
+ "step": 328
1246
+ },
1247
+ {
1248
+ "epoch": 11.48,
1249
+ "grad_norm": 0.2333604246377945,
1250
+ "learning_rate": 0.001,
1251
+ "loss": 0.1572,
1252
+ "step": 330
1253
+ },
1254
+ {
1255
+ "epoch": 11.55,
1256
+ "grad_norm": 0.2325439304113388,
1257
+ "learning_rate": 0.001,
1258
+ "loss": 0.1505,
1259
+ "step": 332
1260
+ },
1261
+ {
1262
+ "epoch": 11.62,
1263
+ "grad_norm": 0.2524455487728119,
1264
+ "learning_rate": 0.001,
1265
+ "loss": 0.1574,
1266
+ "step": 334
1267
+ },
1268
+ {
1269
+ "epoch": 11.69,
1270
+ "grad_norm": 0.2821509838104248,
1271
+ "learning_rate": 0.001,
1272
+ "loss": 0.162,
1273
+ "step": 336
1274
+ },
1275
+ {
1276
+ "epoch": 11.76,
1277
+ "grad_norm": 0.2627921402454376,
1278
+ "learning_rate": 0.001,
1279
+ "loss": 0.1823,
1280
+ "step": 338
1281
+ },
1282
+ {
1283
+ "epoch": 11.83,
1284
+ "grad_norm": 0.20858174562454224,
1285
+ "learning_rate": 0.001,
1286
+ "loss": 0.1581,
1287
+ "step": 340
1288
+ },
1289
+ {
1290
+ "epoch": 11.9,
1291
+ "grad_norm": 0.24444954097270966,
1292
+ "learning_rate": 0.001,
1293
+ "loss": 0.1528,
1294
+ "step": 342
1295
+ },
1296
+ {
1297
+ "epoch": 11.97,
1298
+ "grad_norm": 0.23985397815704346,
1299
+ "learning_rate": 0.001,
1300
+ "loss": 0.1682,
1301
+ "step": 344
1302
+ },
1303
+ {
1304
+ "epoch": 12.0,
1305
+ "eval_loss": 3.842698097229004,
1306
+ "eval_runtime": 14.2204,
1307
+ "eval_samples_per_second": 23.769,
1308
+ "eval_steps_per_second": 3.024,
1309
+ "step": 345
1310
+ },
1311
+ {
1312
+ "epoch": 12.03,
1313
+ "grad_norm": 0.20599780976772308,
1314
+ "learning_rate": 0.001,
1315
+ "loss": 0.1472,
1316
+ "step": 346
1317
+ },
1318
+ {
1319
+ "epoch": 12.1,
1320
+ "grad_norm": 0.20421436429023743,
1321
+ "learning_rate": 0.001,
1322
+ "loss": 0.1136,
1323
+ "step": 348
1324
+ },
1325
+ {
1326
+ "epoch": 12.17,
1327
+ "grad_norm": 0.21933647990226746,
1328
+ "learning_rate": 0.001,
1329
+ "loss": 0.1137,
1330
+ "step": 350
1331
+ },
1332
+ {
1333
+ "epoch": 12.24,
1334
+ "grad_norm": 0.2488287091255188,
1335
+ "learning_rate": 0.001,
1336
+ "loss": 0.1223,
1337
+ "step": 352
1338
+ },
1339
+ {
1340
+ "epoch": 12.31,
1341
+ "grad_norm": 0.21850407123565674,
1342
+ "learning_rate": 0.001,
1343
+ "loss": 0.1266,
1344
+ "step": 354
1345
+ },
1346
+ {
1347
+ "epoch": 12.38,
1348
+ "grad_norm": 0.219721257686615,
1349
+ "learning_rate": 0.001,
1350
+ "loss": 0.1266,
1351
+ "step": 356
1352
+ },
1353
+ {
1354
+ "epoch": 12.45,
1355
+ "grad_norm": 0.44860273599624634,
1356
+ "learning_rate": 0.001,
1357
+ "loss": 0.1271,
1358
+ "step": 358
1359
+ },
1360
+ {
1361
+ "epoch": 12.52,
1362
+ "grad_norm": 0.20995499193668365,
1363
+ "learning_rate": 0.001,
1364
+ "loss": 0.1201,
1365
+ "step": 360
1366
+ },
1367
+ {
1368
+ "epoch": 12.59,
1369
+ "grad_norm": 0.2180211991071701,
1370
+ "learning_rate": 0.001,
1371
+ "loss": 0.1325,
1372
+ "step": 362
1373
+ },
1374
+ {
1375
+ "epoch": 12.66,
1376
+ "grad_norm": 0.23486308753490448,
1377
+ "learning_rate": 0.001,
1378
+ "loss": 0.1316,
1379
+ "step": 364
1380
+ },
1381
+ {
1382
+ "epoch": 12.73,
1383
+ "grad_norm": 0.26361599564552307,
1384
+ "learning_rate": 0.001,
1385
+ "loss": 0.1417,
1386
+ "step": 366
1387
+ },
1388
+ {
1389
+ "epoch": 12.8,
1390
+ "grad_norm": 0.27585896849632263,
1391
+ "learning_rate": 0.001,
1392
+ "loss": 0.1514,
1393
+ "step": 368
1394
+ },
1395
+ {
1396
+ "epoch": 12.87,
1397
+ "grad_norm": 0.23249991238117218,
1398
+ "learning_rate": 0.001,
1399
+ "loss": 0.1387,
1400
+ "step": 370
1401
+ },
1402
+ {
1403
+ "epoch": 12.94,
1404
+ "grad_norm": 0.2378188967704773,
1405
+ "learning_rate": 0.001,
1406
+ "loss": 0.1396,
1407
+ "step": 372
1408
+ },
1409
+ {
1410
+ "epoch": 12.97,
1411
+ "eval_loss": 4.202455520629883,
1412
+ "eval_runtime": 14.2176,
1413
+ "eval_samples_per_second": 23.773,
1414
+ "eval_steps_per_second": 3.024,
1415
+ "step": 373
1416
+ },
1417
+ {
1418
+ "epoch": 13.01,
1419
+ "grad_norm": 0.25618231296539307,
1420
+ "learning_rate": 0.001,
1421
+ "loss": 0.1403,
1422
+ "step": 374
1423
+ },
1424
+ {
1425
+ "epoch": 13.08,
1426
+ "grad_norm": 0.19359038770198822,
1427
+ "learning_rate": 0.001,
1428
+ "loss": 0.1048,
1429
+ "step": 376
1430
+ },
1431
+ {
1432
+ "epoch": 13.15,
1433
+ "grad_norm": 0.20456379652023315,
1434
+ "learning_rate": 0.001,
1435
+ "loss": 0.102,
1436
+ "step": 378
1437
+ },
1438
+ {
1439
+ "epoch": 13.22,
1440
+ "grad_norm": 0.20489871501922607,
1441
+ "learning_rate": 0.001,
1442
+ "loss": 0.1039,
1443
+ "step": 380
1444
+ },
1445
+ {
1446
+ "epoch": 13.29,
1447
+ "grad_norm": 0.20058828592300415,
1448
+ "learning_rate": 0.001,
1449
+ "loss": 0.1066,
1450
+ "step": 382
1451
+ },
1452
+ {
1453
+ "epoch": 13.36,
1454
+ "grad_norm": 0.20939727127552032,
1455
+ "learning_rate": 0.001,
1456
+ "loss": 0.1017,
1457
+ "step": 384
1458
+ },
1459
+ {
1460
+ "epoch": 13.43,
1461
+ "grad_norm": 0.20905180275440216,
1462
+ "learning_rate": 0.001,
1463
+ "loss": 0.1074,
1464
+ "step": 386
1465
+ },
1466
+ {
1467
+ "epoch": 13.5,
1468
+ "grad_norm": 0.2082565426826477,
1469
+ "learning_rate": 0.001,
1470
+ "loss": 0.1122,
1471
+ "step": 388
1472
+ },
1473
+ {
1474
+ "epoch": 13.57,
1475
+ "grad_norm": 0.20671486854553223,
1476
+ "learning_rate": 0.001,
1477
+ "loss": 0.1137,
1478
+ "step": 390
1479
+ },
1480
+ {
1481
+ "epoch": 13.63,
1482
+ "grad_norm": 0.20714151859283447,
1483
+ "learning_rate": 0.001,
1484
+ "loss": 0.1085,
1485
+ "step": 392
1486
+ },
1487
+ {
1488
+ "epoch": 13.7,
1489
+ "grad_norm": 0.21117734909057617,
1490
+ "learning_rate": 0.001,
1491
+ "loss": 0.1225,
1492
+ "step": 394
1493
+ },
1494
+ {
1495
+ "epoch": 13.77,
1496
+ "grad_norm": 0.20374175906181335,
1497
+ "learning_rate": 0.001,
1498
+ "loss": 0.1135,
1499
+ "step": 396
1500
+ },
1501
+ {
1502
+ "epoch": 13.84,
1503
+ "grad_norm": 0.2223602831363678,
1504
+ "learning_rate": 0.001,
1505
+ "loss": 0.1188,
1506
+ "step": 398
1507
+ },
1508
+ {
1509
+ "epoch": 13.91,
1510
+ "grad_norm": 0.2405446618795395,
1511
+ "learning_rate": 0.001,
1512
+ "loss": 0.1329,
1513
+ "step": 400
1514
+ },
1515
+ {
1516
+ "epoch": 13.98,
1517
+ "grad_norm": 0.25538983941078186,
1518
+ "learning_rate": 0.001,
1519
+ "loss": 0.1363,
1520
+ "step": 402
1521
+ },
1522
+ {
1523
+ "epoch": 13.98,
1524
+ "eval_loss": 4.401235103607178,
1525
+ "eval_runtime": 14.2163,
1526
+ "eval_samples_per_second": 23.776,
1527
+ "eval_steps_per_second": 3.025,
1528
+ "step": 402
1529
+ },
1530
+ {
1531
+ "epoch": 14.05,
1532
+ "grad_norm": 0.20007029175758362,
1533
+ "learning_rate": 0.001,
1534
+ "loss": 0.0995,
1535
+ "step": 404
1536
+ },
1537
+ {
1538
+ "epoch": 14.12,
1539
+ "grad_norm": 0.17897407710552216,
1540
+ "learning_rate": 0.001,
1541
+ "loss": 0.0927,
1542
+ "step": 406
1543
+ },
1544
+ {
1545
+ "epoch": 14.19,
1546
+ "grad_norm": 0.18746937811374664,
1547
+ "learning_rate": 0.001,
1548
+ "loss": 0.0898,
1549
+ "step": 408
1550
+ },
1551
+ {
1552
+ "epoch": 14.26,
1553
+ "grad_norm": 0.1855156570672989,
1554
+ "learning_rate": 0.001,
1555
+ "loss": 0.0913,
1556
+ "step": 410
1557
+ },
1558
+ {
1559
+ "epoch": 14.33,
1560
+ "grad_norm": 0.1712156981229782,
1561
+ "learning_rate": 0.001,
1562
+ "loss": 0.0891,
1563
+ "step": 412
1564
+ },
1565
+ {
1566
+ "epoch": 14.4,
1567
+ "grad_norm": 0.193171888589859,
1568
+ "learning_rate": 0.001,
1569
+ "loss": 0.0871,
1570
+ "step": 414
1571
+ },
1572
+ {
1573
+ "epoch": 14.47,
1574
+ "grad_norm": 0.19764076173305511,
1575
+ "learning_rate": 0.001,
1576
+ "loss": 0.0892,
1577
+ "step": 416
1578
+ },
1579
+ {
1580
+ "epoch": 14.54,
1581
+ "grad_norm": 0.20630787312984467,
1582
+ "learning_rate": 0.001,
1583
+ "loss": 0.0984,
1584
+ "step": 418
1585
+ },
1586
+ {
1587
+ "epoch": 14.61,
1588
+ "grad_norm": 0.18784281611442566,
1589
+ "learning_rate": 0.001,
1590
+ "loss": 0.1002,
1591
+ "step": 420
1592
+ },
1593
+ {
1594
+ "epoch": 14.68,
1595
+ "grad_norm": 0.2134091705083847,
1596
+ "learning_rate": 0.001,
1597
+ "loss": 0.0994,
1598
+ "step": 422
1599
+ },
1600
+ {
1601
+ "epoch": 14.75,
1602
+ "grad_norm": 0.20559023320674896,
1603
+ "learning_rate": 0.001,
1604
+ "loss": 0.1036,
1605
+ "step": 424
1606
+ },
1607
+ {
1608
+ "epoch": 14.82,
1609
+ "grad_norm": 0.22063368558883667,
1610
+ "learning_rate": 0.001,
1611
+ "loss": 0.1008,
1612
+ "step": 426
1613
+ },
1614
+ {
1615
+ "epoch": 14.89,
1616
+ "grad_norm": 0.21584555506706238,
1617
+ "learning_rate": 0.001,
1618
+ "loss": 0.1137,
1619
+ "step": 428
1620
+ },
1621
+ {
1622
+ "epoch": 14.96,
1623
+ "grad_norm": 0.20185333490371704,
1624
+ "learning_rate": 0.001,
1625
+ "loss": 0.1148,
1626
+ "step": 430
1627
+ },
1628
+ {
1629
+ "epoch": 14.99,
1630
+ "eval_loss": 4.717379093170166,
1631
+ "eval_runtime": 14.2112,
1632
+ "eval_samples_per_second": 23.784,
1633
+ "eval_steps_per_second": 3.026,
1634
+ "step": 431
1635
+ },
1636
+ {
1637
+ "epoch": 15.03,
1638
+ "grad_norm": 0.22225050628185272,
1639
+ "learning_rate": 0.001,
1640
+ "loss": 0.0942,
1641
+ "step": 432
1642
+ },
1643
+ {
1644
+ "epoch": 15.1,
1645
+ "grad_norm": 0.2037811130285263,
1646
+ "learning_rate": 0.001,
1647
+ "loss": 0.0776,
1648
+ "step": 434
1649
+ },
1650
+ {
1651
+ "epoch": 15.17,
1652
+ "grad_norm": 0.1820273995399475,
1653
+ "learning_rate": 0.001,
1654
+ "loss": 0.0849,
1655
+ "step": 436
1656
+ },
1657
+ {
1658
+ "epoch": 15.23,
1659
+ "grad_norm": 0.18018700182437897,
1660
+ "learning_rate": 0.001,
1661
+ "loss": 0.0744,
1662
+ "step": 438
1663
+ },
1664
+ {
1665
+ "epoch": 15.3,
1666
+ "grad_norm": 0.21271073818206787,
1667
+ "learning_rate": 0.001,
1668
+ "loss": 0.0836,
1669
+ "step": 440
1670
+ },
1671
+ {
1672
+ "epoch": 15.37,
1673
+ "grad_norm": 0.19622232019901276,
1674
+ "learning_rate": 0.001,
1675
+ "loss": 0.0841,
1676
+ "step": 442
1677
+ },
1678
+ {
1679
+ "epoch": 15.44,
1680
+ "grad_norm": 0.19101227819919586,
1681
+ "learning_rate": 0.001,
1682
+ "loss": 0.085,
1683
+ "step": 444
1684
+ },
1685
+ {
1686
+ "epoch": 15.51,
1687
+ "grad_norm": 0.18909965455532074,
1688
+ "learning_rate": 0.001,
1689
+ "loss": 0.093,
1690
+ "step": 446
1691
+ },
1692
+ {
1693
+ "epoch": 15.58,
1694
+ "grad_norm": 0.2003660649061203,
1695
+ "learning_rate": 0.001,
1696
+ "loss": 0.0877,
1697
+ "step": 448
1698
+ },
1699
+ {
1700
+ "epoch": 15.65,
1701
+ "grad_norm": 0.2101544439792633,
1702
+ "learning_rate": 0.001,
1703
+ "loss": 0.0908,
1704
+ "step": 450
1705
+ },
1706
+ {
1707
+ "epoch": 15.72,
1708
+ "grad_norm": 0.20344135165214539,
1709
+ "learning_rate": 0.001,
1710
+ "loss": 0.0943,
1711
+ "step": 452
1712
+ },
1713
+ {
1714
+ "epoch": 15.79,
1715
+ "grad_norm": 0.28948044776916504,
1716
+ "learning_rate": 0.001,
1717
+ "loss": 0.1094,
1718
+ "step": 454
1719
+ },
1720
+ {
1721
+ "epoch": 15.86,
1722
+ "grad_norm": 0.29758819937705994,
1723
+ "learning_rate": 0.001,
1724
+ "loss": 0.1247,
1725
+ "step": 456
1726
+ },
1727
+ {
1728
+ "epoch": 15.93,
1729
+ "grad_norm": 0.24994027614593506,
1730
+ "learning_rate": 0.001,
1731
+ "loss": 0.1106,
1732
+ "step": 458
1733
+ },
1734
+ {
1735
+ "epoch": 16.0,
1736
+ "grad_norm": 0.2185268998146057,
1737
+ "learning_rate": 0.001,
1738
+ "loss": 0.0907,
1739
+ "step": 460
1740
+ },
1741
+ {
1742
+ "epoch": 16.0,
1743
+ "eval_loss": 4.497994899749756,
1744
+ "eval_runtime": 14.2114,
1745
+ "eval_samples_per_second": 23.784,
1746
+ "eval_steps_per_second": 3.026,
1747
+ "step": 460
1748
+ },
1749
+ {
1750
+ "epoch": 16.07,
1751
+ "grad_norm": 0.1781373769044876,
1752
+ "learning_rate": 0.001,
1753
+ "loss": 0.0698,
1754
+ "step": 462
1755
+ },
1756
+ {
1757
+ "epoch": 16.14,
1758
+ "grad_norm": 0.20170117914676666,
1759
+ "learning_rate": 0.001,
1760
+ "loss": 0.0763,
1761
+ "step": 464
1762
+ },
1763
+ {
1764
+ "epoch": 16.21,
1765
+ "grad_norm": 0.20545266568660736,
1766
+ "learning_rate": 0.001,
1767
+ "loss": 0.0858,
1768
+ "step": 466
1769
+ },
1770
+ {
1771
+ "epoch": 16.28,
1772
+ "grad_norm": 0.18228977918624878,
1773
+ "learning_rate": 0.001,
1774
+ "loss": 0.0802,
1775
+ "step": 468
1776
+ },
1777
+ {
1778
+ "epoch": 16.35,
1779
+ "grad_norm": 0.17187969386577606,
1780
+ "learning_rate": 0.001,
1781
+ "loss": 0.0803,
1782
+ "step": 470
1783
+ },
1784
+ {
1785
+ "epoch": 16.42,
1786
+ "grad_norm": 0.17641624808311462,
1787
+ "learning_rate": 0.001,
1788
+ "loss": 0.0796,
1789
+ "step": 472
1790
+ },
1791
+ {
1792
+ "epoch": 16.49,
1793
+ "grad_norm": 0.19069235026836395,
1794
+ "learning_rate": 0.001,
1795
+ "loss": 0.0798,
1796
+ "step": 474
1797
+ },
1798
+ {
1799
+ "epoch": 16.56,
1800
+ "grad_norm": 0.1968899667263031,
1801
+ "learning_rate": 0.001,
1802
+ "loss": 0.081,
1803
+ "step": 476
1804
+ },
1805
+ {
1806
+ "epoch": 16.63,
1807
+ "grad_norm": 0.196567103266716,
1808
+ "learning_rate": 0.001,
1809
+ "loss": 0.0891,
1810
+ "step": 478
1811
+ },
1812
+ {
1813
+ "epoch": 16.7,
1814
+ "grad_norm": 0.18010011315345764,
1815
+ "learning_rate": 0.001,
1816
+ "loss": 0.0795,
1817
+ "step": 480
1818
+ },
1819
+ {
1820
+ "epoch": 16.77,
1821
+ "grad_norm": 0.31678447127342224,
1822
+ "learning_rate": 0.001,
1823
+ "loss": 0.0819,
1824
+ "step": 482
1825
+ },
1826
+ {
1827
+ "epoch": 16.83,
1828
+ "grad_norm": 0.20116592943668365,
1829
+ "learning_rate": 0.001,
1830
+ "loss": 0.0873,
1831
+ "step": 484
1832
+ },
1833
+ {
1834
+ "epoch": 16.9,
1835
+ "grad_norm": 0.19425255060195923,
1836
+ "learning_rate": 0.001,
1837
+ "loss": 0.0898,
1838
+ "step": 486
1839
+ },
1840
+ {
1841
+ "epoch": 16.97,
1842
+ "grad_norm": 0.21240267157554626,
1843
+ "learning_rate": 0.001,
1844
+ "loss": 0.0942,
1845
+ "step": 488
1846
+ },
1847
+ {
1848
+ "epoch": 16.97,
1849
+ "eval_loss": 4.70243501663208,
1850
+ "eval_runtime": 14.2157,
1851
+ "eval_samples_per_second": 23.777,
1852
+ "eval_steps_per_second": 3.025,
1853
+ "step": 488
1854
+ },
1855
+ {
1856
+ "epoch": 17.04,
1857
+ "grad_norm": 0.16432495415210724,
1858
+ "learning_rate": 0.001,
1859
+ "loss": 0.0762,
1860
+ "step": 490
1861
+ },
1862
+ {
1863
+ "epoch": 17.11,
1864
+ "grad_norm": 0.19645223021507263,
1865
+ "learning_rate": 0.001,
1866
+ "loss": 0.0638,
1867
+ "step": 492
1868
+ },
1869
+ {
1870
+ "epoch": 17.18,
1871
+ "grad_norm": 0.1714513599872589,
1872
+ "learning_rate": 0.001,
1873
+ "loss": 0.0669,
1874
+ "step": 494
1875
+ },
1876
+ {
1877
+ "epoch": 17.25,
1878
+ "grad_norm": 0.16747209429740906,
1879
+ "learning_rate": 0.001,
1880
+ "loss": 0.064,
1881
+ "step": 496
1882
+ },
1883
+ {
1884
+ "epoch": 17.32,
1885
+ "grad_norm": 0.17507390677928925,
1886
+ "learning_rate": 0.001,
1887
+ "loss": 0.0675,
1888
+ "step": 498
1889
+ },
1890
+ {
1891
+ "epoch": 17.39,
1892
+ "grad_norm": 0.1752365380525589,
1893
+ "learning_rate": 0.001,
1894
+ "loss": 0.0716,
1895
+ "step": 500
1896
+ },
1897
+ {
1898
+ "epoch": 17.46,
1899
+ "grad_norm": 0.1760423481464386,
1900
+ "learning_rate": 0.001,
1901
+ "loss": 0.0736,
1902
+ "step": 502
1903
+ },
1904
+ {
1905
+ "epoch": 17.53,
1906
+ "grad_norm": 0.17969931662082672,
1907
+ "learning_rate": 0.001,
1908
+ "loss": 0.0725,
1909
+ "step": 504
1910
+ },
1911
+ {
1912
+ "epoch": 17.6,
1913
+ "grad_norm": 0.20572717487812042,
1914
+ "learning_rate": 0.001,
1915
+ "loss": 0.0751,
1916
+ "step": 506
1917
+ },
1918
+ {
1919
+ "epoch": 17.67,
1920
+ "grad_norm": 0.20842702686786652,
1921
+ "learning_rate": 0.001,
1922
+ "loss": 0.0796,
1923
+ "step": 508
1924
+ },
1925
+ {
1926
+ "epoch": 17.74,
1927
+ "grad_norm": 0.18587857484817505,
1928
+ "learning_rate": 0.001,
1929
+ "loss": 0.0757,
1930
+ "step": 510
1931
+ },
1932
+ {
1933
+ "epoch": 17.81,
1934
+ "grad_norm": 0.17455922067165375,
1935
+ "learning_rate": 0.001,
1936
+ "loss": 0.0792,
1937
+ "step": 512
1938
+ },
1939
+ {
1940
+ "epoch": 17.88,
1941
+ "grad_norm": 0.181496262550354,
1942
+ "learning_rate": 0.001,
1943
+ "loss": 0.0711,
1944
+ "step": 514
1945
+ },
1946
+ {
1947
+ "epoch": 17.95,
1948
+ "grad_norm": 0.1918957382440567,
1949
+ "learning_rate": 0.001,
1950
+ "loss": 0.0765,
1951
+ "step": 516
1952
+ },
1953
+ {
1954
+ "epoch": 17.98,
1955
+ "eval_loss": 4.348242282867432,
1956
+ "eval_runtime": 14.2133,
1957
+ "eval_samples_per_second": 23.78,
1958
+ "eval_steps_per_second": 3.025,
1959
+ "step": 517
1960
+ },
1961
+ {
1962
+ "epoch": 18.02,
1963
+ "grad_norm": 0.23215439915657043,
1964
+ "learning_rate": 0.001,
1965
+ "loss": 0.0854,
1966
+ "step": 518
1967
+ },
1968
+ {
1969
+ "epoch": 18.09,
1970
+ "grad_norm": 0.16362574696540833,
1971
+ "learning_rate": 0.001,
1972
+ "loss": 0.0636,
1973
+ "step": 520
1974
+ },
1975
+ {
1976
+ "epoch": 18.16,
1977
+ "grad_norm": 0.18604953587055206,
1978
+ "learning_rate": 0.001,
1979
+ "loss": 0.0632,
1980
+ "step": 522
1981
+ },
1982
+ {
1983
+ "epoch": 18.23,
1984
+ "grad_norm": 0.20401595532894135,
1985
+ "learning_rate": 0.001,
1986
+ "loss": 0.077,
1987
+ "step": 524
1988
+ },
1989
+ {
1990
+ "epoch": 18.3,
1991
+ "grad_norm": 0.1713324785232544,
1992
+ "learning_rate": 0.001,
1993
+ "loss": 0.0743,
1994
+ "step": 526
1995
+ },
1996
+ {
1997
+ "epoch": 18.37,
1998
+ "grad_norm": 0.1794094294309616,
1999
+ "learning_rate": 0.001,
2000
+ "loss": 0.0693,
2001
+ "step": 528
2002
+ },
2003
+ {
2004
+ "epoch": 18.43,
2005
+ "grad_norm": 0.1892070323228836,
2006
+ "learning_rate": 0.001,
2007
+ "loss": 0.0663,
2008
+ "step": 530
2009
+ },
2010
+ {
2011
+ "epoch": 18.5,
2012
+ "grad_norm": 0.1843532919883728,
2013
+ "learning_rate": 0.001,
2014
+ "loss": 0.0721,
2015
+ "step": 532
2016
+ },
2017
+ {
2018
+ "epoch": 18.57,
2019
+ "grad_norm": 0.17645902931690216,
2020
+ "learning_rate": 0.001,
2021
+ "loss": 0.0746,
2022
+ "step": 534
2023
+ },
2024
+ {
2025
+ "epoch": 18.64,
2026
+ "grad_norm": 0.16752366721630096,
2027
+ "learning_rate": 0.001,
2028
+ "loss": 0.0712,
2029
+ "step": 536
2030
+ },
2031
+ {
2032
+ "epoch": 18.71,
2033
+ "grad_norm": 0.16078142821788788,
2034
+ "learning_rate": 0.001,
2035
+ "loss": 0.0729,
2036
+ "step": 538
2037
+ },
2038
+ {
2039
+ "epoch": 18.78,
2040
+ "grad_norm": 0.1775059700012207,
2041
+ "learning_rate": 0.001,
2042
+ "loss": 0.0745,
2043
+ "step": 540
2044
+ },
2045
+ {
2046
+ "epoch": 18.85,
2047
+ "grad_norm": 0.18710005283355713,
2048
+ "learning_rate": 0.001,
2049
+ "loss": 0.0743,
2050
+ "step": 542
2051
+ },
2052
+ {
2053
+ "epoch": 18.92,
2054
+ "grad_norm": 0.20217396318912506,
2055
+ "learning_rate": 0.001,
2056
+ "loss": 0.0761,
2057
+ "step": 544
2058
+ },
2059
+ {
2060
+ "epoch": 18.99,
2061
+ "grad_norm": 0.19417209923267365,
2062
+ "learning_rate": 0.001,
2063
+ "loss": 0.0799,
2064
+ "step": 546
2065
+ },
2066
+ {
2067
+ "epoch": 18.99,
2068
+ "eval_loss": 4.538631916046143,
2069
+ "eval_runtime": 14.2104,
2070
+ "eval_samples_per_second": 23.785,
2071
+ "eval_steps_per_second": 3.026,
2072
+ "step": 546
2073
+ },
2074
+ {
2075
+ "epoch": 19.06,
2076
+ "grad_norm": 0.16049686074256897,
2077
+ "learning_rate": 0.001,
2078
+ "loss": 0.0633,
2079
+ "step": 548
2080
+ },
2081
+ {
2082
+ "epoch": 19.13,
2083
+ "grad_norm": 0.15315844118595123,
2084
+ "learning_rate": 0.001,
2085
+ "loss": 0.0614,
2086
+ "step": 550
2087
+ },
2088
+ {
2089
+ "epoch": 19.2,
2090
+ "grad_norm": 0.16441303491592407,
2091
+ "learning_rate": 0.001,
2092
+ "loss": 0.0658,
2093
+ "step": 552
2094
+ },
2095
+ {
2096
+ "epoch": 19.27,
2097
+ "grad_norm": 0.16347071528434753,
2098
+ "learning_rate": 0.001,
2099
+ "loss": 0.0551,
2100
+ "step": 554
2101
+ },
2102
+ {
2103
+ "epoch": 19.34,
2104
+ "grad_norm": 0.16774040460586548,
2105
+ "learning_rate": 0.001,
2106
+ "loss": 0.0698,
2107
+ "step": 556
2108
+ },
2109
+ {
2110
+ "epoch": 19.41,
2111
+ "grad_norm": 0.15695886313915253,
2112
+ "learning_rate": 0.001,
2113
+ "loss": 0.063,
2114
+ "step": 558
2115
+ },
2116
+ {
2117
+ "epoch": 19.48,
2118
+ "grad_norm": 0.1639881581068039,
2119
+ "learning_rate": 0.001,
2120
+ "loss": 0.0649,
2121
+ "step": 560
2122
+ },
2123
+ {
2124
+ "epoch": 19.55,
2125
+ "grad_norm": 0.1653515100479126,
2126
+ "learning_rate": 0.001,
2127
+ "loss": 0.0592,
2128
+ "step": 562
2129
+ },
2130
+ {
2131
+ "epoch": 19.62,
2132
+ "grad_norm": 0.1697942465543747,
2133
+ "learning_rate": 0.001,
2134
+ "loss": 0.0619,
2135
+ "step": 564
2136
+ },
2137
+ {
2138
+ "epoch": 19.69,
2139
+ "grad_norm": 0.18345074355602264,
2140
+ "learning_rate": 0.001,
2141
+ "loss": 0.0674,
2142
+ "step": 566
2143
+ },
2144
+ {
2145
+ "epoch": 19.76,
2146
+ "grad_norm": 0.17060339450836182,
2147
+ "learning_rate": 0.001,
2148
+ "loss": 0.0603,
2149
+ "step": 568
2150
+ },
2151
+ {
2152
+ "epoch": 19.83,
2153
+ "grad_norm": 0.18731752038002014,
2154
+ "learning_rate": 0.001,
2155
+ "loss": 0.0668,
2156
+ "step": 570
2157
+ },
2158
+ {
2159
+ "epoch": 19.9,
2160
+ "grad_norm": 0.1727019101381302,
2161
+ "learning_rate": 0.001,
2162
+ "loss": 0.0717,
2163
+ "step": 572
2164
+ },
2165
+ {
2166
+ "epoch": 19.97,
2167
+ "grad_norm": 0.18209876120090485,
2168
+ "learning_rate": 0.001,
2169
+ "loss": 0.073,
2170
+ "step": 574
2171
+ },
2172
+ {
2173
+ "epoch": 20.0,
2174
+ "eval_loss": 4.588906764984131,
2175
+ "eval_runtime": 14.2111,
2176
+ "eval_samples_per_second": 23.784,
2177
+ "eval_steps_per_second": 3.026,
2178
+ "step": 575
2179
+ },
2180
+ {
2181
+ "epoch": 20.03,
2182
+ "grad_norm": 0.15836042165756226,
2183
+ "learning_rate": 0.001,
2184
+ "loss": 0.0655,
2185
+ "step": 576
2186
+ },
2187
+ {
2188
+ "epoch": 20.1,
2189
+ "grad_norm": 0.1645493507385254,
2190
+ "learning_rate": 0.001,
2191
+ "loss": 0.0609,
2192
+ "step": 578
2193
+ },
2194
+ {
2195
+ "epoch": 20.17,
2196
+ "grad_norm": 0.14691436290740967,
2197
+ "learning_rate": 0.001,
2198
+ "loss": 0.0533,
2199
+ "step": 580
2200
+ },
2201
+ {
2202
+ "epoch": 20.24,
2203
+ "grad_norm": 0.15472760796546936,
2204
+ "learning_rate": 0.001,
2205
+ "loss": 0.0596,
2206
+ "step": 582
2207
+ },
2208
+ {
2209
+ "epoch": 20.31,
2210
+ "grad_norm": 0.14597873389720917,
2211
+ "learning_rate": 0.001,
2212
+ "loss": 0.0548,
2213
+ "step": 584
2214
+ },
2215
+ {
2216
+ "epoch": 20.38,
2217
+ "grad_norm": 0.14974112808704376,
2218
+ "learning_rate": 0.001,
2219
+ "loss": 0.057,
2220
+ "step": 586
2221
+ },
2222
+ {
2223
+ "epoch": 20.45,
2224
+ "grad_norm": 0.16359882056713104,
2225
+ "learning_rate": 0.001,
2226
+ "loss": 0.0597,
2227
+ "step": 588
2228
+ },
2229
+ {
2230
+ "epoch": 20.52,
2231
+ "grad_norm": 0.15623408555984497,
2232
+ "learning_rate": 0.001,
2233
+ "loss": 0.0581,
2234
+ "step": 590
2235
+ },
2236
+ {
2237
+ "epoch": 20.59,
2238
+ "grad_norm": 0.16072715818881989,
2239
+ "learning_rate": 0.001,
2240
+ "loss": 0.0577,
2241
+ "step": 592
2242
+ },
2243
+ {
2244
+ "epoch": 20.66,
2245
+ "grad_norm": 0.17041383683681488,
2246
+ "learning_rate": 0.001,
2247
+ "loss": 0.0601,
2248
+ "step": 594
2249
+ },
2250
+ {
2251
+ "epoch": 20.73,
2252
+ "grad_norm": 0.182576522231102,
2253
+ "learning_rate": 0.001,
2254
+ "loss": 0.0614,
2255
+ "step": 596
2256
+ },
2257
+ {
2258
+ "epoch": 20.8,
2259
+ "grad_norm": 0.17546887695789337,
2260
+ "learning_rate": 0.001,
2261
+ "loss": 0.0658,
2262
+ "step": 598
2263
+ },
2264
+ {
2265
+ "epoch": 20.87,
2266
+ "grad_norm": 0.19615040719509125,
2267
+ "learning_rate": 0.001,
2268
+ "loss": 0.061,
2269
+ "step": 600
2270
+ },
2271
+ {
2272
+ "epoch": 20.94,
2273
+ "grad_norm": 0.2895524799823761,
2274
+ "learning_rate": 0.001,
2275
+ "loss": 0.0825,
2276
+ "step": 602
2277
+ },
2278
+ {
2279
+ "epoch": 20.97,
2280
+ "eval_loss": 4.681673049926758,
2281
+ "eval_runtime": 14.2047,
2282
+ "eval_samples_per_second": 23.795,
2283
+ "eval_steps_per_second": 3.027,
2284
+ "step": 603
2285
+ },
2286
+ {
2287
+ "epoch": 21.01,
2288
+ "grad_norm": 0.3042687177658081,
2289
+ "learning_rate": 0.001,
2290
+ "loss": 0.102,
2291
+ "step": 604
2292
+ },
2293
+ {
2294
+ "epoch": 21.08,
2295
+ "grad_norm": 0.14991188049316406,
2296
+ "learning_rate": 0.001,
2297
+ "loss": 0.0585,
2298
+ "step": 606
2299
+ },
2300
+ {
2301
+ "epoch": 21.15,
2302
+ "grad_norm": 0.13689693808555603,
2303
+ "learning_rate": 0.001,
2304
+ "loss": 0.0546,
2305
+ "step": 608
2306
+ },
2307
+ {
2308
+ "epoch": 21.22,
2309
+ "grad_norm": 0.14480513334274292,
2310
+ "learning_rate": 0.001,
2311
+ "loss": 0.0533,
2312
+ "step": 610
2313
+ },
2314
+ {
2315
+ "epoch": 21.29,
2316
+ "grad_norm": 0.148710235953331,
2317
+ "learning_rate": 0.001,
2318
+ "loss": 0.0543,
2319
+ "step": 612
2320
+ },
2321
+ {
2322
+ "epoch": 21.36,
2323
+ "grad_norm": 0.14531894028186798,
2324
+ "learning_rate": 0.001,
2325
+ "loss": 0.0528,
2326
+ "step": 614
2327
+ },
2328
+ {
2329
+ "epoch": 21.43,
2330
+ "grad_norm": 0.14758220314979553,
2331
+ "learning_rate": 0.001,
2332
+ "loss": 0.0536,
2333
+ "step": 616
2334
+ },
2335
+ {
2336
+ "epoch": 21.5,
2337
+ "grad_norm": 0.1422356367111206,
2338
+ "learning_rate": 0.001,
2339
+ "loss": 0.0537,
2340
+ "step": 618
2341
+ },
2342
+ {
2343
+ "epoch": 21.57,
2344
+ "grad_norm": 0.16927896440029144,
2345
+ "learning_rate": 0.001,
2346
+ "loss": 0.0551,
2347
+ "step": 620
2348
+ },
2349
+ {
2350
+ "epoch": 21.63,
2351
+ "grad_norm": 0.18368251621723175,
2352
+ "learning_rate": 0.001,
2353
+ "loss": 0.0589,
2354
+ "step": 622
2355
+ },
2356
+ {
2357
+ "epoch": 21.7,
2358
+ "grad_norm": 0.16162264347076416,
2359
+ "learning_rate": 0.001,
2360
+ "loss": 0.0639,
2361
+ "step": 624
2362
+ },
2363
+ {
2364
+ "epoch": 21.77,
2365
+ "grad_norm": 0.1610013097524643,
2366
+ "learning_rate": 0.001,
2367
+ "loss": 0.0604,
2368
+ "step": 626
2369
+ },
2370
+ {
2371
+ "epoch": 21.84,
2372
+ "grad_norm": 0.15656866133213043,
2373
+ "learning_rate": 0.001,
2374
+ "loss": 0.0603,
2375
+ "step": 628
2376
+ },
2377
+ {
2378
+ "epoch": 21.91,
2379
+ "grad_norm": 0.15664397180080414,
2380
+ "learning_rate": 0.001,
2381
+ "loss": 0.0589,
2382
+ "step": 630
2383
+ },
2384
+ {
2385
+ "epoch": 21.98,
2386
+ "grad_norm": 0.1484573483467102,
2387
+ "learning_rate": 0.001,
2388
+ "loss": 0.0616,
2389
+ "step": 632
2390
+ },
2391
+ {
2392
+ "epoch": 21.98,
2393
+ "eval_loss": 5.026252746582031,
2394
+ "eval_runtime": 14.2151,
2395
+ "eval_samples_per_second": 23.778,
2396
+ "eval_steps_per_second": 3.025,
2397
+ "step": 632
2398
+ },
2399
+ {
2400
+ "epoch": 22.05,
2401
+ "grad_norm": 0.16003428399562836,
2402
+ "learning_rate": 0.001,
2403
+ "loss": 0.0579,
2404
+ "step": 634
2405
+ },
2406
+ {
2407
+ "epoch": 22.12,
2408
+ "grad_norm": 0.14746786653995514,
2409
+ "learning_rate": 0.001,
2410
+ "loss": 0.0503,
2411
+ "step": 636
2412
+ },
2413
+ {
2414
+ "epoch": 22.19,
2415
+ "grad_norm": 0.1616966724395752,
2416
+ "learning_rate": 0.001,
2417
+ "loss": 0.0518,
2418
+ "step": 638
2419
+ },
2420
+ {
2421
+ "epoch": 22.26,
2422
+ "grad_norm": 0.14375324547290802,
2423
+ "learning_rate": 0.001,
2424
+ "loss": 0.0501,
2425
+ "step": 640
2426
+ },
2427
+ {
2428
+ "epoch": 22.33,
2429
+ "grad_norm": 0.14273105561733246,
2430
+ "learning_rate": 0.001,
2431
+ "loss": 0.0543,
2432
+ "step": 642
2433
+ },
2434
+ {
2435
+ "epoch": 22.4,
2436
+ "grad_norm": 0.14281895756721497,
2437
+ "learning_rate": 0.001,
2438
+ "loss": 0.0572,
2439
+ "step": 644
2440
+ },
2441
+ {
2442
+ "epoch": 22.47,
2443
+ "grad_norm": 0.1508861482143402,
2444
+ "learning_rate": 0.001,
2445
+ "loss": 0.0538,
2446
+ "step": 646
2447
+ },
2448
+ {
2449
+ "epoch": 22.54,
2450
+ "grad_norm": 0.1677195429801941,
2451
+ "learning_rate": 0.001,
2452
+ "loss": 0.052,
2453
+ "step": 648
2454
+ },
2455
+ {
2456
+ "epoch": 22.61,
2457
+ "grad_norm": 0.16616381704807281,
2458
+ "learning_rate": 0.001,
2459
+ "loss": 0.0569,
2460
+ "step": 650
2461
+ },
2462
+ {
2463
+ "epoch": 22.68,
2464
+ "grad_norm": 0.159612774848938,
2465
+ "learning_rate": 0.001,
2466
+ "loss": 0.0585,
2467
+ "step": 652
2468
+ },
2469
+ {
2470
+ "epoch": 22.75,
2471
+ "grad_norm": 0.14692574739456177,
2472
+ "learning_rate": 0.001,
2473
+ "loss": 0.0559,
2474
+ "step": 654
2475
+ },
2476
+ {
2477
+ "epoch": 22.82,
2478
+ "grad_norm": 0.14404280483722687,
2479
+ "learning_rate": 0.001,
2480
+ "loss": 0.0614,
2481
+ "step": 656
2482
+ },
2483
+ {
2484
+ "epoch": 22.89,
2485
+ "grad_norm": 0.16935443878173828,
2486
+ "learning_rate": 0.001,
2487
+ "loss": 0.0531,
2488
+ "step": 658
2489
+ },
2490
+ {
2491
+ "epoch": 22.96,
2492
+ "grad_norm": 0.16444338858127594,
2493
+ "learning_rate": 0.001,
2494
+ "loss": 0.0677,
2495
+ "step": 660
2496
+ },
2497
+ {
2498
+ "epoch": 22.99,
2499
+ "eval_loss": 4.580361843109131,
2500
+ "eval_runtime": 14.2193,
2501
+ "eval_samples_per_second": 23.77,
2502
+ "eval_steps_per_second": 3.024,
2503
+ "step": 661
2504
+ },
2505
+ {
2506
+ "epoch": 23.03,
2507
+ "grad_norm": 0.1523355394601822,
2508
+ "learning_rate": 0.001,
2509
+ "loss": 0.056,
2510
+ "step": 662
2511
+ },
2512
+ {
2513
+ "epoch": 23.1,
2514
+ "grad_norm": 0.13197599351406097,
2515
+ "learning_rate": 0.001,
2516
+ "loss": 0.0505,
2517
+ "step": 664
2518
+ },
2519
+ {
2520
+ "epoch": 23.17,
2521
+ "grad_norm": 0.1436968594789505,
2522
+ "learning_rate": 0.001,
2523
+ "loss": 0.0459,
2524
+ "step": 666
2525
+ },
2526
+ {
2527
+ "epoch": 23.23,
2528
+ "grad_norm": 0.15447324514389038,
2529
+ "learning_rate": 0.001,
2530
+ "loss": 0.0484,
2531
+ "step": 668
2532
+ },
2533
+ {
2534
+ "epoch": 23.3,
2535
+ "grad_norm": 0.1536988466978073,
2536
+ "learning_rate": 0.001,
2537
+ "loss": 0.0523,
2538
+ "step": 670
2539
+ },
2540
+ {
2541
+ "epoch": 23.37,
2542
+ "grad_norm": 0.14964550733566284,
2543
+ "learning_rate": 0.001,
2544
+ "loss": 0.0523,
2545
+ "step": 672
2546
+ },
2547
+ {
2548
+ "epoch": 23.44,
2549
+ "grad_norm": 0.14874936640262604,
2550
+ "learning_rate": 0.001,
2551
+ "loss": 0.053,
2552
+ "step": 674
2553
+ },
2554
+ {
2555
+ "epoch": 23.51,
2556
+ "grad_norm": 0.14731401205062866,
2557
+ "learning_rate": 0.001,
2558
+ "loss": 0.0534,
2559
+ "step": 676
2560
+ },
2561
+ {
2562
+ "epoch": 23.58,
2563
+ "grad_norm": 0.15508398413658142,
2564
+ "learning_rate": 0.001,
2565
+ "loss": 0.0538,
2566
+ "step": 678
2567
+ },
2568
+ {
2569
+ "epoch": 23.65,
2570
+ "grad_norm": 0.16380004584789276,
2571
+ "learning_rate": 0.001,
2572
+ "loss": 0.0539,
2573
+ "step": 680
2574
+ },
2575
+ {
2576
+ "epoch": 23.72,
2577
+ "grad_norm": 0.38461077213287354,
2578
+ "learning_rate": 0.001,
2579
+ "loss": 0.0609,
2580
+ "step": 682
2581
+ },
2582
+ {
2583
+ "epoch": 23.79,
2584
+ "grad_norm": 0.1609809696674347,
2585
+ "learning_rate": 0.001,
2586
+ "loss": 0.0553,
2587
+ "step": 684
2588
+ },
2589
+ {
2590
+ "epoch": 23.86,
2591
+ "grad_norm": 0.1482686847448349,
2592
+ "learning_rate": 0.001,
2593
+ "loss": 0.0576,
2594
+ "step": 686
2595
+ },
2596
+ {
2597
+ "epoch": 23.93,
2598
+ "grad_norm": 0.16069763898849487,
2599
+ "learning_rate": 0.001,
2600
+ "loss": 0.0569,
2601
+ "step": 688
2602
+ },
2603
+ {
2604
+ "epoch": 24.0,
2605
+ "grad_norm": 0.16612602770328522,
2606
+ "learning_rate": 0.001,
2607
+ "loss": 0.0571,
2608
+ "step": 690
2609
+ },
2610
+ {
2611
+ "epoch": 24.0,
2612
+ "eval_loss": 4.839931488037109,
2613
+ "eval_runtime": 14.2054,
2614
+ "eval_samples_per_second": 23.794,
2615
+ "eval_steps_per_second": 3.027,
2616
+ "step": 690
2617
+ },
2618
+ {
2619
+ "epoch": 24.07,
2620
+ "grad_norm": 0.14117293059825897,
2621
+ "learning_rate": 0.001,
2622
+ "loss": 0.0473,
2623
+ "step": 692
2624
+ },
2625
+ {
2626
+ "epoch": 24.14,
2627
+ "grad_norm": 0.13660480082035065,
2628
+ "learning_rate": 0.001,
2629
+ "loss": 0.0462,
2630
+ "step": 694
2631
+ },
2632
+ {
2633
+ "epoch": 24.21,
2634
+ "grad_norm": 0.13649675250053406,
2635
+ "learning_rate": 0.001,
2636
+ "loss": 0.0436,
2637
+ "step": 696
2638
+ },
2639
+ {
2640
+ "epoch": 24.28,
2641
+ "grad_norm": 0.13140219449996948,
2642
+ "learning_rate": 0.001,
2643
+ "loss": 0.0496,
2644
+ "step": 698
2645
+ },
2646
+ {
2647
+ "epoch": 24.35,
2648
+ "grad_norm": 0.13666340708732605,
2649
+ "learning_rate": 0.001,
2650
+ "loss": 0.0453,
2651
+ "step": 700
2652
+ },
2653
+ {
2654
+ "epoch": 24.42,
2655
+ "grad_norm": 0.14786505699157715,
2656
+ "learning_rate": 0.001,
2657
+ "loss": 0.0458,
2658
+ "step": 702
2659
+ },
2660
+ {
2661
+ "epoch": 24.49,
2662
+ "grad_norm": 0.14177779853343964,
2663
+ "learning_rate": 0.001,
2664
+ "loss": 0.0454,
2665
+ "step": 704
2666
+ },
2667
+ {
2668
+ "epoch": 24.56,
2669
+ "grad_norm": 0.15088017284870148,
2670
+ "learning_rate": 0.001,
2671
+ "loss": 0.0486,
2672
+ "step": 706
2673
+ },
2674
+ {
2675
+ "epoch": 24.63,
2676
+ "grad_norm": 0.1343182623386383,
2677
+ "learning_rate": 0.001,
2678
+ "loss": 0.0466,
2679
+ "step": 708
2680
+ },
2681
+ {
2682
+ "epoch": 24.7,
2683
+ "grad_norm": 0.14765577018260956,
2684
+ "learning_rate": 0.001,
2685
+ "loss": 0.051,
2686
+ "step": 710
2687
+ },
2688
+ {
2689
+ "epoch": 24.77,
2690
+ "grad_norm": 0.1491304486989975,
2691
+ "learning_rate": 0.001,
2692
+ "loss": 0.0497,
2693
+ "step": 712
2694
+ },
2695
+ {
2696
+ "epoch": 24.83,
2697
+ "grad_norm": 0.1482744961977005,
2698
+ "learning_rate": 0.001,
2699
+ "loss": 0.0525,
2700
+ "step": 714
2701
+ },
2702
+ {
2703
+ "epoch": 24.9,
2704
+ "grad_norm": 0.15700866281986237,
2705
+ "learning_rate": 0.001,
2706
+ "loss": 0.0497,
2707
+ "step": 716
2708
+ },
2709
+ {
2710
+ "epoch": 24.97,
2711
+ "grad_norm": 0.15650177001953125,
2712
+ "learning_rate": 0.001,
2713
+ "loss": 0.0525,
2714
+ "step": 718
2715
+ },
2716
+ {
2717
+ "epoch": 24.97,
2718
+ "eval_loss": 4.934982776641846,
2719
+ "eval_runtime": 14.2072,
2720
+ "eval_samples_per_second": 23.791,
2721
+ "eval_steps_per_second": 3.027,
2722
+ "step": 718
2723
+ },
2724
+ {
2725
+ "epoch": 25.04,
2726
+ "grad_norm": 0.15254813432693481,
2727
+ "learning_rate": 0.001,
2728
+ "loss": 0.0501,
2729
+ "step": 720
2730
+ },
2731
+ {
2732
+ "epoch": 25.11,
2733
+ "grad_norm": 0.1353287696838379,
2734
+ "learning_rate": 0.001,
2735
+ "loss": 0.0471,
2736
+ "step": 722
2737
+ },
2738
+ {
2739
+ "epoch": 25.18,
2740
+ "grad_norm": 0.155950665473938,
2741
+ "learning_rate": 0.001,
2742
+ "loss": 0.0478,
2743
+ "step": 724
2744
+ },
2745
+ {
2746
+ "epoch": 25.25,
2747
+ "grad_norm": 0.14119938015937805,
2748
+ "learning_rate": 0.001,
2749
+ "loss": 0.0484,
2750
+ "step": 726
2751
+ },
2752
+ {
2753
+ "epoch": 25.32,
2754
+ "grad_norm": 0.14679110050201416,
2755
+ "learning_rate": 0.001,
2756
+ "loss": 0.0432,
2757
+ "step": 728
2758
+ },
2759
+ {
2760
+ "epoch": 25.39,
2761
+ "grad_norm": 0.13392671942710876,
2762
+ "learning_rate": 0.001,
2763
+ "loss": 0.0486,
2764
+ "step": 730
2765
+ },
2766
+ {
2767
+ "epoch": 25.46,
2768
+ "grad_norm": 0.13733159005641937,
2769
+ "learning_rate": 0.001,
2770
+ "loss": 0.0451,
2771
+ "step": 732
2772
+ },
2773
+ {
2774
+ "epoch": 25.53,
2775
+ "grad_norm": 0.13778570294380188,
2776
+ "learning_rate": 0.001,
2777
+ "loss": 0.0452,
2778
+ "step": 734
2779
+ },
2780
+ {
2781
+ "epoch": 25.6,
2782
+ "grad_norm": 0.15224093198776245,
2783
+ "learning_rate": 0.001,
2784
+ "loss": 0.0469,
2785
+ "step": 736
2786
+ },
2787
+ {
2788
+ "epoch": 25.67,
2789
+ "grad_norm": 0.15857195854187012,
2790
+ "learning_rate": 0.001,
2791
+ "loss": 0.0489,
2792
+ "step": 738
2793
+ },
2794
+ {
2795
+ "epoch": 25.74,
2796
+ "grad_norm": 0.15234865248203278,
2797
+ "learning_rate": 0.001,
2798
+ "loss": 0.0536,
2799
+ "step": 740
2800
+ },
2801
+ {
2802
+ "epoch": 25.81,
2803
+ "grad_norm": 0.17310313880443573,
2804
+ "learning_rate": 0.001,
2805
+ "loss": 0.0518,
2806
+ "step": 742
2807
+ },
2808
+ {
2809
+ "epoch": 25.88,
2810
+ "grad_norm": 0.23332546651363373,
2811
+ "learning_rate": 0.001,
2812
+ "loss": 0.0601,
2813
+ "step": 744
2814
+ },
2815
+ {
2816
+ "epoch": 25.95,
2817
+ "grad_norm": 0.22406938672065735,
2818
+ "learning_rate": 0.001,
2819
+ "loss": 0.081,
2820
+ "step": 746
2821
+ },
2822
+ {
2823
+ "epoch": 25.98,
2824
+ "eval_loss": 4.690272331237793,
2825
+ "eval_runtime": 14.2141,
2826
+ "eval_samples_per_second": 23.779,
2827
+ "eval_steps_per_second": 3.025,
2828
+ "step": 747
2829
+ },
2830
+ {
2831
+ "epoch": 26.02,
2832
+ "grad_norm": 0.14596611261367798,
2833
+ "learning_rate": 0.001,
2834
+ "loss": 0.0551,
2835
+ "step": 748
2836
+ },
2837
+ {
2838
+ "epoch": 26.09,
2839
+ "grad_norm": 0.11857102066278458,
2840
+ "learning_rate": 0.001,
2841
+ "loss": 0.0398,
2842
+ "step": 750
2843
+ },
2844
+ {
2845
+ "epoch": 26.16,
2846
+ "grad_norm": 0.13823005557060242,
2847
+ "learning_rate": 0.001,
2848
+ "loss": 0.0424,
2849
+ "step": 752
2850
+ },
2851
+ {
2852
+ "epoch": 26.23,
2853
+ "grad_norm": 0.12586718797683716,
2854
+ "learning_rate": 0.001,
2855
+ "loss": 0.0422,
2856
+ "step": 754
2857
+ },
2858
+ {
2859
+ "epoch": 26.3,
2860
+ "grad_norm": 0.12402280420064926,
2861
+ "learning_rate": 0.001,
2862
+ "loss": 0.04,
2863
+ "step": 756
2864
+ },
2865
+ {
2866
+ "epoch": 26.37,
2867
+ "grad_norm": 0.1546175628900528,
2868
+ "learning_rate": 0.001,
2869
+ "loss": 0.0434,
2870
+ "step": 758
2871
+ },
2872
+ {
2873
+ "epoch": 26.43,
2874
+ "grad_norm": 0.15831632912158966,
2875
+ "learning_rate": 0.001,
2876
+ "loss": 0.041,
2877
+ "step": 760
2878
+ },
2879
+ {
2880
+ "epoch": 26.5,
2881
+ "grad_norm": 0.1380414068698883,
2882
+ "learning_rate": 0.001,
2883
+ "loss": 0.0411,
2884
+ "step": 762
2885
+ },
2886
+ {
2887
+ "epoch": 26.57,
2888
+ "grad_norm": 0.13040709495544434,
2889
+ "learning_rate": 0.001,
2890
+ "loss": 0.0412,
2891
+ "step": 764
2892
+ },
2893
+ {
2894
+ "epoch": 26.64,
2895
+ "grad_norm": 0.14285586774349213,
2896
+ "learning_rate": 0.001,
2897
+ "loss": 0.0426,
2898
+ "step": 766
2899
+ },
2900
+ {
2901
+ "epoch": 26.71,
2902
+ "grad_norm": 0.14807988703250885,
2903
+ "learning_rate": 0.001,
2904
+ "loss": 0.045,
2905
+ "step": 768
2906
+ },
2907
+ {
2908
+ "epoch": 26.78,
2909
+ "grad_norm": 0.14204862713813782,
2910
+ "learning_rate": 0.001,
2911
+ "loss": 0.0517,
2912
+ "step": 770
2913
+ },
2914
+ {
2915
+ "epoch": 26.85,
2916
+ "grad_norm": 0.13837137818336487,
2917
+ "learning_rate": 0.001,
2918
+ "loss": 0.0452,
2919
+ "step": 772
2920
+ },
2921
+ {
2922
+ "epoch": 26.92,
2923
+ "grad_norm": 0.14084497094154358,
2924
+ "learning_rate": 0.001,
2925
+ "loss": 0.0475,
2926
+ "step": 774
2927
+ },
2928
+ {
2929
+ "epoch": 26.99,
2930
+ "grad_norm": 0.13428126275539398,
2931
+ "learning_rate": 0.001,
2932
+ "loss": 0.0505,
2933
+ "step": 776
2934
+ },
2935
+ {
2936
+ "epoch": 26.99,
2937
+ "eval_loss": 5.000490665435791,
2938
+ "eval_runtime": 14.2197,
2939
+ "eval_samples_per_second": 23.77,
2940
+ "eval_steps_per_second": 3.024,
2941
+ "step": 776
2942
+ },
2943
+ {
2944
+ "epoch": 27.06,
2945
+ "grad_norm": 0.1252642720937729,
2946
+ "learning_rate": 0.001,
2947
+ "loss": 0.0372,
2948
+ "step": 778
2949
+ },
2950
+ {
2951
+ "epoch": 27.13,
2952
+ "grad_norm": 0.1609179526567459,
2953
+ "learning_rate": 0.001,
2954
+ "loss": 0.0422,
2955
+ "step": 780
2956
+ },
2957
+ {
2958
+ "epoch": 27.2,
2959
+ "grad_norm": 0.15185533463954926,
2960
+ "learning_rate": 0.001,
2961
+ "loss": 0.0427,
2962
+ "step": 782
2963
+ },
2964
+ {
2965
+ "epoch": 27.27,
2966
+ "grad_norm": 0.1398567408323288,
2967
+ "learning_rate": 0.001,
2968
+ "loss": 0.0452,
2969
+ "step": 784
2970
+ },
2971
+ {
2972
+ "epoch": 27.34,
2973
+ "grad_norm": 0.13516364991664886,
2974
+ "learning_rate": 0.001,
2975
+ "loss": 0.0431,
2976
+ "step": 786
2977
+ },
2978
+ {
2979
+ "epoch": 27.41,
2980
+ "grad_norm": 0.1388152688741684,
2981
+ "learning_rate": 0.001,
2982
+ "loss": 0.0424,
2983
+ "step": 788
2984
+ },
2985
+ {
2986
+ "epoch": 27.48,
2987
+ "grad_norm": 0.14043325185775757,
2988
+ "learning_rate": 0.001,
2989
+ "loss": 0.0428,
2990
+ "step": 790
2991
+ },
2992
+ {
2993
+ "epoch": 27.55,
2994
+ "grad_norm": 0.14878694713115692,
2995
+ "learning_rate": 0.001,
2996
+ "loss": 0.0473,
2997
+ "step": 792
2998
+ },
2999
+ {
3000
+ "epoch": 27.62,
3001
+ "grad_norm": 0.15647174417972565,
3002
+ "learning_rate": 0.001,
3003
+ "loss": 0.0448,
3004
+ "step": 794
3005
+ },
3006
+ {
3007
+ "epoch": 27.69,
3008
+ "grad_norm": 0.1453557014465332,
3009
+ "learning_rate": 0.001,
3010
+ "loss": 0.0441,
3011
+ "step": 796
3012
+ },
3013
+ {
3014
+ "epoch": 27.76,
3015
+ "grad_norm": 0.13821879029273987,
3016
+ "learning_rate": 0.001,
3017
+ "loss": 0.0473,
3018
+ "step": 798
3019
+ },
3020
+ {
3021
+ "epoch": 27.83,
3022
+ "grad_norm": 0.1780141144990921,
3023
+ "learning_rate": 0.001,
3024
+ "loss": 0.048,
3025
+ "step": 800
3026
+ },
3027
+ {
3028
+ "epoch": 27.9,
3029
+ "grad_norm": 0.16689875721931458,
3030
+ "learning_rate": 0.001,
3031
+ "loss": 0.0568,
3032
+ "step": 802
3033
+ },
3034
+ {
3035
+ "epoch": 27.97,
3036
+ "grad_norm": 0.16479206085205078,
3037
+ "learning_rate": 0.001,
3038
+ "loss": 0.0576,
3039
+ "step": 804
3040
+ },
3041
+ {
3042
+ "epoch": 28.0,
3043
+ "eval_loss": 5.019590377807617,
3044
+ "eval_runtime": 14.2181,
3045
+ "eval_samples_per_second": 23.772,
3046
+ "eval_steps_per_second": 3.024,
3047
+ "step": 805
3048
+ },
3049
+ {
3050
+ "epoch": 28.03,
3051
+ "grad_norm": 0.17976313829421997,
3052
+ "learning_rate": 0.001,
3053
+ "loss": 0.0586,
3054
+ "step": 806
3055
+ },
3056
+ {
3057
+ "epoch": 28.1,
3058
+ "grad_norm": 0.15373440086841583,
3059
+ "learning_rate": 0.001,
3060
+ "loss": 0.0479,
3061
+ "step": 808
3062
+ },
3063
+ {
3064
+ "epoch": 28.17,
3065
+ "grad_norm": 0.13314908742904663,
3066
+ "learning_rate": 0.001,
3067
+ "loss": 0.044,
3068
+ "step": 810
3069
+ },
3070
+ {
3071
+ "epoch": 28.24,
3072
+ "grad_norm": 0.12571971118450165,
3073
+ "learning_rate": 0.001,
3074
+ "loss": 0.0382,
3075
+ "step": 812
3076
+ },
3077
+ {
3078
+ "epoch": 28.31,
3079
+ "grad_norm": 0.1362116038799286,
3080
+ "learning_rate": 0.001,
3081
+ "loss": 0.0412,
3082
+ "step": 814
3083
+ },
3084
+ {
3085
+ "epoch": 28.38,
3086
+ "grad_norm": 0.1244189664721489,
3087
+ "learning_rate": 0.001,
3088
+ "loss": 0.0422,
3089
+ "step": 816
3090
+ },
3091
+ {
3092
+ "epoch": 28.45,
3093
+ "grad_norm": 0.1321047842502594,
3094
+ "learning_rate": 0.001,
3095
+ "loss": 0.0419,
3096
+ "step": 818
3097
+ },
3098
+ {
3099
+ "epoch": 28.52,
3100
+ "grad_norm": 0.14995931088924408,
3101
+ "learning_rate": 0.001,
3102
+ "loss": 0.0453,
3103
+ "step": 820
3104
+ },
3105
+ {
3106
+ "epoch": 28.59,
3107
+ "grad_norm": 0.1319851577281952,
3108
+ "learning_rate": 0.001,
3109
+ "loss": 0.0428,
3110
+ "step": 822
3111
+ },
3112
+ {
3113
+ "epoch": 28.66,
3114
+ "grad_norm": 0.14392071962356567,
3115
+ "learning_rate": 0.001,
3116
+ "loss": 0.0417,
3117
+ "step": 824
3118
+ },
3119
+ {
3120
+ "epoch": 28.73,
3121
+ "grad_norm": 0.13848307728767395,
3122
+ "learning_rate": 0.001,
3123
+ "loss": 0.0445,
3124
+ "step": 826
3125
+ },
3126
+ {
3127
+ "epoch": 28.8,
3128
+ "grad_norm": 0.1495485007762909,
3129
+ "learning_rate": 0.001,
3130
+ "loss": 0.0451,
3131
+ "step": 828
3132
+ },
3133
+ {
3134
+ "epoch": 28.87,
3135
+ "grad_norm": 0.13798055052757263,
3136
+ "learning_rate": 0.001,
3137
+ "loss": 0.0456,
3138
+ "step": 830
3139
+ },
3140
+ {
3141
+ "epoch": 28.94,
3142
+ "grad_norm": 0.1452612578868866,
3143
+ "learning_rate": 0.001,
3144
+ "loss": 0.0448,
3145
+ "step": 832
3146
+ },
3147
+ {
3148
+ "epoch": 28.97,
3149
+ "eval_loss": 5.109987735748291,
3150
+ "eval_runtime": 14.2104,
3151
+ "eval_samples_per_second": 23.785,
3152
+ "eval_steps_per_second": 3.026,
3153
+ "step": 833
3154
+ },
3155
+ {
3156
+ "epoch": 29.01,
3157
+ "grad_norm": 0.17332157492637634,
3158
+ "learning_rate": 0.001,
3159
+ "loss": 0.0438,
3160
+ "step": 834
3161
+ },
3162
+ {
3163
+ "epoch": 29.08,
3164
+ "grad_norm": 0.12759405374526978,
3165
+ "learning_rate": 0.001,
3166
+ "loss": 0.0391,
3167
+ "step": 836
3168
+ },
3169
+ {
3170
+ "epoch": 29.15,
3171
+ "grad_norm": 0.13717012107372284,
3172
+ "learning_rate": 0.001,
3173
+ "loss": 0.0392,
3174
+ "step": 838
3175
+ },
3176
+ {
3177
+ "epoch": 29.22,
3178
+ "grad_norm": 0.12544453144073486,
3179
+ "learning_rate": 0.001,
3180
+ "loss": 0.0373,
3181
+ "step": 840
3182
+ },
3183
+ {
3184
+ "epoch": 29.29,
3185
+ "grad_norm": 0.12997865676879883,
3186
+ "learning_rate": 0.001,
3187
+ "loss": 0.0403,
3188
+ "step": 842
3189
+ },
3190
+ {
3191
+ "epoch": 29.36,
3192
+ "grad_norm": 0.1252465397119522,
3193
+ "learning_rate": 0.001,
3194
+ "loss": 0.0355,
3195
+ "step": 844
3196
+ },
3197
+ {
3198
+ "epoch": 29.43,
3199
+ "grad_norm": 0.13186001777648926,
3200
+ "learning_rate": 0.001,
3201
+ "loss": 0.046,
3202
+ "step": 846
3203
+ },
3204
+ {
3205
+ "epoch": 29.5,
3206
+ "grad_norm": 0.1309279203414917,
3207
+ "learning_rate": 0.001,
3208
+ "loss": 0.0423,
3209
+ "step": 848
3210
+ },
3211
+ {
3212
+ "epoch": 29.57,
3213
+ "grad_norm": 0.13001705706119537,
3214
+ "learning_rate": 0.001,
3215
+ "loss": 0.0401,
3216
+ "step": 850
3217
+ },
3218
+ {
3219
+ "epoch": 29.63,
3220
+ "grad_norm": 0.13039255142211914,
3221
+ "learning_rate": 0.001,
3222
+ "loss": 0.0396,
3223
+ "step": 852
3224
+ },
3225
+ {
3226
+ "epoch": 29.7,
3227
+ "grad_norm": 0.13635265827178955,
3228
+ "learning_rate": 0.001,
3229
+ "loss": 0.0426,
3230
+ "step": 854
3231
+ },
3232
+ {
3233
+ "epoch": 29.77,
3234
+ "grad_norm": 0.16096022725105286,
3235
+ "learning_rate": 0.001,
3236
+ "loss": 0.0444,
3237
+ "step": 856
3238
+ },
3239
+ {
3240
+ "epoch": 29.84,
3241
+ "grad_norm": 0.12575644254684448,
3242
+ "learning_rate": 0.001,
3243
+ "loss": 0.0427,
3244
+ "step": 858
3245
+ },
3246
+ {
3247
+ "epoch": 29.91,
3248
+ "grad_norm": 0.13646484911441803,
3249
+ "learning_rate": 0.001,
3250
+ "loss": 0.0442,
3251
+ "step": 860
3252
+ },
3253
+ {
3254
+ "epoch": 29.98,
3255
+ "grad_norm": 0.13415445387363434,
3256
+ "learning_rate": 0.001,
3257
+ "loss": 0.0457,
3258
+ "step": 862
3259
+ },
3260
+ {
3261
+ "epoch": 29.98,
3262
+ "eval_loss": 5.0008440017700195,
3263
+ "eval_runtime": 14.2056,
3264
+ "eval_samples_per_second": 23.793,
3265
+ "eval_steps_per_second": 3.027,
3266
+ "step": 862
3267
+ },
3268
+ {
3269
+ "epoch": 30.05,
3270
+ "grad_norm": 0.12284772843122482,
3271
+ "learning_rate": 0.001,
3272
+ "loss": 0.0331,
3273
+ "step": 864
3274
+ },
3275
+ {
3276
+ "epoch": 30.12,
3277
+ "grad_norm": 0.12438986450433731,
3278
+ "learning_rate": 0.001,
3279
+ "loss": 0.0372,
3280
+ "step": 866
3281
+ },
3282
+ {
3283
+ "epoch": 30.19,
3284
+ "grad_norm": 0.12020161747932434,
3285
+ "learning_rate": 0.001,
3286
+ "loss": 0.0366,
3287
+ "step": 868
3288
+ },
3289
+ {
3290
+ "epoch": 30.26,
3291
+ "grad_norm": 0.12868288159370422,
3292
+ "learning_rate": 0.001,
3293
+ "loss": 0.0384,
3294
+ "step": 870
3295
+ },
3296
+ {
3297
+ "epoch": 30.33,
3298
+ "grad_norm": 0.1276613026857376,
3299
+ "learning_rate": 0.001,
3300
+ "loss": 0.0415,
3301
+ "step": 872
3302
+ },
3303
+ {
3304
+ "epoch": 30.4,
3305
+ "grad_norm": 0.13801202178001404,
3306
+ "learning_rate": 0.001,
3307
+ "loss": 0.0403,
3308
+ "step": 874
3309
+ },
3310
+ {
3311
+ "epoch": 30.47,
3312
+ "grad_norm": 0.1174420416355133,
3313
+ "learning_rate": 0.001,
3314
+ "loss": 0.0389,
3315
+ "step": 876
3316
+ },
3317
+ {
3318
+ "epoch": 30.54,
3319
+ "grad_norm": 0.11008256673812866,
3320
+ "learning_rate": 0.001,
3321
+ "loss": 0.0342,
3322
+ "step": 878
3323
+ },
3324
+ {
3325
+ "epoch": 30.61,
3326
+ "grad_norm": 0.12188894301652908,
3327
+ "learning_rate": 0.001,
3328
+ "loss": 0.0382,
3329
+ "step": 880
3330
+ },
3331
+ {
3332
+ "epoch": 30.68,
3333
+ "grad_norm": 0.12297231703996658,
3334
+ "learning_rate": 0.001,
3335
+ "loss": 0.0382,
3336
+ "step": 882
3337
+ },
3338
+ {
3339
+ "epoch": 30.75,
3340
+ "grad_norm": 0.12277786433696747,
3341
+ "learning_rate": 0.001,
3342
+ "loss": 0.0381,
3343
+ "step": 884
3344
+ },
3345
+ {
3346
+ "epoch": 30.82,
3347
+ "grad_norm": 0.13909707963466644,
3348
+ "learning_rate": 0.001,
3349
+ "loss": 0.0385,
3350
+ "step": 886
3351
+ },
3352
+ {
3353
+ "epoch": 30.89,
3354
+ "grad_norm": 0.13425001502037048,
3355
+ "learning_rate": 0.001,
3356
+ "loss": 0.0424,
3357
+ "step": 888
3358
+ },
3359
+ {
3360
+ "epoch": 30.96,
3361
+ "grad_norm": 0.14270208775997162,
3362
+ "learning_rate": 0.001,
3363
+ "loss": 0.0442,
3364
+ "step": 890
3365
+ },
3366
+ {
3367
+ "epoch": 30.99,
3368
+ "eval_loss": 5.50925874710083,
3369
+ "eval_runtime": 14.2033,
3370
+ "eval_samples_per_second": 23.797,
3371
+ "eval_steps_per_second": 3.027,
3372
+ "step": 891
3373
+ },
3374
+ {
3375
+ "epoch": 31.03,
3376
+ "grad_norm": 0.14988122880458832,
3377
+ "learning_rate": 0.001,
3378
+ "loss": 0.0391,
3379
+ "step": 892
3380
+ },
3381
+ {
3382
+ "epoch": 31.1,
3383
+ "grad_norm": 0.12136206030845642,
3384
+ "learning_rate": 0.001,
3385
+ "loss": 0.0368,
3386
+ "step": 894
3387
+ },
3388
+ {
3389
+ "epoch": 31.17,
3390
+ "grad_norm": 0.1474607139825821,
3391
+ "learning_rate": 0.001,
3392
+ "loss": 0.0366,
3393
+ "step": 896
3394
+ },
3395
+ {
3396
+ "epoch": 31.23,
3397
+ "grad_norm": 0.1664348542690277,
3398
+ "learning_rate": 0.001,
3399
+ "loss": 0.0364,
3400
+ "step": 898
3401
+ },
3402
+ {
3403
+ "epoch": 31.3,
3404
+ "grad_norm": 0.12352883070707321,
3405
+ "learning_rate": 0.001,
3406
+ "loss": 0.0368,
3407
+ "step": 900
3408
+ },
3409
+ {
3410
+ "epoch": 31.37,
3411
+ "grad_norm": 0.1317415088415146,
3412
+ "learning_rate": 0.001,
3413
+ "loss": 0.0344,
3414
+ "step": 902
3415
+ },
3416
+ {
3417
+ "epoch": 31.44,
3418
+ "grad_norm": 0.12790362536907196,
3419
+ "learning_rate": 0.001,
3420
+ "loss": 0.034,
3421
+ "step": 904
3422
+ },
3423
+ {
3424
+ "epoch": 31.51,
3425
+ "grad_norm": 0.13639821112155914,
3426
+ "learning_rate": 0.001,
3427
+ "loss": 0.0385,
3428
+ "step": 906
3429
+ },
3430
+ {
3431
+ "epoch": 31.58,
3432
+ "grad_norm": 0.14075782895088196,
3433
+ "learning_rate": 0.001,
3434
+ "loss": 0.0414,
3435
+ "step": 908
3436
+ },
3437
+ {
3438
+ "epoch": 31.65,
3439
+ "grad_norm": 0.1388426125049591,
3440
+ "learning_rate": 0.001,
3441
+ "loss": 0.0423,
3442
+ "step": 910
3443
+ },
3444
+ {
3445
+ "epoch": 31.72,
3446
+ "grad_norm": 0.1304369866847992,
3447
+ "learning_rate": 0.001,
3448
+ "loss": 0.0418,
3449
+ "step": 912
3450
+ },
3451
+ {
3452
+ "epoch": 31.79,
3453
+ "grad_norm": 0.13760504126548767,
3454
+ "learning_rate": 0.001,
3455
+ "loss": 0.0438,
3456
+ "step": 914
3457
+ },
3458
+ {
3459
+ "epoch": 31.86,
3460
+ "grad_norm": 0.13388106226921082,
3461
+ "learning_rate": 0.001,
3462
+ "loss": 0.0442,
3463
+ "step": 916
3464
+ },
3465
+ {
3466
+ "epoch": 31.93,
3467
+ "grad_norm": 0.13234779238700867,
3468
+ "learning_rate": 0.001,
3469
+ "loss": 0.0403,
3470
+ "step": 918
3471
+ },
3472
+ {
3473
+ "epoch": 32.0,
3474
+ "grad_norm": 0.12893211841583252,
3475
+ "learning_rate": 0.001,
3476
+ "loss": 0.0391,
3477
+ "step": 920
3478
+ },
3479
+ {
3480
+ "epoch": 32.0,
3481
+ "eval_loss": 5.429612159729004,
3482
+ "eval_runtime": 14.2128,
3483
+ "eval_samples_per_second": 23.781,
3484
+ "eval_steps_per_second": 3.025,
3485
+ "step": 920
3486
+ },
3487
+ {
3488
+ "epoch": 32.07,
3489
+ "grad_norm": 0.12455160915851593,
3490
+ "learning_rate": 0.001,
3491
+ "loss": 0.0343,
3492
+ "step": 922
3493
+ },
3494
+ {
3495
+ "epoch": 32.14,
3496
+ "grad_norm": 0.13458769023418427,
3497
+ "learning_rate": 0.001,
3498
+ "loss": 0.0358,
3499
+ "step": 924
3500
+ },
3501
+ {
3502
+ "epoch": 32.21,
3503
+ "grad_norm": 0.11662837117910385,
3504
+ "learning_rate": 0.001,
3505
+ "loss": 0.0359,
3506
+ "step": 926
3507
+ },
3508
+ {
3509
+ "epoch": 32.28,
3510
+ "grad_norm": 0.1250811070203781,
3511
+ "learning_rate": 0.001,
3512
+ "loss": 0.0366,
3513
+ "step": 928
3514
+ },
3515
+ {
3516
+ "epoch": 32.35,
3517
+ "grad_norm": 0.1264219582080841,
3518
+ "learning_rate": 0.001,
3519
+ "loss": 0.0374,
3520
+ "step": 930
3521
+ },
3522
+ {
3523
+ "epoch": 32.42,
3524
+ "grad_norm": 0.12272246927022934,
3525
+ "learning_rate": 0.001,
3526
+ "loss": 0.0378,
3527
+ "step": 932
3528
+ },
3529
+ {
3530
+ "epoch": 32.49,
3531
+ "grad_norm": 0.12346846610307693,
3532
+ "learning_rate": 0.001,
3533
+ "loss": 0.0387,
3534
+ "step": 934
3535
+ },
3536
+ {
3537
+ "epoch": 32.56,
3538
+ "grad_norm": 0.12550891935825348,
3539
+ "learning_rate": 0.001,
3540
+ "loss": 0.0355,
3541
+ "step": 936
3542
+ },
3543
+ {
3544
+ "epoch": 32.63,
3545
+ "grad_norm": 0.13145896792411804,
3546
+ "learning_rate": 0.001,
3547
+ "loss": 0.0371,
3548
+ "step": 938
3549
+ },
3550
+ {
3551
+ "epoch": 32.7,
3552
+ "grad_norm": 0.12549975514411926,
3553
+ "learning_rate": 0.001,
3554
+ "loss": 0.038,
3555
+ "step": 940
3556
+ },
3557
+ {
3558
+ "epoch": 32.77,
3559
+ "grad_norm": 0.13180097937583923,
3560
+ "learning_rate": 0.001,
3561
+ "loss": 0.0361,
3562
+ "step": 942
3563
+ },
3564
+ {
3565
+ "epoch": 32.83,
3566
+ "grad_norm": 0.1240687221288681,
3567
+ "learning_rate": 0.001,
3568
+ "loss": 0.0364,
3569
+ "step": 944
3570
+ },
3571
+ {
3572
+ "epoch": 32.9,
3573
+ "grad_norm": 0.14110788702964783,
3574
+ "learning_rate": 0.001,
3575
+ "loss": 0.0421,
3576
+ "step": 946
3577
+ },
3578
+ {
3579
+ "epoch": 32.97,
3580
+ "grad_norm": 0.14410488307476044,
3581
+ "learning_rate": 0.001,
3582
+ "loss": 0.0392,
3583
+ "step": 948
3584
+ },
3585
+ {
3586
+ "epoch": 32.97,
3587
+ "eval_loss": 5.23565673828125,
3588
+ "eval_runtime": 14.2111,
3589
+ "eval_samples_per_second": 23.784,
3590
+ "eval_steps_per_second": 3.026,
3591
+ "step": 948
3592
+ },
3593
+ {
3594
+ "epoch": 33.04,
3595
+ "grad_norm": 0.12806980311870575,
3596
+ "learning_rate": 0.001,
3597
+ "loss": 0.041,
3598
+ "step": 950
3599
+ },
3600
+ {
3601
+ "epoch": 33.11,
3602
+ "grad_norm": 0.12071588635444641,
3603
+ "learning_rate": 0.001,
3604
+ "loss": 0.0401,
3605
+ "step": 952
3606
+ },
3607
+ {
3608
+ "epoch": 33.18,
3609
+ "grad_norm": 0.11720431596040726,
3610
+ "learning_rate": 0.001,
3611
+ "loss": 0.035,
3612
+ "step": 954
3613
+ },
3614
+ {
3615
+ "epoch": 33.25,
3616
+ "grad_norm": 0.12089748680591583,
3617
+ "learning_rate": 0.001,
3618
+ "loss": 0.0389,
3619
+ "step": 956
3620
+ },
3621
+ {
3622
+ "epoch": 33.32,
3623
+ "grad_norm": 0.1227785050868988,
3624
+ "learning_rate": 0.001,
3625
+ "loss": 0.0389,
3626
+ "step": 958
3627
+ },
3628
+ {
3629
+ "epoch": 33.39,
3630
+ "grad_norm": 0.12859608232975006,
3631
+ "learning_rate": 0.001,
3632
+ "loss": 0.0342,
3633
+ "step": 960
3634
+ },
3635
+ {
3636
+ "epoch": 33.46,
3637
+ "grad_norm": 0.2180158495903015,
3638
+ "learning_rate": 0.001,
3639
+ "loss": 0.041,
3640
+ "step": 962
3641
+ },
3642
+ {
3643
+ "epoch": 33.53,
3644
+ "grad_norm": 0.2960141897201538,
3645
+ "learning_rate": 0.001,
3646
+ "loss": 0.0989,
3647
+ "step": 964
3648
+ },
3649
+ {
3650
+ "epoch": 33.6,
3651
+ "grad_norm": 0.15140074491500854,
3652
+ "learning_rate": 0.001,
3653
+ "loss": 0.0503,
3654
+ "step": 966
3655
+ },
3656
+ {
3657
+ "epoch": 33.67,
3658
+ "grad_norm": 0.13876180350780487,
3659
+ "learning_rate": 0.001,
3660
+ "loss": 0.0418,
3661
+ "step": 968
3662
+ },
3663
+ {
3664
+ "epoch": 33.74,
3665
+ "grad_norm": 0.1290924996137619,
3666
+ "learning_rate": 0.001,
3667
+ "loss": 0.0413,
3668
+ "step": 970
3669
+ },
3670
+ {
3671
+ "epoch": 33.81,
3672
+ "grad_norm": 0.13152842223644257,
3673
+ "learning_rate": 0.001,
3674
+ "loss": 0.0398,
3675
+ "step": 972
3676
+ },
3677
+ {
3678
+ "epoch": 33.88,
3679
+ "grad_norm": 0.1308111995458603,
3680
+ "learning_rate": 0.001,
3681
+ "loss": 0.0384,
3682
+ "step": 974
3683
+ },
3684
+ {
3685
+ "epoch": 33.95,
3686
+ "grad_norm": 0.12206979840993881,
3687
+ "learning_rate": 0.001,
3688
+ "loss": 0.0376,
3689
+ "step": 976
3690
+ },
3691
+ {
3692
+ "epoch": 33.98,
3693
+ "eval_loss": 5.226629734039307,
3694
+ "eval_runtime": 14.2113,
3695
+ "eval_samples_per_second": 23.784,
3696
+ "eval_steps_per_second": 3.026,
3697
+ "step": 977
3698
+ },
3699
+ {
3700
+ "epoch": 34.02,
3701
+ "grad_norm": 0.15338914096355438,
3702
+ "learning_rate": 0.001,
3703
+ "loss": 0.036,
3704
+ "step": 978
3705
+ },
3706
+ {
3707
+ "epoch": 34.09,
3708
+ "grad_norm": 0.12300385534763336,
3709
+ "learning_rate": 0.001,
3710
+ "loss": 0.0357,
3711
+ "step": 980
3712
+ },
3713
+ {
3714
+ "epoch": 34.16,
3715
+ "grad_norm": 0.10930658131837845,
3716
+ "learning_rate": 0.001,
3717
+ "loss": 0.03,
3718
+ "step": 982
3719
+ },
3720
+ {
3721
+ "epoch": 34.23,
3722
+ "grad_norm": 0.12387796491384506,
3723
+ "learning_rate": 0.001,
3724
+ "loss": 0.0393,
3725
+ "step": 984
3726
+ },
3727
+ {
3728
+ "epoch": 34.3,
3729
+ "grad_norm": 0.1442791372537613,
3730
+ "learning_rate": 0.001,
3731
+ "loss": 0.0327,
3732
+ "step": 986
3733
+ },
3734
+ {
3735
+ "epoch": 34.37,
3736
+ "grad_norm": 0.12000349164009094,
3737
+ "learning_rate": 0.001,
3738
+ "loss": 0.0356,
3739
+ "step": 988
3740
+ },
3741
+ {
3742
+ "epoch": 34.43,
3743
+ "grad_norm": 0.12804967164993286,
3744
+ "learning_rate": 0.001,
3745
+ "loss": 0.0415,
3746
+ "step": 990
3747
+ },
3748
+ {
3749
+ "epoch": 34.5,
3750
+ "grad_norm": 0.12344588339328766,
3751
+ "learning_rate": 0.001,
3752
+ "loss": 0.0348,
3753
+ "step": 992
3754
+ },
3755
+ {
3756
+ "epoch": 34.57,
3757
+ "grad_norm": 0.12841206789016724,
3758
+ "learning_rate": 0.001,
3759
+ "loss": 0.036,
3760
+ "step": 994
3761
+ },
3762
+ {
3763
+ "epoch": 34.64,
3764
+ "grad_norm": 0.11577708274126053,
3765
+ "learning_rate": 0.001,
3766
+ "loss": 0.0349,
3767
+ "step": 996
3768
+ },
3769
+ {
3770
+ "epoch": 34.71,
3771
+ "grad_norm": 0.12154724448919296,
3772
+ "learning_rate": 0.001,
3773
+ "loss": 0.0384,
3774
+ "step": 998
3775
+ },
3776
+ {
3777
+ "epoch": 34.78,
3778
+ "grad_norm": 0.12926706671714783,
3779
+ "learning_rate": 0.001,
3780
+ "loss": 0.0376,
3781
+ "step": 1000
3782
+ },
3783
+ {
3784
+ "epoch": 34.85,
3785
+ "grad_norm": 0.11031629890203476,
3786
+ "learning_rate": 0.001,
3787
+ "loss": 0.0341,
3788
+ "step": 1002
3789
+ },
3790
+ {
3791
+ "epoch": 34.92,
3792
+ "grad_norm": 0.13376009464263916,
3793
+ "learning_rate": 0.001,
3794
+ "loss": 0.0352,
3795
+ "step": 1004
3796
+ },
3797
+ {
3798
+ "epoch": 34.99,
3799
+ "grad_norm": 0.1240614578127861,
3800
+ "learning_rate": 0.001,
3801
+ "loss": 0.0381,
3802
+ "step": 1006
3803
+ },
3804
+ {
3805
+ "epoch": 34.99,
3806
+ "eval_loss": 5.263000965118408,
3807
+ "eval_runtime": 14.2061,
3808
+ "eval_samples_per_second": 23.793,
3809
+ "eval_steps_per_second": 3.027,
3810
+ "step": 1006
3811
+ },
3812
+ {
3813
+ "epoch": 35.06,
3814
+ "grad_norm": 0.12797795236110687,
3815
+ "learning_rate": 0.001,
3816
+ "loss": 0.0339,
3817
+ "step": 1008
3818
+ },
3819
+ {
3820
+ "epoch": 35.13,
3821
+ "grad_norm": 0.13305290043354034,
3822
+ "learning_rate": 0.001,
3823
+ "loss": 0.0333,
3824
+ "step": 1010
3825
+ },
3826
+ {
3827
+ "epoch": 35.2,
3828
+ "grad_norm": 0.12507498264312744,
3829
+ "learning_rate": 0.001,
3830
+ "loss": 0.0316,
3831
+ "step": 1012
3832
+ },
3833
+ {
3834
+ "epoch": 35.27,
3835
+ "grad_norm": 0.1163344755768776,
3836
+ "learning_rate": 0.001,
3837
+ "loss": 0.0323,
3838
+ "step": 1014
3839
+ },
3840
+ {
3841
+ "epoch": 35.34,
3842
+ "grad_norm": 0.11232735961675644,
3843
+ "learning_rate": 0.001,
3844
+ "loss": 0.035,
3845
+ "step": 1016
3846
+ },
3847
+ {
3848
+ "epoch": 35.41,
3849
+ "grad_norm": 0.11937879770994186,
3850
+ "learning_rate": 0.001,
3851
+ "loss": 0.0343,
3852
+ "step": 1018
3853
+ },
3854
+ {
3855
+ "epoch": 35.48,
3856
+ "grad_norm": 0.1227354183793068,
3857
+ "learning_rate": 0.001,
3858
+ "loss": 0.0316,
3859
+ "step": 1020
3860
+ },
3861
+ {
3862
+ "epoch": 35.55,
3863
+ "grad_norm": 0.1201673150062561,
3864
+ "learning_rate": 0.001,
3865
+ "loss": 0.0345,
3866
+ "step": 1022
3867
+ },
3868
+ {
3869
+ "epoch": 35.62,
3870
+ "grad_norm": 0.11625051498413086,
3871
+ "learning_rate": 0.001,
3872
+ "loss": 0.0305,
3873
+ "step": 1024
3874
+ },
3875
+ {
3876
+ "epoch": 35.69,
3877
+ "grad_norm": 0.1144721657037735,
3878
+ "learning_rate": 0.001,
3879
+ "loss": 0.03,
3880
+ "step": 1026
3881
+ },
3882
+ {
3883
+ "epoch": 35.76,
3884
+ "grad_norm": 0.12341687083244324,
3885
+ "learning_rate": 0.001,
3886
+ "loss": 0.0374,
3887
+ "step": 1028
3888
+ },
3889
+ {
3890
+ "epoch": 35.83,
3891
+ "grad_norm": 0.12577305734157562,
3892
+ "learning_rate": 0.001,
3893
+ "loss": 0.0347,
3894
+ "step": 1030
3895
+ },
3896
+ {
3897
+ "epoch": 35.9,
3898
+ "grad_norm": 0.11321111768484116,
3899
+ "learning_rate": 0.001,
3900
+ "loss": 0.0336,
3901
+ "step": 1032
3902
+ },
3903
+ {
3904
+ "epoch": 35.97,
3905
+ "grad_norm": 0.11875444650650024,
3906
+ "learning_rate": 0.001,
3907
+ "loss": 0.0339,
3908
+ "step": 1034
3909
+ },
3910
+ {
3911
+ "epoch": 36.0,
3912
+ "eval_loss": 5.353184700012207,
3913
+ "eval_runtime": 14.1981,
3914
+ "eval_samples_per_second": 23.806,
3915
+ "eval_steps_per_second": 3.029,
3916
+ "step": 1035
3917
+ },
3918
+ {
3919
+ "epoch": 36.03,
3920
+ "grad_norm": 0.122185118496418,
3921
+ "learning_rate": 0.001,
3922
+ "loss": 0.0285,
3923
+ "step": 1036
3924
+ },
3925
+ {
3926
+ "epoch": 36.1,
3927
+ "grad_norm": 0.1075131967663765,
3928
+ "learning_rate": 0.001,
3929
+ "loss": 0.0312,
3930
+ "step": 1038
3931
+ },
3932
+ {
3933
+ "epoch": 36.17,
3934
+ "grad_norm": 0.12375488132238388,
3935
+ "learning_rate": 0.001,
3936
+ "loss": 0.0291,
3937
+ "step": 1040
3938
+ },
3939
+ {
3940
+ "epoch": 36.24,
3941
+ "grad_norm": 0.12728258967399597,
3942
+ "learning_rate": 0.001,
3943
+ "loss": 0.0333,
3944
+ "step": 1042
3945
+ },
3946
+ {
3947
+ "epoch": 36.31,
3948
+ "grad_norm": 0.1169193834066391,
3949
+ "learning_rate": 0.001,
3950
+ "loss": 0.0309,
3951
+ "step": 1044
3952
+ },
3953
+ {
3954
+ "epoch": 36.38,
3955
+ "grad_norm": 0.11048964411020279,
3956
+ "learning_rate": 0.001,
3957
+ "loss": 0.0299,
3958
+ "step": 1046
3959
+ },
3960
+ {
3961
+ "epoch": 36.45,
3962
+ "grad_norm": 0.11923568695783615,
3963
+ "learning_rate": 0.001,
3964
+ "loss": 0.0301,
3965
+ "step": 1048
3966
+ },
3967
+ {
3968
+ "epoch": 36.52,
3969
+ "grad_norm": 0.1160750761628151,
3970
+ "learning_rate": 0.001,
3971
+ "loss": 0.0287,
3972
+ "step": 1050
3973
+ },
3974
+ {
3975
+ "epoch": 36.59,
3976
+ "grad_norm": 0.12535539269447327,
3977
+ "learning_rate": 0.001,
3978
+ "loss": 0.0335,
3979
+ "step": 1052
3980
+ },
3981
+ {
3982
+ "epoch": 36.66,
3983
+ "grad_norm": 0.13712206482887268,
3984
+ "learning_rate": 0.001,
3985
+ "loss": 0.0358,
3986
+ "step": 1054
3987
+ },
3988
+ {
3989
+ "epoch": 36.73,
3990
+ "grad_norm": 0.12085587531328201,
3991
+ "learning_rate": 0.001,
3992
+ "loss": 0.0362,
3993
+ "step": 1056
3994
+ },
3995
+ {
3996
+ "epoch": 36.8,
3997
+ "grad_norm": 0.1010272428393364,
3998
+ "learning_rate": 0.001,
3999
+ "loss": 0.0324,
4000
+ "step": 1058
4001
+ },
4002
+ {
4003
+ "epoch": 36.87,
4004
+ "grad_norm": 0.11820247769355774,
4005
+ "learning_rate": 0.001,
4006
+ "loss": 0.0315,
4007
+ "step": 1060
4008
+ },
4009
+ {
4010
+ "epoch": 36.94,
4011
+ "grad_norm": 0.1259121298789978,
4012
+ "learning_rate": 0.001,
4013
+ "loss": 0.0377,
4014
+ "step": 1062
4015
+ },
4016
+ {
4017
+ "epoch": 36.97,
4018
+ "eval_loss": 5.44431734085083,
4019
+ "eval_runtime": 14.2065,
4020
+ "eval_samples_per_second": 23.792,
4021
+ "eval_steps_per_second": 3.027,
4022
+ "step": 1063
4023
+ },
4024
+ {
4025
+ "epoch": 37.01,
4026
+ "grad_norm": 0.12725719809532166,
4027
+ "learning_rate": 0.001,
4028
+ "loss": 0.0277,
4029
+ "step": 1064
4030
+ },
4031
+ {
4032
+ "epoch": 37.08,
4033
+ "grad_norm": 0.11410412192344666,
4034
+ "learning_rate": 0.001,
4035
+ "loss": 0.0282,
4036
+ "step": 1066
4037
+ },
4038
+ {
4039
+ "epoch": 37.15,
4040
+ "grad_norm": 0.11205872148275375,
4041
+ "learning_rate": 0.001,
4042
+ "loss": 0.0302,
4043
+ "step": 1068
4044
+ },
4045
+ {
4046
+ "epoch": 37.22,
4047
+ "grad_norm": 0.09593572467565536,
4048
+ "learning_rate": 0.001,
4049
+ "loss": 0.0263,
4050
+ "step": 1070
4051
+ },
4052
+ {
4053
+ "epoch": 37.29,
4054
+ "grad_norm": 0.12462112307548523,
4055
+ "learning_rate": 0.001,
4056
+ "loss": 0.0328,
4057
+ "step": 1072
4058
+ },
4059
+ {
4060
+ "epoch": 37.36,
4061
+ "grad_norm": 0.11873652786016464,
4062
+ "learning_rate": 0.001,
4063
+ "loss": 0.0328,
4064
+ "step": 1074
4065
+ },
4066
+ {
4067
+ "epoch": 37.43,
4068
+ "grad_norm": 0.1105048656463623,
4069
+ "learning_rate": 0.001,
4070
+ "loss": 0.0298,
4071
+ "step": 1076
4072
+ },
4073
+ {
4074
+ "epoch": 37.5,
4075
+ "grad_norm": 0.10692736506462097,
4076
+ "learning_rate": 0.001,
4077
+ "loss": 0.0328,
4078
+ "step": 1078
4079
+ },
4080
+ {
4081
+ "epoch": 37.57,
4082
+ "grad_norm": 0.12755931913852692,
4083
+ "learning_rate": 0.001,
4084
+ "loss": 0.0321,
4085
+ "step": 1080
4086
+ },
4087
+ {
4088
+ "epoch": 37.63,
4089
+ "grad_norm": 0.13307899236679077,
4090
+ "learning_rate": 0.001,
4091
+ "loss": 0.0329,
4092
+ "step": 1082
4093
+ },
4094
+ {
4095
+ "epoch": 37.7,
4096
+ "grad_norm": 0.12125247716903687,
4097
+ "learning_rate": 0.001,
4098
+ "loss": 0.034,
4099
+ "step": 1084
4100
+ },
4101
+ {
4102
+ "epoch": 37.77,
4103
+ "grad_norm": 0.12050389498472214,
4104
+ "learning_rate": 0.001,
4105
+ "loss": 0.031,
4106
+ "step": 1086
4107
+ },
4108
+ {
4109
+ "epoch": 37.84,
4110
+ "grad_norm": 0.11437363177537918,
4111
+ "learning_rate": 0.001,
4112
+ "loss": 0.0308,
4113
+ "step": 1088
4114
+ },
4115
+ {
4116
+ "epoch": 37.91,
4117
+ "grad_norm": 0.13162565231323242,
4118
+ "learning_rate": 0.001,
4119
+ "loss": 0.037,
4120
+ "step": 1090
4121
+ },
4122
+ {
4123
+ "epoch": 37.98,
4124
+ "grad_norm": 0.13537472486495972,
4125
+ "learning_rate": 0.001,
4126
+ "loss": 0.0336,
4127
+ "step": 1092
4128
+ },
4129
+ {
4130
+ "epoch": 37.98,
4131
+ "eval_loss": 5.080876350402832,
4132
+ "eval_runtime": 14.2123,
4133
+ "eval_samples_per_second": 23.782,
4134
+ "eval_steps_per_second": 3.026,
4135
+ "step": 1092
4136
+ },
4137
+ {
4138
+ "epoch": 38.05,
4139
+ "grad_norm": 0.10324922204017639,
4140
+ "learning_rate": 0.001,
4141
+ "loss": 0.0311,
4142
+ "step": 1094
4143
+ },
4144
+ {
4145
+ "epoch": 38.12,
4146
+ "grad_norm": 0.10196162760257721,
4147
+ "learning_rate": 0.001,
4148
+ "loss": 0.0274,
4149
+ "step": 1096
4150
+ },
4151
+ {
4152
+ "epoch": 38.19,
4153
+ "grad_norm": 0.10867214947938919,
4154
+ "learning_rate": 0.001,
4155
+ "loss": 0.0281,
4156
+ "step": 1098
4157
+ },
4158
+ {
4159
+ "epoch": 38.26,
4160
+ "grad_norm": 0.10762563347816467,
4161
+ "learning_rate": 0.001,
4162
+ "loss": 0.0301,
4163
+ "step": 1100
4164
+ },
4165
+ {
4166
+ "epoch": 38.33,
4167
+ "grad_norm": 0.10636897385120392,
4168
+ "learning_rate": 0.001,
4169
+ "loss": 0.0305,
4170
+ "step": 1102
4171
+ },
4172
+ {
4173
+ "epoch": 38.4,
4174
+ "grad_norm": 0.10994244366884232,
4175
+ "learning_rate": 0.001,
4176
+ "loss": 0.0328,
4177
+ "step": 1104
4178
+ },
4179
+ {
4180
+ "epoch": 38.47,
4181
+ "grad_norm": 0.11419042944908142,
4182
+ "learning_rate": 0.001,
4183
+ "loss": 0.0328,
4184
+ "step": 1106
4185
+ },
4186
+ {
4187
+ "epoch": 38.54,
4188
+ "grad_norm": 0.10719581693410873,
4189
+ "learning_rate": 0.001,
4190
+ "loss": 0.0359,
4191
+ "step": 1108
4192
+ },
4193
+ {
4194
+ "epoch": 38.61,
4195
+ "grad_norm": 0.13429002463817596,
4196
+ "learning_rate": 0.001,
4197
+ "loss": 0.0302,
4198
+ "step": 1110
4199
+ },
4200
+ {
4201
+ "epoch": 38.68,
4202
+ "grad_norm": 0.11462274938821793,
4203
+ "learning_rate": 0.001,
4204
+ "loss": 0.0311,
4205
+ "step": 1112
4206
+ },
4207
+ {
4208
+ "epoch": 38.75,
4209
+ "grad_norm": 0.12587036192417145,
4210
+ "learning_rate": 0.001,
4211
+ "loss": 0.0303,
4212
+ "step": 1114
4213
+ },
4214
+ {
4215
+ "epoch": 38.82,
4216
+ "grad_norm": 0.11628645658493042,
4217
+ "learning_rate": 0.001,
4218
+ "loss": 0.0294,
4219
+ "step": 1116
4220
+ },
4221
+ {
4222
+ "epoch": 38.89,
4223
+ "grad_norm": 0.1132420152425766,
4224
+ "learning_rate": 0.001,
4225
+ "loss": 0.0344,
4226
+ "step": 1118
4227
+ },
4228
+ {
4229
+ "epoch": 38.96,
4230
+ "grad_norm": 0.12212924659252167,
4231
+ "learning_rate": 0.001,
4232
+ "loss": 0.0316,
4233
+ "step": 1120
4234
+ },
4235
+ {
4236
+ "epoch": 38.96,
4237
+ "eval_loss": 5.267875671386719,
4238
+ "eval_runtime": 14.2092,
4239
+ "eval_samples_per_second": 23.787,
4240
+ "eval_steps_per_second": 3.026,
4241
+ "step": 1120
4242
+ },
4243
+ {
4244
+ "epoch": 38.96,
4245
+ "step": 1120,
4246
+ "total_flos": 1.0970359673435505e+18,
4247
+ "train_loss": 0.32971865670822026,
4248
+ "train_runtime": 25027.7089,
4249
+ "train_samples_per_second": 5.87,
4250
+ "train_steps_per_second": 0.045
4251
+ }
4252
+ ],
4253
+ "logging_steps": 2,
4254
+ "max_steps": 1120,
4255
+ "num_input_tokens_seen": 0,
4256
+ "num_train_epochs": 40,
4257
+ "save_steps": 500,
4258
+ "total_flos": 1.0970359673435505e+18,
4259
+ "train_batch_size": 8,
4260
+ "trial_name": null,
4261
+ "trial_params": null
4262
+ }