pearsonkyle commited on
Commit
a6f4731
1 Parent(s): 3c61fd9
Files changed (8) hide show
  1. .gitattributes +1 -0
  2. README.md +110 -0
  3. config.json +35 -0
  4. optimizer.pt +3 -0
  5. pytorch_model.bin +3 -0
  6. scheduler.pt +3 -0
  7. trainer_state.json +2026 -0
  8. training_args.bin +3 -0
.gitattributes CHANGED
@@ -6,3 +6,4 @@
6
  *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
  *.ot filter=lfs diff=lfs merge=lfs -text
8
  *.onnx filter=lfs diff=lfs merge=lfs -text
 
 
6
  *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
  *.ot filter=lfs diff=lfs merge=lfs -text
8
  *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.pt filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Exo-Machina
2
+ A deep language model, GPT-2, is trained on scientific manuscripts from NASA's Astrophysical Data System pertaining to extrasolar planets and the references therein. This pilot study uses the abstracts of each article as training data in order to explore correlations in scientific literature from a language perspective. A language model is a mathematical representation for an algorithm used to generate sequences in the same way a human would to form sentances. Each word or letter in a sentance is encoded to a numerical value (e.g. using word2vec) and is appended to a list forming sequences that represent up to a paragraph worth of text. The sequences are fed into the [GPT-2](https://openai.com/blog/better-language-models/) 117M model and trained for 500,000 steps with fine tuning. After training, the language model is used to generate new text from scratch and from user input.
3
+
4
+ - ### [Browse some samples](https://pearsonkyle.github.io/Exo-Machina/)
5
+
6
+ - ### [Train a model on Google Colab]()
7
+
8
+ ### Get started fast:
9
+
10
+ ```python
11
+ from transformers import pipeline
12
+
13
+ exo = pipeline('text-generation',model='gpt2-exomachina/checkpoint-320000', tokenizer='gpt2', config={'max_length':1600})
14
+ machina = lambda text: exo(text)[0]['generated_text']
15
+
16
+ print(machina("Transiting exoplanets are"))
17
+ ```
18
+
19
+ ## Training Samples
20
+ ~40,000 Abstracts from NASA's Astrophysical data system (ADS) and ArXiv.
21
+
22
+ ![](Figures/exoplanet_keywords.png)
23
+
24
+ A few generated samples are below:
25
+
26
+ - *We can remotely sense an atmosphere by observing its reflected, transmitted, or emitted light in varying geometries. This light will contain information on the planetary conditions including* `temperature, pressure, composition, and cloud optical thickness. One such property that is important is...`
27
+ - *The reflectance of Earth's vegetation suggests*
28
+ `that large, deciduous forest fires are composed of mostly dry, unprocessed material that is distributed in a nearly patchy fashion. The distributions of these fires are correlated with temperature, and also with vegetation...`
29
+ - *Directly imaged exoplanets probe* `key aspects of planet formation and evolution theory, as well as atmospheric and interior physics. These insights have led to numerous direct imaging instruments for exoplanets, many using polarimetry. However, current instruments take`
30
+
31
+ ## Instructions
32
+
33
+ - ### Setup a SQL Databse to store training samples
34
+ A postegres SQL database is set up on Amazon RDS in order to provide online access to the same data for multiple computers. Follow the instructions below to set up your own database using the Free Tier of services on AWS:
35
+
36
+ 1. sign in or register: https://aws.amazon.com/
37
+ 2. Search for a services and go to RDS
38
+
39
+ Add your credentials to a new file called `settings.json` like such:
40
+ ```
41
+ {
42
+ "database":{
43
+ "dialect":"postgresql",
44
+ "username":"readonly",
45
+ "password":"readonly",
46
+ "endpoint":"exomachina.c4luhvcn1k1s.us-east-2.rds.amazonaws.com",
47
+ "port":5432,
48
+ "dbname":"exomachina"
49
+ }
50
+ }
51
+ ```
52
+
53
+ ## Scraping NASA ADS
54
+
55
+ https://ads.readthedocs.io/en/latest/
56
+
57
+ Scrape ADS and save entries into a sql database:
58
+
59
+ `python ads_query.py -s settings.json -q exoplanet`
60
+
61
+ ```
62
+ usage: ads_query.py [-h] [-q QUERY] [-s SETTINGS] [-k KEY]
63
+
64
+ optional arguments:
65
+ -h, --help show this help message and exit
66
+ -q QUERY, --query QUERY
67
+ Initial search criteria
68
+ -s SETTINGS, --settings SETTINGS
69
+ Settings file
70
+ -k KEY, --key KEY Settings key
71
+ ```
72
+
73
+ Letting the scrape run for ~2 hours found articles from these publications in descending order:
74
+ ```
75
+ 5364 - The Astrophysical Journal
76
+ 3365 - Astronomy and Astrophysics
77
+ 2704 - Monthly Notices of the Royal Astronomical Society
78
+ 1355 - The Astronomical Journal
79
+ 617 - arXiv e-prints
80
+ 498 - Icarus
81
+ 388 - Publications of the Astronomical Society of the Pacific
82
+ 324 - The Astrophysical Journal Supplement Series
83
+ 245 - Nature
84
+ 187 - Journal of Geophysical Research
85
+ 167 - Science
86
+ 145 - Astronomische Nachrichten
87
+ 129 - Planetary and Space Science
88
+ 114 - Space Science Reviews
89
+ 109 - Geophysical Research Letters
90
+ ```
91
+
92
+ The number of manuscripts for each year:
93
+ ![](Figures/exoplanet_histogram.png)
94
+
95
+ ## Pre-processing
96
+ Extract abstracts from the database and create a new file where each line is an new sample. Try a new tokenizer
97
+
98
+ ## Things to improve
99
+
100
+ ## Export the models to an iOS application
101
+
102
+
103
+ References
104
+ - https://huggingface.co/roberta-base
105
+ - GPT-2 generative text
106
+ - https://huggingface.co/docs
107
+ - https://huggingface.co/transformers/training.html
108
+ - https://huggingface.co/transformers/notebooks.html
109
+ https://colab.research.google.com/drive/1vsCh85T_Od7RBwXfvh1iysV-vTxmWXQO#scrollTo=ljknzOlNoyrv
110
+ http://jalammar.github.io/illustrated-gpt2/
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./gpt2-exomachina/checkpoint-109000",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "gradient_checkpointing": false,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_epsilon": 1e-05,
14
+ "model_type": "gpt2",
15
+ "n_ctx": 1024,
16
+ "n_embd": 768,
17
+ "n_head": 12,
18
+ "n_inner": null,
19
+ "n_layer": 12,
20
+ "n_positions": 1024,
21
+ "pad_token_id": 50256,
22
+ "resid_pdrop": 0.1,
23
+ "summary_activation": null,
24
+ "summary_first_dropout": 0.1,
25
+ "summary_proj_to_labels": true,
26
+ "summary_type": "cls_index",
27
+ "summary_use_proj": true,
28
+ "task_specific_params": {
29
+ "text-generation": {
30
+ "do_sample": true,
31
+ "max_length": 50
32
+ }
33
+ },
34
+ "vocab_size": 50257
35
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a221bc4b1968eda415bba8c105db6a2b8b579edc18d1e3592730d101e8d80126
3
+ size 995610991
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:221f869bcca2d877a6bdf0f9b7bf14aac02003800ac264f8b39100f9db2d1f1c
3
+ size 510407951
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4538d9d56412255e685cbe4b709f49c4c2ce0cf165b70b84bab6f912af923e07
3
+ size 623
trainer_state.json ADDED
@@ -0,0 +1,2026 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 63.36296576508417,
5
+ "global_step": 335000,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.18914318138831096,
12
+ "learning_rate": 4.990542840930585e-05,
13
+ "loss": 2.2949150390625,
14
+ "step": 1000
15
+ },
16
+ {
17
+ "epoch": 0.3782863627766219,
18
+ "learning_rate": 4.981085681861169e-05,
19
+ "loss": 2.31815234375,
20
+ "step": 2000
21
+ },
22
+ {
23
+ "epoch": 0.5674295441649329,
24
+ "learning_rate": 4.9716285227917534e-05,
25
+ "loss": 2.32872216796875,
26
+ "step": 3000
27
+ },
28
+ {
29
+ "epoch": 0.7565727255532438,
30
+ "learning_rate": 4.962171363722338e-05,
31
+ "loss": 2.33826904296875,
32
+ "step": 4000
33
+ },
34
+ {
35
+ "epoch": 0.9457159069415547,
36
+ "learning_rate": 4.9527142046529224e-05,
37
+ "loss": 2.340685546875,
38
+ "step": 5000
39
+ },
40
+ {
41
+ "epoch": 1.1348590883298657,
42
+ "learning_rate": 4.943257045583507e-05,
43
+ "loss": 2.28626953125,
44
+ "step": 6000
45
+ },
46
+ {
47
+ "epoch": 1.3240022697181766,
48
+ "learning_rate": 4.9337998865140915e-05,
49
+ "loss": 2.27450390625,
50
+ "step": 7000
51
+ },
52
+ {
53
+ "epoch": 1.5131454511064875,
54
+ "learning_rate": 4.9243427274446756e-05,
55
+ "loss": 2.287498046875,
56
+ "step": 8000
57
+ },
58
+ {
59
+ "epoch": 1.7022886324947986,
60
+ "learning_rate": 4.9148855683752605e-05,
61
+ "loss": 2.30305078125,
62
+ "step": 9000
63
+ },
64
+ {
65
+ "epoch": 1.8914318138831097,
66
+ "learning_rate": 4.905428409305845e-05,
67
+ "loss": 2.30912890625,
68
+ "step": 10000
69
+ },
70
+ {
71
+ "epoch": 2.0805749952714203,
72
+ "learning_rate": 4.895971250236429e-05,
73
+ "loss": 2.272595703125,
74
+ "step": 11000
75
+ },
76
+ {
77
+ "epoch": 2.2697181766597314,
78
+ "learning_rate": 4.886514091167014e-05,
79
+ "loss": 2.238087890625,
80
+ "step": 12000
81
+ },
82
+ {
83
+ "epoch": 2.4588613580480425,
84
+ "learning_rate": 4.877056932097598e-05,
85
+ "loss": 2.253140625,
86
+ "step": 13000
87
+ },
88
+ {
89
+ "epoch": 2.648004539436353,
90
+ "learning_rate": 4.867599773028183e-05,
91
+ "loss": 2.26483203125,
92
+ "step": 14000
93
+ },
94
+ {
95
+ "epoch": 2.8371477208246643,
96
+ "learning_rate": 4.858142613958767e-05,
97
+ "loss": 2.2735078125,
98
+ "step": 15000
99
+ },
100
+ {
101
+ "epoch": 3.0262909022129754,
102
+ "learning_rate": 4.848685454889351e-05,
103
+ "loss": 2.27125,
104
+ "step": 16000
105
+ },
106
+ {
107
+ "epoch": 3.215434083601286,
108
+ "learning_rate": 4.839228295819936e-05,
109
+ "loss": 2.20089453125,
110
+ "step": 17000
111
+ },
112
+ {
113
+ "epoch": 3.404577264989597,
114
+ "learning_rate": 4.829771136750521e-05,
115
+ "loss": 2.2166953125,
116
+ "step": 18000
117
+ },
118
+ {
119
+ "epoch": 3.593720446377908,
120
+ "learning_rate": 4.820313977681105e-05,
121
+ "loss": 2.2287421875,
122
+ "step": 19000
123
+ },
124
+ {
125
+ "epoch": 3.782863627766219,
126
+ "learning_rate": 4.810856818611689e-05,
127
+ "loss": 2.242296875,
128
+ "step": 20000
129
+ },
130
+ {
131
+ "epoch": 3.97200680915453,
132
+ "learning_rate": 4.801399659542274e-05,
133
+ "loss": 2.25162109375,
134
+ "step": 21000
135
+ },
136
+ {
137
+ "epoch": 4.161149990542841,
138
+ "learning_rate": 4.791942500472858e-05,
139
+ "loss": 2.17917578125,
140
+ "step": 22000
141
+ },
142
+ {
143
+ "epoch": 4.350293171931152,
144
+ "learning_rate": 4.782485341403443e-05,
145
+ "loss": 2.1837734375,
146
+ "step": 23000
147
+ },
148
+ {
149
+ "epoch": 4.539436353319463,
150
+ "learning_rate": 4.773028182334027e-05,
151
+ "loss": 2.1981953125,
152
+ "step": 24000
153
+ },
154
+ {
155
+ "epoch": 4.728579534707774,
156
+ "learning_rate": 4.763571023264611e-05,
157
+ "loss": 2.2091328125,
158
+ "step": 25000
159
+ },
160
+ {
161
+ "epoch": 4.917722716096085,
162
+ "learning_rate": 4.754113864195196e-05,
163
+ "loss": 2.22294140625,
164
+ "step": 26000
165
+ },
166
+ {
167
+ "epoch": 5.106865897484396,
168
+ "learning_rate": 4.744656705125781e-05,
169
+ "loss": 2.17415625,
170
+ "step": 27000
171
+ },
172
+ {
173
+ "epoch": 5.296009078872706,
174
+ "learning_rate": 4.7351995460563645e-05,
175
+ "loss": 2.1536875,
176
+ "step": 28000
177
+ },
178
+ {
179
+ "epoch": 5.485152260261017,
180
+ "learning_rate": 4.725742386986949e-05,
181
+ "loss": 2.17046484375,
182
+ "step": 29000
183
+ },
184
+ {
185
+ "epoch": 5.6742954416493285,
186
+ "learning_rate": 4.716285227917534e-05,
187
+ "loss": 2.17616796875,
188
+ "step": 30000
189
+ },
190
+ {
191
+ "epoch": 5.86343862303764,
192
+ "learning_rate": 4.706828068848118e-05,
193
+ "loss": 2.18503125,
194
+ "step": 31000
195
+ },
196
+ {
197
+ "epoch": 6.052581804425951,
198
+ "learning_rate": 4.6973709097787025e-05,
199
+ "loss": 2.172359375,
200
+ "step": 32000
201
+ },
202
+ {
203
+ "epoch": 6.241724985814262,
204
+ "learning_rate": 4.687913750709287e-05,
205
+ "loss": 2.118359375,
206
+ "step": 33000
207
+ },
208
+ {
209
+ "epoch": 6.430868167202572,
210
+ "learning_rate": 4.6784565916398715e-05,
211
+ "loss": 2.133546875,
212
+ "step": 34000
213
+ },
214
+ {
215
+ "epoch": 6.620011348590883,
216
+ "learning_rate": 4.6689994325704564e-05,
217
+ "loss": 2.1423671875,
218
+ "step": 35000
219
+ },
220
+ {
221
+ "epoch": 6.809154529979194,
222
+ "learning_rate": 4.6595422735010405e-05,
223
+ "loss": 2.1594921875,
224
+ "step": 36000
225
+ },
226
+ {
227
+ "epoch": 6.998297711367505,
228
+ "learning_rate": 4.650085114431625e-05,
229
+ "loss": 2.17365625,
230
+ "step": 37000
231
+ },
232
+ {
233
+ "epoch": 7.187440892755816,
234
+ "learning_rate": 4.6406279553622095e-05,
235
+ "loss": 2.0884140625,
236
+ "step": 38000
237
+ },
238
+ {
239
+ "epoch": 7.376584074144127,
240
+ "learning_rate": 4.631170796292794e-05,
241
+ "loss": 2.1023359375,
242
+ "step": 39000
243
+ },
244
+ {
245
+ "epoch": 7.565727255532438,
246
+ "learning_rate": 4.6217136372233786e-05,
247
+ "loss": 2.1240625,
248
+ "step": 40000
249
+ },
250
+ {
251
+ "epoch": 7.754870436920749,
252
+ "learning_rate": 4.612256478153963e-05,
253
+ "loss": 2.126953125,
254
+ "step": 41000
255
+ },
256
+ {
257
+ "epoch": 7.94401361830906,
258
+ "learning_rate": 4.602799319084547e-05,
259
+ "loss": 2.1407265625,
260
+ "step": 42000
261
+ },
262
+ {
263
+ "epoch": 8.133156799697371,
264
+ "learning_rate": 4.593342160015132e-05,
265
+ "loss": 2.080640625,
266
+ "step": 43000
267
+ },
268
+ {
269
+ "epoch": 8.322299981085681,
270
+ "learning_rate": 4.5838850009457166e-05,
271
+ "loss": 2.07196875,
272
+ "step": 44000
273
+ },
274
+ {
275
+ "epoch": 8.511443162473993,
276
+ "learning_rate": 4.5744278418763e-05,
277
+ "loss": 2.0906640625,
278
+ "step": 45000
279
+ },
280
+ {
281
+ "epoch": 8.700586343862303,
282
+ "learning_rate": 4.564970682806885e-05,
283
+ "loss": 2.10053125,
284
+ "step": 46000
285
+ },
286
+ {
287
+ "epoch": 8.889729525250615,
288
+ "learning_rate": 4.55551352373747e-05,
289
+ "loss": 2.112328125,
290
+ "step": 47000
291
+ },
292
+ {
293
+ "epoch": 9.078872706638926,
294
+ "learning_rate": 4.546056364668054e-05,
295
+ "loss": 2.08459375,
296
+ "step": 48000
297
+ },
298
+ {
299
+ "epoch": 9.268015888027236,
300
+ "learning_rate": 4.536599205598638e-05,
301
+ "loss": 2.039546875,
302
+ "step": 49000
303
+ },
304
+ {
305
+ "epoch": 9.457159069415548,
306
+ "learning_rate": 4.527142046529223e-05,
307
+ "loss": 2.06175,
308
+ "step": 50000
309
+ },
310
+ {
311
+ "epoch": 9.646302250803858,
312
+ "learning_rate": 4.517684887459807e-05,
313
+ "loss": 2.076625,
314
+ "step": 51000
315
+ },
316
+ {
317
+ "epoch": 9.83544543219217,
318
+ "learning_rate": 4.508227728390392e-05,
319
+ "loss": 2.088609375,
320
+ "step": 52000
321
+ },
322
+ {
323
+ "epoch": 10.02458861358048,
324
+ "learning_rate": 4.498770569320976e-05,
325
+ "loss": 2.07996875,
326
+ "step": 53000
327
+ },
328
+ {
329
+ "epoch": 10.213731794968792,
330
+ "learning_rate": 4.48931341025156e-05,
331
+ "loss": 2.0183828125,
332
+ "step": 54000
333
+ },
334
+ {
335
+ "epoch": 10.402874976357102,
336
+ "learning_rate": 4.479856251182145e-05,
337
+ "loss": 2.0310546875,
338
+ "step": 55000
339
+ },
340
+ {
341
+ "epoch": 10.592018157745413,
342
+ "learning_rate": 4.47039909211273e-05,
343
+ "loss": 2.0436796875,
344
+ "step": 56000
345
+ },
346
+ {
347
+ "epoch": 10.781161339133725,
348
+ "learning_rate": 4.460941933043314e-05,
349
+ "loss": 2.0622109375,
350
+ "step": 57000
351
+ },
352
+ {
353
+ "epoch": 10.970304520522035,
354
+ "learning_rate": 4.4514847739738984e-05,
355
+ "loss": 2.06621875,
356
+ "step": 58000
357
+ },
358
+ {
359
+ "epoch": 11.159447701910347,
360
+ "learning_rate": 4.442027614904483e-05,
361
+ "loss": 2.0049921875,
362
+ "step": 59000
363
+ },
364
+ {
365
+ "epoch": 11.348590883298657,
366
+ "learning_rate": 4.4325704558350674e-05,
367
+ "loss": 2.00809375,
368
+ "step": 60000
369
+ },
370
+ {
371
+ "epoch": 11.537734064686967,
372
+ "learning_rate": 4.423113296765652e-05,
373
+ "loss": 2.017078125,
374
+ "step": 61000
375
+ },
376
+ {
377
+ "epoch": 11.72687724607528,
378
+ "learning_rate": 4.4136561376962364e-05,
379
+ "loss": 2.03634375,
380
+ "step": 62000
381
+ },
382
+ {
383
+ "epoch": 11.91602042746359,
384
+ "learning_rate": 4.4041989786268206e-05,
385
+ "loss": 2.04240625,
386
+ "step": 63000
387
+ },
388
+ {
389
+ "epoch": 12.105163608851901,
390
+ "learning_rate": 4.3947418195574054e-05,
391
+ "loss": 1.99821875,
392
+ "step": 64000
393
+ },
394
+ {
395
+ "epoch": 12.294306790240212,
396
+ "learning_rate": 4.3852846604879896e-05,
397
+ "loss": 1.9760625,
398
+ "step": 65000
399
+ },
400
+ {
401
+ "epoch": 12.483449971628524,
402
+ "learning_rate": 4.375827501418574e-05,
403
+ "loss": 1.995484375,
404
+ "step": 66000
405
+ },
406
+ {
407
+ "epoch": 12.672593153016834,
408
+ "learning_rate": 4.3663703423491586e-05,
409
+ "loss": 2.00321875,
410
+ "step": 67000
411
+ },
412
+ {
413
+ "epoch": 12.861736334405144,
414
+ "learning_rate": 4.356913183279743e-05,
415
+ "loss": 2.021078125,
416
+ "step": 68000
417
+ },
418
+ {
419
+ "epoch": 13.050879515793456,
420
+ "learning_rate": 4.3474560242103276e-05,
421
+ "loss": 2.004171875,
422
+ "step": 69000
423
+ },
424
+ {
425
+ "epoch": 13.240022697181766,
426
+ "learning_rate": 4.337998865140912e-05,
427
+ "loss": 1.9488125,
428
+ "step": 70000
429
+ },
430
+ {
431
+ "epoch": 13.429165878570078,
432
+ "learning_rate": 4.328541706071496e-05,
433
+ "loss": 1.967921875,
434
+ "step": 71000
435
+ },
436
+ {
437
+ "epoch": 13.618309059958388,
438
+ "learning_rate": 4.319084547002081e-05,
439
+ "loss": 1.984046875,
440
+ "step": 72000
441
+ },
442
+ {
443
+ "epoch": 13.807452241346699,
444
+ "learning_rate": 4.3096273879326657e-05,
445
+ "loss": 1.998828125,
446
+ "step": 73000
447
+ },
448
+ {
449
+ "epoch": 13.99659542273501,
450
+ "learning_rate": 4.30017022886325e-05,
451
+ "loss": 2.0024375,
452
+ "step": 74000
453
+ },
454
+ {
455
+ "epoch": 14.18573860412332,
456
+ "learning_rate": 4.290713069793834e-05,
457
+ "loss": 1.927890625,
458
+ "step": 75000
459
+ },
460
+ {
461
+ "epoch": 14.374881785511633,
462
+ "learning_rate": 4.281255910724419e-05,
463
+ "loss": 1.94315625,
464
+ "step": 76000
465
+ },
466
+ {
467
+ "epoch": 14.564024966899943,
468
+ "learning_rate": 4.271798751655003e-05,
469
+ "loss": 1.95571875,
470
+ "step": 77000
471
+ },
472
+ {
473
+ "epoch": 14.753168148288253,
474
+ "learning_rate": 4.262341592585587e-05,
475
+ "loss": 1.96771875,
476
+ "step": 78000
477
+ },
478
+ {
479
+ "epoch": 14.942311329676565,
480
+ "learning_rate": 4.252884433516172e-05,
481
+ "loss": 1.981109375,
482
+ "step": 79000
483
+ },
484
+ {
485
+ "epoch": 15.131454511064875,
486
+ "learning_rate": 4.243427274446756e-05,
487
+ "loss": 1.929875,
488
+ "step": 80000
489
+ },
490
+ {
491
+ "epoch": 15.320597692453187,
492
+ "learning_rate": 4.233970115377341e-05,
493
+ "loss": 1.91628125,
494
+ "step": 81000
495
+ },
496
+ {
497
+ "epoch": 15.509740873841498,
498
+ "learning_rate": 4.224512956307925e-05,
499
+ "loss": 1.938640625,
500
+ "step": 82000
501
+ },
502
+ {
503
+ "epoch": 15.69888405522981,
504
+ "learning_rate": 4.2150557972385094e-05,
505
+ "loss": 1.9478125,
506
+ "step": 83000
507
+ },
508
+ {
509
+ "epoch": 15.88802723661812,
510
+ "learning_rate": 4.205598638169094e-05,
511
+ "loss": 1.95146875,
512
+ "step": 84000
513
+ },
514
+ {
515
+ "epoch": 16.077170418006432,
516
+ "learning_rate": 4.196141479099679e-05,
517
+ "loss": 1.928953125,
518
+ "step": 85000
519
+ },
520
+ {
521
+ "epoch": 16.266313599394742,
522
+ "learning_rate": 4.186684320030263e-05,
523
+ "loss": 1.895140625,
524
+ "step": 86000
525
+ },
526
+ {
527
+ "epoch": 16.455456780783052,
528
+ "learning_rate": 4.1772271609608474e-05,
529
+ "loss": 1.909171875,
530
+ "step": 87000
531
+ },
532
+ {
533
+ "epoch": 16.644599962171363,
534
+ "learning_rate": 4.1677700018914316e-05,
535
+ "loss": 1.92403125,
536
+ "step": 88000
537
+ },
538
+ {
539
+ "epoch": 16.833743143559676,
540
+ "learning_rate": 4.1583128428220164e-05,
541
+ "loss": 1.935015625,
542
+ "step": 89000
543
+ },
544
+ {
545
+ "epoch": 17.022886324947986,
546
+ "learning_rate": 4.148855683752601e-05,
547
+ "loss": 1.92928125,
548
+ "step": 90000
549
+ },
550
+ {
551
+ "epoch": 17.212029506336297,
552
+ "learning_rate": 4.139398524683185e-05,
553
+ "loss": 1.86678125,
554
+ "step": 91000
555
+ },
556
+ {
557
+ "epoch": 17.401172687724607,
558
+ "learning_rate": 4.1299413656137696e-05,
559
+ "loss": 1.8859375,
560
+ "step": 92000
561
+ },
562
+ {
563
+ "epoch": 17.590315869112917,
564
+ "learning_rate": 4.1204842065443545e-05,
565
+ "loss": 1.90540625,
566
+ "step": 93000
567
+ },
568
+ {
569
+ "epoch": 17.77945905050123,
570
+ "learning_rate": 4.1110270474749386e-05,
571
+ "loss": 1.911765625,
572
+ "step": 94000
573
+ },
574
+ {
575
+ "epoch": 17.96860223188954,
576
+ "learning_rate": 4.101569888405523e-05,
577
+ "loss": 1.924671875,
578
+ "step": 95000
579
+ },
580
+ {
581
+ "epoch": 18.15774541327785,
582
+ "learning_rate": 4.092112729336108e-05,
583
+ "loss": 1.85646875,
584
+ "step": 96000
585
+ },
586
+ {
587
+ "epoch": 18.34688859466616,
588
+ "learning_rate": 4.082655570266692e-05,
589
+ "loss": 1.862640625,
590
+ "step": 97000
591
+ },
592
+ {
593
+ "epoch": 18.53603177605447,
594
+ "learning_rate": 4.073198411197277e-05,
595
+ "loss": 1.880921875,
596
+ "step": 98000
597
+ },
598
+ {
599
+ "epoch": 18.725174957442785,
600
+ "learning_rate": 4.063741252127861e-05,
601
+ "loss": 1.8903125,
602
+ "step": 99000
603
+ },
604
+ {
605
+ "epoch": 18.914318138831096,
606
+ "learning_rate": 4.054284093058445e-05,
607
+ "loss": 1.904015625,
608
+ "step": 100000
609
+ },
610
+ {
611
+ "epoch": 19.103461320219406,
612
+ "learning_rate": 4.04482693398903e-05,
613
+ "loss": 1.8575625,
614
+ "step": 101000
615
+ },
616
+ {
617
+ "epoch": 19.292604501607716,
618
+ "learning_rate": 4.035369774919615e-05,
619
+ "loss": 1.837125,
620
+ "step": 102000
621
+ },
622
+ {
623
+ "epoch": 19.481747682996026,
624
+ "learning_rate": 4.025912615850199e-05,
625
+ "loss": 1.859359375,
626
+ "step": 103000
627
+ },
628
+ {
629
+ "epoch": 19.67089086438434,
630
+ "learning_rate": 4.016455456780783e-05,
631
+ "loss": 1.867921875,
632
+ "step": 104000
633
+ },
634
+ {
635
+ "epoch": 19.86003404577265,
636
+ "learning_rate": 4.006998297711368e-05,
637
+ "loss": 1.88125,
638
+ "step": 105000
639
+ },
640
+ {
641
+ "epoch": 20.04917722716096,
642
+ "learning_rate": 3.997541138641952e-05,
643
+ "loss": 1.866546875,
644
+ "step": 106000
645
+ },
646
+ {
647
+ "epoch": 20.23832040854927,
648
+ "learning_rate": 3.988083979572537e-05,
649
+ "loss": 1.811046875,
650
+ "step": 107000
651
+ },
652
+ {
653
+ "epoch": 20.427463589937584,
654
+ "learning_rate": 3.978626820503121e-05,
655
+ "loss": 1.8318125,
656
+ "step": 108000
657
+ },
658
+ {
659
+ "epoch": 20.616606771325895,
660
+ "learning_rate": 3.969169661433705e-05,
661
+ "loss": 1.83990625,
662
+ "step": 109000
663
+ },
664
+ {
665
+ "epoch": 20.805749952714205,
666
+ "learning_rate": 3.95971250236429e-05,
667
+ "loss": 1.91721875,
668
+ "step": 110000
669
+ },
670
+ {
671
+ "epoch": 20.994893134102515,
672
+ "learning_rate": 3.950255343294875e-05,
673
+ "loss": 1.923984375,
674
+ "step": 111000
675
+ },
676
+ {
677
+ "epoch": 21.184036315490825,
678
+ "learning_rate": 3.9407981842254585e-05,
679
+ "loss": 1.846875,
680
+ "step": 112000
681
+ },
682
+ {
683
+ "epoch": 21.37317949687914,
684
+ "learning_rate": 3.931341025156043e-05,
685
+ "loss": 1.860984375,
686
+ "step": 113000
687
+ },
688
+ {
689
+ "epoch": 21.56232267826745,
690
+ "learning_rate": 3.9218838660866275e-05,
691
+ "loss": 1.8764375,
692
+ "step": 114000
693
+ },
694
+ {
695
+ "epoch": 21.75146585965576,
696
+ "learning_rate": 3.912426707017212e-05,
697
+ "loss": 1.88171875,
698
+ "step": 115000
699
+ },
700
+ {
701
+ "epoch": 21.94060904104407,
702
+ "learning_rate": 3.9029695479477965e-05,
703
+ "loss": 1.89409375,
704
+ "step": 116000
705
+ },
706
+ {
707
+ "epoch": 22.12975222243238,
708
+ "learning_rate": 3.8935123888783807e-05,
709
+ "loss": 1.847390625,
710
+ "step": 117000
711
+ },
712
+ {
713
+ "epoch": 22.318895403820694,
714
+ "learning_rate": 3.8840552298089655e-05,
715
+ "loss": 1.83696875,
716
+ "step": 118000
717
+ },
718
+ {
719
+ "epoch": 22.508038585209004,
720
+ "learning_rate": 3.8745980707395504e-05,
721
+ "loss": 1.852484375,
722
+ "step": 119000
723
+ },
724
+ {
725
+ "epoch": 22.697181766597314,
726
+ "learning_rate": 3.8651409116701345e-05,
727
+ "loss": 1.86578125,
728
+ "step": 120000
729
+ },
730
+ {
731
+ "epoch": 22.886324947985624,
732
+ "learning_rate": 3.855683752600719e-05,
733
+ "loss": 1.877734375,
734
+ "step": 121000
735
+ },
736
+ {
737
+ "epoch": 23.075468129373935,
738
+ "learning_rate": 3.8462265935313035e-05,
739
+ "loss": 1.8471875,
740
+ "step": 122000
741
+ },
742
+ {
743
+ "epoch": 23.26461131076225,
744
+ "learning_rate": 3.836769434461888e-05,
745
+ "loss": 1.816078125,
746
+ "step": 123000
747
+ },
748
+ {
749
+ "epoch": 23.45375449215056,
750
+ "learning_rate": 3.8273122753924726e-05,
751
+ "loss": 1.833703125,
752
+ "step": 124000
753
+ },
754
+ {
755
+ "epoch": 23.64289767353887,
756
+ "learning_rate": 3.817855116323057e-05,
757
+ "loss": 1.840984375,
758
+ "step": 125000
759
+ },
760
+ {
761
+ "epoch": 23.83204085492718,
762
+ "learning_rate": 3.808397957253641e-05,
763
+ "loss": 1.8568125,
764
+ "step": 126000
765
+ },
766
+ {
767
+ "epoch": 24.02118403631549,
768
+ "learning_rate": 3.798940798184226e-05,
769
+ "loss": 1.855609375,
770
+ "step": 127000
771
+ },
772
+ {
773
+ "epoch": 24.210327217703803,
774
+ "learning_rate": 3.7894836391148106e-05,
775
+ "loss": 1.7931875,
776
+ "step": 128000
777
+ },
778
+ {
779
+ "epoch": 24.399470399092113,
780
+ "learning_rate": 3.780026480045394e-05,
781
+ "loss": 1.805828125,
782
+ "step": 129000
783
+ },
784
+ {
785
+ "epoch": 24.588613580480423,
786
+ "learning_rate": 3.770569320975979e-05,
787
+ "loss": 1.825140625,
788
+ "step": 130000
789
+ },
790
+ {
791
+ "epoch": 24.777756761868734,
792
+ "learning_rate": 3.761112161906564e-05,
793
+ "loss": 1.83425,
794
+ "step": 131000
795
+ },
796
+ {
797
+ "epoch": 24.966899943257047,
798
+ "learning_rate": 3.751655002837148e-05,
799
+ "loss": 1.8449375,
800
+ "step": 132000
801
+ },
802
+ {
803
+ "epoch": 25.156043124645358,
804
+ "learning_rate": 3.742197843767732e-05,
805
+ "loss": 1.78215625,
806
+ "step": 133000
807
+ },
808
+ {
809
+ "epoch": 25.345186306033668,
810
+ "learning_rate": 3.732740684698317e-05,
811
+ "loss": 1.79159375,
812
+ "step": 134000
813
+ },
814
+ {
815
+ "epoch": 25.534329487421978,
816
+ "learning_rate": 3.723283525628901e-05,
817
+ "loss": 1.80753125,
818
+ "step": 135000
819
+ },
820
+ {
821
+ "epoch": 25.723472668810288,
822
+ "learning_rate": 3.713826366559486e-05,
823
+ "loss": 1.81053125,
824
+ "step": 136000
825
+ },
826
+ {
827
+ "epoch": 25.912615850198602,
828
+ "learning_rate": 3.70436920749007e-05,
829
+ "loss": 1.8265,
830
+ "step": 137000
831
+ },
832
+ {
833
+ "epoch": 26.101759031586912,
834
+ "learning_rate": 3.694912048420654e-05,
835
+ "loss": 1.79653125,
836
+ "step": 138000
837
+ },
838
+ {
839
+ "epoch": 26.290902212975222,
840
+ "learning_rate": 3.685454889351239e-05,
841
+ "loss": 1.76934375,
842
+ "step": 139000
843
+ },
844
+ {
845
+ "epoch": 26.480045394363533,
846
+ "learning_rate": 3.6759977302818233e-05,
847
+ "loss": 1.78228125,
848
+ "step": 140000
849
+ },
850
+ {
851
+ "epoch": 26.669188575751843,
852
+ "learning_rate": 3.666540571212408e-05,
853
+ "loss": 1.79384375,
854
+ "step": 141000
855
+ },
856
+ {
857
+ "epoch": 26.858331757140157,
858
+ "learning_rate": 3.6570834121429924e-05,
859
+ "loss": 1.8095625,
860
+ "step": 142000
861
+ },
862
+ {
863
+ "epoch": 27.047474938528467,
864
+ "learning_rate": 3.6476262530735765e-05,
865
+ "loss": 1.7923125,
866
+ "step": 143000
867
+ },
868
+ {
869
+ "epoch": 27.236618119916777,
870
+ "learning_rate": 3.6381690940041614e-05,
871
+ "loss": 1.7471875,
872
+ "step": 144000
873
+ },
874
+ {
875
+ "epoch": 27.425761301305087,
876
+ "learning_rate": 3.628711934934746e-05,
877
+ "loss": 1.76446875,
878
+ "step": 145000
879
+ },
880
+ {
881
+ "epoch": 27.614904482693397,
882
+ "learning_rate": 3.61925477586533e-05,
883
+ "loss": 1.77759375,
884
+ "step": 146000
885
+ },
886
+ {
887
+ "epoch": 27.80404766408171,
888
+ "learning_rate": 3.6097976167959146e-05,
889
+ "loss": 1.78840625,
890
+ "step": 147000
891
+ },
892
+ {
893
+ "epoch": 27.99319084547002,
894
+ "learning_rate": 3.6003404577264994e-05,
895
+ "loss": 1.799125,
896
+ "step": 148000
897
+ },
898
+ {
899
+ "epoch": 28.18233402685833,
900
+ "learning_rate": 3.5908832986570836e-05,
901
+ "loss": 1.73171875,
902
+ "step": 149000
903
+ },
904
+ {
905
+ "epoch": 28.37147720824664,
906
+ "learning_rate": 3.581426139587668e-05,
907
+ "loss": 1.73996875,
908
+ "step": 150000
909
+ },
910
+ {
911
+ "epoch": 28.560620389634952,
912
+ "learning_rate": 3.5719689805182526e-05,
913
+ "loss": 1.75671875,
914
+ "step": 151000
915
+ },
916
+ {
917
+ "epoch": 28.749763571023266,
918
+ "learning_rate": 3.562511821448837e-05,
919
+ "loss": 1.7723125,
920
+ "step": 152000
921
+ },
922
+ {
923
+ "epoch": 28.938906752411576,
924
+ "learning_rate": 3.5530546623794216e-05,
925
+ "loss": 1.77990625,
926
+ "step": 153000
927
+ },
928
+ {
929
+ "epoch": 29.128049933799886,
930
+ "learning_rate": 3.543597503310006e-05,
931
+ "loss": 1.73996875,
932
+ "step": 154000
933
+ },
934
+ {
935
+ "epoch": 29.317193115188196,
936
+ "learning_rate": 3.53414034424059e-05,
937
+ "loss": 1.726625,
938
+ "step": 155000
939
+ },
940
+ {
941
+ "epoch": 29.50633629657651,
942
+ "learning_rate": 3.524683185171175e-05,
943
+ "loss": 1.74075,
944
+ "step": 156000
945
+ },
946
+ {
947
+ "epoch": 29.69547947796482,
948
+ "learning_rate": 3.5152260261017597e-05,
949
+ "loss": 1.753,
950
+ "step": 157000
951
+ },
952
+ {
953
+ "epoch": 29.88462265935313,
954
+ "learning_rate": 3.505768867032344e-05,
955
+ "loss": 1.7575,
956
+ "step": 158000
957
+ },
958
+ {
959
+ "epoch": 30.07376584074144,
960
+ "learning_rate": 3.496311707962928e-05,
961
+ "loss": 1.73771875,
962
+ "step": 159000
963
+ },
964
+ {
965
+ "epoch": 30.26290902212975,
966
+ "learning_rate": 3.486854548893513e-05,
967
+ "loss": 1.70875,
968
+ "step": 160000
969
+ },
970
+ {
971
+ "epoch": 30.452052203518065,
972
+ "learning_rate": 3.477397389824097e-05,
973
+ "loss": 1.72415625,
974
+ "step": 161000
975
+ },
976
+ {
977
+ "epoch": 30.641195384906375,
978
+ "learning_rate": 3.467940230754682e-05,
979
+ "loss": 1.7310625,
980
+ "step": 162000
981
+ },
982
+ {
983
+ "epoch": 30.830338566294685,
984
+ "learning_rate": 3.4584830716852654e-05,
985
+ "loss": 1.74175,
986
+ "step": 163000
987
+ },
988
+ {
989
+ "epoch": 31.019481747682995,
990
+ "learning_rate": 3.44902591261585e-05,
991
+ "loss": 1.74609375,
992
+ "step": 164000
993
+ },
994
+ {
995
+ "epoch": 31.208624929071306,
996
+ "learning_rate": 3.439568753546435e-05,
997
+ "loss": 1.684875,
998
+ "step": 165000
999
+ },
1000
+ {
1001
+ "epoch": 31.39776811045962,
1002
+ "learning_rate": 3.430111594477019e-05,
1003
+ "loss": 1.69925,
1004
+ "step": 166000
1005
+ },
1006
+ {
1007
+ "epoch": 31.58691129184793,
1008
+ "learning_rate": 3.4206544354076034e-05,
1009
+ "loss": 1.7169375,
1010
+ "step": 167000
1011
+ },
1012
+ {
1013
+ "epoch": 31.77605447323624,
1014
+ "learning_rate": 3.411197276338188e-05,
1015
+ "loss": 1.7275,
1016
+ "step": 168000
1017
+ },
1018
+ {
1019
+ "epoch": 31.96519765462455,
1020
+ "learning_rate": 3.4017401172687724e-05,
1021
+ "loss": 1.74096875,
1022
+ "step": 169000
1023
+ },
1024
+ {
1025
+ "epoch": 32.154340836012864,
1026
+ "learning_rate": 3.392282958199357e-05,
1027
+ "loss": 1.68128125,
1028
+ "step": 170000
1029
+ },
1030
+ {
1031
+ "epoch": 32.343484017401174,
1032
+ "learning_rate": 3.3828257991299414e-05,
1033
+ "loss": 1.6851875,
1034
+ "step": 171000
1035
+ },
1036
+ {
1037
+ "epoch": 32.532627198789484,
1038
+ "learning_rate": 3.3733686400605256e-05,
1039
+ "loss": 1.69896875,
1040
+ "step": 172000
1041
+ },
1042
+ {
1043
+ "epoch": 32.721770380177794,
1044
+ "learning_rate": 3.3639114809911104e-05,
1045
+ "loss": 1.7080625,
1046
+ "step": 173000
1047
+ },
1048
+ {
1049
+ "epoch": 32.910913561566105,
1050
+ "learning_rate": 3.354454321921695e-05,
1051
+ "loss": 1.7200625,
1052
+ "step": 174000
1053
+ },
1054
+ {
1055
+ "epoch": 33.100056742954415,
1056
+ "learning_rate": 3.3449971628522795e-05,
1057
+ "loss": 1.69040625,
1058
+ "step": 175000
1059
+ },
1060
+ {
1061
+ "epoch": 33.289199924342725,
1062
+ "learning_rate": 3.3355400037828636e-05,
1063
+ "loss": 1.666625,
1064
+ "step": 176000
1065
+ },
1066
+ {
1067
+ "epoch": 33.478343105731035,
1068
+ "learning_rate": 3.3260828447134485e-05,
1069
+ "loss": 1.6798125,
1070
+ "step": 177000
1071
+ },
1072
+ {
1073
+ "epoch": 33.66748628711935,
1074
+ "learning_rate": 3.3166256856440326e-05,
1075
+ "loss": 1.69575,
1076
+ "step": 178000
1077
+ },
1078
+ {
1079
+ "epoch": 33.85662946850766,
1080
+ "learning_rate": 3.3071685265746175e-05,
1081
+ "loss": 1.7008125,
1082
+ "step": 179000
1083
+ },
1084
+ {
1085
+ "epoch": 34.04577264989597,
1086
+ "learning_rate": 3.297711367505202e-05,
1087
+ "loss": 1.69340625,
1088
+ "step": 180000
1089
+ },
1090
+ {
1091
+ "epoch": 34.23491583128428,
1092
+ "learning_rate": 3.288254208435786e-05,
1093
+ "loss": 1.6509375,
1094
+ "step": 181000
1095
+ },
1096
+ {
1097
+ "epoch": 34.42405901267259,
1098
+ "learning_rate": 3.278797049366371e-05,
1099
+ "loss": 1.66703125,
1100
+ "step": 182000
1101
+ },
1102
+ {
1103
+ "epoch": 34.613202194060904,
1104
+ "learning_rate": 3.2693398902969555e-05,
1105
+ "loss": 1.67615625,
1106
+ "step": 183000
1107
+ },
1108
+ {
1109
+ "epoch": 34.802345375449214,
1110
+ "learning_rate": 3.259882731227539e-05,
1111
+ "loss": 1.6876875,
1112
+ "step": 184000
1113
+ },
1114
+ {
1115
+ "epoch": 34.991488556837524,
1116
+ "learning_rate": 3.250425572158124e-05,
1117
+ "loss": 1.695,
1118
+ "step": 185000
1119
+ },
1120
+ {
1121
+ "epoch": 35.180631738225834,
1122
+ "learning_rate": 3.240968413088709e-05,
1123
+ "loss": 1.6354375,
1124
+ "step": 186000
1125
+ },
1126
+ {
1127
+ "epoch": 35.369774919614144,
1128
+ "learning_rate": 3.231511254019293e-05,
1129
+ "loss": 1.64653125,
1130
+ "step": 187000
1131
+ },
1132
+ {
1133
+ "epoch": 35.55891810100246,
1134
+ "learning_rate": 3.222054094949877e-05,
1135
+ "loss": 1.66346875,
1136
+ "step": 188000
1137
+ },
1138
+ {
1139
+ "epoch": 35.74806128239077,
1140
+ "learning_rate": 3.212596935880461e-05,
1141
+ "loss": 1.66990625,
1142
+ "step": 189000
1143
+ },
1144
+ {
1145
+ "epoch": 35.93720446377908,
1146
+ "learning_rate": 3.203139776811046e-05,
1147
+ "loss": 1.67878125,
1148
+ "step": 190000
1149
+ },
1150
+ {
1151
+ "epoch": 36.12634764516739,
1152
+ "learning_rate": 3.193682617741631e-05,
1153
+ "loss": 1.63953125,
1154
+ "step": 191000
1155
+ },
1156
+ {
1157
+ "epoch": 36.3154908265557,
1158
+ "learning_rate": 3.184225458672215e-05,
1159
+ "loss": 1.62646875,
1160
+ "step": 192000
1161
+ },
1162
+ {
1163
+ "epoch": 36.50463400794401,
1164
+ "learning_rate": 3.174768299602799e-05,
1165
+ "loss": 1.6480625,
1166
+ "step": 193000
1167
+ },
1168
+ {
1169
+ "epoch": 36.69377718933232,
1170
+ "learning_rate": 3.165311140533384e-05,
1171
+ "loss": 1.65371875,
1172
+ "step": 194000
1173
+ },
1174
+ {
1175
+ "epoch": 36.88292037072063,
1176
+ "learning_rate": 3.155853981463968e-05,
1177
+ "loss": 1.66778125,
1178
+ "step": 195000
1179
+ },
1180
+ {
1181
+ "epoch": 37.07206355210894,
1182
+ "learning_rate": 3.146396822394553e-05,
1183
+ "loss": 1.64275,
1184
+ "step": 196000
1185
+ },
1186
+ {
1187
+ "epoch": 37.26120673349726,
1188
+ "learning_rate": 3.136939663325137e-05,
1189
+ "loss": 1.61565625,
1190
+ "step": 197000
1191
+ },
1192
+ {
1193
+ "epoch": 37.45034991488557,
1194
+ "learning_rate": 3.1274825042557215e-05,
1195
+ "loss": 1.631875,
1196
+ "step": 198000
1197
+ },
1198
+ {
1199
+ "epoch": 37.63949309627388,
1200
+ "learning_rate": 3.118025345186306e-05,
1201
+ "loss": 1.64078125,
1202
+ "step": 199000
1203
+ },
1204
+ {
1205
+ "epoch": 37.82863627766219,
1206
+ "learning_rate": 3.108568186116891e-05,
1207
+ "loss": 1.6473125,
1208
+ "step": 200000
1209
+ },
1210
+ {
1211
+ "epoch": 38.0177794590505,
1212
+ "learning_rate": 3.0991110270474747e-05,
1213
+ "loss": 1.65090625,
1214
+ "step": 201000
1215
+ },
1216
+ {
1217
+ "epoch": 38.20692264043881,
1218
+ "learning_rate": 3.0896538679780595e-05,
1219
+ "loss": 1.59846875,
1220
+ "step": 202000
1221
+ },
1222
+ {
1223
+ "epoch": 38.39606582182712,
1224
+ "learning_rate": 3.0801967089086443e-05,
1225
+ "loss": 1.6114375,
1226
+ "step": 203000
1227
+ },
1228
+ {
1229
+ "epoch": 38.58520900321543,
1230
+ "learning_rate": 3.0707395498392285e-05,
1231
+ "loss": 1.6266875,
1232
+ "step": 204000
1233
+ },
1234
+ {
1235
+ "epoch": 38.77435218460374,
1236
+ "learning_rate": 3.061282390769813e-05,
1237
+ "loss": 1.6340625,
1238
+ "step": 205000
1239
+ },
1240
+ {
1241
+ "epoch": 38.96349536599205,
1242
+ "learning_rate": 3.0518252317003975e-05,
1243
+ "loss": 1.64478125,
1244
+ "step": 206000
1245
+ },
1246
+ {
1247
+ "epoch": 39.15263854738037,
1248
+ "learning_rate": 3.0423680726309817e-05,
1249
+ "loss": 1.59403125,
1250
+ "step": 207000
1251
+ },
1252
+ {
1253
+ "epoch": 39.34178172876868,
1254
+ "learning_rate": 3.0329109135615662e-05,
1255
+ "loss": 1.5961875,
1256
+ "step": 208000
1257
+ },
1258
+ {
1259
+ "epoch": 39.53092491015699,
1260
+ "learning_rate": 3.023453754492151e-05,
1261
+ "loss": 1.612375,
1262
+ "step": 209000
1263
+ },
1264
+ {
1265
+ "epoch": 39.7200680915453,
1266
+ "learning_rate": 3.013996595422735e-05,
1267
+ "loss": 1.62103125,
1268
+ "step": 210000
1269
+ },
1270
+ {
1271
+ "epoch": 39.90921127293361,
1272
+ "learning_rate": 3.0045394363533197e-05,
1273
+ "loss": 1.62684375,
1274
+ "step": 211000
1275
+ },
1276
+ {
1277
+ "epoch": 40.09835445432192,
1278
+ "learning_rate": 2.995082277283904e-05,
1279
+ "loss": 1.60059375,
1280
+ "step": 212000
1281
+ },
1282
+ {
1283
+ "epoch": 40.28749763571023,
1284
+ "learning_rate": 2.9856251182144884e-05,
1285
+ "loss": 1.58271875,
1286
+ "step": 213000
1287
+ },
1288
+ {
1289
+ "epoch": 40.47664081709854,
1290
+ "learning_rate": 2.976167959145073e-05,
1291
+ "loss": 1.59703125,
1292
+ "step": 214000
1293
+ },
1294
+ {
1295
+ "epoch": 40.66578399848685,
1296
+ "learning_rate": 2.966710800075657e-05,
1297
+ "loss": 1.604875,
1298
+ "step": 215000
1299
+ },
1300
+ {
1301
+ "epoch": 40.85492717987517,
1302
+ "learning_rate": 2.957253641006242e-05,
1303
+ "loss": 1.6115625,
1304
+ "step": 216000
1305
+ },
1306
+ {
1307
+ "epoch": 41.04407036126348,
1308
+ "learning_rate": 2.9477964819368265e-05,
1309
+ "loss": 1.60434375,
1310
+ "step": 217000
1311
+ },
1312
+ {
1313
+ "epoch": 41.23321354265179,
1314
+ "learning_rate": 2.9383393228674106e-05,
1315
+ "loss": 1.566625,
1316
+ "step": 218000
1317
+ },
1318
+ {
1319
+ "epoch": 41.4223567240401,
1320
+ "learning_rate": 2.928882163797995e-05,
1321
+ "loss": 1.58084375,
1322
+ "step": 219000
1323
+ },
1324
+ {
1325
+ "epoch": 41.61149990542841,
1326
+ "learning_rate": 2.91942500472858e-05,
1327
+ "loss": 1.590625,
1328
+ "step": 220000
1329
+ },
1330
+ {
1331
+ "epoch": 41.80064308681672,
1332
+ "learning_rate": 2.9099678456591638e-05,
1333
+ "loss": 1.60096875,
1334
+ "step": 221000
1335
+ },
1336
+ {
1337
+ "epoch": 41.98978626820503,
1338
+ "learning_rate": 2.9005106865897487e-05,
1339
+ "loss": 1.6100625,
1340
+ "step": 222000
1341
+ },
1342
+ {
1343
+ "epoch": 42.17892944959334,
1344
+ "learning_rate": 2.8910535275203332e-05,
1345
+ "loss": 1.552625,
1346
+ "step": 223000
1347
+ },
1348
+ {
1349
+ "epoch": 42.36807263098165,
1350
+ "learning_rate": 2.8815963684509173e-05,
1351
+ "loss": 1.56346875,
1352
+ "step": 224000
1353
+ },
1354
+ {
1355
+ "epoch": 42.55721581236996,
1356
+ "learning_rate": 2.872139209381502e-05,
1357
+ "loss": 1.57903125,
1358
+ "step": 225000
1359
+ },
1360
+ {
1361
+ "epoch": 42.74635899375828,
1362
+ "learning_rate": 2.8626820503120867e-05,
1363
+ "loss": 1.5851875,
1364
+ "step": 226000
1365
+ },
1366
+ {
1367
+ "epoch": 42.93550217514659,
1368
+ "learning_rate": 2.8532248912426705e-05,
1369
+ "loss": 1.59359375,
1370
+ "step": 227000
1371
+ },
1372
+ {
1373
+ "epoch": 43.1246453565349,
1374
+ "learning_rate": 2.8437677321732554e-05,
1375
+ "loss": 1.55915625,
1376
+ "step": 228000
1377
+ },
1378
+ {
1379
+ "epoch": 43.31378853792321,
1380
+ "learning_rate": 2.83431057310384e-05,
1381
+ "loss": 1.5494375,
1382
+ "step": 229000
1383
+ },
1384
+ {
1385
+ "epoch": 43.50293171931152,
1386
+ "learning_rate": 2.824853414034424e-05,
1387
+ "loss": 1.55978125,
1388
+ "step": 230000
1389
+ },
1390
+ {
1391
+ "epoch": 43.69207490069983,
1392
+ "learning_rate": 2.8153962549650086e-05,
1393
+ "loss": 1.57346875,
1394
+ "step": 231000
1395
+ },
1396
+ {
1397
+ "epoch": 43.88121808208814,
1398
+ "learning_rate": 2.8059390958955934e-05,
1399
+ "loss": 1.58403125,
1400
+ "step": 232000
1401
+ },
1402
+ {
1403
+ "epoch": 44.07036126347645,
1404
+ "learning_rate": 2.7964819368261776e-05,
1405
+ "loss": 1.568125,
1406
+ "step": 233000
1407
+ },
1408
+ {
1409
+ "epoch": 44.25950444486476,
1410
+ "learning_rate": 2.787024777756762e-05,
1411
+ "loss": 1.535125,
1412
+ "step": 234000
1413
+ },
1414
+ {
1415
+ "epoch": 44.44864762625307,
1416
+ "learning_rate": 2.7775676186873466e-05,
1417
+ "loss": 1.5455,
1418
+ "step": 235000
1419
+ },
1420
+ {
1421
+ "epoch": 44.63779080764139,
1422
+ "learning_rate": 2.7681104596179308e-05,
1423
+ "loss": 1.55828125,
1424
+ "step": 236000
1425
+ },
1426
+ {
1427
+ "epoch": 44.8269339890297,
1428
+ "learning_rate": 2.7586533005485156e-05,
1429
+ "loss": 1.56821875,
1430
+ "step": 237000
1431
+ },
1432
+ {
1433
+ "epoch": 45.01607717041801,
1434
+ "learning_rate": 2.7491961414790994e-05,
1435
+ "loss": 1.573875,
1436
+ "step": 238000
1437
+ },
1438
+ {
1439
+ "epoch": 45.20522035180632,
1440
+ "learning_rate": 2.7397389824096843e-05,
1441
+ "loss": 1.52146875,
1442
+ "step": 239000
1443
+ },
1444
+ {
1445
+ "epoch": 45.39436353319463,
1446
+ "learning_rate": 2.7302818233402688e-05,
1447
+ "loss": 1.53628125,
1448
+ "step": 240000
1449
+ },
1450
+ {
1451
+ "epoch": 45.58350671458294,
1452
+ "learning_rate": 2.720824664270853e-05,
1453
+ "loss": 1.549875,
1454
+ "step": 241000
1455
+ },
1456
+ {
1457
+ "epoch": 45.77264989597125,
1458
+ "learning_rate": 2.7113675052014375e-05,
1459
+ "loss": 1.55240625,
1460
+ "step": 242000
1461
+ },
1462
+ {
1463
+ "epoch": 45.96179307735956,
1464
+ "learning_rate": 2.7019103461320223e-05,
1465
+ "loss": 1.56171875,
1466
+ "step": 243000
1467
+ },
1468
+ {
1469
+ "epoch": 46.15093625874787,
1470
+ "learning_rate": 2.692453187062606e-05,
1471
+ "loss": 1.52228125,
1472
+ "step": 244000
1473
+ },
1474
+ {
1475
+ "epoch": 46.340079440136186,
1476
+ "learning_rate": 2.682996027993191e-05,
1477
+ "loss": 1.5250625,
1478
+ "step": 245000
1479
+ },
1480
+ {
1481
+ "epoch": 46.5292226215245,
1482
+ "learning_rate": 2.6735388689237755e-05,
1483
+ "loss": 1.5320625,
1484
+ "step": 246000
1485
+ },
1486
+ {
1487
+ "epoch": 46.71836580291281,
1488
+ "learning_rate": 2.6640817098543597e-05,
1489
+ "loss": 1.5440625,
1490
+ "step": 247000
1491
+ },
1492
+ {
1493
+ "epoch": 46.90750898430112,
1494
+ "learning_rate": 2.6546245507849442e-05,
1495
+ "loss": 1.5495,
1496
+ "step": 248000
1497
+ },
1498
+ {
1499
+ "epoch": 47.09665216568943,
1500
+ "learning_rate": 2.645167391715529e-05,
1501
+ "loss": 1.5265625,
1502
+ "step": 249000
1503
+ },
1504
+ {
1505
+ "epoch": 47.28579534707774,
1506
+ "learning_rate": 2.6357102326461132e-05,
1507
+ "loss": 1.50359375,
1508
+ "step": 250000
1509
+ },
1510
+ {
1511
+ "epoch": 47.47493852846605,
1512
+ "learning_rate": 2.6262530735766977e-05,
1513
+ "loss": 1.5239375,
1514
+ "step": 251000
1515
+ },
1516
+ {
1517
+ "epoch": 47.66408170985436,
1518
+ "learning_rate": 2.6167959145072822e-05,
1519
+ "loss": 1.5295625,
1520
+ "step": 252000
1521
+ },
1522
+ {
1523
+ "epoch": 47.85322489124267,
1524
+ "learning_rate": 2.6073387554378664e-05,
1525
+ "loss": 1.53828125,
1526
+ "step": 253000
1527
+ },
1528
+ {
1529
+ "epoch": 48.04236807263098,
1530
+ "learning_rate": 2.597881596368451e-05,
1531
+ "loss": 1.53,
1532
+ "step": 254000
1533
+ },
1534
+ {
1535
+ "epoch": 48.231511254019296,
1536
+ "learning_rate": 2.5884244372990358e-05,
1537
+ "loss": 1.496,
1538
+ "step": 255000
1539
+ },
1540
+ {
1541
+ "epoch": 48.420654435407606,
1542
+ "learning_rate": 2.57896727822962e-05,
1543
+ "loss": 1.503375,
1544
+ "step": 256000
1545
+ },
1546
+ {
1547
+ "epoch": 48.609797616795916,
1548
+ "learning_rate": 2.5695101191602044e-05,
1549
+ "loss": 1.521125,
1550
+ "step": 257000
1551
+ },
1552
+ {
1553
+ "epoch": 48.798940798184226,
1554
+ "learning_rate": 2.560052960090789e-05,
1555
+ "loss": 1.5279375,
1556
+ "step": 258000
1557
+ },
1558
+ {
1559
+ "epoch": 48.98808397957254,
1560
+ "learning_rate": 2.550595801021373e-05,
1561
+ "loss": 1.5339375,
1562
+ "step": 259000
1563
+ },
1564
+ {
1565
+ "epoch": 49.17722716096085,
1566
+ "learning_rate": 2.541138641951958e-05,
1567
+ "loss": 1.4859375,
1568
+ "step": 260000
1569
+ },
1570
+ {
1571
+ "epoch": 49.36637034234916,
1572
+ "learning_rate": 2.5316814828825425e-05,
1573
+ "loss": 1.4956875,
1574
+ "step": 261000
1575
+ },
1576
+ {
1577
+ "epoch": 49.55551352373747,
1578
+ "learning_rate": 2.5222243238131266e-05,
1579
+ "loss": 1.50375,
1580
+ "step": 262000
1581
+ },
1582
+ {
1583
+ "epoch": 49.74465670512578,
1584
+ "learning_rate": 2.512767164743711e-05,
1585
+ "loss": 1.51525,
1586
+ "step": 263000
1587
+ },
1588
+ {
1589
+ "epoch": 49.933799886514095,
1590
+ "learning_rate": 2.5033100056742953e-05,
1591
+ "loss": 1.52084375,
1592
+ "step": 264000
1593
+ },
1594
+ {
1595
+ "epoch": 50.122943067902405,
1596
+ "learning_rate": 2.4938528466048798e-05,
1597
+ "loss": 1.49025,
1598
+ "step": 265000
1599
+ },
1600
+ {
1601
+ "epoch": 50.312086249290715,
1602
+ "learning_rate": 2.4843956875354647e-05,
1603
+ "loss": 1.48290625,
1604
+ "step": 266000
1605
+ },
1606
+ {
1607
+ "epoch": 50.501229430679025,
1608
+ "learning_rate": 2.474938528466049e-05,
1609
+ "loss": 1.49121875,
1610
+ "step": 267000
1611
+ },
1612
+ {
1613
+ "epoch": 50.690372612067335,
1614
+ "learning_rate": 2.4654813693966334e-05,
1615
+ "loss": 1.502875,
1616
+ "step": 268000
1617
+ },
1618
+ {
1619
+ "epoch": 50.879515793455646,
1620
+ "learning_rate": 2.4560242103272175e-05,
1621
+ "loss": 1.51284375,
1622
+ "step": 269000
1623
+ },
1624
+ {
1625
+ "epoch": 51.068658974843956,
1626
+ "learning_rate": 2.4465670512578024e-05,
1627
+ "loss": 1.4948125,
1628
+ "step": 270000
1629
+ },
1630
+ {
1631
+ "epoch": 51.257802156232266,
1632
+ "learning_rate": 2.4371098921883865e-05,
1633
+ "loss": 1.4699375,
1634
+ "step": 271000
1635
+ },
1636
+ {
1637
+ "epoch": 51.446945337620576,
1638
+ "learning_rate": 2.427652733118971e-05,
1639
+ "loss": 1.48165625,
1640
+ "step": 272000
1641
+ },
1642
+ {
1643
+ "epoch": 51.63608851900889,
1644
+ "learning_rate": 2.4181955740495556e-05,
1645
+ "loss": 1.491875,
1646
+ "step": 273000
1647
+ },
1648
+ {
1649
+ "epoch": 51.825231700397204,
1650
+ "learning_rate": 2.40873841498014e-05,
1651
+ "loss": 1.498875,
1652
+ "step": 274000
1653
+ },
1654
+ {
1655
+ "epoch": 52.014374881785514,
1656
+ "learning_rate": 2.3992812559107246e-05,
1657
+ "loss": 1.50025,
1658
+ "step": 275000
1659
+ },
1660
+ {
1661
+ "epoch": 52.203518063173824,
1662
+ "learning_rate": 2.389824096841309e-05,
1663
+ "loss": 1.45865625,
1664
+ "step": 276000
1665
+ },
1666
+ {
1667
+ "epoch": 52.392661244562134,
1668
+ "learning_rate": 2.3803669377718936e-05,
1669
+ "loss": 1.4689375,
1670
+ "step": 277000
1671
+ },
1672
+ {
1673
+ "epoch": 52.581804425950445,
1674
+ "learning_rate": 2.3709097787024778e-05,
1675
+ "loss": 1.4756875,
1676
+ "step": 278000
1677
+ },
1678
+ {
1679
+ "epoch": 52.770947607338755,
1680
+ "learning_rate": 2.3614526196330626e-05,
1681
+ "loss": 1.48815625,
1682
+ "step": 279000
1683
+ },
1684
+ {
1685
+ "epoch": 52.960090788727065,
1686
+ "learning_rate": 2.3519954605636468e-05,
1687
+ "loss": 1.4954375,
1688
+ "step": 280000
1689
+ },
1690
+ {
1691
+ "epoch": 53.149233970115375,
1692
+ "learning_rate": 2.3425383014942313e-05,
1693
+ "loss": 1.4575625,
1694
+ "step": 281000
1695
+ },
1696
+ {
1697
+ "epoch": 53.338377151503686,
1698
+ "learning_rate": 2.3330811424248155e-05,
1699
+ "loss": 1.4586875,
1700
+ "step": 282000
1701
+ },
1702
+ {
1703
+ "epoch": 53.527520332891996,
1704
+ "learning_rate": 2.3236239833554003e-05,
1705
+ "loss": 1.46946875,
1706
+ "step": 283000
1707
+ },
1708
+ {
1709
+ "epoch": 53.71666351428031,
1710
+ "learning_rate": 2.3141668242859845e-05,
1711
+ "loss": 1.47496875,
1712
+ "step": 284000
1713
+ },
1714
+ {
1715
+ "epoch": 53.90580669566862,
1716
+ "learning_rate": 2.304709665216569e-05,
1717
+ "loss": 1.48246875,
1718
+ "step": 285000
1719
+ },
1720
+ {
1721
+ "epoch": 54.09494987705693,
1722
+ "learning_rate": 2.2952525061471535e-05,
1723
+ "loss": 1.458375,
1724
+ "step": 286000
1725
+ },
1726
+ {
1727
+ "epoch": 54.284093058445244,
1728
+ "learning_rate": 2.285795347077738e-05,
1729
+ "loss": 1.44609375,
1730
+ "step": 287000
1731
+ },
1732
+ {
1733
+ "epoch": 54.473236239833554,
1734
+ "learning_rate": 2.2763381880083222e-05,
1735
+ "loss": 1.45584375,
1736
+ "step": 288000
1737
+ },
1738
+ {
1739
+ "epoch": 54.662379421221864,
1740
+ "learning_rate": 2.266881028938907e-05,
1741
+ "loss": 1.46325,
1742
+ "step": 289000
1743
+ },
1744
+ {
1745
+ "epoch": 54.851522602610174,
1746
+ "learning_rate": 2.2574238698694912e-05,
1747
+ "loss": 1.475625,
1748
+ "step": 290000
1749
+ },
1750
+ {
1751
+ "epoch": 55.040665783998485,
1752
+ "learning_rate": 2.2479667108000757e-05,
1753
+ "loss": 1.46834375,
1754
+ "step": 291000
1755
+ },
1756
+ {
1757
+ "epoch": 55.229808965386795,
1758
+ "learning_rate": 2.2385095517306602e-05,
1759
+ "loss": 1.43265625,
1760
+ "step": 292000
1761
+ },
1762
+ {
1763
+ "epoch": 55.41895214677511,
1764
+ "learning_rate": 2.2290523926612447e-05,
1765
+ "loss": 1.4436875,
1766
+ "step": 293000
1767
+ },
1768
+ {
1769
+ "epoch": 55.60809532816342,
1770
+ "learning_rate": 2.2195952335918292e-05,
1771
+ "loss": 1.455,
1772
+ "step": 294000
1773
+ },
1774
+ {
1775
+ "epoch": 55.79723850955173,
1776
+ "learning_rate": 2.2101380745224134e-05,
1777
+ "loss": 1.46475,
1778
+ "step": 295000
1779
+ },
1780
+ {
1781
+ "epoch": 55.98638169094004,
1782
+ "learning_rate": 2.2006809154529982e-05,
1783
+ "loss": 1.46825,
1784
+ "step": 296000
1785
+ },
1786
+ {
1787
+ "epoch": 56.17552487232835,
1788
+ "learning_rate": 2.1912237563835824e-05,
1789
+ "loss": 1.4260625,
1790
+ "step": 297000
1791
+ },
1792
+ {
1793
+ "epoch": 56.36466805371666,
1794
+ "learning_rate": 2.181766597314167e-05,
1795
+ "loss": 1.43375,
1796
+ "step": 298000
1797
+ },
1798
+ {
1799
+ "epoch": 56.55381123510497,
1800
+ "learning_rate": 2.1723094382447514e-05,
1801
+ "loss": 1.444375,
1802
+ "step": 299000
1803
+ },
1804
+ {
1805
+ "epoch": 56.74295441649328,
1806
+ "learning_rate": 2.162852279175336e-05,
1807
+ "loss": 1.4503125,
1808
+ "step": 300000
1809
+ },
1810
+ {
1811
+ "epoch": 56.932097597881594,
1812
+ "learning_rate": 2.15339512010592e-05,
1813
+ "loss": 1.4611875,
1814
+ "step": 301000
1815
+ },
1816
+ {
1817
+ "epoch": 57.121240779269904,
1818
+ "learning_rate": 2.143937961036505e-05,
1819
+ "loss": 1.4291875,
1820
+ "step": 302000
1821
+ },
1822
+ {
1823
+ "epoch": 57.31038396065822,
1824
+ "learning_rate": 2.134480801967089e-05,
1825
+ "loss": 1.422,
1826
+ "step": 303000
1827
+ },
1828
+ {
1829
+ "epoch": 57.49952714204653,
1830
+ "learning_rate": 2.1250236428976736e-05,
1831
+ "loss": 1.434625,
1832
+ "step": 304000
1833
+ },
1834
+ {
1835
+ "epoch": 57.68867032343484,
1836
+ "learning_rate": 2.115566483828258e-05,
1837
+ "loss": 1.437625,
1838
+ "step": 305000
1839
+ },
1840
+ {
1841
+ "epoch": 57.87781350482315,
1842
+ "learning_rate": 2.1061093247588427e-05,
1843
+ "loss": 1.4495625,
1844
+ "step": 306000
1845
+ },
1846
+ {
1847
+ "epoch": 58.06695668621146,
1848
+ "learning_rate": 2.0966521656894268e-05,
1849
+ "loss": 1.440125,
1850
+ "step": 307000
1851
+ },
1852
+ {
1853
+ "epoch": 58.25609986759977,
1854
+ "learning_rate": 2.0871950066200113e-05,
1855
+ "loss": 1.413375,
1856
+ "step": 308000
1857
+ },
1858
+ {
1859
+ "epoch": 58.44524304898808,
1860
+ "learning_rate": 2.077737847550596e-05,
1861
+ "loss": 1.4220625,
1862
+ "step": 309000
1863
+ },
1864
+ {
1865
+ "epoch": 58.63438623037639,
1866
+ "learning_rate": 2.0682806884811804e-05,
1867
+ "loss": 1.4315,
1868
+ "step": 310000
1869
+ },
1870
+ {
1871
+ "epoch": 58.8235294117647,
1872
+ "learning_rate": 2.058823529411765e-05,
1873
+ "loss": 1.4355625,
1874
+ "step": 311000
1875
+ },
1876
+ {
1877
+ "epoch": 59.01267259315302,
1878
+ "learning_rate": 2.0493663703423494e-05,
1879
+ "loss": 1.440125,
1880
+ "step": 312000
1881
+ },
1882
+ {
1883
+ "epoch": 59.20181577454133,
1884
+ "learning_rate": 2.039909211272934e-05,
1885
+ "loss": 1.4,
1886
+ "step": 313000
1887
+ },
1888
+ {
1889
+ "epoch": 59.39095895592964,
1890
+ "learning_rate": 2.030452052203518e-05,
1891
+ "loss": 1.4098125,
1892
+ "step": 314000
1893
+ },
1894
+ {
1895
+ "epoch": 59.58010213731795,
1896
+ "learning_rate": 2.0209948931341026e-05,
1897
+ "loss": 1.4241875,
1898
+ "step": 315000
1899
+ },
1900
+ {
1901
+ "epoch": 59.76924531870626,
1902
+ "learning_rate": 2.011537734064687e-05,
1903
+ "loss": 1.43,
1904
+ "step": 316000
1905
+ },
1906
+ {
1907
+ "epoch": 59.95838850009457,
1908
+ "learning_rate": 2.0020805749952716e-05,
1909
+ "loss": 1.43575,
1910
+ "step": 317000
1911
+ },
1912
+ {
1913
+ "epoch": 60.14753168148288,
1914
+ "learning_rate": 1.9926234159258557e-05,
1915
+ "loss": 1.402625,
1916
+ "step": 318000
1917
+ },
1918
+ {
1919
+ "epoch": 60.33667486287119,
1920
+ "learning_rate": 1.9831662568564406e-05,
1921
+ "loss": 1.4025,
1922
+ "step": 319000
1923
+ },
1924
+ {
1925
+ "epoch": 60.5258180442595,
1926
+ "learning_rate": 1.9737090977870248e-05,
1927
+ "loss": 1.413,
1928
+ "step": 320000
1929
+ },
1930
+ {
1931
+ "epoch": 60.71496122564781,
1932
+ "learning_rate": 1.9642519387176093e-05,
1933
+ "loss": 1.4200625,
1934
+ "step": 321000
1935
+ },
1936
+ {
1937
+ "epoch": 60.90410440703613,
1938
+ "learning_rate": 1.9547947796481938e-05,
1939
+ "loss": 1.4270625,
1940
+ "step": 322000
1941
+ },
1942
+ {
1943
+ "epoch": 61.09324758842444,
1944
+ "learning_rate": 1.9453376205787783e-05,
1945
+ "loss": 1.4055625,
1946
+ "step": 323000
1947
+ },
1948
+ {
1949
+ "epoch": 61.28239076981275,
1950
+ "learning_rate": 1.9358804615093625e-05,
1951
+ "loss": 1.3949375,
1952
+ "step": 324000
1953
+ },
1954
+ {
1955
+ "epoch": 61.47153395120106,
1956
+ "learning_rate": 1.9264233024399473e-05,
1957
+ "loss": 1.4008125,
1958
+ "step": 325000
1959
+ },
1960
+ {
1961
+ "epoch": 61.66067713258937,
1962
+ "learning_rate": 1.9169661433705315e-05,
1963
+ "loss": 1.4103125,
1964
+ "step": 326000
1965
+ },
1966
+ {
1967
+ "epoch": 61.84982031397768,
1968
+ "learning_rate": 1.907508984301116e-05,
1969
+ "loss": 1.41725,
1970
+ "step": 327000
1971
+ },
1972
+ {
1973
+ "epoch": 62.03896349536599,
1974
+ "learning_rate": 1.8980518252317005e-05,
1975
+ "loss": 1.4091875,
1976
+ "step": 328000
1977
+ },
1978
+ {
1979
+ "epoch": 62.2281066767543,
1980
+ "learning_rate": 1.888594666162285e-05,
1981
+ "loss": 1.382125,
1982
+ "step": 329000
1983
+ },
1984
+ {
1985
+ "epoch": 62.41724985814261,
1986
+ "learning_rate": 1.8791375070928692e-05,
1987
+ "loss": 1.39175,
1988
+ "step": 330000
1989
+ },
1990
+ {
1991
+ "epoch": 62.60639303953093,
1992
+ "learning_rate": 1.8696803480234537e-05,
1993
+ "loss": 1.3970625,
1994
+ "step": 331000
1995
+ },
1996
+ {
1997
+ "epoch": 62.79553622091924,
1998
+ "learning_rate": 1.8602231889540382e-05,
1999
+ "loss": 1.409375,
2000
+ "step": 332000
2001
+ },
2002
+ {
2003
+ "epoch": 62.98467940230755,
2004
+ "learning_rate": 1.8507660298846227e-05,
2005
+ "loss": 1.4141875,
2006
+ "step": 333000
2007
+ },
2008
+ {
2009
+ "epoch": 63.17382258369586,
2010
+ "learning_rate": 1.8413088708152072e-05,
2011
+ "loss": 1.371875,
2012
+ "step": 334000
2013
+ },
2014
+ {
2015
+ "epoch": 63.36296576508417,
2016
+ "learning_rate": 1.8318517117457917e-05,
2017
+ "loss": 1.382125,
2018
+ "step": 335000
2019
+ }
2020
+ ],
2021
+ "max_steps": 528700,
2022
+ "num_train_epochs": 100,
2023
+ "total_flos": 512235918148829184,
2024
+ "trial_name": null,
2025
+ "trial_params": null
2026
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09c846693b7bcfdc192dbdc4274628996b880cacf845de80497f19136f83c34a
3
+ size 1775