details
Browse files
README.md
CHANGED
@@ -82,11 +82,11 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
|
|
82 |
|
83 |
- between different checkpoints, about 20 epochs in total
|
84 |
- all training was done at 16384 token input / 1024 max output
|
85 |
-
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
89 |
-
- At time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
|
90 |
- I plan to update this page with newer checkpoints and post some metrics over time.
|
91 |
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset.
|
92 |
|
@@ -98,7 +98,7 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
|
|
98 |
|
99 |
### Training hyperparameters
|
100 |
|
101 |
-
The following hyperparameters were used during the final training round
|
102 |
- learning_rate: 0.0004
|
103 |
- train_batch_size: 2
|
104 |
- eval_batch_size: 1
|
@@ -111,6 +111,8 @@ The following hyperparameters were used during the final training round:
|
|
111 |
- lr_scheduler_warmup_ratio: 0.02
|
112 |
- num_epochs: 2
|
113 |
|
|
|
|
|
114 |
### Training results
|
115 |
|
116 |
|
|
|
82 |
|
83 |
- between different checkpoints, about 20 epochs in total
|
84 |
- all training was done at 16384 token input / 1024 max output
|
85 |
+
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**, and then trained further for at least five epochs.
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
89 |
+
- At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
|
90 |
- I plan to update this page with newer checkpoints and post some metrics over time.
|
91 |
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset.
|
92 |
|
|
|
98 |
|
99 |
### Training hyperparameters
|
100 |
|
101 |
+
The following hyperparameters were used during the **final** training round\*:
|
102 |
- learning_rate: 0.0004
|
103 |
- train_batch_size: 2
|
104 |
- eval_batch_size: 1
|
|
|
111 |
- lr_scheduler_warmup_ratio: 0.02
|
112 |
- num_epochs: 2
|
113 |
|
114 |
+
\*_Prior training sessions used roughly similar parameters, multiple sessions were required as this takes eons to train
|
115 |
+
|
116 |
### Training results
|
117 |
|
118 |
|