Update README.md
Browse files
README.md
CHANGED
@@ -227,7 +227,7 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
|
|
227 |
|
228 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
229 |
|
230 |
-
-
|
231 |
- all training used 16384 token input / 1024 max output
|
232 |
|
233 |
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
@@ -262,9 +262,9 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
262 |
|
263 |
## Training and evaluation data
|
264 |
|
265 |
-
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out
|
266 |
|
267 |
-
_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for
|
268 |
|
269 |
## Training procedure
|
270 |
|
@@ -274,9 +274,9 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
|
|
274 |
|
275 |
### Training hyperparameters
|
276 |
|
277 |
-
The following hyperparameters were used during the **
|
278 |
|
279 |
-
- learning_rate: 0.
|
280 |
- train_batch_size: 1
|
281 |
- eval_batch_size: 1
|
282 |
- seed: 42
|
@@ -289,7 +289,7 @@ The following hyperparameters were used during the **final** training round\*:
|
|
289 |
- num_epochs: 2
|
290 |
|
291 |
|
292 |
-
\*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes
|
293 |
|
294 |
### Training results
|
295 |
|
|
|
227 |
|
228 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
229 |
|
230 |
+
- 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
|
231 |
- all training used 16384 token input / 1024 max output
|
232 |
|
233 |
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
|
|
262 |
|
263 |
## Training and evaluation data
|
264 |
|
265 |
+
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
|
266 |
|
267 |
+
_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
|
268 |
|
269 |
## Training procedure
|
270 |
|
|
|
274 |
|
275 |
### Training hyperparameters
|
276 |
|
277 |
+
The following hyperparameters were used during the **most recent** training round\*:
|
278 |
|
279 |
+
- learning_rate: 0.0006
|
280 |
- train_batch_size: 1
|
281 |
- eval_batch_size: 1
|
282 |
- seed: 42
|
|
|
289 |
- num_epochs: 2
|
290 |
|
291 |
|
292 |
+
\*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes aeons to train_
|
293 |
|
294 |
### Training results
|
295 |
|