long-t5-tglobal-base-16384-booksci-summary: v1
An experiment investigating transfer learning capabilities by fine-tuning models on different datasets starting from the booksum
checkpoint.
Model Details
This model is a fine-tuned version of pszemraj/long-t5-tglobal-base-16384-book-summary on the pszemraj/scientific_lay_summarisation-elife-norm
dataset for two epochs.
Usage
It's recommended to use this model with beam search decoding. If interested, you can also use the textsum
util repo to have most of this abstracted out for you:
pip install -U textsum
from textsum.summarize import Summarizer
model_name = "pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)
Intended uses & limitations
- This is an initial experiment
- Domain generalization abilities at time of writing are unknown
Training procedure
Note: this model was trained at a lower LR & not till "absolute convergence" with the intention of retaining some of the properties learned from the initial fine-tuning on
booksum
Results
It achieves the following results on the evaluation set:
- Loss: 2.3994
- Rouge1: 34.2428
- Rouge2: 4.3644
- Rougel: 12.5332
- Rougelsum: 30.6965
- Gen Len: 294.0249
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 4
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
2.7492 | 0.99 | 67 | 2.4272 | 34.6436 | 4.4536 | 12.4985 | 30.916 | 300.7635 |
2.6689 | 1.97 | 134 | 2.3994 | 34.2428 | 4.3644 | 12.5332 | 30.6965 | 294.0249 |
- Downloads last month
- 18
Model tree for pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1
Dataset used to train pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1
Spaces using pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1 3
Evaluation results
- ROUGE-1 on kmfoda/booksumtest set verified36.798
- ROUGE-2 on kmfoda/booksumtest set verified6.100
- ROUGE-L on kmfoda/booksumtest set verified16.504
- ROUGE-LSUM on kmfoda/booksumtest set verified33.613
- loss on kmfoda/booksumtest set verified2.393
- gen_len on kmfoda/booksumtest set verified279.916