Edit model card

long-t5-tglobal-base-sci-simplify: elife subset

Open In Colab

Exploring how well long-document models trained on "lay summaries" of scientific papers generalize.

A lay summary is a summary of a research paper or scientific study that is written in plain language, without the use of technical jargon, and is designed to be easily understood by non-experts.

Model description

This model is a fine-tuned version of google/long-t5-tglobal-base on the pszemraj/scientific_lay_summarisation-elife-norm dataset.

  • The variant trained on the PLOS subset can be found here

Usage

It's recommended to use this model with beam search decoding. If interested, you can also use the textsum util repo to have most of this abstracted out for you:

pip install -U textsum
from textsum.summarize import Summarizer

model_name = "pszemraj/long-t5-tglobal-base-sci-simplify-elife"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

  • Ability to generalize outside of the dataset domain (pubmed/bioscience type papers) has to be evaluated.

Training and evaluation data

The elife subset of the lay summaries dataset. Refer to pszemraj/scientific_lay_summarisation-elife-norm

Training procedure

Eval results

It achieves the following results on the evaluation set:

  • Loss: 1.9990
  • Rouge1: 38.5587
  • Rouge2: 9.7336
  • Rougel: 21.1974
  • Rougelsum: 35.9333
  • Gen Len: 392.7095

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.2995 1.47 100 2.0175 35.2501 8.2121 20.4587 32.4494 439.7552
2.2171 2.94 200 1.9990 38.5587 9.7336 21.1974 35.9333 392.7095
Downloads last month
344
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/long-t5-tglobal-base-sci-simplify-elife

Quantized
(3)
this model

Dataset used to train pszemraj/long-t5-tglobal-base-sci-simplify-elife

Spaces using pszemraj/long-t5-tglobal-base-sci-simplify-elife 3