Edit model card

karnold-walmer-base-biopapers

Karnold-Walmer is a text2text model based on google/long-t5-tglobal-base, specifically designed to decode the 'keywords' column of pszemraj/scientific_lay_summarisation-plos-norm.

Karnold-Walmer focuses on extracting relevant keywords from the input text, making it a powerful tool for keyword identification and text classification. It was fine-tuned on & supports text input of up to 16,384 tokens.

It achieves the following results on the evaluation set:

  • Loss: 0.8844
  • Rouge1: 46.7593
  • Rouge2: 28.3538
  • Rougel: 42.2921
  • Rougelsum: 42.2774
  • Gen Len: 78.1706

Intended Uses & Limitations

Karnold-Walmer is intended to be used for keyword extraction and text classification in various domains, such as scientific literature, biomedical research articles, and more. By analyzing the content of an input text, the model generates a list of relevant keywords that describe the topic of the article.

It is important to note, however, that Karnold-Walmer is specifically trained to decode text similar to the "keywords" column and is not designed for summarization tasks. For accurate keyword extraction and text classification, the model should be used within the limits of its training data and intended purpose (see what happens when you try the out-of-domain API examples).

Training and Evaluation Data

Karnold-Walmer was trained on the PLOS dataset, which contains full biomedical research articles paired with expert-written lay summaries and keyword lists. The model was tuned to decode the "keywords" column in the dataset, focusing on keyword extraction and text classification tasks.

Wordcloud

wordcloud-kw

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.0471 0.15 100 1.6138 12.4374 4.1861 11.1863 11.1833 324.6971
1.5654 0.3 200 1.3447 23.9982 11.1431 21.4173 21.4413 176.0294
1.3467 0.45 300 1.2038 33.8084 18.1588 30.4748 30.4142 107.7735
1.4398 0.6 400 1.1054 37.772 20.8967 33.859 33.8324 102.9029
1.306 0.75 500 1.0478 39.2642 22.0388 35.6578 35.5773 91.1235
1.1677 0.9 600 0.9994 40.5149 22.8507 36.3888 36.3499 103.9118
1.078 1.05 700 0.9627 42.301 24.2523 38.0739 38.0532 88.4941
1.0942 1.2 800 0.9443 44.5907 26.2046 39.7461 39.6763 88.7559
1.0209 1.35 900 0.9108 45.357 26.861 40.6411 40.706 90.1206
1.1161 1.5 1000 0.9026 47.1362 28.6605 42.6406 42.6108 79.2412
1.1224 1.65 1100 0.8907 47.31 28.4395 42.6658 42.6509 78.4265
0.9857 1.8 1200 0.8862 46.7061 28.1586 42.3181 42.3105 80.5059
1.0011 1.95 1300 0.8844 46.7593 28.3538 42.2921 42.2774 78.1706
Downloads last month
9
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/karnold-walmer-base-biopapers