File size: 4,588 Bytes
cb7a542
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2fc49fd
cb7a542
 
 
 
 
2fc49fd
 
 
 
cb7a542
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef3fdb
cb7a542
 
 
 
 
 
 
 
 
 
 
 
 
6ef3fdb
 
cb7a542
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
tags:
- generated_from_trainer
- distilbart
model-index:
- name: distilbart-finetuned-summarization
  results: []
license: apache-2.0
datasets:
- cnn_dailymail
- xsum
- samsum
- ccdv/pubmed-summarization
language:
- en
metrics:
- rouge
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# distilgpt2-finetuned-finance

This model is a further fine-tuned version of [distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) on the the combination of 4 different summarisation datasets:
- [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail)
- [samsum](https://huggingface.co/datasets/samsum)
- [xsum](https://huggingface.co/datasets/xsum)
- [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization)

Please check out the offical model page and paper:
- [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6)
- [Pre-trained Summarization Distillation](https://arxiv.org/abs/2010.13002)

## Training and evaluation data

One can reproduce the dataset using the following code:

```python
from datasets import DatasetDict, load_dataset
from datasets import concatenate_datasets

xsum_dataset = load_dataset("xsum")
pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary")
cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary")
samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document")

summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]])
summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]])
summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]])

raw_datasets = DatasetDict()
raw_datasets["train"] = summary_train
raw_datasets["validation"] = summary_validation
raw_datasets["test"] = summary_test

```

## Inference example

```python
from transformers import pipeline

pipe = pipeline("text2text-generation", model="lxyuan/distilbart-finetuned-summarization")

text = """The tower is 324 metres (1,063 ft) tall, about the same height as
an 81-storey building, and the tallest structure in Paris. Its base is square,
measuring 125 metres (410 ft) on each side. During its construction, the
Eiffel Tower surpassed the Washington Monument to become the tallest man-made
structure in the world, a title it held for 41 years until the Chrysler Building
in New York City was finished in 1930. It was the first structure to reach a
height of 300 metres. Due to the addition of a broadcasting aerial at the top
of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres
(17 ft). Excluding transmitters, the Eiffel Tower is the second tallest
free-standing structure in France after the Millau Viaduct.
"""

pipe(text)

>>>"""The Eiffel Tower is the tallest man-made structure in the world .
The tower is 324 metres tall, about the same height as an 81-storey building .
Due to the addition of a broadcasting aerial in 1957, it is now taller than
the Chrysler Building by 5.2 metres .
"""
```

## Training procedure

Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb)

### Training hyperparameters

The following hyperparameters were used during training:
- evaluation_strategy="epoch",
- save_strategy="epoch",
- logging_strategy="epoch",
- learning_rate=2e-5,
- per_device_train_batch_size=2,
- per_device_eval_batch_size=2,
- gradient_accumulation_steps=64,
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- weight_decay=0.01,
- save_total_limit=2,
- num_train_epochs=10,
- predict_with_generate=True,
- fp16=True,
- push_to_hub=True

### Training results
_Training is still in progress_

| Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len |
|-------|---------------|-----------------|--------|--------|--------|-----------|---------|
| 0     | 1.779700      | 1.719054        | 40.0039| 17.9071| 27.8825| 34.8886   | 88.8936 |

### Framework versions

- Transformers 4.30.2
- Pytorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3