lxyuan commited on
Commit
1e7a190
2 Parent(s): 8ea9e5f f61de99

Merge branch 'main' of https://huggingface.co/lxyuan/distilbart-finetuned-summarization into main

Browse files
Files changed (2) hide show
  1. README.md +123 -0
  2. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ - distilbart
5
+ model-index:
6
+ - name: distilbart-finetuned-summarization
7
+ results: []
8
+ license: apache-2.0
9
+ datasets:
10
+ - cnn_dailymail
11
+ - xsum
12
+ - samsum
13
+ - ccdv/pubmed-summarization
14
+ language:
15
+ - en
16
+ metrics:
17
+ - rouge
18
+ ---
19
+
20
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
+ should probably proofread and complete it, then remove this comment. -->
22
+
23
+ # distilgpt2-finetuned-finance
24
+
25
+ This model is a further fine-tuned version of [distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) on the the combination of 4 different summarisation datasets:
26
+ - [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail)
27
+ - [samsum](https://huggingface.co/datasets/samsum)
28
+ - [xsum](https://huggingface.co/datasets/xsum)
29
+ - [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization)
30
+
31
+ Please check out the offical model page and paper:
32
+ - [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6)
33
+ - [Pre-trained Summarization Distillation](https://arxiv.org/abs/2010.13002)
34
+
35
+ ## Training and evaluation data
36
+
37
+ One can reproduce the dataset using the following code:
38
+
39
+ ```python
40
+ from datasets import DatasetDict, load_dataset
41
+ from datasets import concatenate_datasets
42
+
43
+ xsum_dataset = load_dataset("xsum")
44
+ pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary")
45
+ cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary")
46
+ samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document")
47
+
48
+ summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]])
49
+ summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]])
50
+ summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]])
51
+
52
+ raw_datasets = DatasetDict()
53
+ raw_datasets["train"] = summary_train
54
+ raw_datasets["validation"] = summary_validation
55
+ raw_datasets["test"] = summary_test
56
+
57
+ ```
58
+
59
+ ## Inference example
60
+
61
+ ```python
62
+ from transformers import pipeline
63
+
64
+ pipe = pipeline("text2text-generation", model="lxyuan/distilbart-finetuned-summarization")
65
+
66
+ text = """The tower is 324 metres (1,063 ft) tall, about the same height as
67
+ an 81-storey building, and the tallest structure in Paris. Its base is square,
68
+ measuring 125 metres (410 ft) on each side. During its construction, the
69
+ Eiffel Tower surpassed the Washington Monument to become the tallest man-made
70
+ structure in the world, a title it held for 41 years until the Chrysler Building
71
+ in New York City was finished in 1930. It was the first structure to reach a
72
+ height of 300 metres. Due to the addition of a broadcasting aerial at the top
73
+ of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres
74
+ (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest
75
+ free-standing structure in France after the Millau Viaduct.
76
+ """
77
+
78
+ pipe(text)
79
+
80
+ >>>"""The Eiffel Tower is the tallest man-made structure in the world .
81
+ The tower is 324 metres tall, about the same height as an 81-storey building .
82
+ Due to the addition of a broadcasting aerial in 1957, it is now taller than
83
+ the Chrysler Building by 5.2 metres .
84
+ """
85
+ ```
86
+
87
+ ## Training procedure
88
+
89
+ Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb)
90
+
91
+ ### Training hyperparameters
92
+
93
+ The following hyperparameters were used during training:
94
+ - evaluation_strategy="epoch",
95
+ - save_strategy="epoch",
96
+ - logging_strategy="epoch",
97
+ - learning_rate=2e-5,
98
+ - per_device_train_batch_size=2,
99
+ - per_device_eval_batch_size=2,
100
+ - gradient_accumulation_steps=64,
101
+ - total_train_batch_size: 128
102
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
103
+ - lr_scheduler_type: linear
104
+ - weight_decay=0.01,
105
+ - save_total_limit=2,
106
+ - num_train_epochs=10,
107
+ - predict_with_generate=True,
108
+ - fp16=True,
109
+ - push_to_hub=True
110
+
111
+ ### Training results
112
+ _Training is still in progress_
113
+
114
+ | Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len |
115
+ |-------|---------------|-----------------|--------|--------|--------|-----------|---------|
116
+ | 0 | 1.779700 | 1.719054 | 40.0039| 17.9071| 27.8825| 34.8886 | 88.8936 |
117
+
118
+ ### Framework versions
119
+
120
+ - Transformers 4.30.2
121
+ - Pytorch 2.0.1+cu117
122
+ - Datasets 2.13.1
123
+ - Tokenizers 0.13.3
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8df57a92dddf33720eb147d0329d3a7dcdcce282dc0c8fe33e89e2be3e4a858e
3
+ size 1222284056