lxyuan commited on
Commit
cb7a542
1 Parent(s): aa42b2a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ - distilbart
5
+ model-index:
6
+ - name: distilbart-finetuned-summarization
7
+ results: []
8
+ license: apache-2.0
9
+ datasets:
10
+ - cnn_dailymail
11
+ - xsum
12
+ - samsum
13
+ - ccdv/pubmed-summarization
14
+ language:
15
+ - en
16
+ metrics:
17
+ - rouge
18
+ ---
19
+
20
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
+ should probably proofread and complete it, then remove this comment. -->
22
+
23
+ # distilgpt2-finetuned-finance
24
+
25
+ This model is a fine-tuned version of distilgpt2 on the the combination of 4 different finance datasets:
26
+ - [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail)
27
+ - [samsum](https://huggingface.co/datasets/samsum)
28
+ - [xsum](https://huggingface.co/datasets/xsum)
29
+ - [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization)
30
+
31
+ ## Training and evaluation data
32
+
33
+ One can reproduce the dataset using the following code:
34
+
35
+ ```python
36
+ from datasets import DatasetDict, load_dataset
37
+ from datasets import concatenate_datasets
38
+
39
+ xsum_dataset = load_dataset("xsum")
40
+ pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary")
41
+ cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary")
42
+ samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document")
43
+
44
+ summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]])
45
+ summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]])
46
+ summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]])
47
+
48
+ raw_datasets = DatasetDict()
49
+ raw_datasets["train"] = summary_train
50
+ raw_datasets["validation"] = summary_validation
51
+ raw_datasets["test"] = summary_test
52
+
53
+ ```
54
+
55
+ ## Inference example
56
+
57
+ ```python
58
+ from transformers import pipeline
59
+
60
+ pipe = pipeline("text2text-generation", model="sshleifer/distilbart-cnn-12-6")
61
+
62
+ text = """The tower is 324 metres (1,063 ft) tall, about the same height as
63
+ an 81-storey building, and the tallest structure in Paris. Its base is square,
64
+ measuring 125 metres (410 ft) on each side. During its construction, the
65
+ Eiffel Tower surpassed the Washington Monument to become the tallest man-made
66
+ structure in the world, a title it held for 41 years until the Chrysler Building
67
+ in New York City was finished in 1930. It was the first structure to reach a
68
+ height of 300 metres. Due to the addition of a broadcasting aerial at the top
69
+ of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres
70
+ (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest
71
+ free-standing structure in France after the Millau Viaduct.
72
+ """
73
+
74
+ >>>"""The Eiffel Tower is the tallest man-made structure in the world .
75
+ The tower is 324 metres tall, about the same height as an 81-storey building .
76
+ Due to the addition of a broadcasting aerial in 1957, it is now taller than
77
+ the Chrysler Building by 5.2 metres .
78
+ """
79
+ ```
80
+
81
+ ## Training procedure
82
+
83
+ Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb)
84
+
85
+ ### Training hyperparameters
86
+
87
+ The following hyperparameters were used during training:
88
+ - evaluation_strategy="epoch",
89
+ - save_strategy="epoch",
90
+ - logging_strategy="epoch",
91
+ - learning_rate=2e-5,
92
+ - per_device_train_batch_size=2,
93
+ - per_device_eval_batch_size=2,
94
+ - gradient_accumulation_steps=64,
95
+ - total_train_batch_size: 128
96
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
97
+ - lr_scheduler_type: linear
98
+ - weight_decay=0.01,
99
+ - save_total_limit=2,
100
+ - num_train_epochs=10,
101
+ - predict_with_generate=True,
102
+ - fp16=True,
103
+ - push_to_hub=True
104
+
105
+ ### Training results
106
+ _Training is still in progress_
107
+
108
+ | Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len |
109
+ |-------|---------------|-----------------|--------|--------|--------|-----------|---------|
110
+ | 0 | 1.779700 | 1.719054 | 40.0039| 17.9071| 27.8825| 34.8886 | 88.8936 |
111
+
112
+ ### Framework versions
113
+
114
+ - Transformers 4.30.2
115
+ - Pytorch 2.0.1+cu117
116
+ - Datasets 2.13.1
117
+ - Tokenizers 0.13.3