Update README.md

c0ec00c 12 months ago

4.27 kB

	---
	base_model: google/pegasus-x-base
	tags:
	- generated_from_trainer
	datasets:
	- ccdv/arxiv-summarization
	model-index:
	- name: Paper-Summarization-ArXiv
	results:
	- task:
	name: Summarization
	type: summarization
	dataset:
	name: ccdv/arxiv-summarization
	type: ccdv/arxiv-summarization
	config: section
	split: test
	args: section
	metrics:
	- name: ROUGE-1
	type: rouge
	value: 43.2305
	- name: ROUGE-2
	type: rouge
	value: 16.6571
	- name: ROUGE-L
	type: rouge
	value: 24.4315
	- name: ROUGE-LSum
	type: rouge
	value: 33.9399
	license: bigscience-openrail-m
	language:
	- en
	metrics:
	- rouge
	library_name: transformers
	pipeline_tag: summarization
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Paper-Summarization-ArXiv

	This model is a fine-tuned version of [google/pegasus-x-base](https://huggingface.co/google/pegasus-x-base) on the arxiv-summarization dataset.


	Base Model: [Pegasus-x-base (State-of-the-art for Long Context Summarization)](https://huggingface.co/google/pegasus-x-base)

	Finetuning Dataset:
	- We used full of ArXiv Dataset (Cohan et al., 2018, NAACL-HLT 2018) [[PDF]](https://arxiv.org/abs/1804.05685)
	- (Full length is 200,000+)

	GPU: (RTX A6000) x 1

	Train time: About 120 hours for 5 epochs

	Test time: About 8 hours for test dataset.


	## Intended uses & limitations

	- Research Paper Summarization



	## Compare to Baseline
	- Pegasus-X-base zero-shot Performance:
	- R-1 \| R-2 \| R-L \| R-LSUM : 6.2269 \| 0.7894 \| 4.6905 \| 5.4591

	- This model


	- R-1 \| R-2 \| R-L \| R-LSUM : 43.2305 \| 16.6571 \| 24.4315 \| 33.9399 at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	length_penalty=1, num_beams=2, max_length=128*4,min_length=150, no_repeat_ngram_size= 3, top_k=25,top_p=0.95)

	```
	- R-1 \| R-2 \| R-L \| R-LSUM : 40.8486 \| 16.3717 \| 25.2937 \| 33.6923 (refer to PEGASUS-X's [paper](https://arxiv.org/pdf/2208.04347.pdf)) at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	length_penalty=1, num_beams=1, max_length=128*2,top_p=1)
	```
	- R-1 \| R-2 \| R-L \| R-LSUM : 38.1317 \| 15.0357 \| 23.0286 \| 30.9938 (Diverse Beam-Search Decoding) at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	num_beam_groups=5,diversity_penalty=1.0,num_beams=5,min_length=150,max_length=128*4)
	```
	- R-1 \| R-2 \| R-L \| R-LSUM : 43.3017 \| 16.6023 \| 24.1867 \| 33.7019 at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	length_penalty=1.2, num_beams=4, max_length=128*4,min_length=150, no_repeat_ngram_size= 3, temperature=0.9,top_k=50,top_p=0.92)

	```



	## Training procedure

	We use huggingface-based environment such as datasets, trainer, etc.


	### Training hyperparameters

	The following hyperparameters were used during training:

	```(python)
	learning_rate: 1e-05,
	train_batch_size: 1,
	eval_batch_size: 1,
	seed: 42,
	gradient_accumulation_steps: 64,
	total_train_batch_size: 64,
	optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08,
	lr_scheduler_type: linear,
	lr_scheduler_warmup_steps: 1586,
	num_epochs: 5
	```


	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 2.6153 \| 1.0 \| 3172 \| 2.1045 \|
	\| 2.202 \| 2.0 \| 6344 \| 2.0511 \|
	\| 2.1547 \| 3.0 \| 9516 \| 2.0282 \|
	\| 2.132 \| 4.0 \| 12688 \| 2.0164 \|
	\| 2.1222 \| 5.0 \| 15860 \| 2.0127 \|





	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.0.1
	- Datasets 2.12.0
	- Tokenizers 0.13.2