File size: 3,934 Bytes
d866695
 
f6e0a99
 
 
 
 
 
 
 
 
 
 
d866695
f6e0a99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83bfb1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: mit
datasets:
- big_patent
language:
- en
metrics:
- rouge
tags:
- summarization
- summarizer
- text summarization
- abstractive summarization
---

[![Generic badge](https://img.shields.io/badge/STATUS-WIP-yellow.svg)](https://shields.io/)

>  ⚠️ ATTENTION
> 
> NOTE THAT FOR THE MODEL TO WORK AS INTENDED, YOU NEED TO APPEND THE 'summarize:' PREFIX BEFORE THE INPUT DATA

# Table of Contents

1. [Model Details](#model-details)
2. [Uses](#uses)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)
5. [How To Get Started With the Model](#how-to-get-started-with-the-model)
6. [Citation](#citation)
7. [Author](#model-card-authors)

# Model Details

This T5 model, named `KipperDev/t5_summarizer_model`, is fine-tuned specifically for the task of document summarization. It's based on the T5 architecture, renowned for its flexibility and efficiency across a wide range of NLP tasks, including summarization. This model aims to generate concise, coherent, and informative summaries from extensive text documents, leveraging the power of the T5's text-to-text approach.

# Uses

This model is intended for use in summarizing long-form documents into concise, informative abstracts. It's particularly useful for professionals and researchers who need to quickly grasp the essence of detailed reports, research papers, or articles without reading the entire text. 

# Training Details

## Training Data

The model was trained using the [Big Patent Dataset](https://huggingface.co/datasets/big_patent), comprising 1.3 million US patent documents and their corresponding human-written summaries. This dataset was chosen for its rich language and complex structure, representative of the challenging nature of document summarization tasks. Training involved multiple subsets of the dataset to ensure broad coverage and robust model performance across varied document types.

## Training Procedure

Training was conducted over three rounds, with initial settings including a learning rate of 0.00002, batch size of 8, and 4 epochs. Subsequent rounds adjusted these parameters to refine model performance further. A linear decay learning rate schedule was applied to enhance model learning efficiency over time.

# Evaluation

Model performance was evaluated using the ROUGE metric, highlighting its capability to generate summaries closely aligned with human-written abstracts.

| **Metric**                              | **Value** |
|-----------------------------------------|-----------|
| Evaluation Loss (Eval Loss)             | 1.9984    |
| Rouge-1                                 | 0.503     |
| Rouge-2                                 | 0.286     |
| Rouge-L                                 | 0.3813    |
| Rouge-Lsum                              | 0.3813    |
| Average Generation Length (Gen Len)     | 151.918   |
| Runtime (seconds)                       | 714.4344  |
| Samples per Second                      | 2.679     |
| Steps per Second                        | 0.336     |


# How to Get Started with the Model

Use the code below to get started with the model.

<details>
<summary> Click to expand </summary>

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("KipperDev/t5_summarizer_model")
model = T5ForConditionalGeneration.from_pretrained("KipperDev/t5_summarizer_model")

# Example usage
prefix = "summarize: "
input_text = "Your input text here."
input_ids = tokenizer.encode(prefix + input_text, return_tensors="pt")
summary_ids = model.generate(input_ids)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
```

# Citation

**BibTeX:**

```bibtex
@article{kipper_t5_summarizer,
 // SOON
}
```

# Authors

This model card was written by [Fernanda Kipper](https://www.fernandakipper.com/)

# How to Get Started with the Model

Use the code below to get started with the model.