dtruong46me
commited on
Commit
•
6eaf2a5
1
Parent(s):
97e4014
Delete README.md
Browse files
README.md
DELETED
@@ -1,109 +0,0 @@
|
|
1 |
-
# Problem Description
|
2 |
-
|
3 |
-
This project aims to develop a system capable of automatically **summarizing short dialogue text**. This addresses the challenge of extracting concise yet informative summaries from conversational exchanges, enabling users to **quickly grasp the information of the dialogues**.
|
4 |
-
|
5 |
-
Summarizing these conversations can be valuable for various applications, such as:
|
6 |
-
- Streamlining information retrieval in customer service interactions
|
7 |
-
- Condensing meeting discussions for efficient review
|
8 |
-
- Providing concise overviews of chat conversations on social media platforms
|
9 |
-
|
10 |
-
This project tackles the task of automatically generating concise summaries, saving users time and effort while improving comprehension.
|
11 |
-
|
12 |
-
![](assets/image2.png)
|
13 |
-
|
14 |
-
<p align="center"><i>Source: Google Research</i></p>
|
15 |
-
|
16 |
-
**Input:** Dialogue text
|
17 |
-
|
18 |
-
Example:
|
19 |
-
```
|
20 |
-
Matt: Do you want to go for date?
|
21 |
-
Agnes: Wow! You caught me out with this question Matt.
|
22 |
-
...
|
23 |
-
Agnes: See you on saturday.
|
24 |
-
Matt: Yes, looking forward to it.
|
25 |
-
Agnes: Me too.
|
26 |
-
```
|
27 |
-
|
28 |
-
**Output:** Summarized dialogue
|
29 |
-
|
30 |
-
Example:
|
31 |
-
```
|
32 |
-
Matt invites Agnes for a date to get to know each other better. They'll go to the Georgian restaurant in Kazimierz on Saturday at 6 pm, and he'll pick her up on the way to the place.
|
33 |
-
```
|
34 |
-
|
35 |
-
# Dataset
|
36 |
-
|
37 |
-
We'll utilize the `DialogSum` dataset accessible from 🤗**Hugging Face** (https://huggingface.co/datasets/knkarthick/dialogsum) and **Paper** (https://arxiv.org/pdf/2105.06762.pdf). This dataset comprises real-life dialogue scenarios paired with corresponding manually crafted summaries and dialogue topics.
|
38 |
-
|
39 |
-
`DialogSum` is a large-scale dialogue summarization dataset, consisting of **13,460** (Plus 100 holdout data for topic generation) dialogues with corresponding manually labeled summaries and topics.
|
40 |
-
|
41 |
-
Here's a sample of the `DialogSum` dataset structure:
|
42 |
-
|
43 |
-
|
44 |
-
|id|dialogue|summary|topic|
|
45 |
-
|-|-|-|-|
|
46 |
-
|train_3|#Person1#: Why didn't you tell me you had a girlfriend? #Person2#: Sorry, I thought you knew. ... #Person1#: Oh, you men! You are all the same.|#Person1#'s angry because #Person2# didn't tell #Person1# that #Person2# had a girlfriend and would marry her.|have a girl friend|
|
47 |
-
|train_16|#Person1#: Tell me something about your Valentine's Day. ...#Person2#: Yeah, that is what the holiday is for, isn't it?|#Person2# tells #Person1# their Valentine's Day. #Person1# feels it's romantic.|Valentine's Day|
|
48 |
-
|...|...|...|...|
|
49 |
-
|
50 |
-
**Distribution of dataset**
|
51 |
-
|
52 |
-
|Dialogue|Summary|Dialogue + Summary|
|
53 |
-
|:-:|:-:|:-:|
|
54 |
-
|![](assets/hist_dialogue.png)|![](assets/hist_summary.png)|![](assets/hist_dialogue+summary.png)|
|
55 |
-
|
56 |
-
# Method
|
57 |
-
|
58 |
-
### Pre-trained Language Models:
|
59 |
-
|
60 |
-
This project explores two powerful LLMs well-suited for dialogue summarization:
|
61 |
-
|
62 |
-
- **FLAN-T5:** This model excels at understanding complex relationships within text, making it effective in summarizing the nuances of conversations.
|
63 |
-
- **BART:** This model boasts strong capabilities in text generation tasks, making it adept at generating informative and well-structured summaries.
|
64 |
-
|
65 |
-
### Fine-tuning Techniques:
|
66 |
-
|
67 |
-
To tailor these LLMs specifically for dialogue summarization, we will investigate several fine-tuning approaches:
|
68 |
-
|
69 |
-
- Instruction Fine-tuning
|
70 |
-
- Parameter Efficient Fine Tuning (PEFT)
|
71 |
-
+ Low-Rank Adaptation **(LoRA)**
|
72 |
-
+ Quantized Low-Rank Adaptation **(QLoRA)**
|
73 |
-
|
74 |
-
# Installation
|
75 |
-
|
76 |
-
```
|
77 |
-
!git clone "https://github.com/dtruong46me/dialogue-text-summarization.git"
|
78 |
-
```
|
79 |
-
|
80 |
-
# Contributions
|
81 |
-
|
82 |
-
**Supervisor:** Prof. Le Thanh Huong
|
83 |
-
|
84 |
-
**Student Group:**
|
85 |
-
|
86 |
-
|No.|Name|Student ID|Email|
|
87 |
-
|:-:|-|:-:|-|
|
88 |
-
|1|Phan Dinh Truong (Leader)|20214937|[email protected]|
|
89 |
-
|2|Nguyen Tung Luong|20214913|[email protected]|
|
90 |
-
|3|Vu Tuan Minh|20210597|[email protected]|
|
91 |
-
|4|Hoang Tu Quyen|20214929|[email protected]|
|
92 |
-
|
93 |
-
# [Bonus] How to run Streamlit on Kaggle
|
94 |
-
|
95 |
-
```
|
96 |
-
!pip install -q streamlit
|
97 |
-
```
|
98 |
-
|
99 |
-
```
|
100 |
-
!wget -q -O - ipv4.icanhazip.com
|
101 |
-
```
|
102 |
-
|
103 |
-
```
|
104 |
-
!npm install -g localtunnel -q
|
105 |
-
```
|
106 |
-
|
107 |
-
```
|
108 |
-
!streamlit run "/kaggle/working/dialogue-text-summarization/streamlit_app.py" & npx localtunnel --port 8501
|
109 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|