dtruong46me commited on
Commit
6eaf2a5
1 Parent(s): 97e4014

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -109
README.md DELETED
@@ -1,109 +0,0 @@
1
- # Problem Description
2
-
3
- This project aims to develop a system capable of automatically **summarizing short dialogue text**. This addresses the challenge of extracting concise yet informative summaries from conversational exchanges, enabling users to **quickly grasp the information of the dialogues**.
4
-
5
- Summarizing these conversations can be valuable for various applications, such as:
6
- - Streamlining information retrieval in customer service interactions
7
- - Condensing meeting discussions for efficient review
8
- - Providing concise overviews of chat conversations on social media platforms
9
-
10
- This project tackles the task of automatically generating concise summaries, saving users time and effort while improving comprehension.
11
-
12
- ![](assets/image2.png)
13
-
14
- <p align="center"><i>Source: Google Research</i></p>
15
-
16
- **Input:** Dialogue text
17
-
18
- Example:
19
- ```
20
- Matt: Do you want to go for date?
21
- Agnes: Wow! You caught me out with this question Matt.
22
- ...
23
- Agnes: See you on saturday.
24
- Matt: Yes, looking forward to it.
25
- Agnes: Me too.
26
- ```
27
-
28
- **Output:** Summarized dialogue
29
-
30
- Example:
31
- ```
32
- Matt invites Agnes for a date to get to know each other better. They'll go to the Georgian restaurant in Kazimierz on Saturday at 6 pm, and he'll pick her up on the way to the place.
33
- ```
34
-
35
- # Dataset
36
-
37
- We'll utilize the `DialogSum` dataset accessible from 🤗**Hugging Face** (https://huggingface.co/datasets/knkarthick/dialogsum) and **Paper** (https://arxiv.org/pdf/2105.06762.pdf). This dataset comprises real-life dialogue scenarios paired with corresponding manually crafted summaries and dialogue topics.
38
-
39
- `DialogSum` is a large-scale dialogue summarization dataset, consisting of **13,460** (Plus 100 holdout data for topic generation) dialogues with corresponding manually labeled summaries and topics.
40
-
41
- Here's a sample of the `DialogSum` dataset structure:
42
-
43
-
44
- |id|dialogue|summary|topic|
45
- |-|-|-|-|
46
- |train_3|#Person1#: Why didn't you tell me you had a girlfriend? #Person2#: Sorry, I thought you knew. ... #Person1#: Oh, you men! You are all the same.|#Person1#'s angry because #Person2# didn't tell #Person1# that #Person2# had a girlfriend and would marry her.|have a girl friend|
47
- |train_16|#Person1#: Tell me something about your Valentine's Day. ...#Person2#: Yeah, that is what the holiday is for, isn't it?|#Person2# tells #Person1# their Valentine's Day. #Person1# feels it's romantic.|Valentine's Day|
48
- |...|...|...|...|
49
-
50
- **Distribution of dataset**
51
-
52
- |Dialogue|Summary|Dialogue + Summary|
53
- |:-:|:-:|:-:|
54
- |![](assets/hist_dialogue.png)|![](assets/hist_summary.png)|![](assets/hist_dialogue+summary.png)|
55
-
56
- # Method
57
-
58
- ### Pre-trained Language Models:
59
-
60
- This project explores two powerful LLMs well-suited for dialogue summarization:
61
-
62
- - **FLAN-T5:** This model excels at understanding complex relationships within text, making it effective in summarizing the nuances of conversations.
63
- - **BART:** This model boasts strong capabilities in text generation tasks, making it adept at generating informative and well-structured summaries.
64
-
65
- ### Fine-tuning Techniques:
66
-
67
- To tailor these LLMs specifically for dialogue summarization, we will investigate several fine-tuning approaches:
68
-
69
- - Instruction Fine-tuning
70
- - Parameter Efficient Fine Tuning (PEFT)
71
- + Low-Rank Adaptation **(LoRA)**
72
- + Quantized Low-Rank Adaptation **(QLoRA)**
73
-
74
- # Installation
75
-
76
- ```
77
- !git clone "https://github.com/dtruong46me/dialogue-text-summarization.git"
78
- ```
79
-
80
- # Contributions
81
-
82
- **Supervisor:** Prof. Le Thanh Huong
83
-
84
- **Student Group:**
85
-
86
- |No.|Name|Student ID|Email|
87
- |:-:|-|:-:|-|
88
- |1|Phan Dinh Truong (Leader)|20214937|[email protected]|
89
- |2|Nguyen Tung Luong|20214913|[email protected]|
90
- |3|Vu Tuan Minh|20210597|[email protected]|
91
- |4|Hoang Tu Quyen|20214929|[email protected]|
92
-
93
- # [Bonus] How to run Streamlit on Kaggle
94
-
95
- ```
96
- !pip install -q streamlit
97
- ```
98
-
99
- ```
100
- !wget -q -O - ipv4.icanhazip.com
101
- ```
102
-
103
- ```
104
- !npm install -g localtunnel -q
105
- ```
106
-
107
- ```
108
- !streamlit run "/kaggle/working/dialogue-text-summarization/streamlit_app.py" & npx localtunnel --port 8501
109
- ```