Spaces:
Running
Running
proper commit
Browse files- .gitignore +8 -0
- Dockerfile +14 -0
- README.md +77 -0
- TranscriptApi/__init__.py +21 -0
- TranscriptApi/common/__init__.py +0 -0
- TranscriptApi/common/utils.py +212 -0
- TranscriptApi/main/__init__.py +0 -0
- TranscriptApi/main/routes.py +13 -0
- TranscriptApi/models.py +22 -0
- TranscriptApi/resources/__init__.py +0 -0
- TranscriptApi/resources/routes.py +62 -0
- TranscriptApi/static/app.js +118 -0
- TranscriptApi/static/images/background-dark.svg +14 -0
- TranscriptApi/static/images/background-light.svg +14 -0
- TranscriptApi/static/styles.css +125 -0
- TranscriptApi/templates/home.html +78 -0
- app.py +18 -0
- instance/site.db +0 -0
.gitignore
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#python cache
|
2 |
+
**/__pycache__/
|
3 |
+
|
4 |
+
|
5 |
+
#my files
|
6 |
+
|
7 |
+
trial.py
|
8 |
+
test/
|
Dockerfile
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
|
2 |
+
# you will also find guides on how best to write your Dockerfile
|
3 |
+
|
4 |
+
FROM python:3.9
|
5 |
+
|
6 |
+
WORKDIR /code
|
7 |
+
|
8 |
+
COPY ./requirements.txt /code/requirements.txt
|
9 |
+
|
10 |
+
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
|
11 |
+
|
12 |
+
COPY . .
|
13 |
+
|
14 |
+
CMD ["gunicorn","-b","0.0.0.0:7860", "app:app","--timeout","950"]
|
README.md
CHANGED
@@ -8,3 +8,80 @@ pinned: false
|
|
8 |
---
|
9 |
|
10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
11 |
+
|
12 |
+
|
13 |
+
|
14 |
+
# TranscriptApi
|
15 |
+
|
16 |
+
TranscriptApi is a backend service written in Flask that provides a RESTful API for summarizing YouTube videos or uploaded files using deep learning models. It allows users to extract and summarize the textual content from video or audio files, enabling easy access to key information.
|
17 |
+
|
18 |
+
## Table of Contents
|
19 |
+
- [Features](#features)
|
20 |
+
- [Installation](#installation)
|
21 |
+
- [Usage](#usage)
|
22 |
+
|
23 |
+
## Features
|
24 |
+
|
25 |
+
- Extract and summarize textual content from YouTube videos or uploaded files.
|
26 |
+
- Utilizes deep learning models for accurate and efficient summarization.
|
27 |
+
- Provides a RESTful API for easy integration with other applications.
|
28 |
+
- Supports customization and configuration options to meet specific requirements.
|
29 |
+
|
30 |
+
## Installation
|
31 |
+
|
32 |
+
1. Clone the repository:
|
33 |
+
|
34 |
+
```
|
35 |
+
git clone https://github.com/th3bossc/TranscriptApi.git
|
36 |
+
```
|
37 |
+
|
38 |
+
2. Navigate to the project directory:
|
39 |
+
|
40 |
+
```
|
41 |
+
cd TranscriptApi
|
42 |
+
```
|
43 |
+
|
44 |
+
3. Install the required dependencies using pip:
|
45 |
+
|
46 |
+
```
|
47 |
+
pip install -r requirements.txt
|
48 |
+
```
|
49 |
+
|
50 |
+
4. Set up the necessary configuration variables, such as API keys, in the `.env` file.
|
51 |
+
|
52 |
+
5. Run the Flask development server:
|
53 |
+
|
54 |
+
```
|
55 |
+
python app.py
|
56 |
+
```
|
57 |
+
|
58 |
+
The server should now be running locally at `http://localhost:5000`.
|
59 |
+
|
60 |
+
## Usage
|
61 |
+
|
62 |
+
To utilize the TranscriptApi service, you can make requests to the provided API endpoints. Here's an example using cURL:
|
63 |
+
|
64 |
+
```bash and python requet examples
|
65 |
+
|
66 |
+
# summarizing video
|
67 |
+
curl -X GET http://localhost:5000/video_api/your-video-id
|
68 |
+
requests.get("http://localhost:5000/video_api/your-video-id")
|
69 |
+
|
70 |
+
# summaring pdf file
|
71 |
+
curl -X POST -H "Content-type : application/pdf" -F "[email protected]" http://localhost:5000/file_api/pdf
|
72 |
+
requests.post("http://localhost:5000/file_api/pdf", headers = {'Content-Type' : 'application/pdf'}, files = {'file' : open('yourfile.pdf', 'rb')})
|
73 |
+
|
74 |
+
# summaring text file
|
75 |
+
curl -X POST -H "Content-type : text/plain" -F "[email protected]" http://localhost:5000/file_api/txt
|
76 |
+
requests.post("http://localhost:5000/file_api/txt", headers = {'Content-Type' : 'text/plain'}, files = {'file' : open('yourfile.txt', 'rb')})
|
77 |
+
|
78 |
+
# summarizing raw text data
|
79 |
+
curl -X POST -d '{"text" : your-text-data}' http://localhost:5000/file_api/direct_text
|
80 |
+
requests.post("http://localhost:5000/file_api/direct_text, headers = {'Content-Type : 'application/json'}, json = {'text' : your-text-data})
|
81 |
+
|
82 |
+
```
|
83 |
+
|
84 |
+
Replace `your-video-id` with the actual YouTube video ID you want to summarize.
|
85 |
+
Replace `yourfile` with the actual file path of the file you want to summarize.
|
86 |
+
Replace `your-text-data` with the actual text string you want to summarize.
|
87 |
+
|
TranscriptApi/__init__.py
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from flask import Flask
|
2 |
+
from flask_sqlalchemy import SQLAlchemy
|
3 |
+
from flask_cors import CORS
|
4 |
+
db = SQLAlchemy()
|
5 |
+
|
6 |
+
SQLALCHEMY_DATABASE_URI = 'sqlite:///site.db'
|
7 |
+
|
8 |
+
def create_app():
|
9 |
+
app = Flask(__name__)
|
10 |
+
CORS(app)
|
11 |
+
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///site.db'
|
12 |
+
app.config['UPLOAD_FOLDER'] = 'TranscriptApi/common/files/'
|
13 |
+
db.init_app(app)
|
14 |
+
|
15 |
+
from TranscriptApi.resources.routes import resources
|
16 |
+
app.register_blueprint(resources)
|
17 |
+
|
18 |
+
from TranscriptApi.main.routes import main
|
19 |
+
app.register_blueprint(main)
|
20 |
+
|
21 |
+
return app
|
TranscriptApi/common/__init__.py
ADDED
File without changes
|
TranscriptApi/common/utils.py
ADDED
@@ -0,0 +1,212 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import openai
|
2 |
+
import os
|
3 |
+
import librosa
|
4 |
+
import soundfile as sf
|
5 |
+
from pytube import YouTube
|
6 |
+
import urllib.parse as urlparse
|
7 |
+
from moviepy.editor import VideoFileClip
|
8 |
+
import shutil
|
9 |
+
import whisper
|
10 |
+
import torch
|
11 |
+
from transformers import pipeline
|
12 |
+
from tqdm.auto import tqdm
|
13 |
+
from PyPDF2 import PdfReader
|
14 |
+
|
15 |
+
|
16 |
+
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
17 |
+
# device = 'cpu'
|
18 |
+
|
19 |
+
|
20 |
+
checkpoint = 'Th3BossC/SummarizationModel_t5-small_opeai_tldr'
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
|
25 |
+
|
26 |
+
|
27 |
+
############### video queries ###############
|
28 |
+
def title(video_id):
|
29 |
+
return YouTube('https://www.youtube.com/watch?v=' + video_id).title
|
30 |
+
|
31 |
+
def get_video_id(video_url):
|
32 |
+
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
|
33 |
+
query = urlparse.parse_qs(url_data.query)
|
34 |
+
video = query["v"][0]
|
35 |
+
return video
|
36 |
+
|
37 |
+
def get_video(video_url, location, filename = 'audio'):
|
38 |
+
if not os.path.exists(location):
|
39 |
+
os.makedirs(location)
|
40 |
+
video_filename = location + filename + '.mp4'
|
41 |
+
audio_filename = location + filename + '.mp3'
|
42 |
+
print('[INFO] downloading video...')
|
43 |
+
video = YouTube(video_url).streams.filter(file_extension = 'mp4').first().download(filename = video_filename)
|
44 |
+
print('something')
|
45 |
+
video = VideoFileClip(video_filename)
|
46 |
+
print('[INFO] extracting audio from video...')
|
47 |
+
video.audio.write_audiofile(audio_filename)
|
48 |
+
#os.remove(video_filename)
|
49 |
+
|
50 |
+
return audio_filename
|
51 |
+
|
52 |
+
############################################################
|
53 |
+
|
54 |
+
|
55 |
+
############### Audio ###############
|
56 |
+
def chunk_audio(filename, segment_length, output_dir):
|
57 |
+
if not os.path.isdir(output_dir):
|
58 |
+
os.mkdir(output_dir)
|
59 |
+
audio, sr = librosa.load(filename, sr = 44100)
|
60 |
+
duration = librosa.get_duration(y = audio, sr = sr)
|
61 |
+
num_segments = int(duration / segment_length) + 1
|
62 |
+
print(f'[INFO] Chunking {num_segments} chunks...')
|
63 |
+
|
64 |
+
audio_files = []
|
65 |
+
|
66 |
+
for i in range(num_segments):
|
67 |
+
start = i*segment_length*sr
|
68 |
+
end = (i+1)*segment_length*sr
|
69 |
+
segment = audio[start:end]
|
70 |
+
sf.write(os.path.join(output_dir, f"segment_{i}.mp3"), segment, sr)
|
71 |
+
audio_files.append(output_dir + f'segment_{i}.mp3')
|
72 |
+
|
73 |
+
print(audio_files)
|
74 |
+
#os.remove(filename)
|
75 |
+
return audio_files
|
76 |
+
|
77 |
+
def transcribe_audio(audio_files, output_file = None, model = whisper.load_model('base', device = device)):
|
78 |
+
print('[INFO] converting audio to text...')
|
79 |
+
transcripts = []
|
80 |
+
model.to(device)
|
81 |
+
for audio_file in audio_files:
|
82 |
+
response = model.transcribe(audio_file)
|
83 |
+
transcripts.append(response['text'])
|
84 |
+
|
85 |
+
if output_file is not None:
|
86 |
+
with open(output_file, 'w') as f:
|
87 |
+
for transcript in transcripts:
|
88 |
+
f.write(transcript + '\n')
|
89 |
+
|
90 |
+
return transcripts
|
91 |
+
|
92 |
+
############################################################
|
93 |
+
|
94 |
+
|
95 |
+
############################################################
|
96 |
+
|
97 |
+
############### Compile all functions ###############
|
98 |
+
def summarize_youtube_video(video_url, outputs_dir):
|
99 |
+
print(f'[INFO] running on {device}')
|
100 |
+
raw_audio_dir = f'{outputs_dir}/raw_audio/'
|
101 |
+
chunks_dir = f'{outputs_dir}/chunks/'
|
102 |
+
transcripts_file = f'{outputs_dir}/transcripts.txt'
|
103 |
+
summary_file = f'{outputs_dir}/summary.txt'
|
104 |
+
segment_length = 60*10
|
105 |
+
|
106 |
+
if os.path.exists(outputs_dir):
|
107 |
+
shutil.rmtree(outputs_dir)
|
108 |
+
os.mkdir(outputs_dir)
|
109 |
+
|
110 |
+
audio_filename = get_video(video_url, raw_audio_dir)
|
111 |
+
chunked_audio_files = chunk_audio(audio_filename, segment_length, chunks_dir)
|
112 |
+
transcriptions = transcribe_audio(chunked_audio_files, transcripts_file)
|
113 |
+
|
114 |
+
|
115 |
+
# splitting transcription into sentences
|
116 |
+
sentences = []
|
117 |
+
for transcript in transcriptions:
|
118 |
+
sentences += transcript.split('.')
|
119 |
+
|
120 |
+
sentences_len = [len(sentence) for sentence in sentences]
|
121 |
+
sentence_mean_length = sum(sentences_len) // len(sentences_len)
|
122 |
+
|
123 |
+
num_sentences_per_step = int(1600 / (sentence_mean_length))
|
124 |
+
num_steps = (len(sentences) // num_sentences_per_step) + (len(sentences) % num_sentences_per_step != 0)
|
125 |
+
|
126 |
+
print(f"""
|
127 |
+
[INFO] sentences_len : {len(sentences_len)}
|
128 |
+
[INFO] sentence_mean_length : {sentence_mean_length},
|
129 |
+
[INFO] num_sentences_per_step : {num_sentences_per_step},
|
130 |
+
[INFO] num_steps : {num_steps}
|
131 |
+
""")
|
132 |
+
|
133 |
+
summarizer = pipeline('summarization', model = checkpoint, tokenizer = checkpoint, max_length = 200, truncation = True, device = 0)
|
134 |
+
|
135 |
+
summaries = []
|
136 |
+
|
137 |
+
for i in tqdm(range(num_steps)):
|
138 |
+
chunk = ' '.join(sentences[num_sentences_per_step*i : num_sentences_per_step*(i+1)])
|
139 |
+
summary = summarizer(chunk, do_sample = False)[0]['summary_text']
|
140 |
+
summaries.append(summary)
|
141 |
+
|
142 |
+
complete_summary = ' '.join(summaries)
|
143 |
+
with open(summary_file, 'w') as f:
|
144 |
+
f.write(complete_summary)
|
145 |
+
return complete_summary
|
146 |
+
############################################################
|
147 |
+
|
148 |
+
|
149 |
+
|
150 |
+
############ File Summarize ############
|
151 |
+
|
152 |
+
def extract_text_pdf(file_location = 'TranscriptApi/static/files/temp.pdf'):
|
153 |
+
reader = PdfReader(file_location)
|
154 |
+
text = ""
|
155 |
+
for page in reader.pages:
|
156 |
+
text += page.extract_text()
|
157 |
+
return text;
|
158 |
+
|
159 |
+
def extract_text_txt(file_location = 'TranscriptApi/static/files/temp.txt'):
|
160 |
+
with open(file_location, "r") as f:
|
161 |
+
text = f.read()
|
162 |
+
return text
|
163 |
+
|
164 |
+
|
165 |
+
|
166 |
+
|
167 |
+
def summarize_string(text : str):
|
168 |
+
sentences = text.split('.')
|
169 |
+
|
170 |
+
summarizer = pipeline('summarization', model = checkpoint, tokenizer = checkpoint, max_length = 200, truncation = True, device = 0)
|
171 |
+
|
172 |
+
sentences_len = [len(sentence) for sentence in sentences]
|
173 |
+
sentence_mean_length = sum(sentences_len) // len(sentences_len)
|
174 |
+
|
175 |
+
num_sentences_per_step = int(1600 / (sentence_mean_length))
|
176 |
+
num_steps = (len(sentences) // num_sentences_per_step) + (len(sentences) % num_sentences_per_step != 0)
|
177 |
+
|
178 |
+
print(f"""
|
179 |
+
[INFO] sentences_len : {len(sentences_len)}
|
180 |
+
[INFO] sentence_mean_length : {sentence_mean_length},
|
181 |
+
[INFO] num_sentences_per_step : {num_sentences_per_step},
|
182 |
+
[INFO] num_steps : {num_steps}
|
183 |
+
""")
|
184 |
+
|
185 |
+
|
186 |
+
summaries = []
|
187 |
+
for i in tqdm(range(num_steps)):
|
188 |
+
chunk = ' '.join(sentences[num_sentences_per_step*i : num_sentences_per_step*(i+1)])
|
189 |
+
summary = summarizer(chunk, do_sample = False)[0]['summary_text']
|
190 |
+
summaries.append(summary)
|
191 |
+
|
192 |
+
complete_summary = ' '.join(summaries)
|
193 |
+
return complete_summary
|
194 |
+
|
195 |
+
|
196 |
+
################################################
|
197 |
+
|
198 |
+
|
199 |
+
def summarize_file(file_location, file_extension, working_dir = "TranscriptApi/static/files"):
|
200 |
+
# _, file_extension = os.path.splitext(file_location)
|
201 |
+
text = ""
|
202 |
+
if file_extension == '.pdf':
|
203 |
+
text = extract_text_pdf(file_location)
|
204 |
+
elif file_extension == '.txt':
|
205 |
+
text = extract_text_txt(file_location)
|
206 |
+
else:
|
207 |
+
return "[ERROR]"
|
208 |
+
|
209 |
+
if os.path.exists(working_dir):
|
210 |
+
shutil.rmtree(working_dir)
|
211 |
+
os.mkdir(working_dir)
|
212 |
+
return summarize_string(text)
|
TranscriptApi/main/__init__.py
ADDED
File without changes
|
TranscriptApi/main/routes.py
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from flask import Blueprint, render_template, url_for
|
2 |
+
from TranscriptApi.resources.routes import api
|
3 |
+
main = Blueprint('main', __name__)
|
4 |
+
|
5 |
+
@main.route('/')
|
6 |
+
@main.route('/home')
|
7 |
+
def home():
|
8 |
+
return render_template('home.html')
|
9 |
+
|
10 |
+
|
11 |
+
@main.route('/online')
|
12 |
+
def online():
|
13 |
+
return {"online" : "yes"}, 200
|
TranscriptApi/models.py
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from TranscriptApi import db
|
2 |
+
from datetime import datetime
|
3 |
+
|
4 |
+
class VideoSummary(db.Model):
|
5 |
+
id = db.Column(db.Integer, primary_key = True)
|
6 |
+
date = db.Column(db.DateTime(), nullable = False, default = datetime.utcnow)
|
7 |
+
video_id = db.Column(db.String(10), unique = True, nullable = False)
|
8 |
+
title = db.Column(db.String(100), nullable = False)
|
9 |
+
summary = db.Column(db.Text(), nullable = False)
|
10 |
+
|
11 |
+
def __repr__(self):
|
12 |
+
print(f'VideoSummary({self.id}, {self.video_id}, {self.title})')
|
13 |
+
|
14 |
+
|
15 |
+
class FileSummary(db.Model):
|
16 |
+
id = db.Column(db.Integer, primary_key = True)
|
17 |
+
date = db.Column(db.DateTime(), nullable = False, default = datetime.utcnow)
|
18 |
+
title = db.Column(db.String(100), nullable = False)
|
19 |
+
summary = db.Column(db.Text(), nullable = False)
|
20 |
+
|
21 |
+
def __repr__(self):
|
22 |
+
print(f"FileSummary({self.id}, {self.title})")
|
TranscriptApi/resources/__init__.py
ADDED
File without changes
|
TranscriptApi/resources/routes.py
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from flask import Blueprint, request, current_app
|
2 |
+
from flask_restful import Api, Resource
|
3 |
+
from TranscriptApi.common.utils import title, summarize_youtube_video, summarize_file, summarize_string
|
4 |
+
from TranscriptApi.models import VideoSummary, FileSummary
|
5 |
+
from TranscriptApi import db
|
6 |
+
import os
|
7 |
+
|
8 |
+
resources = Blueprint('resources', __name__)
|
9 |
+
api = Api(resources)
|
10 |
+
|
11 |
+
class VideoTranscript(Resource):
|
12 |
+
def get(self, video_id):
|
13 |
+
print(request)
|
14 |
+
summaryExist = VideoSummary.query.filter_by(video_id = video_id).first()
|
15 |
+
if summaryExist is not None:
|
16 |
+
return {'title' : summaryExist.title, 'summary' : summaryExist.summary}, 200
|
17 |
+
|
18 |
+
|
19 |
+
try:
|
20 |
+
video_title = title(video_id)
|
21 |
+
except:
|
22 |
+
return {'error' : 'Video ID not valid'}, 400
|
23 |
+
try:
|
24 |
+
summary = summarize_youtube_video('https://www.youtube.com/watch?v=' + video_id, 'TranscriptApi/common/audio')
|
25 |
+
newVideo = VideoSummary(title = video_title, video_id = video_id, summary = summary)
|
26 |
+
db.session.add(newVideo)
|
27 |
+
db.session.commit()
|
28 |
+
return {'title' : video_title, 'summary' : summary}, 200
|
29 |
+
except Exception as e:
|
30 |
+
return 500
|
31 |
+
|
32 |
+
|
33 |
+
api.add_resource(VideoTranscript, '/video_api/<string:video_id>')
|
34 |
+
|
35 |
+
|
36 |
+
class FileTranscript(Resource):
|
37 |
+
|
38 |
+
def post(self, type):
|
39 |
+
|
40 |
+
|
41 |
+
if type == 'pdf' or type == 'txt':
|
42 |
+
print(request.files)
|
43 |
+
file = request.files['file']
|
44 |
+
file_location = os.path.join(current_app.config.get('UPLOAD_FOLDER'), file.filename)
|
45 |
+
file.save(os.path.join(current_app.config.get('UPLOAD_FOLDER'), file.filename))
|
46 |
+
summary = summarize_file(file_location = file_location, file_extension = type)
|
47 |
+
file_name = file.filename
|
48 |
+
elif type == 'direct_text':
|
49 |
+
summary = summarize_string(request.json['text'])
|
50 |
+
file_name = "Entered Text"
|
51 |
+
if summary == "[ERROR]":
|
52 |
+
return {'error' : 'We are expreriencing some issues...'}, 500
|
53 |
+
else:
|
54 |
+
newSummary = FileSummary(title = file_name, summary = summary)
|
55 |
+
db.session.add(newSummary)
|
56 |
+
db.session.commit()
|
57 |
+
return {'title' : file_name, 'summary' : summary}, 200
|
58 |
+
print(file)
|
59 |
+
|
60 |
+
|
61 |
+
|
62 |
+
api.add_resource(FileTranscript, '/file_api/<string:type>')
|
TranscriptApi/static/app.js
ADDED
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
function youtube_video_id(url){
|
2 |
+
var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#&?]*).*/;
|
3 |
+
var match = url.match(regExp);
|
4 |
+
return (match&&match[7].length==11)? match[7] : false;
|
5 |
+
}
|
6 |
+
|
7 |
+
|
8 |
+
// Theme implementation
|
9 |
+
|
10 |
+
const theme = localStorage.getItem('theme');
|
11 |
+
const navbar_bg = localStorage.getItem('navbar-bg');
|
12 |
+
const navbar_color = localStorage.getItem('navbar-color');
|
13 |
+
const button_content = localStorage.getItem('button-content');
|
14 |
+
|
15 |
+
|
16 |
+
const themeButton = document.getElementById('theme');
|
17 |
+
const body = document.body;
|
18 |
+
const nav = document.getElementById('navbar');
|
19 |
+
|
20 |
+
body.classList.add(theme || 'light');
|
21 |
+
nav.classList.add(navbar_bg || 'bg-light');
|
22 |
+
nav.classList.add(navbar_color || 'navbar-light')
|
23 |
+
themeButton.innerHTML = button_content || '<i class="bi bi-moon-fill"></i> Toggle Theme';
|
24 |
+
|
25 |
+
|
26 |
+
themeButton.onclick = () => {
|
27 |
+
if (body.classList.contains('light')) {
|
28 |
+
body.classList.replace('light', 'dark');
|
29 |
+
nav.classList.replace('bg-light', 'bg-dark');
|
30 |
+
nav.classList.replace('navbar-light', 'navbar-dark');
|
31 |
+
themeButton.innerHTML = '<i class="bi bi-brightness-high-fill"></i> Toggle Theme'
|
32 |
+
|
33 |
+
localStorage.setItem('theme', 'dark');
|
34 |
+
localStorage.setItem('navbar-bg', 'bg-dark');
|
35 |
+
localStorage.setItem('navbar-color', 'navbar-dark');
|
36 |
+
localStorage.setItem('button-content', themeButton.innerHTML);
|
37 |
+
}
|
38 |
+
else {
|
39 |
+
body.classList.replace('dark', 'light');
|
40 |
+
nav.classList.replace('bg-dark', 'bg-light');
|
41 |
+
nav.classList.replace('navbar-dark', 'navbar-light');
|
42 |
+
themeButton.innerHTML = '<i class="bi bi-moon-fill"></i> Toggle Theme';
|
43 |
+
|
44 |
+
localStorage.setItem('theme', 'light');
|
45 |
+
localStorage.setItem('navbar-bg', 'bg-light');
|
46 |
+
localStorage.setItem('navbar-color', 'navbar-light');
|
47 |
+
localStorage.setItem('button-content', themeButton.innerHTML);
|
48 |
+
}
|
49 |
+
}
|
50 |
+
|
51 |
+
// darkButton.onclick = () => {
|
52 |
+
// body.classList.replace('light', 'dark');
|
53 |
+
// nav.classList.replace('bg-light', 'bg-dark');
|
54 |
+
// nav.classList.replace('navbar-light', 'navbar-dark');
|
55 |
+
// darkButton.classList.add('active');
|
56 |
+
// darkButton.classList.add('disabled');
|
57 |
+
// lightButton.classList.remove('active');
|
58 |
+
// lightButton.classList.remove('disabled');
|
59 |
+
// };
|
60 |
+
|
61 |
+
// lightButton.onclick = () => {
|
62 |
+
// body.classList.replace('dark', 'light');
|
63 |
+
// nav.classList.replace('bg-dark', 'bg-light');
|
64 |
+
// nav.classList.replace('navbar-dark', 'navbar-light');
|
65 |
+
// lightButton.classList.add('active');
|
66 |
+
// lightButton.classList.add('disabled');
|
67 |
+
// darkButton.classList.remove('active');
|
68 |
+
// darkButton.classList.remove('disabled');
|
69 |
+
// };
|
70 |
+
|
71 |
+
|
72 |
+
const main_content = document.getElementById('main-content');
|
73 |
+
const video_title = document.getElementById('video-title');
|
74 |
+
const video_summary = document.getElementById('video-summary');
|
75 |
+
|
76 |
+
const button = document.getElementById('submit-btn');
|
77 |
+
const form = document.getElementById('url-form');
|
78 |
+
const url = document.getElementById('url')
|
79 |
+
|
80 |
+
async function getApiData(video_id) {
|
81 |
+
const response = await fetch('http://localhost:5000/video_api/' + video_id);
|
82 |
+
const jsonData = await response.json();
|
83 |
+
|
84 |
+
console.log(jsonData);
|
85 |
+
video_title.innerHTML = jsonData['title'];
|
86 |
+
return video_summary.innerHTML = jsonData['summary'];
|
87 |
+
|
88 |
+
}
|
89 |
+
|
90 |
+
|
91 |
+
|
92 |
+
form.addEventListener('submit', (e) => {
|
93 |
+
e.preventDefault();
|
94 |
+
video_url = url.value;
|
95 |
+
if (video_url == "")
|
96 |
+
return;
|
97 |
+
video_id = youtube_video_id(video_url);
|
98 |
+
video_title.innerHTML = 'Summarizing...';
|
99 |
+
console.log(main_content.classList);
|
100 |
+
//main_content.classList.remove('visually-hidden');
|
101 |
+
main_content.style.clipPath = 'circle(200% at 50% 50%)';
|
102 |
+
video_summary.innerHTML = '<div class="progress" role="progressbar" aria-label="Animated striped example" aria-valuenow="75" aria-valuemin="0" aria-valuemax="100"> \
|
103 |
+
<div class="progress-bar progress-bar-striped progress-bar-animated" style="width: 100%"></div> \
|
104 |
+
</div>';
|
105 |
+
if (video_id == false) {
|
106 |
+
video_title.innerHTML = '[Error]';
|
107 |
+
video_summary.innerHTML = 'Invalid video URL';
|
108 |
+
return;
|
109 |
+
}
|
110 |
+
try {
|
111 |
+
getApiData(video_id);
|
112 |
+
}
|
113 |
+
catch {
|
114 |
+
video_title.innerHTML = '[Error]'
|
115 |
+
video_summary.innerHTML = 'Error Video not found';
|
116 |
+
}
|
117 |
+
});
|
118 |
+
|
TranscriptApi/static/images/background-dark.svg
ADDED
TranscriptApi/static/images/background-light.svg
ADDED
TranscriptApi/static/styles.css
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
.dark {
|
2 |
+
/* --bg : #353941; */
|
3 |
+
--heading-bg : #26282B;
|
4 |
+
--button-bg : #5F85DB;
|
5 |
+
--button-hover-bg : #90B8F8;
|
6 |
+
--text-color : white;
|
7 |
+
--rev-text-color : black;
|
8 |
+
|
9 |
+
--bg : url('images/background-dark.svg');
|
10 |
+
}
|
11 |
+
|
12 |
+
|
13 |
+
.light {
|
14 |
+
/* --bg : #448EF6; */
|
15 |
+
--heading-bg : #75C2F6;
|
16 |
+
--button-bg : #65DAF7;
|
17 |
+
--button-hover-bg : #FFE981;
|
18 |
+
--text-color : black;
|
19 |
+
--rev-text-color : white;
|
20 |
+
|
21 |
+
--bg : url('images/background-light.svg');
|
22 |
+
}
|
23 |
+
|
24 |
+
nav {
|
25 |
+
transition: all 200ms ease-in-out;
|
26 |
+
transition-delay : 0ms;
|
27 |
+
}
|
28 |
+
|
29 |
+
body {
|
30 |
+
background : var(--bg);
|
31 |
+
background-size: cover;
|
32 |
+
transition: background 200ms ease-in-out, color 1000ms ease-in-out;
|
33 |
+
/* overflow: hidden; */
|
34 |
+
}
|
35 |
+
|
36 |
+
.grid {
|
37 |
+
display: flex;
|
38 |
+
flex-direction: column;
|
39 |
+
flex-wrap: wrap;
|
40 |
+
/* gap: 1rem; */
|
41 |
+
grid-template-columns: minmax(240px, 1fr);
|
42 |
+
grid-template-rows: 240px;
|
43 |
+
margin : 10px;
|
44 |
+
padding : 20px;
|
45 |
+
}
|
46 |
+
|
47 |
+
|
48 |
+
|
49 |
+
|
50 |
+
.heading {
|
51 |
+
color : var(--text-color);
|
52 |
+
margin : minmax(10px, 100px);
|
53 |
+
padding: 50px;
|
54 |
+
text-align: center;
|
55 |
+
align-self: center;
|
56 |
+
font-family: 'Opens Sans', sans-serif;
|
57 |
+
font-style: italic;
|
58 |
+
font-weight: 800;
|
59 |
+
background-color: var(--heading-bg);
|
60 |
+
border-radius: 8px;
|
61 |
+
filter: drop-shadow(.3rem .3rem 4px black);
|
62 |
+
transition: all 200ms ease-in-out;
|
63 |
+
transition-delay : 200ms;
|
64 |
+
}
|
65 |
+
|
66 |
+
.url-submit-form {
|
67 |
+
padding : 50px;
|
68 |
+
display: flex;
|
69 |
+
flex-direction: column;
|
70 |
+
align-items: center;
|
71 |
+
justify-content: center;
|
72 |
+
}
|
73 |
+
|
74 |
+
input[type = 'text'] {
|
75 |
+
text-align : center;
|
76 |
+
border: none;
|
77 |
+
}
|
78 |
+
|
79 |
+
|
80 |
+
input[type = 'text']::placeholder {
|
81 |
+
color: var(--text-color);
|
82 |
+
opacity: 0.4;
|
83 |
+
}
|
84 |
+
|
85 |
+
.btn-primary {
|
86 |
+
background-color : var(--button-bg) !important;
|
87 |
+
border-color : var(--button-bg) !important;
|
88 |
+
color : var(--text-color) !important;
|
89 |
+
}
|
90 |
+
|
91 |
+
.btn-primary:hover {
|
92 |
+
background-color: var(--button-hover-bg) !important;
|
93 |
+
border-color : var(--button-hover-bg) !important;
|
94 |
+
color : black !important;
|
95 |
+
|
96 |
+
}
|
97 |
+
|
98 |
+
|
99 |
+
.text {
|
100 |
+
/* grid-column : span 1 / auto; */
|
101 |
+
color : var(--text-color);
|
102 |
+
padding : 30px;
|
103 |
+
border: 2px solid var(--rev-text-color);
|
104 |
+
border-radius: 8px;
|
105 |
+
backdrop-filter: blur(10px);
|
106 |
+
clip-path: circle(0% at 50% 0%);
|
107 |
+
transition : all 200ms ease-in-out, clip-path 500ms ease-in-out;
|
108 |
+
transition-delay : 400ms;
|
109 |
+
}
|
110 |
+
|
111 |
+
|
112 |
+
.title {
|
113 |
+
font-family :'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
|
114 |
+
font-style : bold;
|
115 |
+
font-size: large;
|
116 |
+
text-align: center;
|
117 |
+
}
|
118 |
+
|
119 |
+
.content {
|
120 |
+
font-family: 'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
|
121 |
+
margin: 5px;
|
122 |
+
padding : 10px;
|
123 |
+
text-align: center;
|
124 |
+
}
|
125 |
+
|
TranscriptApi/templates/home.html
ADDED
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!DOCTYPE html>
|
2 |
+
<html lang="en">
|
3 |
+
<head>
|
4 |
+
<meta charset="UTF-8">
|
5 |
+
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
6 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
7 |
+
<title>Document</title>
|
8 |
+
|
9 |
+
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-KK94CHFLLe+nY2dmCWGMq91rCGa5gtU4mk92HdvYe+M/SXH301p5ILy+dN9+nJOZ" crossorigin="anonymous">
|
10 |
+
<link href = "{{url_for('static', filename = 'styles.css')}}" rel = "stylesheet">
|
11 |
+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/animate.css/4.1.1/animate.min.css">
|
12 |
+
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/font/bootstrap-icons.css">
|
13 |
+
|
14 |
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
15 |
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
16 |
+
<link href="https://fonts.googleapis.com/css2?family=Open+Sans:ital,wght@1,800&display=swap" rel="stylesheet">
|
17 |
+
|
18 |
+
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-ENjdO4Dr2bkBIFxQpeoTz1HIcje39Wm4jDKdf19U8gI4ddQ3GYNS7NTKfAdVQSZe" crossorigin="anonymous"></script>
|
19 |
+
<script defer src = "{{url_for('static', filename = 'app.js')}}"></script>
|
20 |
+
|
21 |
+
<nav class="navbar navbar-expand-lg sticky-top", id = "navbar">
|
22 |
+
<div class="container-fluid">
|
23 |
+
<a class="navbar-brand" href="#">Video summarizer</a>
|
24 |
+
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNavAltMarkup" aria-controls="navbarNavAltMarkup" aria-expanded="false" aria-label="Toggle navigation">
|
25 |
+
<span class="navbar-toggler-icon"></span>
|
26 |
+
</button>
|
27 |
+
<div class="collapse navbar-collapse" id="navbarNavAltMarkup">
|
28 |
+
<div class="navbar-nav">
|
29 |
+
<button class="nav-link" aria-current="page" href="#" onclick = "location.reload();">Home</button>
|
30 |
+
<a class="nav-link" href="#" id = 'theme' style = 'transition: all 200ms ease-in-out;'>
|
31 |
+
</a>
|
32 |
+
</div>
|
33 |
+
</div>
|
34 |
+
</div>
|
35 |
+
</nav>
|
36 |
+
</head>
|
37 |
+
<body class = ''>
|
38 |
+
<section class = 'grid'>
|
39 |
+
<h1 class = 'animate__animated animate__slideInDown heading'>
|
40 |
+
YouTube Video Summarizer
|
41 |
+
</h1>
|
42 |
+
|
43 |
+
<div class = 'url-submit-form animate__animated animate__slideInUp'>
|
44 |
+
<form class="input-group mb-3" id = "url-form">
|
45 |
+
<input type="text" class="form-control hid" id = 'url' style = "background-color: var(--heading-bg); color : var(--text-color); transition : all 200ms ease; transition-delay : 300ms;" placeholder="Enter URL here">
|
46 |
+
</form>
|
47 |
+
<button class = "btn btn-primary hid" id = 'submit-btn' type = 'submit' form = "url-form">
|
48 |
+
Summarize
|
49 |
+
</button>
|
50 |
+
</div>
|
51 |
+
|
52 |
+
|
53 |
+
<div class = 'text', id = 'main-content'>
|
54 |
+
<div class = 'title'>
|
55 |
+
<strong id = 'video-title'>
|
56 |
+
Text
|
57 |
+
</strong>
|
58 |
+
<hr>
|
59 |
+
</div>
|
60 |
+
<div class = 'content', id = 'video-summary'>
|
61 |
+
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
|
62 |
+
</div>
|
63 |
+
</div>
|
64 |
+
</section>
|
65 |
+
|
66 |
+
<div class="toast" role="alert" aria-live="assertive" aria-atomic="true">
|
67 |
+
<div class="toast-header">
|
68 |
+
<img src="..." class="rounded me-2" alt="...">
|
69 |
+
<strong class="me-auto">Bootstrap</strong>
|
70 |
+
<small>11 mins ago</small>
|
71 |
+
<button type="button" class="btn-close" data-bs-dismiss="toast" aria-label="Close"></button>
|
72 |
+
</div>
|
73 |
+
<div class="toast-body">
|
74 |
+
Hello, world! This is a toast message.
|
75 |
+
</div>
|
76 |
+
</div>
|
77 |
+
</body>
|
78 |
+
</html>
|
app.py
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from TranscriptApi import create_app
|
2 |
+
from threading import Thread
|
3 |
+
|
4 |
+
app = create_app()
|
5 |
+
|
6 |
+
|
7 |
+
if __name__ == '__main__':
|
8 |
+
app.run(debug=True,host="0.0.0.0",port=5000)
|
9 |
+
|
10 |
+
|
11 |
+
# def run():
|
12 |
+
# app.run(host = "0.0.0.0", port = 8080)
|
13 |
+
|
14 |
+
# def keep_alive():
|
15 |
+
# t = Thread(target = run)
|
16 |
+
# t.start()
|
17 |
+
|
18 |
+
# keep_alive()
|
instance/site.db
ADDED
File without changes
|