Spaces:
Sleeping
Sleeping
Fix options.md
Browse files- docs/options.md +15 -8
docs/options.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
# Options
|
2 |
-
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
|
|
|
|
3 |
|
4 |
For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
|
5 |
|
@@ -18,12 +20,14 @@ Select the model that Whisper will use to transcribe the audio:
|
|
18 |
|
19 |
Select the language, or leave it empty for Whisper to automatically detect it.
|
20 |
|
21 |
-
Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected
|
|
|
22 |
|
23 |
## Inputs
|
24 |
The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
|
25 |
|
26 |
-
Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
|
|
|
27 |
|
28 |
## Task
|
29 |
Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
|
@@ -32,14 +36,17 @@ Select the task - either "transcribe" to transcribe the audio to text, or "trans
|
|
32 |
* none
|
33 |
* Run whisper on the entire audio input
|
34 |
* silero-vad
|
35 |
-
* Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run
|
|
|
36 |
* silero-vad-skip-gaps
|
37 |
-
* As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but
|
|
|
38 |
* periodic-vad
|
39 |
-
* Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break
|
|
|
40 |
|
41 |
## VAD - Merge Window
|
42 |
-
If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged.
|
43 |
|
44 |
## VAD - Max Merge Size (s)
|
45 |
-
Disables merging of adjacent speech sections if they are this number of seconds long.
|
|
|
1 |
# Options
|
2 |
+
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
3 |
+
supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
|
4 |
+
in the file selector to select any file type, including video files) or use the microphone.
|
5 |
|
6 |
For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
|
7 |
|
|
|
20 |
|
21 |
Select the language, or leave it empty for Whisper to automatically detect it.
|
22 |
|
23 |
+
Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected
|
24 |
+
language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
|
25 |
|
26 |
## Inputs
|
27 |
The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
|
28 |
|
29 |
+
Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
|
30 |
+
the URL.
|
31 |
|
32 |
## Task
|
33 |
Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
|
|
|
36 |
* none
|
37 |
* Run whisper on the entire audio input
|
38 |
* silero-vad
|
39 |
+
* Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run
|
40 |
+
on the gaps between each speech section.
|
41 |
* silero-vad-skip-gaps
|
42 |
+
* As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but
|
43 |
+
may cause dialogue to be skipped.
|
44 |
* periodic-vad
|
45 |
+
* Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break
|
46 |
+
a sentence or word in two.
|
47 |
|
48 |
## VAD - Merge Window
|
49 |
+
If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged.
|
50 |
|
51 |
## VAD - Max Merge Size (s)
|
52 |
+
Disables merging of adjacent speech sections if they are this number of seconds long.
|