whisper-webui-translate

Running

App Files Files Community

avans06 commited on 29 days ago

Commit

a14fe5a

•

1 Parent(s): a760511

The translation model is now compatible with the

Browse files

"Word Timestamps - Highlight Words" feature.

Files changed (3) hide show

app.py +32 -0
docs/translateModel.md +14 -1
src/utils.py +25 -9

app.py CHANGED Viewed

@@ -716,6 +716,38 @@ class WhisperTranscriber:
                     segments_progress_listener.on_progress(idx+1, len(segments), desc=f"Process segments: {idx}/{len(segments)}")
                 translationModel.release_vram()
                 perf_end_time = time.perf_counter()
                 # Call the finished callback
                 if segments_progress_listener is not None:

                     segments_progress_listener.on_progress(idx+1, len(segments), desc=f"Process segments: {idx}/{len(segments)}")
                 translationModel.release_vram()
+                if highlight_words and segments[0]["words"] is not None:
+                    for idx, segment in enumerate(segments):
+                        text = segment["text"]
+                        words = segment["words"]
+                        total_duration = words[-1]['end'] - words[0]['start'] #Calculate the total duration of the entire sentence
+                        total_text_length = len(text)
+                        # Allocate lengths to each word
+                        duration_ratio_lengths = []
+                        total_allocated = 0
+                        text_idx = 0  # Track the position in the translated string
+                        for word in words:
+                            # Calculate the duration of each word as a proportion of the total time
+                            word_duration = word['end'] - word['start']
+                            duration_ratio = word_duration / total_duration
+                            duration_ratio_length = int(duration_ratio * total_text_length)
+                            duration_ratio_lengths.append(duration_ratio_length)
+                            total_allocated += duration_ratio_length
+                        # Distribute remaining characters to avoid 0-duration_ratio_length issues
+                        remaining_chars = total_text_length - total_allocated
+                        for idx in range(remaining_chars):
+                            duration_ratio_lengths[idx % len(words)] += 1  # Distribute the remaining chars evenly
+                        # Generate translated words based on the calculated lengths
+                        text_idx = 0
+                        for idx, word in enumerate(words):
+                            text_part = text[text_idx:text_idx + duration_ratio_lengths[idx]]
+                            word["word"], word["word_original"] = text_part, word["word"]
+                            text_idx += duration_ratio_lengths[idx]
                 perf_end_time = time.perf_counter()
                 # Call the finished callback
                 if segments_progress_listener is not None:

docs/translateModel.md CHANGED Viewed

@@ -5,7 +5,9 @@ The `translate` task in `Whisper` only supports translating other languages `int
 The larger the parameters of the Translation model, the better its translation capability is expected. However, this also requires higher computational resources and slower running speed.
-Currently, when the `Highlight Words timestamps` option is enabled in the Whisper `Word Timestamps options`, it cannot be used simultaneously with the Translation Model. This is because Highlight Words splits the source text, and after translation, it becomes a non-word-level string.
 # Translation Model
@@ -153,6 +155,17 @@ Automatic speech recognition (ASR)
 | [facebook/seamless-m4t-large](https://huggingface.co/facebook/seamless-m4t-large) | 2.3B | 11.4 GB | float32 | N/A |
 | [facebook/seamless-m4t-v2-large](https://huggingface.co/facebook/seamless-m4t-v2-large) | 2.3B | 11.4 GB (safetensors:9.24 GB) | float32 | ≈9.2 GB |
 # Options

 The larger the parameters of the Translation model, the better its translation capability is expected. However, this also requires higher computational resources and slower running speed.
+The translation model is now compatible with the `Word Timestamps - Highlight Words` feature.
+~~Currently, when the `Highlight Words timestamps` option is enabled in the Whisper `Word Timestamps options`, it cannot be used simultaneously with the Translation Model. This is because Highlight Words splits the source text, and after translation, it becomes a non-word-level string.~~
 # Translation Model
 | [facebook/seamless-m4t-large](https://huggingface.co/facebook/seamless-m4t-large) | 2.3B | 11.4 GB | float32 | N/A |
 | [facebook/seamless-m4t-v2-large](https://huggingface.co/facebook/seamless-m4t-v2-large) | 2.3B | 11.4 GB (safetensors:9.24 GB) | float32 | ≈9.2 GB |
+## Llama
+Meta developed and released the Meta Llama 3 family of large language models (LLMs). This program modifies them through prompts to function as translation models.
+| Name | Parameters | Size | type/quantize | Required VRAM |
+|------|------------|------|---------------|---------------|
+| [avans06/Meta-Llama-3.2-8B-Instruct-ct2-int8_float16](https://huggingface.co/avans06/Meta-Llama-3.2-8B-Instruct-ct2-int8_float16) | 8B | 8.04 GB | int8_float16 | ≈ 7.9 GB |
+| [avans06/Meta-Llama-3.1-8B-Instruct-ct2-int8_float16](https://huggingface.co/avans06/Meta-Llama-3.1-8B-Instruct-ct2-int8_float16) | 8B | 8.04 GB | int8_float16 | ≈ 7.9 GB |
+| [avans06/Meta-Llama-3-8B-Instruct-ct2-int8_float16](https://huggingface.co/avans06/Meta-Llama-3-8B-Instruct-ct2-int8_float16) | 8B | 8.04 GB | int8_float16 | ≈ 7.9 GB |
+| [jncraton/Llama-3.2-3B-Instruct-ct2-int8](https://huggingface.co/jncraton/Llama-3.2-3B-Instruct-ct2-int8) | 3B | 3.22 GB | int8 | ≈ 3.3 GB |
 # Options

src/utils.py CHANGED Viewed

@@ -155,7 +155,7 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
         subtitle_start = segment['start']
         subtitle_end   = segment['end']
         text           = segment['text'].strip()
-        original_text  = segment['original'].strip() if 'original' in segment else None
         if len(words) == 0:
             # Prepend the longest speaker ID if available
@@ -167,8 +167,8 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
                 'end'  : subtitle_end,
                 'text' : process_text(text, maxLineWidth)
             }
-            if original_text is not None and len(original_text) > 0:
-                result.update({'original': process_text(original_text, maxLineWidth)})
             yield result
             # We are done
@@ -181,12 +181,14 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
                 'end'  : subtitle_start,
                 'word' : f"({segment_longest_speaker})"
             })
-        text_words = [text] if not highlight_words and original_text is not None and len(original_text) > 0 else [ this_word["word"] for this_word in words ]
         subtitle_text = __join_words(text_words, maxLineWidth)
         # Iterate over the words in the segment
         if highlight_words:
             last = subtitle_start
             for idx, this_word in enumerate(words):
@@ -195,14 +197,17 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
                 if last != start:
                     # Display the text up to this point
-                    yield {
                         'start': last,
                         'end'  : start,
                         'text' : subtitle_text
                     }
                 # Display the text with the current word highlighted
-                yield {
                     'start': start,
                     'end'  : end,
                     'text' : __join_words(
@@ -212,15 +217,26 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
                         ]
                         , maxLineWidth)
                 }
                 last = end
             if last != subtitle_end:
                 # Display the last part of the text
-                yield {
                     'start': last,
                     'end'  : subtitle_end,
                     'text' : subtitle_text
                 }
         # Just return the subtitle text
         else:
@@ -229,8 +245,8 @@ def __subtitle_preprocessor_iterator(transcript: Iterator[dict], maxLineWidth: i
                 'end'  : subtitle_end,
                 'text' : subtitle_text
             }
-            if original_text is not None and len(original_text) > 0:
-                result.update({'original': process_text(original_text, maxLineWidth)})
             yield result
 def __join_words(words: Iterator[str], maxLineWidth: int = None):

         subtitle_start = segment['start']
         subtitle_end   = segment['end']
         text           = segment['text'].strip()
+        text_original  = segment['original'].strip() if 'original' in segment else None
         if len(words) == 0:
             # Prepend the longest speaker ID if available
                 'end'  : subtitle_end,
                 'text' : process_text(text, maxLineWidth)
             }
+            if text_original is not None and len(text_original) > 0:
+                result.update({'original': process_text(text_original, maxLineWidth)})
             yield result
             # We are done
                 'end'  : subtitle_start,
                 'word' : f"({segment_longest_speaker})"
             })
+        text_words = [text] if not highlight_words and text_original is not None and len(text_original) > 0 else [ this_word["word"] for this_word in words ]
         subtitle_text = __join_words(text_words, maxLineWidth)
         # Iterate over the words in the segment
         if highlight_words:
+            text_words_original = [ this_word["word_original"] for this_word in words if "word_original" in this_word ] if text_original is not None and len(text_original) > 0 else None
             last = subtitle_start
             for idx, this_word in enumerate(words):
                 if last != start:
                     # Display the text up to this point
+                    result = {
                         'start': last,
                         'end'  : start,
                         'text' : subtitle_text
                     }
+                    if text_original is not None and len(text_original) > 0:
+                        result.update({'original': process_text(text_original, maxLineWidth)})
+                    yield result
                 # Display the text with the current word highlighted
+                result = {
                     'start': start,
                     'end'  : end,
                     'text' : __join_words(
                         ]
                         , maxLineWidth)
                 }
+                if text_words_original is not None and len(text_words_original) > 0:
+                    result.update({'original': __join_words(
+                        [
+                            re.sub(r"^(\s*)(.*)$", r"\1<u>\2</u>", word_original) if subidx == idx else word_original
+                            for subidx, word_original in enumerate(text_words_original)
+                        ]
+                        , maxLineWidth)})
+                yield result
                 last = end
             if last != subtitle_end:
                 # Display the last part of the text
+                result = {
                     'start': last,
                     'end'  : subtitle_end,
                     'text' : subtitle_text
                 }
+                if text_original is not None and len(text_original) > 0:
+                    result.update({'original': process_text(text_original, maxLineWidth)})
+                yield result
         # Just return the subtitle text
         else:
                 'end'  : subtitle_end,
                 'text' : subtitle_text
             }
+            if text_original is not None and len(text_original) > 0:
+                result.update({'original': process_text(text_original, maxLineWidth)})
             yield result
 def __join_words(words: Iterator[str], maxLineWidth: int = None):