sanchit-gandhi HF staff commited on
Commit
55e79ed
1 Parent(s): 960623b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -157,8 +157,8 @@ This code snippet shows how to evaluate Whisper medium.en on [LibriSpeech test-c
157
  The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking
158
  algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers
159
  [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
160
- method. Chunking is enabled by setting `chunk_length_s=30` when instantiating the pipeline. It can also be extended to
161
- predict utterance level timestamps by passing `return_timestamps=True`:
162
 
163
  ```python
164
  >>> import torch
@@ -177,15 +177,16 @@ predict utterance level timestamps by passing `return_timestamps=True`:
177
  >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
178
  >>> sample = ds[0]["audio"]
179
 
180
- >>> prediction = pipe(sample.copy())["text"]
181
  " Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
182
 
183
  >>> # we can also return timestamps for the predictions
184
- >>> prediction = pipe(sample, return_timestamps=True)["chunks"]
185
  [{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
186
  'timestamp': (0.0, 5.44)}]
187
  ```
188
 
 
189
  ## Fine-Tuning
190
 
191
  The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However,
 
157
  The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking
158
  algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers
159
  [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
160
+ method. Chunking is enabled by setting `chunk_length_s=30` when instantiating the pipeline. With chunking enabled, the pipeline
161
+ can be run with batched inference. It can also be extended to predict sequence level timestamps by passing `return_timestamps=True`:
162
 
163
  ```python
164
  >>> import torch
 
177
  >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
178
  >>> sample = ds[0]["audio"]
179
 
180
+ >>> prediction = pipe(sample.copy(), batch_size=8)["text"]
181
  " Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
182
 
183
  >>> # we can also return timestamps for the predictions
184
+ >>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
185
  [{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
186
  'timestamp': (0.0, 5.44)}]
187
  ```
188
 
189
+ Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.
190
  ## Fine-Tuning
191
 
192
  The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However,