ylacombe HF staff commited on
Commit
c0ab532
1 Parent(s): 0f75c5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -27
README.md CHANGED
@@ -27,16 +27,15 @@ This is the "large" variant of the unified model, which enables multiple tasks w
27
  You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
28
 
29
 
30
-
31
- ## Usage
32
 
33
  First, load the processor and a checkpoint of the model:
34
 
35
  ```python
36
- >>> from transformers import AutoProcessor, SeamlessM4TModel
37
 
38
- >>> processor = AutoProcessor.from_pretrained("ylacombe/hf-seamless-m4t-medium")
39
- >>> model = SeamlessM4TModel.from_pretrained("ylacombe/hf-seamless-m4t-medium")
40
  ```
41
 
42
  You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
@@ -46,25 +45,43 @@ You can seamlessly use this model on text or on audio, to generated either trans
46
  You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
47
 
48
  ```python
49
- >>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
50
 
51
- >>> audio_array = model.generate(**inputs, tgt_lang="rus")
52
- >>> audio_array = audio_array[0].cpu().numpy().squeeze()
53
  ```
54
 
55
  You can also translate directly from a speech waveform. Here is an example from Arabic to English:
56
 
57
  ```python
58
- >>> from datasets import load_dataset
59
 
60
- >>> dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
61
 
62
- >>> audio_sample = dataset["audio"][0]["array"]
63
 
64
- >>> inputs = processor(audios = audio_sample, return_tensors="pt")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- >>> audio_array = model.generate(**inputs, tgt_lang="rus")
67
- >>> audio_array = audio_array[0].cpu().numpy().squeeze()
68
  ```
69
 
70
  #### Tips
@@ -73,8 +90,8 @@ You can also translate directly from a speech waveform. Here is an example from
73
  For example, you can replace the previous snippet with the model dedicated to the S2ST task:
74
 
75
  ```python
76
- >>> from transformers import SeamlessM4TForSpeechToSpeech
77
- >>> model = SeamlessM4TForSpeechToSpeech.from_pretrained("ylacombe/hf-seamless-m4t-medium")
78
  ```
79
 
80
 
@@ -83,25 +100,25 @@ For example, you can replace the previous snippet with the model dedicated to th
83
  Similarly, you can generate translated text from text or audio files, this time using the dedicated models.
84
 
85
  ```python
86
- >>> from transformers import SeamlessM4TForSpeechToText
87
- >>> model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
88
- >>> audio_sample = dataset["audio"][0]["array"]
89
 
90
- >>> inputs = processor(audios = audio_sample, return_tensors="pt")
91
 
92
- >>> output_tokens = model.generate(**inputs, tgt_lang="fra")
93
- >>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
94
  ```
95
 
96
  And from text:
97
 
98
  ```python
99
- >>> from transformers import SeamlessM4TForTextToText
100
- >>> model = SeamlessM4TForTextToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
101
- >>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
102
 
103
- >>> output_tokens = model.generate(**inputs, tgt_lang="fra")
104
- >>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
105
  ```
106
 
107
  #### Tips
 
27
  You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
28
 
29
 
30
+ ## 🤗 Usage
 
31
 
32
  First, load the processor and a checkpoint of the model:
33
 
34
  ```python
35
+ from transformers import AutoProcessor, SeamlessM4TModel
36
 
37
+ processor = AutoProcessor.from_pretrained("ylacombe/hf-seamless-m4t-medium")
38
+ model = SeamlessM4TModel.from_pretrained("ylacombe/hf-seamless-m4t-medium")
39
  ```
40
 
41
  You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
 
45
  You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
46
 
47
  ```python
48
+ inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
49
 
50
+ audio_array = model.generate(**inputs, tgt_lang="rus")
51
+ audio_array = audio_array[0].cpu().numpy().squeeze()
52
  ```
53
 
54
  You can also translate directly from a speech waveform. Here is an example from Arabic to English:
55
 
56
  ```python
57
+ from datasets import load_dataset
58
 
59
+ dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
60
 
61
+ audio_sample = dataset["audio"][0]["array"]
62
 
63
+ inputs = processor(audios = audio_sample, return_tensors="pt")
64
+
65
+ audio_array = model.generate(**inputs, tgt_lang="rus")
66
+ audio_array = audio_array[0].cpu().numpy().squeeze()
67
+ ```
68
+
69
+ Listen to the speech samples either in an ipynb notebook:
70
+
71
+ ```python
72
+ from IPython.display import Audio
73
+
74
+ sampling_rate = model.config.sample_rate
75
+ Audio(audio_array, rate=sampling_rate)
76
+ ```
77
+
78
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
79
+
80
+ ```python
81
+ import scipy
82
 
83
+ sampling_rate = model.config.sample_rate
84
+ scipy.io.wavfile.write("seamless_m4t_out.wav", rate=sampling_rate, data=audio_array)
85
  ```
86
 
87
  #### Tips
 
90
  For example, you can replace the previous snippet with the model dedicated to the S2ST task:
91
 
92
  ```python
93
+ from transformers import SeamlessM4TForSpeechToSpeech
94
+ model = SeamlessM4TForSpeechToSpeech.from_pretrained("ylacombe/hf-seamless-m4t-medium")
95
  ```
96
 
97
 
 
100
  Similarly, you can generate translated text from text or audio files, this time using the dedicated models.
101
 
102
  ```python
103
+ from transformers import SeamlessM4TForSpeechToText
104
+ model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
105
+ audio_sample = dataset["audio"][0]["array"]
106
 
107
+ inputs = processor(audios = audio_sample, return_tensors="pt")
108
 
109
+ output_tokens = model.generate(**inputs, tgt_lang="fra")
110
+ translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
111
  ```
112
 
113
  And from text:
114
 
115
  ```python
116
+ from transformers import SeamlessM4TForTextToText
117
+ model = SeamlessM4TForTextToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
118
+ inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
119
 
120
+ output_tokens = model.generate(**inputs, tgt_lang="fra")
121
+ translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
122
  ```
123
 
124
  #### Tips