Tensors size error when generating audio incrementally.
#16
by
severos
- opened
Generating audio incrementally using 2 seconds steps:
- 2 seconds are generated by text condition
"80s pop synth guitars and heavy drums"
- Write result to file
- Read previous file
- Generate 2 seconds using text condition
"80s pop synth guitars and heavy drums"
and previous audio file - Repeat from 2
So far so good, everything works.
However, when changing the text condition in step 4, ex: text condition is "80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"
, a runtime error will be thrown:
Traceback (most recent call last):
File "main.py", line 32, in <module>
audio_values = model.generate(**inputs, max_new_tokens=128)
File "C:\Python38\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Python38\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 2279, in generate
input_ids, model_kwargs = self._prepare_decoder_input_ids_for_generation(
File "C:\Python38\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 1993, in _prepare_decoder_input_ids_for_generation
decoder_input_ids = torch.cat([decoder_input_ids_start, decoder_input_ids], dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 4 for tensor number 1 in the list.
I expected this would work without issue since the model should treat each iteration from step 3 as a separate prompt processing, or am I misunderstanding something in here?