audio encoder...
#3
by
Imran1
- opened
Can I add audio encoder into this architecture?
Supposed, it will understand the audio and also speck.
We think this is feasible. Janus has demonstrated that the biggest bottleneck in multitasking lies in the encoding phase, rather than conflicts arising from using the same transformer. We have validated the feasibility of separating the encoding process and then using the same transformer to handle text-only understanding, multimodal understanding, and visual generation. The same applies to audio.