audio encoder...

by Imran1 - opened 13 days ago

Discussion

Imran1

13 days ago

Can I add audio encoder into this architecture?
Supposed, it will understand the audio and also speck.

CharlesCXK

DeepSeek org 13 days ago

We think this is feasible. Janus has demonstrated that the biggest bottleneck in multitasking lies in the encoding phase, rather than conflicts arising from using the same transformer. We have validated the feasibility of separating the encoding process and then using the same transformer to handle text-only understanding, multimodal understanding, and visual generation. The same applies to audio.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment