F5-TTS

Running

F5-TTS / model /backbones /README.md

Upload folder using huggingface_hub

dd217c7 verified about 2 months ago

701 Bytes

	## Backbones quick introduction


	### unett.py
	- flat unet transformer
	- structure same as in e2-tts & voicebox paper except using rotary pos emb
	- update: allow possible abs pos emb & convnextv2 blocks for embedded text before concat

	### dit.py
	- adaln-zero dit
	- embedded timestep as condition
	- concatted noised_input + masked_cond + embedded_text, linear proj in
	- possible abs pos emb & convnextv2 blocks for embedded text before concat
	- possible long skip connection (first layer to last layer)

	### mmdit.py
	- sd3 structure
	- timestep as condition
	- left stream: text embedded and applied a abs pos emb
	- right stream: masked_cond & noised_input concatted and with same conv pos emb as unett