Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate Paper • 2410.07167 • Published Oct 9 • 37
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models Paper • 2407.06938 • Published Jul 9 • 21
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 16