LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published about 1 month ago • 24
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 27
VampNet: Music Generation via Masked Acoustic Token Modeling Paper • 2307.04686 • Published Jul 10, 2023 • 20