🚀 We are thrilled to introduce TextImage Data Augmentation, developed in collaboration with Albumentations AI! ✨ This multimodal technique modifies document images and text simultaneously, enhancing Vision Language Models (VLMs) for high-text datasets.
The Document AI team (@Molbap, @rwightman, @danaaubakirova) at Hugging Face is developing a new multimodal data augmentation pipeline utilising both visual and textual aspects of document images.