๐ฅ Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware!
โฌ๏ธ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank.
๐ฉ๐ฟโ๐ป Install via pip install transformers>=4.39.0 galore-torch. #ProudlyGpuPoor
The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources.
Under the hood there we have many other improvements, due to extensive maintenance activity, community contributions by super active + knowledgable volunteers โจ ๐ and the official sponsorship by Hugging Face that makes all this possible ๐ค โค๏ธ ๐
We would greatly appreciate any further community contributions, be it to help with refactorings, exterminating flaky tests, writing doc-strings, tutorials, new features. Don't be shy, just contact us and we see where this leads us: https://github.com/TimDettmers/bitsandbytes/discussions