Post
1292
Looks like
@Meta
thinks we forgot they created PyTorch, so now they've open-sourced Lingua, a powerful and flexible library for training and inferencing large language models.
Things that stand out:
- Architecture: Pure PyTorch
- Checkpointing: Uses the new PyTorch distributed saving method (.distcp format) for flexible model reloading across different GPU configurations.
- Configuration: Utilizes data classes and YAML files for intuitive setup and modification.
- Profiling: Integrates with xFormers' profiler for automatic MFU and HFU calculation, plus memory profiling.
- Slurm Integration: Includes
Some results from @Meta to show off:
- 1B parameter models trained on 60B tokens achieve strong performance across various NLP tasks.
- 7B parameter Mamba model (trained on 200B tokens) shows competitive results with Llama 7B on benchmarks like ARC, MMLU, and BBH.
If you're working on LLM research or looking to experiment with cutting-edge language model architectures, Lingua is definitely worth exploring.
Things that stand out:
- Architecture: Pure PyTorch
nn.Module
implementation for easy customization.- Checkpointing: Uses the new PyTorch distributed saving method (.distcp format) for flexible model reloading across different GPU configurations.
- Configuration: Utilizes data classes and YAML files for intuitive setup and modification.
- Profiling: Integrates with xFormers' profiler for automatic MFU and HFU calculation, plus memory profiling.
- Slurm Integration: Includes
stool.py
for seamless job launching on Slurm clusters.Some results from @Meta to show off:
- 1B parameter models trained on 60B tokens achieve strong performance across various NLP tasks.
- 7B parameter Mamba model (trained on 200B tokens) shows competitive results with Llama 7B on benchmarks like ARC, MMLU, and BBH.
If you're working on LLM research or looking to experiment with cutting-edge language model architectures, Lingua is definitely worth exploring.