LoNAS Model Card: lonas-bloomz-7b-math
The super-network fine-tuned on BLOOMZ-7B with some math reasoning datasets using LoNAS.
Model Details
Information
- Model name: lonas-bloomz-7b-math
- Base model: BLOOMZ-7b
- Domain: Math
- Subnetwork version: Super-network
- NNCF Configuration: nncf_lonas_bloomz_7b.json
Adapter Configuration
- LoRA rank: 32
- LoRA alpha: 64
- LoRA target modules: query_key_value, dense_h_to_4h, dense_4h_to_h
Training Hyperparameters
- Batch size: 16
- Learning rate: 3e-4
- Epoch: 8
Training Data
Unified math reasoning dataset: math_10k.json (collected with the training sets of GSM8K, MAWPS, and AQuA).
Evaluation Data
How to use
Refer to https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS#evaluation:
CUDA_VISIBLE_DEVICES=${DEVICES} python run_math.py \
--dataset_path None \
--model_name_or_path bigscience/bloomz-7b1 \
--lora \
--lora_weights lonas-bloomz-7b-math \
--nncf_config nncf_config/unified_math/nncf_lonas_bloomz_7b.json \
--do_test \
--output_dir lonas-bloomz-7b-math/results
Evaluation Results
Results of the heuristic sub-network discoverd from the super-network:
Method | Total Params. | TFLOPs | GSM8K | AQuA | MAWPS | SVAMP | Average |
---|---|---|---|---|---|---|---|
LoRA | 7.1B | 1.8 | 17.4 | 21.3 | 70.2 | 41.0 | 37.5 |
LoNAS | 6.1B | 1.5 | 18.6 | 22.0 | 76.5 | 31.8 | 37.2 |
Model Sources
- Repository: https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS
- Paper: LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models
Citation
@article{munoz2024lonas,
title = {LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models},
author={J. Pablo Munoz and Jinjie Yuan and Yi Zheng and Nilesh Jain},
journal={},
year={2024}
}
License
Apache-2.0