Post
3044
"How expensive is it actually to teach a #LanguageModel German through #finetuning 💰💰💰? We get asked this quite often.
There is no one-size-fits-all answer to this question, as among other factors:
⏹ each fine-tuning is different,
⏹ the hardware used can be a major cost driver,
⏹ the amount and type of training data can extend the process,
⏹ and the skills to be trained can increase the difficulty of fine-tuning.
However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)
Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b
Total cost: 312 euros 💶
These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.
There is no one-size-fits-all answer to this question, as among other factors:
⏹ each fine-tuning is different,
⏹ the hardware used can be a major cost driver,
⏹ the amount and type of training data can extend the process,
⏹ and the skills to be trained can increase the difficulty of fine-tuning.
However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)
Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b
Total cost: 312 euros 💶
These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.