--- license: apache-2.0 datasets: - hiyouga/glaive-function-calling-v2-sharegpt language: - en library_name: transformers tags: - llama-factory - unsloth base_model: h2oai/h2o-danube2-1.8b-base --- # h2o-danube2 with ChatML template This is a [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") fine-tuned danube2 base model. It uses the ChatML template and was trained on the [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) dataset from [GlaiveAI](https://huggingface.co/glaiveai) that has been converted to [ShareGPT](https://huggingface.co/datasets/hiyouga/glaive-function-calling-v2-sharegpt) by [hiyouga](https://huggingface.co/hiyouga) of [LLama-Factory](https://github.com/hiyouga/LLaMA-Factory) fame. ## Template ### ChatML ```jinja2 <|im_start|>system {{system}} {{json_format_tools}} <|im_end|> <|im_start|>user {{instruction}}<|im_end|> <|im_start|>assistant {{tool_call}} <|im_end|> <|im_start|>tool {{response}} <|im_end|> ``` ### LLama-Factory ```python _register_template( name="hermes_chatml", format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]), format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]), format_function=FunctionFormatter(slots=["\n{\"name\":\"{{name}}\", \"arguments\":{{arguments}}}\n<|im_end|>\n"]), format_observation=StringFormatter(slots=["<|im_start|>tool\n\n{{content}}\n<|im_end|>\n<|im_start|>assistant\n"]), format_tools=ToolFormatter(tool_format="chatml"), stop_words=["<|im_end|>"], ) ``` ## BAdam config ```yaml ### model model_name_or_path: danube2-base-chatml ### method stage: sft do_train: true finetuning_type: full use_badam: true badam_switch_mode: ascending badam_switch_interval: 50 badam_verbose: 1 badam_start_block: 5 seed: 404 ### dataset dataset: glaive_toolcall_100k template: hermes_chatml cutoff_len: 8192 overwrite_cache: false preprocessing_num_workers: 12 ### output output_dir: glaive-tool-chatml-badam logging_steps: 5 save_steps: 1 save_strategy: epoch plot_loss: true overwrite_output_dir: false ### train per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 0.000005 num_train_epochs: 1 lr_scheduler_type: cosine warmup_ratio: 0.01 pure_bf16: true flash_attn: fa2 ### eval val_size: 0.01 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000 ``` ### BAdam Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3914 | 0.1607 | 1000 | 0.2984 | | 0.3256 | 0.3214 | 2000 | 0.2819 | | 0.4131 | 0.4821 | 3000 | 0.2765 | | 0.3922 | 0.6428 | 4000 | 0.2736 | | 0.3528 | 0.8036 | 5000 | 0.2724 | | 0.3477 | 0.9643 | 6000 | 0.2724 |