--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards {{ card_data }} --- # Model Card for mamba-2.8b-slimpj-OpenOrca_1ep This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset. ## Model Details ### Model Description This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752 It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch. - **Model type:** Mamba State Space Model (mamba_ssm) - **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj ## Uses This model is intended to evaluate fine-tuning results on mamba models. ## Usage ### Prompt structure The prompt structure used in fine-tuning is alpaca format: "### Human:\n%question%\n\n### AI response:\n%response%" ## Training Details ### Training Data https://huggingface.co/datasets/Open-Orca/OpenOrca ### Training Procedure Trained using text-generation-webui with code from the mamba_ssm pull request. #### Training Hyperparameters - **Training regime:** Trained in bfloat16 with the following parameters: ``` { "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep", "save_steps": 500000.0, "micro_batch_size": 4, "batch_size": 128, "epochs": 1.0, "learning_rate": "3e-4", "lr_scheduler_type": "linear", "cutoff_len": 256, "dataset": "OpenOrca", "eval_dataset": "None", "format": "openorca-format", "warmup_steps": 100.0, "optimizer": "paged_adamw_8bit", "hard_cut_string": "\\n\\n\\n", "add_eos_token": false, "min_chars": 0.0, } ``` Reported train_loss was 0.6762700151924311 ### Results #### lm-evaluation-harness results for final model mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| |--------------|------:|------|-----:|----------|------:|---|-----:| |arc_challenge | 1|none | 0|acc | 0.2594|± |0.0128| | | |none | 0|acc_norm | 0.2935|± |0.0133| |arc_easy | 1|none | 0|acc | 0.4390|± |0.0102| | | |none | 0|acc_norm | 0.4032|± |0.0101| |boolq | 2|none | 0|acc | 0.5801|± |0.0086| |lambada_openai| 1|none | 0|perplexity|27.8582|± |1.1183| | | |none | 0|acc | 0.3683|± |0.0067| |openbookqa | 1|none | 0|acc | 0.2500|± |0.0194| | | |none | 0|acc_norm | 0.3700|± |0.0216| |piqa | 1|none | 0|acc | 0.6817|± |0.0109| | | |none | 0|acc_norm | 0.6839|± |0.0108| |winogrande | 1|none | 0|acc | 0.5770|± |0.0139| #### lm-evaluation-harness results after half epoch mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| |--------------|------:|------|-----:|----------|------:|---|-----:| |arc_challenge | 1|none | 0|acc | 0.2602|± |0.0128| | | |none | 0|acc_norm | 0.2833|± |0.0132| |arc_easy | 1|none | 0|acc | 0.4533|± |0.0102| | | |none | 0|acc_norm | 0.4125|± |0.0101| |boolq | 2|none | 0|acc | 0.4095|± |0.0086| |lambada_openai| 1|none | 0|perplexity|30.4832|± |1.2403| | | |none | 0|acc | 0.3551|± |0.0067| |openbookqa | 1|none | 0|acc | 0.2420|± |0.0192| | | |none | 0|acc_norm | 0.3640|± |0.0215| |piqa | 1|none | 0|acc | 0.6812|± |0.0109| | | |none | 0|acc_norm | 0.6730|± |0.0109| |winogrande | 1|none | 0|acc | 0.5588|± |0.0140| #### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |--------------|------:|------|-----:|----------|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |0.3882|± |0.0142| | | |none | 0|acc_norm |0.4155|± |0.0144| |arc_easy | 1|none | 0|acc |0.7264|± |0.0091| | | |none | 0|acc_norm |0.6814|± |0.0096| |boolq | 2|none | 0|acc |0.7107|± |0.0079| |lambada_openai| 1|none | 0|perplexity|5.8770|± |0.1881| | | |none | 0|acc |0.6427|± |0.0067| |openbookqa | 1|none | 0|acc |0.2860|± |0.0202| | | |none | 0|acc_norm |0.3980|± |0.0219| |piqa | 1|none | 0|acc |0.7709|± |0.0098| | | |none | 0|acc_norm |0.7813|± |0.0096| |winogrande | 1|none | 0|acc |0.6614|± |0.0133| #### Summary The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text. The answer quality as percieved by users is yet to be evaluated. ## Environmental Impact - **Hardware Type:** RTX 3090 - **Hours used:** 118