--- library_name: transformers license: apache-2.0 base_model: ibm-granite/granite-3.0-1b-a400m-base tags: - axolotl - generated_from_trainer model-index: - name: sexy-moe-girl_400MA_1BT-ckpts results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml # Weights and Biases logging coinfig wandb_project: sexy-moe-girl_400MA_1BT-2 # wandb_entity: # wandb_watch: all wandb_name: v1 # wandb_log_model: # Model architecture config base_model: ibm-granite/granite-3.0-1b-a400m-base model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer # Hugging Face saving config hub_model_id: allura-org/sexy-moe-girl_400MA_1BT-ckpts hub_strategy: every_save push_dataset_to_hub: hf_use_auth_token: true # Model checkpointing config output_dir: out resume_from_checkpoint: save_steps: saves_per_epoch: 5 save_safetensors: true save_total_limit: 5 # Mixed precision training config bf16: true fp16: false tf32: false # Model loading config load_in_8bit: false load_in_4bit: false strict: false # Sequence config sequence_len: 8192 s2_attention: false sample_packing: true # true # false eval_sample_packing: false # true pad_to_sequence_len: false #true # false train_on_inputs: true group_by_length: false # Unfrozen parameters for FFT unfrozen_parameters: # Dataset config chat_template: chatml datasets: - path: Fizzarolli/special-sauce type: sharegpt chat_template: chatml #val_set_size: 0.05 # evaluation_strategy: steps # eval_steps: #evals_per_epoch: 5 # test_datasets: dataset_prepared_path: last_run_prepared shuffle_merged_datasets: true # Training hyperparameters num_epochs: 2 gradient_accumulation_steps: 4 micro_batch_size: 8 warmup_steps: 150 optimizer: schedule_free_adamw lr_scheduler: constant_with_warmup learning_rate: 0.00002 weight_decay: 0.1 max_grad_norm: 1.0 logging_steps: 1 # # Model optimization / unsloth ---- INSTALL UNSLOTH gradient_checkpointing: unsloth # # unsloth_cross_entropy_loss: true # unsloth_lora_mlp: true # unsloth_lora_qkv: true # unsloth_lora_o: true #plugins: # - axolotl.integrations.liger.LigerPlugin #liger_rope: true #liger_rms_norm: true #liger_swiglu: true #liger_fused_linear_cross_entropy: true xformers_attention: false flash_attention: true sdp_attention: false # Loss monitoring config early_stopping_patience: false loss_watchdog_threshold: 100.0 loss_watchdog_patience: 3 # Debug config debug: true seed: 1001 # 42 special_tokens: eos_token: "<|im_end|>" bos_token: "<|endoftext|>" tokens: # these are delimiters - "<|im_start|>" # Don't mess with this, it's here for accelerate and torchrun local_rank: ```

# sexy-moe-girl_400MA_1BT-ckpts This model is a fine-tuned version of [ibm-granite/granite-3.0-1b-a400m-base](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 1001 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_steps: 150 - num_epochs: 2 ### Training results ### Framework versions - Transformers 4.45.2 - Pytorch 2.4.1+cu124 - Datasets 3.0.1 - Tokenizers 0.20.1