---
library_name: transformers
license: apache-2.0
base_model: ibm-granite/granite-3.0-1b-a400m-base
tags:
- axolotl
- generated_from_trainer
model-index:
- name: sexy-moe-girl_400MA_1BT-ckpts
results: []
---
[](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config
axolotl version: `0.4.1`
```yaml
# Weights and Biases logging coinfig
wandb_project: sexy-moe-girl_400MA_1BT-2
# wandb_entity:
# wandb_watch: all
wandb_name: v1
# wandb_log_model:
# Model architecture config
base_model: ibm-granite/granite-3.0-1b-a400m-base
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
# Hugging Face saving config
hub_model_id: allura-org/sexy-moe-girl_400MA_1BT-ckpts
hub_strategy: every_save
push_dataset_to_hub:
hf_use_auth_token: true
# Model checkpointing config
output_dir: out
resume_from_checkpoint:
save_steps:
saves_per_epoch: 5
save_safetensors: true
save_total_limit: 5
# Mixed precision training config
bf16: true
fp16: false
tf32: false
# Model loading config
load_in_8bit: false
load_in_4bit: false
strict: false
# Sequence config
sequence_len: 8192
s2_attention: false
sample_packing: true # true # false
eval_sample_packing: false # true
pad_to_sequence_len: false #true # false
train_on_inputs: true
group_by_length: false
# Unfrozen parameters for FFT
unfrozen_parameters:
# Dataset config
chat_template: chatml
datasets:
- path: Fizzarolli/special-sauce
type: sharegpt
chat_template: chatml
#val_set_size: 0.05
# evaluation_strategy: steps
# eval_steps:
#evals_per_epoch: 5
# test_datasets:
dataset_prepared_path: last_run_prepared
shuffle_merged_datasets: true
# Training hyperparameters
num_epochs: 2
gradient_accumulation_steps: 4
micro_batch_size: 8
warmup_steps: 150
optimizer: schedule_free_adamw
lr_scheduler: constant_with_warmup
learning_rate: 0.00002
weight_decay: 0.1
max_grad_norm: 1.0
logging_steps: 1
# # Model optimization / unsloth ---- INSTALL UNSLOTH
gradient_checkpointing: unsloth
#
# unsloth_cross_entropy_loss: true
# unsloth_lora_mlp: true
# unsloth_lora_qkv: true
# unsloth_lora_o: true
#plugins:
# - axolotl.integrations.liger.LigerPlugin
#liger_rope: true
#liger_rms_norm: true
#liger_swiglu: true
#liger_fused_linear_cross_entropy: true
xformers_attention: false
flash_attention: true
sdp_attention: false
# Loss monitoring config
early_stopping_patience: false
loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3
# Debug config
debug: true
seed: 1001 # 42
special_tokens:
eos_token: "<|im_end|>"
bos_token: "<|endoftext|>"
tokens: # these are delimiters
- "<|im_start|>"
# Don't mess with this, it's here for accelerate and torchrun
local_rank:
```
# sexy-moe-girl_400MA_1BT-ckpts
This model is a fine-tuned version of [ibm-granite/granite-3.0-1b-a400m-base](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) on the None dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 1001
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 150
- num_epochs: 2
### Training results
### Framework versions
- Transformers 4.45.2
- Pytorch 2.4.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1