Finetuned state-space/mamba-3.8b using s3nh/polish_dolly instruction dataset.
pip install mamba_ssm
is needed to properly infer on this model. More detail explanation soon.
Axolotl config
base_model: state-spaces/mamba-2.8b
model_type: MambaLMHeadModel
tokenizer_type: AutoTokenizer
tokenizer_config: EleutherAI/gpt-neox-20b
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: s3nh/alpaca-dolly-instruction-only-polish
type: alpaca
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./mamba
sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5e-5
train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
tf32: true
save_strategy: steps
gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint: true
local_rank:
logging_steps: 100
xformers_attention:
flash_attention:
warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch:
save_steps: 3000
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
tokens:
save_safetensors: False
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.