license: gemma | |
datasets: | |
- anthracite-org/stheno-filtered-v1.1 | |
base_model: google/gemma-2-2b-it | |
![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ) | |
# QuantFactory/Gemma-2-2B-Stheno-Filtered-GGUF | |
This is quantized version of [SaisExperiments/Gemma-2-2B-Stheno-Filtered](https://huggingface.co/SaisExperiments/Gemma-2-2B-Stheno-Filtered) created using llama.cpp | |
# Original Model Card | |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/660e67afe23148df7ca321a5/F1TQkG-VUmlTFL-xtk3wW.png) | |
I don't have anything else so you get a cursed cat image | |
# Basic info | |
This is [anthracite-org/stheno-filtered-v1.1](https://huggingface.co/datasets/anthracite-org/stheno-filtered-v1.1) over [unsloth/gemma-2-2b-it](https://huggingface.co/unsloth/gemma-2-2b-it) | |
It saw 76.6M tokens | |
This time it took 14 hours and i'm pretty sure i've been training with the wrong prompt template X-X | |
# Training config: | |
``` | |
cutoff_len: 1024 | |
dataset: stheno-3.4 | |
dataset_dir: data | |
ddp_timeout: 180000000 | |
do_train: true | |
finetuning_type: lora | |
flash_attn: auto | |
fp16: true | |
gradient_accumulation_steps: 8 | |
include_num_input_tokens_seen: true | |
learning_rate: 5.0e-05 | |
logging_steps: 5 | |
lora_alpha: 64 | |
lora_dropout: 0 | |
lora_rank: 64 | |
lora_target: all | |
lr_scheduler_type: cosine | |
max_grad_norm: 1.0 | |
max_samples: 100000 | |
model_name_or_path: unsloth/gemma-2-2b-it | |
num_train_epochs: 3.0 | |
optim: adamw_8bit | |
output_dir: saves/Gemma-2-2B-Chat/lora/stheno | |
packing: false | |
per_device_train_batch_size: 2 | |
plot_loss: true | |
preprocessing_num_workers: 16 | |
quantization_bit: 4 | |
quantization_method: bitsandbytes | |
report_to: none | |
save_steps: 100 | |
stage: sft | |
template: gemma | |
use_unsloth: true | |
warmup_steps: 0 | |
``` | |