Edit model card

ruGPT-3.5-13B / Saiga2

LoRA адаптер для ruGPT3.5-13B обученный на коллекции датасетов Saiga.

Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_lora.yml

Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 18.2Gb VRAM и заняло 16h 58m.

output_dir: ./models/ruGPT35_13B_lora
train_path: ./train.ruGPT35_13B.jsonl
val_path: ./val.ruGPT35_13B.jsonl

datasets:
  - name: IlyaGusev/ru_turbo_alpaca
    converter: impruver.instruction_to_messages
  - name: IlyaGusev/ru_turbo_alpaca_evol_instruct
    converter: impruver.instruction_to_messages
  - name: IlyaGusev/ru_turbo_saiga
    converter: impruver.dialog_to_messages
  - name: IlyaGusev/ru_sharegpt_cleaned
    converter: impruver.dialog_to_messages
  - name: IlyaGusev/oasst1_ru_main_branch
    converter: impruver.dialog_to_messages
  - name: lksy/ru_instruct_gpt4
    converter: impruver.converters.instruction_to_messages

model:
  class: transformers.AutoModelForCausalLM
  name: ai-forever/ruGPT-3.5-13B
  load_in_4bit: true
  load_in_8bit: false
  dtype: bf16

lora:
  r: 16
  lora_alpha: 16
  lora_dropout: 0.05
  bias: none
  target_modules: [ c_attn ]
  task_type: CAUSAL_LM

tokenizer:
  class: transformers.AutoTokenizer
  name: ai-forever/ruGPT-3.5-13B
  max_tokens_count: 1024

trainer:
  eval_strategy: steps
  save_strategy: steps
  eval_steps: 100
  save_steps: 100
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 1
  gradient_accumulation_steps: 128
  logging_steps: 1
  learning_rate: 0.0002
  num_train_epochs: 2
  lr_scheduler_type: cosine
  warmup_steps: 16
  optim: adamw_8bit
  metric_for_best_model: eval_loss
  load_best_model_at_end: true
  save_total_limit: 2
  seed: 42
  remove_unused_columns: false
  max_grad_norm: 1.0
  weight_decay: 0.08
  torch_compile: false
Downloads last month
197
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for evilfreelancer/ruGPT3.5-13B-lora-saiga2

Adapter
(3)
this model

Datasets used to train evilfreelancer/ruGPT3.5-13B-lora-saiga2