Edit model card

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.4.0

How to use:

!pip install transformers peft accelerate bitsandbytes trl safetensors

from huggingface_hub import notebook_login
notebook_login()

import torch
from peft import AutoPeftModelForCausalLM, get_peft_config, PeftModel, PeftConfig, get_peft_model, LoraConfig, TaskType
from transformers import AutoTokenizer

peft_model_id = "akdeniz27/llama-2-7b-hf-qlora-dolly15k-turkish"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_id,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

prompt = "..."

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)
Downloads last month
3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Adapter for

Dataset used to train akdeniz27/llama-2-7b-hf-qlora-dolly15k-turkish

Space using akdeniz27/llama-2-7b-hf-qlora-dolly15k-turkish 1