LLaMA 33b finetuned on wikitext_document_level
with a combination of both linear and NTK-aware ROPE scaling.
Trained with alpha=4, scale=2. Definitely works for sequence lengths up to and including 4096. Might work for much longer, but I don't have the VRAM to test properly. ¯\_(ツ)_/¯
Training procedure
The following bitsandbytes
quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Framework versions
- PEFT 0.4.0.dev0
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for chargoddard/llama33b-s2a4-qlora
Base model
huggyllama/llama-30b