Doctor-Shotgun
/

airoboros-2.2.1-limarpv3-y34b-exl2

Text Generation

Model card Files Files and versions Community

airoboros-2.2.1-limarpv3-y34b-exl2 / README.md

Doctor-Shotgun's picture

Update README.md

a368527 about 1 year ago

|

history blame contribute delete

812 Bytes

	---
	inference: false
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- Yi
	- llama
	- llama-2
	license: other
	license_name: yi-license
	license_link: LICENSE
	datasets:
	- jondurbin/airoboros-2.2.1
	- lemonilia/LimaRP
	---
	# airoboros-2.2.1-limarpv3-y34b-exl2

	Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggingface.co/Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b)

	Branches:
	- main: measurement.json calculated at 2048 token calibration rows on PIPPA
	- 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
	- ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
	- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
	- ideal for large (>24gb) VRAM setups