Anthonyg5005
/

hf-scripts

Model card Files Files and versions Community

hf-scripts / auto-exl2-upload /INSTRUCTIONS.txt

Anthonyg5005's picture

publish finished auto exl2 script

9bcb78e 8 months ago

2.35 kB

	For NVIDIA cards install the CUDA toolkit

	Nvidia Maxwell or higher
	https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64

	Nvidia Kepler or higher
	https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64


	Haven't done much testing but for Windows, Visual Studio with desktop development for C++ might be required. I've gotten cl.exe errors on a previous install


	This may work with AMD cards but only on linux. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html



	First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then every file and folder except for windows.bat, linux.sh, and exl2-quant.py should be deleted then try again.

	After setup is complete then you'll have a file called start-quant. Use this to run the quant script.

	Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
	Make sure to also have a lot of RAM depending on the model.

	If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the finished BPW as it will start from the beginning. You may also use ctrl + c pause at any time during the quant process.

	Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions


	Credit to turboderp for creating exllamav2 and the exl2 quantization method.
	https://github.com/turboderp

	Credit to oobabooga the original download and safetensors scripts.
	https://github.com/oobabooga

	Credit to Lucain Pouget for maintaining huggingface-hub.
	https://github.com/Wauplin

	Only tested with CUDA 12.1 on Windows 11 and half-tested Linux through WSL2 but I don't have enough RAM to fully test but quantization did start.