kingbri's picture
Update README.md
1f3452d verified
metadata
language:
  - en

Information

This is a Exl2 quantized version of MN-12B-Starcannon-v3

Please refer to the original creator for more information.

Calibration dataset: Exl2 default

Branches:

  • main: Measurement files
  • 4bpw: 4 bits per weight
  • 5bpw: 5 bits per weight
  • 6bpw: 6 bits per weight

Notes

  • 6bpw is recommended for the best quality to vram usage ratio (assuming you have enough vram).
  • Quants greater than 6bpw will not be created because there is no improvement in using them. If you really want them, ask someone else or make them yourself.

Download

With async-hf-downloader: A lightweight and asynchronous huggingface downloader created by me

./async-hf-downloader royallab/MN-12B-Starcannon-v3-exl2 -r 6bpw -p MN-12B-Starcannon-v3-exl2-6bpw

With HuggingFace hub (pip install huggingface_hub)

huggingface-cli download royallab/MN-12B-Starcannon-v3-exl2 --revision 6bpw --local-dir MN-12B-Starcannon-v3-exl2-6bpw

Run in TabbyAPI

TabbyAPI is a pure exllamav2 FastAPI server developed by us. You can find TabbyAPI's source code here: https://github.com/theroyallab/TabbyAPI

  1. Inside TabbyAPI's config.yml, set model_name to MN-12B-Starcannon-v3-exl2-6bpw

    1. You can also use an argument --model_name MN-12B-Starcannon-v3-exl2-6bpw on startup or you can use the /v1/model/load endpoint
  2. Launch TabbyAPI inside your python env by running ./start.bat or ./start.sh

Donate?

All my infrastructure and cloud expenses are paid out of pocket. If you'd like to donate, you can do so here: https://ko-fi.com/kingbri

You should not feel obligated to donate, but if you do, I'd appreciate it.