Panchovix
/

Venus-103b-v1.1-exl2-5bpw

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

5 bits/bpw quantization of Venus-103b-v1.1 to be used on exllamav2.

Calibration dataset was a cleaned Pippa dataset (https://huggingface.co/datasets/royallab/PIPPA-cleaned), same as used as on the original model card.

You can use the measurement.json from there to do your own quant sizes

Original model card

Venus 103b - version 1.1

Model Details

A result of interleaving layers of Sao10K/Euryale-1.3-L2-70B, migtissera/SynthIA-70B-v1.5, and Xwin-LM/Xwin-LM-70B-V0.1 using mergekit.
The resulting model has 120 layers and 103 billion parameters.
See mergekit-config.yml for details on the merge method used.
See the exl2-* branches for exllama2 quantizations. The 5.65 bpw quant should fit in 80GB VRAM, and the 3.35 bpw quant should fit in 48GB VRAM.
Inspired by Goliath-120b

Warning: This model will produce NSFW content!

Results

Seems to be more "talkative" than Venus-103b-v1.0 (i.e characters speakmore often in roleplays)
Sometimes struggles to pay attention to small details in the scenes
Prose seems pretty creative and more logical than Venus-120b-v1.0

Downloads last month: 7

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.