This is a quantized version of the Jais-30b-chat model
If you are using text-generator-webui Select Transformers
- Compute d-type: bfloat16
- Quantization Type : nf4
- Load in 4-bit: True
- Use double quantization: True
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import transformers
import torch
model_name = "jwnder/core42_jais-30b-chat-v3-bnb-4bit"
import warnings
warnings.filterwarnings('ignore')
tokenizer = AutoTokenizer.from_pretrained(model_input_folder, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_input_folder, trust_remote_code=True)
inputs = tokenizer("Testing LLM!", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
- Downloads last month
- 20
Inference API (serverless) does not yet support model repos that contain custom code.