VinaLlama2-14B Beta

GGUF Here: VinaLlama2-14B-GGUF

Top Features:

  • Context Length: 32,768 tokens.
  • VERY GOOD at reasoning, mathematics and creative writing.
  • Works with Langchain Agent out-of-the-box.

Known Issues

  • Still a bit struggling with Vietnamese fact (Hoang Sa & Truong Sa, Historical questions).
  • Hallucination when reasoning.
  • Can't do Vi-En/En-Vi translation (yet)!

Quick use:

VRAM Requirement: ~20GB

pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "vilm/VinaLlama2-14B",
    torch_dtype='auto',
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("vilm/VinaLlama2-14B")

prompt = "Một cộng một bằng mấy?"
messages = [
    {"role": "system", "content": "Bạn là trợ lí AI hữu ích."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=1024,
    eos_token_id=tokenizer.eos_token_id,
    temperature=0.25,
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids)[0]
print(response)
Downloads last month
3
Safetensors
Model size
14.2B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using vilm/VinaLlama2-14B-arxiv 1