Uploaded model
- Developed by: Ashed00
- License: apache-2.0
- Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Inference Code
import torch
prompt = """Below is given a Question and context to solve the question. Provide the answer to the question from the context.
### Question:
{}
### Context:
{}
### Answer:
{}"""
if True:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Ashed00/Hindi_tuned_Llama-3.2-1B", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
prompt.format(
'Who stopped revolt of Ballarat?', #Question in hindi/english
"'इसे ब्रिटिश सैनिकों द्वारा कुचल दिया गया था, लेकिन असंतोष ने औपनिवेशिक अधिकारियों को प्रशासन में सुधार करने (विशेष रूप से घृणित खनन लाइसेंस शुल्क को कम करना) और मताधिकार का विस्तार करने के लिए प्रेरित किया।'", # Context
"",
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 500, use_cache = True, temperature = 1.5, min_p = 0.1)
answer=tokenizer.batch_decode(outputs)
answer = answer[0].split("### Answer:")[-1]
print("Answer of the question is:", answer)
Metrics
(to be calculated)
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Ashed00/Hindi_tuned_Llama-3.2-1B
Base model
meta-llama/Llama-3.2-1B-Instruct
Quantized
unsloth/Llama-3.2-1B-Instruct-bnb-4bit