Edit model card

llama-2-7b-ov

Description

This is Llama-2-7b model converted to the OpenVINO™ IR (Intermediate Representation) format.

Running Model Inference with Optimum Intel

  1. Install packages required for using Optimum Intel integration with the OpenVINO backend:
pip install optimum[openvino]
  1. Run model inference:
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

model_name = "OjasPatil/intel-llama2-7b-ov"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = OVModelForCausalLM.from_pretrained(model_name)

message = "What is Intel OpenVINO?"
prompt = f"[INST] {message} [/INST]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = base_model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt+" ", "")
print(response)
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.