This is my code which I am running on Kaggle :
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", torch_dtype=torch.float16)
model.save_pretrained("/kaggle/working/")

I am trying to run this inference code :

inp = tokenizer("Who was first man on moon", return_tensors = "pt")
print(inp)
output = model.generate(inp, top_p=0.95, top_k=60)

decode output

print("Output >>> " + tokenizer.decode(output[0], skip_special_tokens=True))

This returning the error as follows:

KeyError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:267, in BatchEncoding.getattr(self, item)
266 try:
--> 267 return self.data[item]
268 except KeyError:

KeyError: 'shape'

During handling of the above exception, another exception occurred

Please help me . I am new to using LLM . I also have additional questions like is the model loaded to disk space and how to access it through kaggle. What would be RAM requirements to run a 7 billion parameter on Kaggle and can the free available GPU option run it for inference.

meta-llama
/

Meta-Llama-3-8B

Having problems generating text

decode output