shambhurajp commited on
Commit
afe6b99
1 Parent(s): f738178

created README File

Browse files

Created Simple Llama2 fintuned chat model which is trained on mlabonne/guanaco-llama2-1k dataset . This model easily load on Colab as it requires 10-13GB GPU RAM for Inference.

Below is Code for Inference

# Load base model
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

# Run text generation pipeline with our next model
prompt = "How beer is manufactured?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<'s'>[INST] {prompt} [/INST]") ###only s between < and >
print(result[0]['generated_text'])

Files changed (1) hide show
  1. README.md +10 -0
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - mlabonne/guanaco-llama2-1k
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ library_name: adapter-transformers
10
+ ---