jester6136
/

Phi-3.5-mini-instruct-awq

4-bit precision

Model card Files Files and versions Community

Phi-3.5-mini-instruct-awq / README.md

jester6136's picture

Update README.md

b28c9f1 verified about 2 months ago

|

history blame contribute delete

840 Bytes

	---
	license: mit
	---
	How to use with vllm:
	```
	from vllm import LLM, SamplingParams
	inputs = [
	"Who is the president of US?",
	"Can you speak Indonesian?"
	]
	# Initialize the LLM model
	llm = LLM(model="jester6136/Phi-3.5-mini-instruct-awq",
	quantization="AWQ",
	gpu_memory_utilization=0.9,
	max_model_len=2000,
	max_num_seqs=32)
	sparams = SamplingParams(temperature=0.0, max_tokens=2000, top_p=0.95,top_k=40,repetition_penalty=1.05)
	chat_template = '<\|user\|>\n{input} <\|end\|>\n<\|assistant\|>'
	prompts = [chat_template.format(input=prompt) for prompt in inputs]
	outputs = llm.generate(prompts, sparams)
	# print out the model response
	for output in outputs:
	prompt = output.prompt
	generated_text = output.outputs[0].text
	print(f"Prompt: {prompt}\nResponse: {generated_text}\n\n")
	```