Huge Performance Disparity: MPT 30b Instruct Model (on Local Inference) vs. MPT 30b Chat Model (using HF Demo)

#8
by ukumar - opened

I'm facing a performance problem while using the mpt 30b instruct model for inference, locally. Comparatively, the mpt 30b chat model from Hugging Face's demo endpoints performs significantly better. Although there's no demo available for the instruct model, the performance difference seems unexpected.
This is my code:
......
model = transformers.AutoModelForCausalLM.from_pretrained(
name= mosaicml/mpt-30b-instruct,
device_map= "auto",
cache_dir=local_path,
torch_dtype=torch.float32,
local_files_only=True,
trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b',
cache_dir=local_path,
local_files_only=True)

pipe = pipeline(task,
model=model,
tokenizer=tokenizer,
#device='cuda:0',
device_map= "auto",
max_length= 512,
do_sample=True,
top_p= 0.9,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
temperature=0.8
)
Then I use langchain for text generation.
Also, I couldn't find the top_p and temperature values from the config files.

Sign up or log in to comment