Share your recommended configurations for speed.
#76
by
archonlith
- opened
I'm just getting started running some generation scenarios but I'd prefer not to spend all day finding the best parameters for speed/accuracy to be optimized. What works for you configuration wise?
I have been testing with:
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained (model, trust_remote_code=True)
model.config.max_new_tokens = 2000
gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.max_new_tokens = 2000
gen_cfg.max_time = 90.0
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
generation_config=gen_cfg
)