Spaces:
Runtime error
Generate parameters
From this APP, we can set top_k, top_p, temperature, max_new_tokens manually.
But I wonder what other parameters being set in service side? Such as repetition_penalty, random_seed, do_sample, use_cache, precision ...
This is a great suggestion! We will add repetition_penalty
, random_seed
, do_sample
soon. We'll probably keep use_cache=True and precision=FP16 for now, since they usually won't affect the output much. Please let us know if you feel there are other parameters that are of interest!
@juewang As a kind of follow up on this question, I struggle to replicate the results I obtain when using this app compared to what I get prompting the model locally. Is there anything you are doing differently than the standard HF generation behind your API? Below is the code I use for generation. The parameters in the app GUI are the same.
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1", device_map="auto")
prompt = "The benefits of vacation at the Baltic Sea"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
gen_config = GenerationConfig(
max_new_tokens=128,
do_sample=True,
temperature=0.8,
top_k=1000,
top_p=1.0,
)
outputs = model.generate(
**inputs,
generation_config=gen_config,
)
text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(text)
['The benefits of vacation at the Baltic Sea\n\nI’ve been taking a lot of vacation lately, and I’m happy to say that it’s paying off for me.\n\nLet me explain.\n\nVacation is usually taken for a break from work. Whether from a back injury or a holiday to unwind, it’s a “vacation from work”.
Whereas for the same prompt the quality in the app is much better:
I used to also get good results with zero-shot instruction last year. But now instructions result in mostly gibberish. Has anything changed w.r.t to that?