togethercomputer/GPT-JT · Generate parameters

Dec 12, 2022

From this APP, we can set top_k, top_p, temperature, max_new_tokens manually.
But I wonder what other parameters being set in service side? Such as repetition_penalty, random_seed, do_sample, use_cache, precision ...

juewang

Together org Dec 15, 2022

This is a great suggestion! We will add repetition_penalty, random_seed, do_sample soon. We'll probably keep use_cache=True and precision=FP16 for now, since they usually won't affect the output much. Please let us know if you feel there are other parameters that are of interest!

mrtnm

Jan 3, 2023

•

edited Jan 5, 2023

@juewang As a kind of follow up on this question, I struggle to replicate the results I obtain when using this app compared to what I get prompting the model locally. Is there anything you are doing differently than the standard HF generation behind your API? Below is the code I use for generation. The parameters in the app GUI are the same.

model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1", device_map="auto")

prompt = "The benefits of vacation at the Baltic Sea"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

gen_config = GenerationConfig(
    max_new_tokens=128,
    do_sample=True,
    temperature=0.8,
    top_k=1000,
    top_p=1.0,
)

outputs = model.generate(
    **inputs,
    generation_config=gen_config,
)

text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(text)

['The benefits of vacation at the Baltic Sea\n\nI’ve been taking a lot of vacation lately, and I’m happy to say that it’s paying off for me.\n\nLet me explain.\n\nVacation is usually taken for a break from work. Whether from a back injury or a holiday to unwind, it’s a “vacation from work”.

Whereas for the same prompt the quality in the app is much better:

I used to also get good results with zero-shot instruction last year. But now instructions result in mostly gibberish. Has anything changed w.r.t to that?