Results are extremely poor
I managed to get text-generation-webui to load it.
This took quite a bit of effort, but with 8bit it fit into my 4090. However, even a simple request for a hello world 'C' program I kept getting code fragment garbage. After 4 tries I once got:
in input.split('\n')]
print("Hello World!")
But mostly I get things like:
:{4}}| # | Name - Type Size Offset Value") \ . format('', '-' * 4)
Sometime I wonder if it isn't leaking some Salesforce application code. I spent 4 years working there and am familiar with it especially when I see embedded SQL in hello work program. Also I see text being displayed only to be overwritten in the output window. I doubt if that is a bug in the GUI but your model is likely generating control char's or escape sequences.
With another small model like manticore-13b I can ask for a python program to handle HTTP POST requests and return a base64 image and it gives me a working program.
Having said this I did manage to get something with your example in the README file modified for my HTTP example. The code was perhaps ?90? percent as good as what manticore gave me. So there must be something good inside if I want to properly evaluate this as an alternative.
The GUI has all kinds of options like temperature, top_p, top_k, and many others. It is unclear what the standalone program is using. Perhaps I need to study the config files?
A shout out to Subho Catterjee at Salesforce. He was my boss for 4 years when I was there working on the Postgres project.
Let me know if I can help. I still have some stock. :-)
However, even a simple request for a hello world 'C' program I kept getting code fragment garbage.
to hazard a guess - you probably need to make more changes to this text-generation-webui
thing you are using. from the example on the model card pay particular attention to what is happening with the tokenizer, as this differs from typical LLMs.
encoding = tokenizer("def print_hello_world():", return_tensors="pt").to(device)
encoding['decoder_input_ids'] = encoding['input_ids'].clone()
outputs = model.generate(**encoding, max_length=15)
so input_ids and decoder_input_ids both need to be set to your prompt, otherwise only one half of the model gets the prompt, and you get garbage output with high frequency.
The GUI has all kinds of options like temperature, top_p, top_k, and many others. It is unclear what the standalone program is using. Perhaps I need to study the config files?
you should read the documentation for model.generate
(note that most of the parameters are in the GenerationConfig specification, and can be passed in directly to generate, there is no need to create and pass in a GenerationConfig.)
by default and as in the example on the model card it will be running a greedy search, so choosing the token with the highest probability each time. But you can add do_sampling=True then set values for temperature, top_p, etc in model.generate as you desire.
ah right, I'd also like to add I have no idea what the "correct" (as used during training) prompt format is for this instruction tuned version of the model, as the readme sample doesn't give a hint. I looked at the code alpaca data and am going with prompt = "### Instruction:\n{}\n\n### Response: "
for now, but let me know if you know the truth of the matter.
Has anyone figured out how to get this running properly in text-generation-webui?
I am also getting those very weird results and poor performance.
Thank you!