outputs are wrong
I tried to run it on colab T4 and it seems there is something wrong in the model outputs
!CUDA_VISIBLE_DEVICES=0 python interactive_gen.py --hf_path /content/model --no_use_flash_attn
I1216 09:57:57.827290 5316 utils.py:160] NumExpr defaulting to 2 threads.
I1216 09:58:09.321581 5316 modeling.py:799] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Please enter your prompt or 'quit' (without quotes) to quit: explain quick sort
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Model Output: explain quick sort
explain quick sort - 6.0 out of 5 based on 10 reviews
Please enter your prompt or 'quit' (without quotes) to quit: what is huggingface
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Model Output: what is huggingface
Hugging Face is a nonprofit 501(c)(3) corporation. We have no employees, and all work is done by volunteers. Our mission is to help children who are victims of war, poverty, natural disasters,
Please enter your prompt or 'quit' (without quotes) to quit: give me python quick sort example
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Model Output: give me python quick sort example
Asked by Anonymous a on January 15, 2018 Verified by bdehara
Please enter your prompt or 'quit' (without quotes) to quit: Write a python script to output numbers 1 to 53 with step = 3 and then the should run the script
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Model Output: Write a python script to output numbers 1 to 53 with step = 3 and then the should run the script.
This is a very simple problem. I'm sure there are a lot of ways to solve it. Here's one way.
# Write a python script to output numbers
Please enter your prompt or 'quit' (without quotes) to quit:
What seems wrong about the output? You're running the non instruction tuned version, which isn't very good at following instruction prompts. There is also a max_length flag in interactive_gen.py
which should let you get more output from the models.
Just curious, can you share the prompts that you typically use to evaluate a model?
Thank you everyone, and kindly review my attempt to use quip to run a different model: https://gist.github.com/eramax/64d59644d600b3db3fcb6acf0133721b
I haven't used a prompt template and I'm not sure how to set one for the script interactive_gen.py
.
My experience with the relaxml/Llama-2-13b-chat-E8P-2Bit} instruction-tuned model wasn't great; perhaps because I hadn't applied a prompt template, the model didn't respond well to my questions. Since there are many packages in the
requirements.txt` file, I discovered that I don't need to install them all. I simply installed the ones that were stated, and I was able to function.
Best regards,
interactive_gen.py is just a simple script that calls hf.generate() internally. If you want to set up prompt templates, you'll have to work that in somehow. I personally have not used a prompt template before so I can't really help with that. I think the quality of output you're seeing is in line with what you should expect when entering those prompts. There is still some degradation in text generation quality when going from fp16 to 2 bit (remember, this model is 8x smaller than the original model!). Openhermes seems to be a better "chat" model and is smaller. You can also try using a 4 bit model if you want higher quality generation.