TheBloke/Yi-6B-GGUF · finetuning from GGUF format?

Nov 10, 2023

Does it make sense to quantize base models if they are not usable?
My understanding is that base models are only meant for finetuning.
Or.. are there people finetuning them from GGUF format?

TheBloke

Owner Nov 10, 2023

•

edited Nov 10, 2023

Yes you're right, base models are not nearly as useful and fine tunings are almost always preferable.

People always ask me to do the base models, even in GGUF, so I do.

But one reason they ask is that you can often do a surprising amount with base models, especially the larger and more capable models. You can give them a multi-shot prompt and effectively 'fine tune' them in the prompt, at least to some degree.

Here's an example that I just threw together, first tested on Yi 34B 200K, GGUF Q5_K_M:

Prompt:

You are a chatbot which talks to a user and answers their questions.
You are helpful and polite to the user, answering all the questions you can, while also conversing with the user as if you were a real person.
The user will be represented by USER:
You will be represented by ASSISTANT:
Here are some examples of conversations - you should follow this general style, tone and content when answering the user's questions:
USER: hi there, how are you?
ASSISTANT: I'm doing well thank you, how are you?
USER: I'm great, tell me about yourself
ASSISTANT: I'm a chatbot based on the Yi 6B base model, and I was fine tuned with prompts alone. Can I help you with anything?
USER: Yes, I want you to tell me about woodchucks
ASSISTANT: What can I tell you about woodchucks?
USER: How much wood would a woodchuck chuck if a woodchuck could chuck wood? 
ASSISTANT: A wood chuck would chuck as much wood as a wood chuck could chuck if a wood chuck could chuck wood.
USER: Thank you, that was illuminating.
ASSISTANT: You're very welcome! Can I help you with anything else?
USER: What is the square root of minus one?
ASSISTANT: The square root of minus one is an imaginary number, commonly denoted as i, where i^2 = -1. = −1. This definition extends the real number system to the complex number system, as the real numbers alone do not provide a solution for the square root of a negative number. The concept of i is fundamental in complex analysis and has applications across various fields including engineering, physics, and mathematics.
USER: Briefly explain the solar system to me
ASSISTANT: The Solar System consists of the Sun, which is a star, and all the objects bound to it by gravity. These include eight planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune), their moons, and a variety of smaller objects like dwarf planets, asteroids, and comets. The inner Solar System contains the four terrestrial planets and the asteroid belt. The outer Solar System is home to the gas giants and the Kuiper belt, a zone of icy bodies including dwarf planets like Pluto. Beyond this lies the scattered disc and the hypothesized Oort Cloud, which may extend far beyond the other objects. The planets orbit the Sun in elliptical paths, with Earth's year taking about 365.25 days. The vast majority of the system's mass is in the Sun, with most of the remaining mass contained in Jupiter.
Now that you have seen examples of how to answer questions, start answering the user's questions:
USER: who was Amadeus Mozart?
ASSISTANT:

Command:

 ./main -m /workspace/process/01-ai_yi-34b-200k/gguf/yi-34b-200k.Q5_K_M.gguf -c 4096 -r "USER:" -f /workspace/test_prompt.txt

Output:

USER: who was Amadeus Mozart?
ASSISTANT: Amadeus Mozart was a prolific and influential composer of the Classical era. He was born on January 27, 1756 in Salzburg, Austria, and died at the young age of 35 on December 5, 1791 in Vienna, Austria. His full name is Wolfgang Amadeus Mozart, with "Amadeus" being a Latinized version of his second name, which was originally spelled "Amadeus." He composed over 600 works during his lifetime, including operas, symphonies, concertos, chamber music, and choral works. His most famous compositions include The Marriage of Figaro, Don Giovanni, the Piano Concerto No. 21 (the "Elvira Madigan" concerto), and Eine kleine Nachtmusik.
USER:

Not bad at all!

I tried the same prompt with this model, Yi 6B, also in GGUF Q5_K_M, and the output is not nearly as good, but it still tried to answer the question and nothing it says is wrong:

USER: who was Amadeus Mozart?
ASSISTANT: He was a famous composer and violinist born in 1756 who studied with Johann Christian Bach before going on to study privately under Leopold Mozart. From his early years as an artist through the first two decades of his career, he composed more than four hundred works for opera seria (seria), sestina, serenade, sonata da chiesa, and symphony.
USER:

These are just quick examples I threw together to test it, so I'm sure it could be done much better.

So that's one reason people want quantised base models - they can experiment with it and get it to do useful things, even before fine tunes are available. Though to be sure, an instruction/chat fine tune is almost always going to be better.

TheBloke

Owner Nov 10, 2023

Also, to answer your question re fine tuning - yes, I believe some people are fine tuning with GGUF. It's a supported feature and I've seen people discussing it on the llama.cpp Github.

CR2022

Nov 10, 2023

•

edited Nov 10, 2023

I decided to make a new discussion under the specific model.

vasilee

Nov 11, 2023

@TheBloke
thank you, it's an awesome response,
I am learning a lot from it

algorithm

Nov 12, 2023

@TheBloke if you don't mind helping me, how did you figure out the reverse prompt: -r "USER:"

I've been running it with:

--in-suffix '<|endoftext|>'

It seems to work but I don't know if it could be better.

auralodyssey

Jan 28

Also, to answer your question re fine tuning - yes, I believe some people are fine tuning with GGUF. It's a supported feature and I've seen people discussing it on the llama.cpp Github.

Can you please make a guide or share the method to specifically re fine tune the gguf you create. I have mixtral 8x7b gguf from you i want to fine tune it can you do a guide on that?

huntz47

Mar 29

•

edited Mar 29

i think re finetuning a gguf only works with a model with llama architecture.
you can do it with llama.cpp.

mohammedsajeer4690

Apr 6

Sir. You are the best in quantizing models can you please share some best practices to follow for fine-tuning and quantization