Wrong solution for 1+1=
LLMs can't do math reliably without external assistance.
I don't know what you expect here.
Hi,
Thanks for your answering. I also tried a few other examples. For instance: ask it to generate a prompt based on my need, and ask it to answer some questions based on the context I provide. Neither of them shows useful results.
You're asking a base model to solve problems.
What you want is the Instruct variant. Base isn't suitable for this.
So does base model for further fine tune while instruct model for solve problems?
Thanks for your sharing!
The base model is a raw LLM; it ONLY does text completion.
Instruct has been tuned to respond to you instead.
Understand. Thx!
"1+1=3" doesn't necessarily mean that the model was wrong. The '1+1=3' can mean many different things, such as irony, a metaphor for synergy, and it can even be the start of an equation like '1+1=3-1' which also is correct.
The issue is NOT that the model is incapable of such a simple operation! It's because it doesn't understand what you actually want from it. If you want to ensure the model knows what you mean, you either have to fine-tune it or give an example by prefacing the equation, for example, '5+3=8 1+1=', and now it's obvious that the expected answer is the sum of 1 and 1.
Several examples:
Input: '5+3=8 1+1=', the model outputs '2'
Input: 'Sum of: 1+1=', the model outputs '2'
Input: 'Sum of: 62+16=', the model outputs '78'
You can certainly nudge an LLM in the right direction, but they are fundamentally incapable of arithmetics or real logical actions without external help.
Don't mistake simple calculations being right as ability to do maths.
"1+1=3" doesn't necessarily mean that the model was wrong. The '1+1=3' can mean many different things, such as irony, a metaphor for synergy, and it can even be the start of an equation like '1+1=3-1' which also is correct.
The issue is NOT that the model is incapable of such a simple operation! It's because it doesn't understand what you actually want from it. If you want to ensure the model knows what you mean, you either have to fine-tune it or give an example by prefacing the equation, for example, '5+3=8 1+1=', and now it's obvious that the expected answer is the sum of 1 and 1.
Several examples:
Input: '5+3=8 1+1=', the model outputs '2'
Input: 'Sum of: 1+1=', the model outputs '2'
Input: 'Sum of: 62+16=', the model outputs '78'
Hi Satoszi,
Thx for your sharing! I haven't though from that side. I think that's quite interesting. It's like LLM has many "capabilities" to answer this question but without FT it doesn't know which one it should give.
You can certainly nudge an LLM in the right direction, but they are fundamentally incapable of arithmetics or real logical actions without external help.
Don't mistake simple calculations being right as ability to do maths.
Of course you are right, that LLMs are not good at arithmetics, and they are not built for that. We can never trust LLM output in arithmetic problems (and other domains too 🙂). It'll give approximation that looks legit, but for simple operations like sum of small numbers etc that approximation should be (usually) correct.
"1+1=3" doesn't necessarily mean that the model was wrong. The '1+1=3' can mean many different things, such as irony, a metaphor for synergy, and it can even be the start of an equation like '1+1=3-1' which also is correct.
The issue is NOT that the model is incapable of such a simple operation! It's because it doesn't understand what you actually want from it. If you want to ensure the model knows what you mean, you either have to fine-tune it or give an example by prefacing the equation, for example, '5+3=8 1+1=', and now it's obvious that the expected answer is the sum of 1 and 1.
Several examples:
Input: '5+3=8 1+1=', the model outputs '2'
Input: 'Sum of: 1+1=', the model outputs '2'
Input: 'Sum of: 62+16=', the model outputs '78'Hi Satoszi,
Thx for your sharing! I haven't though from that side. I think that's quite interesting. It's like LLM has many "capabilities" to answer this question but without FT it doesn't know which one it should give.
Yeah vanilla LLMs without reinforcement or other fancy finetuning methods are pretty "stupid" 😁