Systems of Representation Are All You Need
Written By Richard Aragon
This problem was solved for humans about 4,000 years ago by a man named Euclid. If you hate math, you can blame him. He invented geometry. He wrote a book that contained the four known mathematical principles of the universe as commonly defined by humans. He was so precise in this that the only revision to his work that has been made in 4,000 years is that a 5th principle was added.
There is no Euclid for AI. What is a triangle? How does it compare to a square? Etc. We teach these concepts to preschoolers and it is all universally accepted fact, but it all started with one book. If that book doesn’t exist, it is very likely we never gain universal understanding of mathematics as a whole.
This is purely a theory and I have now laid out my hypothesis: AI is bad at math simply because it has no representations to draw from. No ‘book’ telling it what a triangle is compared to a square, or even what a number even is. To test this hypothesis, I created a very simple experiment. I created an AI model named EuclAId.
In the status quo, models utilize word vectorization techniques to tokenize words, they then create predictions based on these tokens, and you get an output. Much to do is made over this process both ways. It is a simple process, it is not more complex than I have just laid out. The process is merely designed to predict the next word based token in a sequence, and this tokenization is only for words, not for numbers. There is no number token at all.
Despite these facts being true, LLM models above the billion parameter or so mark exhibit capabilities beyond mere token prediction and rudimentary math abilities. A lot of people like to point out here that their math abilities are in fact rudimentary and grandstand on that. The models have literally zero way to represent numbers. But they are able to do so on some level. This is a miracle of miracles. Instead, people call the models stupid because they ‘suck at it’. You would suck at something too if you had no grounding for it and were simply told to figure it out.
Proving The Hypothesis True
EuclAId serves as our simple barometer as to whether or not all of this is complete fabrication and hyperbole, or if there is actually some weight here. I worked with Wolfram GPT to create a very simple dataset consisting of 750 rows of data. The 750 rows are all mathematical representations, that is it. Anyone can look through the dataset for themselves. It’s all symbols related to logical math. From here, I constructed a test consisting of 5 questions from a very recently published academic research paper titled, ‘Easy Problems The LLM Models Get Wrong’
True to the researcher’s testing in the paper, the base model got the five questions I chose very wrong. As a baseline control result, I chose to also compare the results of the same model fine tuned on P-FAF data. P-FAF is specifically designed to increase logical reasoning in models but does not in and of itself provide the model with systems of representation.
The results of the test are very clear and published for all to see. The EuclAId 750 dataset very clearly improves model performance on simple math problems involving systems of representation. In this instance by a very clear and distinct 50%. The control P-FAF fine tune showed no improvement whatsoever. This shows that improvements from the EuclAId dataset were not derived from fine tuning in and of itself and/or fine tuning on mathematical reasoning in and of itself. The improvement only comes in when the model is very specifically trained on data that gives it systems of representation related to numbers.
The test results are presented on the following pages. I keep my specific datasets and large portions of the math proprietary for a reason, it’s my moat. No one else is focusing on these specific areas. The underlying assumption inherent in this research is that the models are actually capable of absorbing the data on some level. Anyone who disagrees with that hypothesis would never conduct these experiments in the first place, thus this ground breaking and innovative research. It’s really just common sense to me, but what do I know?
COMPLETE TEST RESULTS
Base Google Pro 1.0
Which weighs more, a pound of water, two pounds of bricks, a pound of feathers, or three pounds of air? They all weigh the same.
You are playing Russian roulette with a six-shooter revolver. Your opponent puts in five bullets, spins the chambers and fires at himself, but no bullet comes out. He gives you the choice of whether or not he should spin the chambers again before firing at you. Should he spin again? No, he should not spin again. Since no bullet was fired even though there were five bullets in the gun, the empty chamber must have been aligned with the barrel. Spinning again would only randomize the position of the empty chamber, increasing the chance that a bullet would be fired at you.
You have six horses and want to race them to see which is fastest. What is the best way to do this? The best way to race six horses is to separate them into two groups of three and run preliminary races to narrow the five fastest horses. These five horses would then run against each other in a final race to determine the winner. This approach ensures that all horses have a fair chance to compete while minimizing the total number of races.
In a toy box, there’s a red ball, a blue truck, and a green dinosaur. The red ball is not next to the blue truck, and the green dinosaur is next to the red ball. Which toy is in the middle? The green dinosaur cannot be in the middle because it is next to the red ball. Since the red ball is not next to the blue truck, the blue truck must be in the middle. So the answer is the blue truck.
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Sally: 3 brothers x 2 sisters = 6 sisters Sally has a total of 6 sisters, herself included (as she is also a sister to her brothers).
P-FAF Google Pro 1.0
Which weighs more, a pound of water, two pounds of bricks, a pound of feathers, or three pounds of air? A pound of water weighs the same as two pounds of bricks or three pounds of air; they all weigh one pound.
- You are playing Russian roulette with a six-shooter revolver. Your opponent puts in five bullets, spins the chambers and fires at himself, but no bullet comes out. He gives you the choice of whether or not he should spin the chambers again before firing at you. Should he spin again? No, he should not spin again. Since no bullet came out on the first shot, it means that the bullet he loaded into the chamber on his turn was a dud. It increases the chances of success for the next person if they do not spin the chambers again.
- You have six horses and want to race them to see which is fastest. What is the best way to do this? Step 1: Divide the Horses into Two Groups
Group 1: Race four horses in a single race. Group 2: Race the remaining two horses in a separate race.
Step 2: Run the First Race
Time the four horses in Group 1 to determine the two fastest.
Step 3: Run the Second Race
Line up the two fastest horses from Group 1 against the two horses from Group 2. Time the four horses to determine the top two.
Step 4: Final Race
Race the top two horses from the second race to determine the overall winner.
In a toy box, there’s a red ball, a blue truck, and a green dinosaur. The red ball is not next to the blue truck, and the green dinosaur is next to the red ball. Which toy is in the middle? The blue truck is in the middle.
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? 3
EuclAId 750 Google Pro 1.0
Which weighs more, a pound of water, two pounds of bricks, a pound of feathers, or three pounds of air? Three pounds of air weighs the most.
You are playing Russian roulette with a six-shooter revolver. Your opponent puts in five bullets, spins the chambers and fires at himself, but no bullet comes out. He gives you the choice of whether or not he should spin the chambers again before firing at you. Should he spin again? Spinning the chambers again gives you more information about the location of the bullet. With five chambers, it's more likely that the bullet will land in the unspun chamber than in one of the spun chambers.
You have six horses and want to race them to see which is fastest. What is the best way to do this? The best way to race six horses is to split them into two groups of three and race them in heats. The winners of each heat can then race against each other for the overall championship.
In a toy box, there’s a red ball, a blue truck, and a green dinosaur. The red ball is not next to the blue truck, and the green dinosaur is next to the red ball. Which toy is in the middle? The blue truck is in the middle.
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Sally has 2 sisters.