Large language models, not good mathematicians at least present
Which digit is bigger, 9.11 or 9.9?
When it comes to the very question that can be easily answered by most pupils in Grade 3 or above, eight out of the 12 popular Large Language Models (LLMs), the well-known ChatGPT-4o included, give the wrong answer. On some professional websites, a possible computing process of them was revealed, in which the LLMs, having found that 9 equals 9, turned to comparing 11 with 9 instead of 0.11 with 0.9, following mathematical rules.
This might be contrary to the general perception of the public that is now popularly using LLMs for almost everything but suitable to its nature as reflected by its name, namely models that take in huge amounts of linguistic materials, process them grammatically, and output texts accordingly.
Some advanced versions can output pictures or videos, but in essence, they are still language models, with the only difference being that the languages they use are visual, verbal, or textual.
For LLMs, mathematics remains a difficult challenge to cope with and their belief that 9.11 is bigger than 9.9 would be more easily understandable in that way, as for quite a few LLMs, the digits could be treated like words, so they simply compare 11 with 9 in the case of 9.11 vs 9.9 without caring about the decimal point before them. Quite many LLMs hardly understand any logic so complicated as involving a decimal point.
Mathematics is the foundation of all higher-level logic and the failure in basic mathematical computation is a bottleneck to LLMs, which is essentially why their composed texts always follow similar styles that strictly use “because”, “so” or “first, second, third”, because only with these words clearly stated could LLMs understand the logic.
The situation is improving. OpenAI, the developer of ChatGPT, was reported to have been working on a new approach towards it AI with the code name Strawberry, while more AI developers have been improving the logic of their models in similar ways. There will always be a way ahead, and AI users just need some patience.