Google DeepMind Utilizes Large Language Model to Solve Previously Unsolvable Mathematics Problem

Aaron Di Blasi

2 years ago

In a groundbreaking development, Google DeepMind has successfully employed a large language model to decipher a previously unsolved problem in pure mathematics. According to a paper published in Nature on December 14, 2023, this marks the first instance where a large language model has been instrumental in discovering a solution to a longstanding scientific conundrum, thereby generating verifiable and novel information that was hitherto unknown. “It’s not in the training data — it wasn’t even known,” asserts Pushmeet Kohli, Vice President of Research at Google DeepMind.

Typically, large language models are associated with fabricating information (AKA hallucinating) rather than contributing new factual knowledge. However, Google DeepMind’s latest innovation, dubbed FunSearch, challenges this perception. It demonstrates the potential of large language models in making significant discoveries, provided they are guided appropriately and the majority of their outputs are discarded.

Named FunSearch for its ability to search mathematical functions (and not necessarily for being amusing), this tool continues DeepMind’s series of achievements in fundamental mathematics and computer science using artificial intelligence. Preceding FunSearch, the AI system AlphaTensor identified a method to expedite a calculation fundamental to various codes, surpassing a 50-year record. Following that, AlphaDev discovered more efficient ways to execute key algorithms used trillions of times daily.

However, AlphaTensor and AlphaDev did not rely on large language models. Developed on the foundation of DeepMind’s game-playing AI AlphaZero, they approached mathematical problems as puzzles similar to those in games like Go or chess. The limitation, as pointed out by Bernardino Romera-Paredes, a researcher at the company involved with both AlphaTensor and FunSearch, is their specialization. “AlphaTensor is great at matrix multiplication, but basically nothing else,” he explains.

FunSearch adopts a different strategy. It integrates a large language model known as Codey, a variant of Google’s PaLM 2 fine-tuned for computer code, with additional systems that eliminate incorrect or illogical responses and incorporate viable solutions.

Alhussein Fawzi, a Research Scientist at Google DeepMind, candidly admits the underlying mechanisms of their latest achievement remain somewhat mysterious. “To be very honest with you, we have hypotheses, but we don’t know exactly why this works,” he states. Initially, there was uncertainty regarding the project’s feasibility: “In the beginning of the project, we didn’t know whether this would work at all.”

The team began their endeavor by outlining the mathematical problem they intended to tackle using Python, a widely used programming language. However, they intentionally omitted specific parts of the program that would dictate the method of solving the problem, paving the way for FunSearch to play its role. FunSearch employs Codey to fill in these gaps, effectively prompting it to propose coding solutions for the problem at hand.

This process involves a secondary algorithm that evaluates and ranks the solutions generated by Codey. Even if the initial suggestions are not entirely correct, the most promising ones are preserved and fed back into Codey for further refinement. As Kohli explains, “Many will be nonsensical, some will be sensible, and a few will be truly inspired. You take those truly inspired ones and you say, ‘Okay, take these ones and repeat.'”

Following several million iterations and numerous cycles of this procedure over a few days, FunSearch succeeded in devising a code that solved the cap set problem, an esoteric yet significant challenge in mathematics. This problem entails determining the maximum size of a specific set, analogous to placing dots on graph paper in such a way that no three dots align in a straight line.

Though the cap set problem is highly specialized, it holds substantial importance in the field of mathematics, where even consensus on an approach to solve it is elusive, let alone an agreed-upon solution. The problem also has connections to matrix multiplication, the very computation that AlphaTensor enhanced. Terence Tao, a highly esteemed mathematician at the University of California, Los Angeles and a Fields Medal recipient, has expressed his fascination with the cap set problem, labeling it “perhaps my favorite open question” in a 2007 blog post.

Terence Tao, a renowned mathematician, expresses his enthusiasm for the capabilities of FunSearch. “This is a promising paradigm,” he remarks, highlighting the novel approach of leveraging the capabilities of large language models.

FunSearch possesses a significant advantage over its predecessor, AlphaTensor, in its theoretical capacity to solve a broad spectrum of problems. This is due to its unique method of generating code, which acts as a recipe for deriving solutions, rather than producing the solutions directly. This approach allows for versatile application, where different codes are tailored to solve various problems. According to Fawzi, the clarity of these ‘recipes’ is often greater than that of the complex mathematical solutions they lead to.

The team put FunSearch’s adaptability to the test by applying it to another complex mathematical challenge: the bin packing problem. This problem, which focuses on the optimization of packing items into the fewest possible bins, has significant implications in several areas within computer science, including data center management and e-commerce. Impressively, FunSearch managed to devise a solution method that outperforms those created by humans.

Tao notes the ongoing efforts within the mathematical community to effectively integrate large language models into research methodologies, aiming to maximize their potential while addressing their limitations. He acknowledges the work with FunSearch as a demonstration of one viable path forward in this endeavor. “Mathematicians are still trying to figure out the best way to incorporate large language models into our research workflow in ways that harness their power while mitigating their drawbacks,” he says. “This certainly indicates one possible way forward.”

Share this: