What does it take to get AI to work like a scientist?

Leveraging existing knowledge

To truly make groundbreaking discoveries, King argues, the way the machines represent knowledge has to be more sophisticated than simply pushing around algebraic expressions until they find one that fits. There needs to be a way to represent more of the abstract, almost philosophical formulations of knowledge and understanding—they have to handle laws in both their mathematical and non-mathematical forms.

As a step in that direction, researchers at IBM have created a new AI scientist with a novel feature: incorporating prior knowledge. Human scientists often start with well-established basic principles and deduce more intricate or specific relationships from there; they don’t solely rely on new data.

The IBM program, named AI Descartes, merges data-driven discovery with a knowledge of theory for the first time. “This is what real scientists do,” said Cristina Cornelio, a research scientist now at Samsung AI who led the effort. Like many previous machine scientists, AI Descartes looks at new data and compiles a list of potential underlying formulas. Unlike previous software, however, it doesn’t stop there: it then considers relevant prior knowledge, checking how well the suggested formulas fit into the bigger picture.

AI Descartes is basically a three-step system that helps the software make the most sense out of a set of data, given some theoretical information. Its first step is similar to previous machine scientists: looking at noisy data and searching for a formula that would fit without being overly complicated. For example, one of the classic equations it re-discovered was Kepler's law, which describes how planets orbit the Sun. Descartes’ handlers fed the system the masses of the Sun and each planet, their distance to the Sun, and the number of days each takes to complete one revolution. The system used a version of symbolic regression to construct possible formulas from component terms and searched for one that can predict the orbital period of any planet based on mass and distance. Usually, this procedure results in a few possible formulas with varying levels of complexity (with simpler ones being less accurate).

For the second step, AI Descartes turns to the known background theory to check if any of the candidate formulas make scientific sense and help break the tie. To do this, it makes use of a “logical reasoning module” that basically works as a theorem prover—verifying logical connections without the need for actual data. It starts with fundamental rules and concepts, expressed as a set of equations entered by human researchers. For the case of Kepler’s law, this included expressions for gravitational and centrifugal forces, as well as basic premises like mass should always be positive. Then, the reasoning module tries to expand its background knowledge one logical step at a time, using the fundamental rules to generate more and more formulas that are still valid.

If one of the first step’s candidate formula pops up in that list, that immediately makes it the favorite, since it would be provable from background theory.

Imperfect matches

Of course, it’s more likely the theorem prover won’t generate an exact match for a candidate formula—if the formula is easily derived from the background theory alone, one might question the necessity of the data in the first place. In the Kepler’s law example, none of the three formulas it identified in the first step could be derived from existing knowledge alone.

But the ways in which the candidate formulas fall short can be enlightening. This comprises the crucial third step: determining which candidate formula is closest to the possibilities suggested by the background theory. To do this, AI Descartes uses three separate ways of describing the distance between the candidate data-driven formulas and those derivable from background theory—something that can be done even without an explicit ‘correct’ formula. “That’s the magic of the theorem prover,” Cornelio says.

These definitions of distance vary, but they’re all about trying to derive the candidate formula from the background theory with a few different assumptions. These distances help tease out why the formula might be underivable from the background theory and thus suggest future courses of action. One checks that the data itself isn’t inconsistent with the theory; the second examines whether the formula overfit the noisy data; and the third checks whether the candidate formula has a sensible dependence on each of the variables (for example, the masses and distances of the planets in the Solar System).

By looking at all three error measures, AI Descartes picked the least offensive version of Kepler’s law. All three candidate formulas did reasonably well on the first and second tests, but the third revealed that none of them had a theory-approved dependence on mass, and only one had the appropriate dependence on the distance between the planets and the Sun. So, the AI concluded that the distance-dependent formula is a good approximation for the range of masses of bodies in the Solar System.

To do better, the team turned to a dataset that included pairs of stars orbiting each other. Then, the AI learned the proper dependence on mass and fully re-discovered Kepler’s law.

If the program fails to find a formula that at least partially fits both data and theory, it can recommend follow-up experiments to produce additional data that would help it distinguish between candidate formulas.

Page:

Leveraging existing knowledge

Imperfect matches

Recommend

Dear junior designer, don’t rush to be senior.

Wealthiest People in Hong Kong (August 08, 2023)

C# 中关于 T 泛型【C# 基础】 - 橙子家

荣泰健康：上半年净利润同比增长21.48%

String Permutation - GeeksforGeeks | Videos

工业富联：今年开始为客户开发并量产英伟达的 H100 及 H800 等高性能 AI 服务器

收钱吧与火山引擎VeDI合作一年后，有了哪些新变化？-品玩

Disney explores cutting costs through AI use

《深入理解Java虚拟机》笔记：垃圾收集算法和HotSpot的算法实现 - 知北游z

前AMD技术营销经理Robert Hallock投奔英特尔，将担任技术营销高级总监

About Joyk