December 7, 2009 weblog
Eureqa, the robot scientist (w/ Video)
When the program first appeared in April this year, it was fed information on a double pendulum and in just a few hours it inferred Newton's second law of motion and the law of conservation of momentum from the data. Given other data, it could find laws that have so far eluded scientists.
Eureqa is a successor to robots that work out how to repair themselves, which were developed at the Computational Synthesis Lab at Cornell University by Dr Hod Lipson. The same algorithms that were used in the robots have been adapted for the analysis of any kind of data. These algorithms may help scientists find complicated equations and laws.
The program begins by examining the data for numbers that appear to be connected, and then suggests equations that fit the connections. Of the proposed equations most fail, but some are less wrong than others, and these are selected and modified and then repeatedly re-tested against the data and tweaked until a workable equation is identified.
In some cases there is not enough data to enable Eureqa to find equations, but in these cases the latest version of the program may identify the gaps in the data and even recommend experiments to supply the missing data.
Eureqa was able to calculate in hours equations that Newton took years to find, and Lipson hopes it can do the same for data such as the interactions between proteins, genomes and cell signals, which are so complicated that describing the interactions mathematically has so far been impossible. While Lipson envisaged the program as having application mainly in biological fields, it will analyze any data that can be presented in a spreadsheet.
Dr John Wikswo of Vanderbilt University, who is using Eureqa to study the effects of cocaine on white blood cells, said that biology is far too complicated for humans to fully understand, but the Eureqa project may find solutions. Teamed with other gadgets developed by Lipson, Eureqa can adjust valves controlling the nutrients and toxins being fed to cells, and make changes faster than any human. Dr Wikswo said the program not only derives the equations, but also the experiments needed to come up with the equations.
Dr Wikswo explained that scientists usually work by keeping everything constant except one variable, but that works best for linear systems and not so well for biological systems, which are more complex, and which can only be understood fully by changing many variables. Understanding which variables to change and what the results mean can be incredibly complicated, but Eureqa should be able to help.
Eureqa was released in response to an overwhelming number of requests from scientists asking Lipson to analyze their data for them. The program is available for free download now, but is still being refined by Lipson and his colleague Michael Schmidt. One of the problems is its tendency to return suitable equations but with variables that are not understood. The equations work and make accurate predictions, and must be true, but no one can understand how they work. Lipson likens the situation to trying to explain the laws of energy conservation to mathematicians from medieval times, who did not have the vocabulary needed to understand the mathematics.
One example of this is the use of Eureqa by University of Texas Southwestern's Dr Gurol Suel to analyze data on cell division and growth. Eureqa developed equations, and although Dr Suel is not sure what they mean, he said the results are still useful, and can be used as a starting point for further work, and can help in the development of new hypotheses about the cells.
The next step is to devise algorithms to explain what Eureka is finding, possibly by relating the unknown concepts to those with which we are familiar. Meanwhile, the program is freely available for download at Cornell University's website.