Insights about phase transitions in matter — like when ice melts to become liquid water — may apply to sprawling datasets, too. (image: Taylor Cole/ Unsplash)

In the field of computational complexity, a problem is deemed “easy” if it can be solved by an algorithm in a reasonable number of steps. A problem is “hard” if its solution requires an astronomical number of steps that grow exponentially with its size — often because it requires a brute-force search. These definitions are particularly apt in the era of big data, where problems related to finding patterns in huge amounts of noisy data may be easy, hard, or impossible.

Many of these problems are believed to undergo a kind of transition, from easy to hard, analogous to phase transitions in matter, like when a solid — an orderly arrangement of particles — melts to become liquid — which is more disorganized. It’s a powerful comparison, says SFI Professor Cris Moore, who works at the crossroads of physics, math, and computer science. 

“The noise in a dataset is heat, in this analogy,” he says. “In some sense, what an algorithm is trying to do, when it’s looking for a signal, is a lot like crystallization.” It’s looking for clear patterns against a messy backdrop. “When the data are too hot and noisy, the algorithm can’t settle down or find the true pattern.” 

By using knowledge of phase transitions in physical systems, researchers can gain new insights into more efficient ways to answer questions about patterns and structures in sprawling datasets. Moore recently organized a working group, held July 17–21 at SFI, that brought together experts from computer science, physics, and mathematics to explore connections between theoretical computer science and spin-glass theory, which is a framework for understanding phase transitions in complex materials. 

Interest in the overlap between the two areas has grown in the last two decades, says Moore, as researchers have identified two kinds of phase transition in data. “If there’s only a small amount of noise, the signal comes through easily, and fast algorithms can find it,” he says. “But as the noise increases, the problem jumps from easy to hard, like window glass never finding its crystalline state. And at an even higher level of noise, the problem becomes impossible — the signal gets lost in the noise.”

Making those connections, says Moore, is at the heart of the working group. “This meeting is about an ongoing effort to build the bridge between the physics — which is very convincing, but not mathematically rigorous — and rigorous techniques in mathematics and
statistics,” he says.  

Read more about the working group "Connecting Physics, Geometry, and Algebraic Hardness"

Support from the National Science Foundation Grant Award 1838251 BIGDATA: F: Collaborative Research: Mining for Patterns in Graphs and High-Dimensional Data: Achieving the Limits.