New data-mining strategy that offers unprecedented pattern search speed could glean new insights from massive datasets

December 28, 2016, King Abdullah University of Science and Technology
A new data mining strategy that offers unprecedented pattern search speed could lead to new insights from massive data. Credit: © Mopic / Alamy Stock Photo DTFTEM

Searching for recurring patterns in network systems has become a fundamental part of research and discovery in fields as diverse as biology and social media. KAUST researchers have developed a pattern or graph-mining framework that promises to significantly speed up searches on massive network data sets.

"A graph is a data structure that models complex relationships among objects," explained Panagiotis Kalnis, leader of the research team from the KAUST Extreme Computing Research Center. "Graphs are widely used in many modern applications, including social networks, biological networks like protein-to-protein interactions, and communication networks like the internet."

In these applications, one of the most important operations is the process of finding recurring graphs that reveal how objects tend to connect to each other. The process, which is called frequent subgraph mining (FSM), is an essential building block of many knowledge extraction techniques in social studies, bioinformatics and image processing, as well as in security and fraud detection. However, graphs may contain hundreds of millions of objects and billions of relationships, which means that extracting recurring patterns places huge demands on time and computing resources.

"In essence, if we can provide a better algorithm, all the applications that depend on FSM will be able to perform deeper analysis on larger data in less time," Kalnis noted.

Kalnis and his colleagues developed a system called ScaleMine that offers a ten-fold acceleration compared with existing methods.

"FSM involves a vast number of graph operations, each of which is computationally expensive, so the only practical way to support FSM in large graphs is by massively parallel computation," he said.

In parallel computing, the is divided into and each is run simultaneously on its own processor. If the tasks are too large, the entire search is held up by waiting for the slowest task to complete; if the tasks are too small, the extra communication needed to coordinate the parallelization becomes a significant additional computational load.

Kalnis' team overcame this limitation by performing the search in two steps: a first approximation step to determine the search space and the optimal division of tasks and a second computational step in which large tasks are split dynamically into the optimal number of subtasks. This resulted in search speeds up to ten times faster than previously possible.

"Hopefully this performance improvement will enable deeper and more accurate analysis of large graph data and the extraction of new knowledge," Kalnis said.

Explore further: Graphics processors accelerate pattern discovery

Related Stories

Graphics processors accelerate pattern discovery

August 26, 2015

Repeating patterns in complex biological networks can now be found hundreds of times faster using an algorithm that exploits the parallel computing capacity of modern graphics adapters. The A*STAR-led breakthrough opens the ...

Quick drawing of complex relationships

October 19, 2015

Quality criteria for a readable graphic representation of complex relationships are high. For example, the node points have to be located at sufficient distances in order to be identifiable. At the same time, the graph drawing ...

Worldwide quantum web may be possible with help from graphs

June 8, 2016

(Phys.org)—One of the most ambitious endeavors in quantum physics right now is to build a large-scale quantum network that could one day span the entire globe. In a new study, physicists have shown that describing quantum ...

Recommended for you

Cryptocurrency rivals snap at Bitcoin's heels

January 14, 2018

Bitcoin may be the most famous cryptocurrency but, despite a dizzying rise, it's not the most lucrative one and far from alone in a universe that counts 1,400 rivals, and counting.

Top takeaways from Consumers Electronics Show

January 13, 2018

The 2018 Consumer Electronics Show, which concluded Friday in Las Vegas, drew some 4,000 exhibitors from dozens of countries and more than 170,000 attendees, showcased some of the latest from the technology world.

Finnish firm detects new Intel security flaw

January 12, 2018

A new security flaw has been found in Intel hardware which could enable hackers to access corporate laptops remotely, Finnish cybersecurity specialist F-Secure said on Friday.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.