Novel high-performance hybrid system for semantic factoring of graph databases

Sep 20, 2011
Massive amounts of data can be analyzed using a novel hybrid system for semantic factoring in graph databases. Each line represents a frequent relationship type between entities in the dataset.

Imagine trying to analyze all of the English entries in Wikipedia. Now imagine you've got 20 times as much information. That's the challenge scientists face when working with gigabyte data sets. Scientists at Pacific Northwest National Laboratory, Sandia National Laboratories and Cray, Inc. developed an application to take on such massive data analysis challenges. Their novel high-performance computing application uses semantic factoring to organize data, bringing out hidden connections and threads.

The team then used their applications to analyze the massive datasets for the Billion Triple Challenge, an international competition focused on demonstrating capability and innovation for dealing with very large semantic graph databases, known as SGDs.

Why it matters? Science. Security. In both areas, people must turn massive data sets into knowledge that can be used to save lives.

As SGD technology grows to address components from extremely large data stores, it is becoming increasingly important to be able to use high-performance for analysis, interpretation, and visualization, especially as it pertains to the innate structure. However, the ability to understand the semantic structure of a vast SGD still needs both a coherent methodology and the platform to exercise the necessary methods.

The team took advantage of the Cray XMT architecture, which allowed all 624 gigabytes of input data to be held in RAM. They were then able to scalably perform a variety of novel tasks for descriptive analysis of the inherent semantics in the dataset provided by the Billion Triple Challenge, including identifying the ontological structure, the sensitivity of connectivity within the relationships, and the interaction among different contributions to the dataset.

The semantic database system research team is developing a prototype that can be adapted to a variety of application domains and datasets, including working with the bio2rdf.org and future billion-triple-challenge datasets in prototype testing and evaluation.

Explore further: Computer scientist publishes new algorithm cluster to data mine health records

More information: Joslyn C, R Adolf, S al-Saffar, J Feo, E Goodman, D Haglin, G Mackey, and D Mizell. 2010. "High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases." Semantic Web Challenge Billion Triple Challenge 2010. cass-mt.pnl.gov/btc2010/pnnl_btc.pdf

add to favorites email to friend print save as pdf

Related Stories

Customizing supercomputers from the ground up

May 27, 2010

(PhysOrg.com) -- Computer scientist Adolfy Hoisie has joined the Department of Energy's Pacific Northwest National Laboratory to lead PNNL's high performance computing activities. In one such activity, Hoisie will direct ...

Enter the semantic grid

Jan 31, 2006

Working under the IST programme-funded OntoGrid project, they are catalysing the evolution of the Grid from a distributed network of computers in which the meaning of information is implicit and hidden into ...

Recommended for you

The brain as a model for future supercomputers

May 14, 2013

(Phys.org) —The brain's repute took a big hit in 1997 when an IBM supercomputer defeated world chess champion Gary Kasparov in a match reported around the world. But in the second round, the brain is back.

User comments : 0

More news stories

Morocco to harness the wind in energy hunt

Morocco is ploughing ahead with a programme to boost wind energy production, particularly in the southern Tarfaya region, where Africa's largest wind farm is set to open in 2014.

US psychiatry gets makeover in new manual

The latest makeover to a massive psychiatric tome honored by some, reviled by others and even called the "Bible" of mental disorders is being released Saturday with a host of new changes.

New case of SARS-like virus in Saudi: ministry

A new case of the deadly coronavirus has been detected in Saudi Arabia where 15 people have already died after contracting it, the health ministry announced on Saturday on its Internet website.

Galaxy's Ring of Fire

Johnny Cash may have preferred this galaxy's burning ring of fire to the one he sang about falling into in his popular song. The "starburst ring" seen at center in red and yellow hues is not the product of ...