Novel high-performance hybrid system for semantic factoring of graph databases

Sep 20, 2011
Massive amounts of data can be analyzed using a novel hybrid system for semantic factoring in graph databases. Each line represents a frequent relationship type between entities in the dataset.

Imagine trying to analyze all of the English entries in Wikipedia. Now imagine you've got 20 times as much information. That's the challenge scientists face when working with gigabyte data sets. Scientists at Pacific Northwest National Laboratory, Sandia National Laboratories and Cray, Inc. developed an application to take on such massive data analysis challenges. Their novel high-performance computing application uses semantic factoring to organize data, bringing out hidden connections and threads.

The team then used their applications to analyze the massive datasets for the Billion Triple Challenge, an international competition focused on demonstrating capability and innovation for dealing with very large semantic graph databases, known as SGDs.

Why it matters? Science. Security. In both areas, people must turn massive data sets into knowledge that can be used to save lives.

As SGD technology grows to address components from extremely large data stores, it is becoming increasingly important to be able to use high-performance for analysis, interpretation, and visualization, especially as it pertains to the innate structure. However, the ability to understand the semantic structure of a vast SGD still needs both a coherent methodology and the platform to exercise the necessary methods.

The team took advantage of the Cray XMT architecture, which allowed all 624 gigabytes of input data to be held in RAM. They were then able to scalably perform a variety of novel tasks for descriptive analysis of the inherent semantics in the dataset provided by the Billion Triple Challenge, including identifying the ontological structure, the sensitivity of connectivity within the relationships, and the interaction among different contributions to the dataset.

The semantic database system research team is developing a prototype that can be adapted to a variety of application domains and datasets, including working with the bio2rdf.org and future billion-triple-challenge datasets in prototype testing and evaluation.

Explore further: Forging a photo is easy, but how do you spot a fake?

More information: Joslyn C, R Adolf, S al-Saffar, J Feo, E Goodman, D Haglin, G Mackey, and D Mizell. 2010. "High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases." Semantic Web Challenge Billion Triple Challenge 2010. cass-mt.pnl.gov/btc2010/pnnl_btc.pdf

add to favorites email to friend print save as pdf

Related Stories

Customizing supercomputers from the ground up

May 27, 2010

(PhysOrg.com) -- Computer scientist Adolfy Hoisie has joined the Department of Energy's Pacific Northwest National Laboratory to lead PNNL's high performance computing activities. In one such activity, Hoisie will direct ...

Enter the semantic grid

Jan 31, 2006

Working under the IST programme-funded OntoGrid project, they are catalysing the evolution of the Grid from a distributed network of computers in which the meaning of information is implicit and hidden into ...

Recommended for you

Forging a photo is easy, but how do you spot a fake?

Nov 21, 2014

Faking photographs is not a new phenomenon. The Cottingley Fairies seemed convincing to some in 1917, just as the images recently broadcast on Russian television, purporting to be satellite images showin ...

Algorithm, not live committee, performs author ranking

Nov 21, 2014

Thousands of authors' works enter the public domain each year, but only a small number of them end up being widely available. So how to choose the ones taking center-stage? And how well can a machine-learning ...

Professor proposes alternative to 'Turing Test'

Nov 19, 2014

(Phys.org) —A Georgia Tech professor is offering an alternative to the celebrated "Turing Test" to determine whether a machine or computer program exhibits human-level intelligence. The Turing Test - originally ...

Image descriptions from computers show gains

Nov 18, 2014

"Man in black shirt is playing guitar." "Man in blue wetsuit is surfing on wave." "Black and white dog jumps over bar." The picture captions were not written by humans but through software capable of accurately ...

Converting data into knowledge

Nov 17, 2014

When a movie-streaming service recommends a new film you might like, sometimes that recommendation becomes a new favorite; other times, the computer's suggestion really misses the mark. Yisong Yue, assistant ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.