DBToaster breaks up data jams in server farms

Sep 21, 2012 by Lionel Pousaz
DBToaster breaks up data jams in server farms
In gigantic server farms around the world, billions of database entries are queried every second. EPFL researchers have developed a system that drastically improves the circulation of this flow of information. The economic and environmental benefits are considerable. Credit: EPFL

Databases have revolutionized the business world. Every bottle of shampoo you buy, every purchase you make, is just one more data point sent out to your bank's and your supermarket's servers. This enormous quantity of detailed information allows merchants to optimize their inventories and displays and bankers to optimize the flow of money. Gigantic farms of servers are deployed in an effort to keep up with this breakneck pace of information storage and transfer. Researchers in EPFL's DATA Laboratory have developed DBToaster, a system that speeds up the pace of operations by a factor of 100 – 10,000. The latest version has just been made available on www.dbtoaster.org.

"Ten years ago, set up one of the world's largest databases," explains EPFL professor Christoph Koch, DBToaster's creator. "Today, your average supermarket has a bigger system." This inflation has escalated dramatically, to the point that optimizing databases has become an environmental issue. In the U.S., electricity use by is growing exponentially, currently representing 2% of total electricity consumption.

Avoiding data jams by accelerating the flow of data

In a classic database, data are handled in a series of successive packets. For example, say a bank wants a list of all its clients who live in Zurich who have a balance of at least 5,000 francs. The user queries the database by selecting certain criteria. This request is translated into a series of . Because every banking transaction results in a separate database entry, the amount of information that must be sorted is phenomenal - the first operation has to search through billions of entries. The resulting data set is then sorted by the second operator, and so on, until the list is reduced to the clients desired.

The data are so vast that often the server's RAM is not large enough to temporarily store initial results, causing a data jam. The server must temporarily store intermediate results on the hard disk before sending them on to the next operator. This slows things down considerably, because accessing the hard disk is 10,000 times slower than accessing RAM. It also requires much more electricity.

The EPFL scientists were able to get their system to compile successive operators as one single operator. This extremely complex operation makes it possible to store huge intermediate results. In doing so, DBToaster is able to efficiently prevent data jams.

Keeping queries in memory so you don't have to reinvent the wheel

DBToaster has a second innovation, as well. The researchers took into account the fact that queries are often repetitive. "In general, the same operator is used many times within brief periods of time," explains Koch. Rather than having to recalculate everything each time, the system keeps the preceding result in memory and merges it with new entries. "The big innovation with DBToaster is its ability to generate efficient code that manages to figure out how previous queries should be changed in order to be updated." In this way, only recently entered data has to be queried, rather than billions of entries.

DBToaster is available online for no charge. Financial institutions, in particular, are enthusiastic about the system. According to Koch, DBToaster "enables analytical processing in real time, which financial institutions need to perform automated trading or to enforce regulatory compliance – for instance to detect patterns of money laundering in their streams of financial transactions." But the benefits go farther than this. As data processing consumes escalating amounts of power, DBToaster is a solution that can be easily deployed on existing servers to reduce their and mitigate their impact on the environment.

Explore further: Computer scientists win a major grant to network mobile devices in the cloud

add to favorites email to friend print save as pdf

Related Stories

A way to reduce the Internet's energy drain

May 28, 2012

(Phys.org) -- Swiss researchers at EPFL have developed a device intended for monitoring and saving the energy consumed by large data centers. It was developed in collaboration with Credit Suisse, which has ...

Recommended for you

Computerized emotion detector

8 hours ago

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

Cutting the cloud computing carbon cost

Sep 12, 2014

Cloud computing involves displacing data storage and processing from the user's computer on to remote servers. It can provide users with more storage space and computing power that they can then access from anywhere in the ...

Teaching computers the nuances of human conversation

Sep 12, 2014

Computer scientists have successfully developed programs to recognize spoken language, as in automated phone systems that respond to voice prompts and voice-activated assistants like Apple's Siri.

Mapping the connections between diverse sets of data

Sep 12, 2014

What is a map? Most often, it's a visual tool used to demonstrate the relationship between multiple places in geographic space. They're useful because you can look at one and very quickly pick up on the general ...

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

baudrunner
not rated yet Sep 21, 2012
I'll take a better look when the plug-in for NetBeans is completely bug-free.