March 12, 2013

Cloud computing: For database-driven applications, new software could reduce hardware requirements by 95 percent

by Larry Hardesty, Massachusetts Institute of Technology

Making cloud computing more efficient — Credit: CHRISTINE DANILOFF/MIT

For many companies, moving their web-application servers to the cloud is an attractive option, since cloud-computing services can offer economies of scale, extensive technical support and easy accommodation of demand fluctuations.

But for applications that depend heavily on database queries, cloud hosting can pose as many problems as it solves. Cloud services often partition their servers into "virtual machines," each of which gets so many operations per second on a server's central processing unit, so much space in memory, and the like. That makes cloud servers easier to manage, but for database-intensive applications, it can result in the allocation of about 20 times as much hardware as should be necessary. And the cost of that overprovisioning gets passed on to customers.

MIT researchers are developing a new system called DBSeer that should help solve this problem and others, such as the pricing of cloud services and the diagnosis of application slowdowns. At the recent Biennial Conference on Innovative Data Systems Research, the researchers laid out their vision for DBSeer. And in June, at the annual meeting of the Association for Computing Machinery's Special Interest Group on Management of Data (SIGMOD), they will unveil the algorithms at the heart of DBSeer, which use machine-learning techniques to build accurate models of performance and resource demands of database-driven applications.

DBSeer's advantages aren't restricted to cloud computing, either. Teradata, a major database company, has already assigned several of its engineers the task of importing the MIT researchers' new algorithm—which has been released under an open-source license—into its own software.

Virtual limitations

Barzan Mozafari, a postdoc in the lab of professor of electrical engineering and computer science Samuel Madden and lead author on both new papers, explains that, with virtual machines, server resources must be allocated according to an application's peak demand. "You're not going to hit your peak load all the time," Mozafari says. "So that means that these resources are going to be underutilized most of the time."

Moreover, Mozafari says, virtual machines are, by design, isolated from each other: They can't share resources, even when they're running on the same physical server. With databases, that can mean wasteful duplication of a great deal of data.

And even the provisioning for peak demand is largely guesswork. "It's very counterintuitive," Mozafari says, "but you might take on certain types of extra load that might help your overall performance." Increased demand means that a database server will store more of its frequently used data in its high-speed memory, which can help it process requests more quickly.

On the other hand, a slight increase in demand could cause the system to slow down precipitously—if, for instance, too many requests require modification of the same pieces of data, which need to be updated on multiple servers. "It's extremely nonlinear," Mozafari says.

Mozafari, Madden, postdoc Alekh Jindal, and Carlo Curino, a former member of Madden's group who's now at Microsoft, use two different techniques in the SIGMOD paper to predict how a database-driven application will respond to increased load. Mozafari describes the first as a "black box" approach: DBSeer simply monitors fluctuations in both the number and type of user requests and system performance and uses machine-learning techniques to correlate the two. This approach is good at predicting the consequences of fluctuations that don't fall too far outside the range of the training data.

Gray areas

Often, however, database managers—or prospective cloud-computing customers—will be interested in the consequences of a fourfold, tenfold, or even hundredfold increase in demand. For those types of predictions, Mozafari explains, DBSeer uses a "gray box" model, which takes into account the idiosyncrasies of particular database systems.

For instance, Mozafari explains, updating data stored on a hard drive is time-consuming, so most database servers will try to postpone that operation as long as they can, instead storing data modifications in the much faster—but volatile—main memory. At some point, however, the server has to commit its pending modifications to disk, and the criteria for making that decision can vary from one database system to another.

The version of DBSeer presented at SIGMOD includes a gray-box model of MySQL, one of the most widely used database systems. The researchers are currently building a new model for another popular system, PostgreSQL. Although adapting the model isn't a negligible undertaking, models tailored to just a handful of systems would cover the large majority of database-driven Web applications.

The researchers tested their prediction algorithm against both a set of benchmark data, called TPC-C, that's commonly used in database research and against real-world data on modifications to the Wikipedia database. On average, the model was about 80 percent accurate in predicting CPU use and 99 percent accurate in predicting the bandwidth consumed by disk operations.

"We're really fascinated and thrilled that someone is doing this work," says Doug Brown, a database software architect at Teradata. "We've already taken the code and are prototyping right now." Initially, Brown says, Teradata will use the MIT researchers' prediction algorithm to determine customers' resource requirements. "The really big question for our customers is, 'How are we going to scale?'" Brown says.

Brown hopes, however, that the algorithm will ultimately help allocate server resources on the fly, as database requests come in. If servers can assess the demands imposed by individual requests and budget accordingly, they can ensure that transaction times stay within the bounds set by customers' service agreements. For instance, "if you have two big, big resource consumers, you can calculate ahead of time that we're only going to run two of these in parallel," Brown says. "There's all kinds of games you can play in workload management."

More information: Paper (PDF): Performance and Resource Modeling in Highly-Concurrent OLTP Workloads

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Cloud computing: For database-driven applications, new software could reduce hardware requirements by 95 percent (2013, March 12) retrieved 23 April 2024 from https://phys.org/news/2013-03-cloud-database-driven-applications-software-hardware.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

IBM Offers Paid Support for No-Cost Data Server

0 shares

Feedback to editors

Cloud computing: For database-driven applications, new software could reduce hardware requirements by 95 percent

Virtual limitations

Gray areas

CMS Collaboration observes new all-heavy quark structures

The big quantum chill: Scientists modify common lab refrigerator to cool faster with less energy

Synthesizing highly efficient carbohelicenes for circularly polarized luminescence emitters

Plastic food packaging can contain harmful chemicals that affect hormones and metabolism, researchers find

Estimating emissions potential of decommissioned gas wells from shale samples

Study sheds new light on cross-species virus spillovers that can cause pandemics

Advances in understanding the evolution of stomach loss in agastric fishes

New toolkit makes molecular dynamics simulations more accessible

New small molecule helps scientists study regeneration

Breaking boundaries in tiny labs: New technology using sound waves has implications for nanoparticle manipulation

Relevant PhysicsForums posts

Flipped RGB colours in a TV

Fixing Linux kernel not found

Is an invisible LED mouse more accurate than one with a red LED?

AI In Actual Use

Does anyone make zero-flicker computer monitors?

Artificial Intelligence in Video

IBM Offers Paid Support for No-Cost Data Server

Fujitsu develops prototype of world's first server that simultaneously delivers high performance, flexibility

Database Competitors IBM, MySQL Join Hands

Making Web applications more efficient

CA Offers New Database Performance Analysis Tool

Fujitsu Develops Technology Employing 10 Gbps Virtual Switch to Substitute for On-Server Virtual Switch Functions

Google's challenge to game consoles to kick off in November

Technology streamlines computational science projects

New video game teaches teens about electricity

Travis the translator aims to make people understood

Windows 10 update set for October release

De-jargonizing program helps decode science speak

Medical Xpress

Tech Xplore

Science X

Cloud computing: For database-driven applications, new software could reduce hardware requirements by 95 percent

Virtual limitations

Gray areas

CMS Collaboration observes new all-heavy quark structures

The big quantum chill: Scientists modify common lab refrigerator to cool faster with less energy

Synthesizing highly efficient carbohelicenes for circularly polarized luminescence emitters

Plastic food packaging can contain harmful chemicals that affect hormones and metabolism, researchers find

Estimating emissions potential of decommissioned gas wells from shale samples

Study sheds new light on cross-species virus spillovers that can cause pandemics

Advances in understanding the evolution of stomach loss in agastric fishes

New toolkit makes molecular dynamics simulations more accessible

New small molecule helps scientists study regeneration

Breaking boundaries in tiny labs: New technology using sound waves has implications for nanoparticle manipulation

Relevant PhysicsForums posts

Related Stories

IBM Offers Paid Support for No-Cost Data Server

Fujitsu develops prototype of world's first server that simultaneously delivers high performance, flexibility

Database Competitors IBM, MySQL Join Hands

Making Web applications more efficient

CA Offers New Database Performance Analysis Tool

Fujitsu Develops Technology Employing 10 Gbps Virtual Switch to Substitute for On-Server Virtual Switch Functions

Recommended for you

Google's challenge to game consoles to kick off in November

Technology streamlines computational science projects

New video game teaches teens about electricity

Travis the translator aims to make people understood

Windows 10 update set for October release

De-jargonizing program helps decode science speak

Newsletter sign up

Donate and enjoy an ad-free experience