Researchers craft program to stop cloud computer problems before they start

Sep 10, 2012

(Phys.org)—Researchers from North Carolina State University have developed a new software tool to prevent performance disruptions in cloud computing systems by automatically identifying and responding to potential anomalies before they can develop into problems.

Cloud computing enables users to create multiple "virtual machines" that operate independently, even though they are all operating on one large . However, this approach can cause performance issues when a , or other problem, in one virtual machine disrupts the entire cloud.

Now researchers have designed software that looks at the amount of memory being used, , CPU usage and other system-level data in a cloud computing infrastructure to develop a definition of the wide range of behaviors that can be considered "normal." CPU usage is the amount of computing power being used at any given time. The program defines normal behavior for every virtual machine in the cloud, and can then look for deviations and predict that could affect the system's ability to provide service to users.

One advantage of this approach is that it does not require users to provide so-called "training data" about what constitutes abnormal behavior, which is important because training data are often difficult to obtain in production . Moreover, this approach is also able to predict anomalies that have never been seen before.

If the program spots a that is deviating from its normal behavior, it runs a "black box" diagnostic that can determine which metrics – such as CPU usage – may be affected, without exposing user data. This metric data can then be used to trigger the appropriate prevention system, which will address the deviation and prevent it from becoming a problem.

"If we can identify the initial deviation and launch an automatic response, we can not only prevent a major disturbance, but actually prevent the user from even experiencing any change in system performance," says Dr. Helen Gu, an assistant professor of computer science at NC State and co-author of a paper describing the research. "Also, it's important to note that this program does not access any user's individual information. We're looking only at system-level behavior."

The program is also lightweight, meaning it does not use much of the cloud's to operate. It is able to collect the initial data and define normal behavior much faster than existing approaches. Once it is up and running, it uses less than 1 percent of the CPU load and 16 megabytes of memory.

In benchmark testing, the program identified up to 98 percent of anomalies, which is much higher than the rate found in existing approaches. "It also had a 1.7 percent rate of false positives, meaning it triggered very few false alarms," Gu says. "And because the false alarms resulted in automatic responses, which are easily reversible, the cost of the false alarms is negligible."

Gu says her team's next step is to incorporate more detailed "white box" diagnostic tools into the software, so they can identify the software bugs causing any anomalies and correct them.

Explore further: Ant colonies help evacuees in disaster zones

More information: "UBL: Unsupervised Behavior Learning for Predicting Performance Anomalies in Virtualized Cloud Systems" Daniel J. Dean, Hiep Nguyen and Xiaohui Gu, Presented: Sept. 20 at the 9th Annual ACM International Conference on Autonomic Computing in San Jose, Calif.

Abstract
Infrastructure-as-a-Service (IaaS) clouds are prone to performance anomalies due to their complex nature. Although previous work has shown the effectiveness of using statistical learning to detect performance anomalies, existing schemes often assume labeled training data, which requires significant human effort and can only handle previously known anomalies. We present an Unsupervised Behavior Learning (UBL) system for IaaS cloud computing infrastructures. UBL leverages Self-Organizing Maps to capture emergent system behaviors and predict unknown anomalies. For scalability, UBL uses residual resources in the cloud infrastructure for behavior learning and anomaly prediction with little add-on cost. We have implemented a prototype of the UBL system on top of the Xen platform and conducted extensive experiments using a range of distributed systems. Our results show that UBL can predict performance anomalies with high accuracy and achieve sufficient lead time for automatic anomaly prevention. UBL supports large-scale infrastructure-wide behavior learning with negligible overhead.

Related Stories

A nimbus rises in the world of cloud computing

May 08, 2009

Cloud computing is a hot topic in the technology world these days. Even if you're not a tech-phile, chances are if you've watched a lot of television or skimmed a business magazine, you've heard someone talking ...

Recommended for you

Quantenna promises 10-gigabit Wi-Fi by next year

2 hours ago

(Phys.org) —Quantenna Communications has announced that it has plans for releasing a chipset that will be capable of delivering 10Gbps WiFi to/from routers, bridges and computers by sometime next year. ...

New US-Spanish firm says targets rich mobile ad market

2 hours ago

Spanish telecoms firm Telefonica and US investment giant Blackstone launched a mobile telephone advertising venture on Wednesday, challenging internet giants such as Google and Facebook in a multi-billion-dollar ...

Environmentally compatible organic solar cells

3 hours ago

Environmentally compatible production methods for organic solar cells from novel materials are in the focus of "MatHero". The new project coordinated by Karlsruhe Institute of Technology (KIT) aims at making ...

Twitter rules out Turkey office amid tax row

3 hours ago

Social networking company Twitter on Wednesday rejected demands from the Turkish government to open an office there, following accusations of tax evasion and a two-week ban on the service.

User comments : 0

More news stories

Quantenna promises 10-gigabit Wi-Fi by next year

(Phys.org) —Quantenna Communications has announced that it has plans for releasing a chipset that will be capable of delivering 10Gbps WiFi to/from routers, bridges and computers by sometime next year. ...

Unlocking secrets of new solar material

(Phys.org) —A new solar material that has the same crystal structure as a mineral first found in the Ural Mountains in 1839 is shooting up the efficiency charts faster than almost anything researchers have ...

Floating nuclear plants could ride out tsunamis

When an earthquake and tsunami struck the Fukushima Daiichi nuclear plant complex in 2011, neither the quake nor the inundation caused the ensuing contamination. Rather, it was the aftereffects—specifically, ...

New US-Spanish firm says targets rich mobile ad market

Spanish telecoms firm Telefonica and US investment giant Blackstone launched a mobile telephone advertising venture on Wednesday, challenging internet giants such as Google and Facebook in a multi-billion-dollar ...