September 13, 2016

New programming language delivers fourfold speedups on problems common in the age of big data

by Larry Hardesty, Massachusetts Institute of Technology

In today's computer chips, memory management is based on what computer scientists call the principle of locality: If a program needs a chunk of data stored at some memory location, it probably needs the neighboring chunks as well.

But that assumption breaks down in the age of big data, now that computer programs more frequently act on just a few data items scattered arbitrarily across huge data sets. Since fetching data from their main memory banks is the major performance bottleneck in today's chips, having to fetch it more frequently can dramatically slow program execution.

This week, at the International Conference on Parallel Architectures and Compilation Techniques, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new programming language, called Milk, that lets application developers manage memory more efficiently in programs that deal with scattered data points in large data sets.

In tests on several common algorithms, programs written in the new language were four times as fast as those written in existing languages. But the researchers believe that further work will yield even larger gains.

The reason that today's big data sets pose problems for existing memory management techniques, explains Saman Amarasinghe, a professor of electrical engineering and computer science, is not so much that they are large as that they are what computer scientists call "sparse." That is, with big data, the scale of the solution does not necessarily increase proportionally with the scale of the problem.

"In social settings, we used to look at smaller problems," Amarasinghe says. "If you look at the people in this [CSAIL] building, we're all connected. But if you look at the planet scale, I don't scale my number of friends. The planet has billions of people, but I still have only hundreds of friends. Suddenly you have a very sparse problem."

Similarly, Amarasinghe says, an online bookseller with, say, 1,000 customers might like to provide its visitors with a list of its 20 most popular books. It doesn't follow, however, that an online bookseller with a million customers would want to provide its visitors with a list of its 20,000 most popular books.

Thinking locally

Today's computer chips are not optimized for sparse data—in fact, the reverse is true. Because fetching data from the chip's main memory bank is slow, every core, or processor, in a modern chip has its own "cache," a relatively small, local, high-speed memory bank. Rather than fetching a single data item at a time from main memory, a core will fetch an entire block of data. And that block is selected according to the principle of locality.

It's easy to see how the principle of locality works with, say, image processing. If the purpose of a program is to apply a visual filter to an image, and it works on one block of the image at a time, then when a core requests a block, it should receive all the adjacent blocks its cache can hold, so that it can grind away on block after block without fetching any more data.

But that approach doesn't work if the algorithm is interested in only 20 books out of the 2 million in an online retailer's database. If it requests the data associated with one book, it's likely that the data associated with the 100 adjacent books will be irrelevant.

Going to main memory for a single data item at a time is woefully inefficient. "It's as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton, and put it back in the fridge," says Vladimir Kiriansky, a PhD student in electrical engineering and computer science and first author on the new paper. He's joined by Amarasinghe and Yunming Zhang, also a PhD student in electrical engineering and computer science.

Batch processing

Milk simply adds a few commands to OpenMP, an extension of languages such as C and Fortran that makes it easier to write code for multicore processors. With Milk, a programmer inserts a couple additional lines of code around any instruction that iterates through a large data collection looking for a comparatively small number of items. Milk's compiler—the program that converts high-level code into low-level instructions—then figures out how to manage memory accordingly.

With a Milk program, when a core discovers that it needs a piece of data, it doesn't request it—and a cacheful of adjacent data—from main memory. Instead, it adds the data item's address to a list of locally stored addresses. When the list is long enough, all the chip's cores pool their lists, group together those addresses that are near each other, and redistribute them to the cores. That way, each core requests only data items that it knows it needs and that can be retrieved efficiently.

That's the high-level description, but the details get more complicated. In fact, most modern computer chips have several different levels of caches, each one larger but also slightly less efficient than the last. The Milk compiler has to keep track of not only a list of memory addresses but also the data stored at those addresses, and it regularly shuffles both around between cache levels. It also has to decide which addresses should be retained because they might be accessed again, and which to discard. Improving the algorithm that choreographs this intricate data ballet is where the researchers see hope for further performance gains.

"Many important applications today are data-intensive, but unfortunately, the growing gap in performance between memory and CPU means they do not fully utilize current hardware," says Matei Zaharia, an assistant professor of computer science at Stanford University. "Milk helps to address this gap by optimizing memory access in common programming constructs. The work combines detailed knowledge about the design of memory controllers with knowledge about compilers to implement good optimizations for current hardware."

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: New programming language delivers fourfold speedups on problems common in the age of big data (2016, September 13) retrieved 5 May 2024 from https://phys.org/news/2016-09-language-fourfold-speedups-problems-common.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A more efficient way to write data into non-volatile memory devices improves their performance

158 shares

Feedback to editors

Nanotech opens door to future of insulin medication

13 hours ago

How evolving landscapes impacted First Peoples' early migration patterns into Australia

17 hours ago

Saturday Citations: Parrots on the internet; a map of human wakefulness; the most useless rare-earth element

17 hours ago

When injecting pure spin into chiral materials, direction matters

22 hours ago

New quantum sensing scheme could lead to enhanced high-precision nanoscopic techniques

22 hours ago

Boeing's Starliner finally ready for first crewed mission

22 hours ago

Hungry, hungry white dwarfs: Solving the puzzle of stellar metal pollution

May 3, 2024

How E. coli get the power to cause urinary tract infections

May 3, 2024

Male or female? Scientists discover the genetic mechanism that determines sex development in butterflies

May 3, 2024

New study is first to use statistical physics to corroborate 1940s social balance theory

May 3, 2024

Load comments (0)

New programming language delivers fourfold speedups on problems common in the age of big data

Thinking locally

Batch processing

Nanotech opens door to future of insulin medication

How evolving landscapes impacted First Peoples' early migration patterns into Australia

Saturday Citations: Parrots on the internet; a map of human wakefulness; the most useless rare-earth element

When injecting pure spin into chiral materials, direction matters

New quantum sensing scheme could lead to enhanced high-precision nanoscopic techniques

Boeing's Starliner finally ready for first crewed mission

Hungry, hungry white dwarfs: Solving the puzzle of stellar metal pollution

How E. coli get the power to cause urinary tract infections

Male or female? Scientists discover the genetic mechanism that determines sex development in butterflies

New study is first to use statistical physics to corroborate 1940s social balance theory

Relevant PhysicsForums posts

Parallel processing for loops and pointer defined outside the loop

Passing variables in FORTRAN

User-Defined Functions in Sql Server SSMS

Classifiers, threshold, and ROC curve

My Website For Creating Interactive Visuals Linked To Equations

Number of Multiplications in the FFT Algorithm

A more efficient way to write data into non-volatile memory devices improves their performance

New 'performance cloning' techniques designed to boost computer chip memory systems design

New microchip demonstrates efficiency and scalable design

New chip design makes parallel programs run many times faster and requires one-tenth the code

Cleverer 'cache' management could improve computer chips' performance, reduce energy consumption

Researchers parallelize a common data structure to work with multicore chips

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

New programming language delivers fourfold speedups on problems common in the age of big data

Thinking locally

Batch processing

Nanotech opens door to future of insulin medication

How evolving landscapes impacted First Peoples' early migration patterns into Australia

Saturday Citations: Parrots on the internet; a map of human wakefulness; the most useless rare-earth element

When injecting pure spin into chiral materials, direction matters

New quantum sensing scheme could lead to enhanced high-precision nanoscopic techniques

Boeing's Starliner finally ready for first crewed mission

Hungry, hungry white dwarfs: Solving the puzzle of stellar metal pollution

How E. coli get the power to cause urinary tract infections

Male or female? Scientists discover the genetic mechanism that determines sex development in butterflies

New study is first to use statistical physics to corroborate 1940s social balance theory

Relevant PhysicsForums posts

Related Stories

A more efficient way to write data into non-volatile memory devices improves their performance

New 'performance cloning' techniques designed to boost computer chip memory systems design

New microchip demonstrates efficiency and scalable design

New chip design makes parallel programs run many times faster and requires one-tenth the code

Cleverer 'cache' management could improve computer chips' performance, reduce energy consumption

Researchers parallelize a common data structure to work with multicore chips

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience