# System for performing 'tensor algebra' offers 100-fold speedups over previous software packages

##### October 31, 2017 by Larry Hardesty, Massachusetts Institute of Technology

We live in the age of big data, but most of that data is "sparse." Imagine, for instance, a massive table that mapped all of Amazon's customers against all of its products, with a "1" for each product a given customer bought and a "0" otherwise. The table would be mostly zeroes.

With sparse data, analytic algorithms end up doing a lot of addition and multiplication by zero, which is wasted computation. Programmers get around this by writing custom to avoid zero entries, but that code is complex, and it generally applies only to a narrow range of problems.

At the Association for Computing Machinery's Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), researchers from MIT, the French Alternative Energies and Atomic Energy Commission, and Adobe Research recently presented a new system that automatically produces code optimized for sparse data.

That code offers a 100-fold speedup over existing, non-optimized software packages. And its performance is comparable to that of meticulously hand-optimized code for specific sparse-data operations, while requiring far less work on the programmer's part.

The system is called Taco, for tensor algebra compiler. In computer-science parlance, a data structure like the Amazon table is called a "matrix," and a tensor is just a higher-dimensional analogue of a matrix. If that Amazon table also mapped customers and products against the customers' product ratings on the Amazon site and the words used in their product reviews, the result would be a four-dimensional tensor.

"Sparse representations have been there for more than 60 years," says Saman Amarasinghe, an MIT professor of electrical engineering and computer science (EECS) and senior author on the new paper. "But nobody knew how to generate code for them automatically. People figured out a few very specific operations—sparse matrix-vector multiply, sparse matrix-vector multiply plus a vector, sparse matrix-matrix multiply, sparse matrix-matrix-matrix multiply. The biggest contribution we make is the ability to generate code for any tensor-algebra expression when the matrices are sparse."

Joining Amarasinghe on the paper are first author Fredrik Kjolstad, an MIT graduate student in EECS; Stephen Chou, also a graduate student in EECS; David Lugato of the French Alternative Energies and Atomic Energy Commission; and Shoaib Kamil of Adobe Research.

Custom kernels

In recent years, the mathematical manipulation of tensors—tensor algebra—has become crucial to not only big-data analysis but machine learning, too. And it's been a staple of scientific research since Einstein's time.

Traditionally, to handle tensor algebra, mathematics software has decomposed tensor operations into their constituent parts. So, for instance, if a computation required two tensors to be multiplied and then added to a third, the software would run its standard tensor multiplication routine on the first two tensors, store the result, and then run its standard tensor addition routine.

In the age of big data, however, this approach is too time-consuming. For efficient operation on massive data sets, Kjolstad explains, every sequence of tensor operations requires its own "kernel," or computational template.

"If you do it in one kernel, you can do it all at once, and you can make it go faster, instead of having to put the output in memory and then read it back in so that you can add it to something else," Kjolstad says. "You can just do it in the same loop."

Computer science researchers have developed kernels for some of the tensor operations most common in machine learning and big-data analytics, such as those enumerated by Amarasinghe. But the number of possible kernels is infinite: The kernel for adding together three tensors, for instance, is different from the kernel for adding together four, and the kernel for adding three three-dimensional tensors is different from the kernel for adding three four-dimensional tensors.

Many tensor operations involve multiplying an entry from one tensor with one from another. If either entry is zero, so is their product, and programs for manipulating large, sparse matrices can waste a huge amount of time adding and multiplying zeroes.

Hand-optimized code for sparse tensors identifies zero entries and streamlines operations involving them—either carrying forward the nonzero entries in additions or omitting multiplications entirely. This makes tensor manipulations much faster, but it requires the programmer to do a lot more work.

The code for multiplying two matrices—a simple type of tensor, with only two dimensions, like a table—might, for instance, take 12 lines if the matrix is full (meaning that none of the entries can be omitted). But if the matrix is sparse, the same operation can require 100 lines of code or more, to track omissions and elisions.

Enter Taco

Taco adds all that extra code automatically. The programmer simply specifies the size of a tensor, whether it's full or sparse, and the location of the file from which it should import its values. For any given operation on two tensors, Taco builds a hierarchical map that indicates, first, which paired entries from both tensors are nonzero and, then, which entries from each tensor are paired with zeroes. All pairs of zeroes it simply discards.

Taco also uses an efficient indexing scheme to store only the nonzero values of sparse tensors. With zero entries included, a publicly released tensor from Amazon, which maps customer ID numbers against purchases and descriptive terms culled from reviews, takes up 107 exabytes of data, or roughly 10 times the estimated storage capacity of all of Google's servers. But using the Taco compression scheme, it takes up only 13 gigabytes—small enough to fit on a smartphone.

"Many research groups over the last two decades have attempted to solve the compiler-optimization and code-generation problem for sparse-matrix computations but made little progress," says Saday Sadayappan, a professor of computer science and engineering at Ohio State University, who was not involved in the research. "The recent developments from Fred and Saman represent a fundamental breakthrough on this long-standing open problem."

"Their compiler now enables application developers to specify very complex sparse matrix or computations in a very easy and convenient high-level notation, from which the compiler automatically generates very efficient code," he continues. "For several sparse computations, the generated code from the compiler has been shown to be comparable or better than painstakingly developed manual implementations. This has the potential to be a real game-changer. It is one of the most exciting advances in recent times in the area of compiler optimization."

More information: Fredrik Kjolstad et al. The tensor algebra compiler, Proceedings of the ACM on Programming Languages (2017). DOI: 10.1145/3133901

## Related Stories

#### Novel tensor mining tool to enable automated modeling described in big data

October 6, 2016

Tensors and tensor decompositions, a powerful set of new data mining tools that can be used to model and extract knowledge from multidimensional data, can be automated for more widespread use in Big Data applications. The ...

#### Communication-optimal algorithms for contracting distributed tensors

July 29, 2014

Tensor contractions, generalized matrix multiplications that are time-consuming to calculate, make them among the most compute-intensive operations in several ab initio computational quantum chemistry methods. In this work, ...

#### User-friendly language for programming efficient simulations

August 10, 2016

Computer simulations of physical systems are common in science, engineering, and entertainment, but they use several different types of tools.

#### Network traffic anomaly detection

December 27, 2016

"Diagnosing unusual events (called "anomalies") in a large-scale network like Internet Service Providers and enterprise networks is critical and challenging for both network operators and end users," explain Hiroyuki Kasai ...

#### Quantum computing—breaking through the 49 qubit simulation barrier

October 18, 2017

Quantum computing is at the threshold of tackling important problems that cannot be efficiently or practically computed by other, more classical means. Getting past this threshold will require us to build, test and operate ...

#### Fujitsu AI increases accuracy of malware intrusion detection

October 6, 2017

Fujitsu Laboratories today announced the development of AI technology to improve accuracy in detecting malware intrusions into networks within organizations, such as corporations, through an extension of its proprietary Deep ...

## Recommended for you

#### New Horizons spacecraft returns its sharpest views of Ultima Thule

February 23, 2019

The mission team called it a "stretch goal" – just before closest approach, precisely point the cameras on NASA's New Horizons spacecraft to snap the sharpest possible pics of the Kuiper Belt object nicknamed Ultima Thule, ...

#### After a reset, Сuriosity is operating normally

February 23, 2019

NASA's Curiosity rover is busy making new discoveries on Mars. The rover has been climbing Mount Sharp since 2014 and recently reached a clay region that may offer new clues about the ancient Martian environment's potential ...

#### NASA greenlights SpaceX crew capsule test to ISS

February 23, 2019

NASA on Friday gave SpaceX the green light to test a new crew capsule by first sending an unmanned craft with a life-sized mannequin to the International Space Station.

#### Study: With Twitter, race of the messenger matters

February 23, 2019

When NFL player Colin Kaepernick took a knee during the national anthem to protest police brutality and racial injustice, the ensuing debate took traditional and social media by storm. University of Kansas researchers have ...

#### Physicists calculate proton's pressure distribution for first time

February 22, 2019

Neutron stars are among the densest-known objects in the universe, withstanding pressures so great that one teaspoon of a star's material would equal about 15 times the weight of the moon. Yet as it turns out, protons—the ...

#### A new framework to predict spatiotemporal signal propagation in complex networks

February 22, 2019

Past studies have found that a variety of complex networks, from biological systems to social media networks, can exhibit universal topological characteristics. These universal characteristics, however, do not always translate ...

#### Solving the jet/cocoon riddle of a gravitational wave event

February 22, 2019

An international research team including astronomers from the Max Planck Institute for Radio Astronomy in Bonn, Germany, has combined radio telescopes from five continents to prove the existence of a narrow stream of material, ...

##### zeb
not rated yet Nov 01, 2017
How amusing as a Mathematician to find out that linear algebra is computer science. As to the technical content I once took some academic code that summed the outer product of a stress tensor with itself. So the academics calculated 81 values and I calculated 9 (symmetry reduces the 9 components of the stress tensor to 6 then it was plane stress so only 3 different non-zero values and 3 x 3 = 9 versus 9 x 9 = 81).

So this academic code will only work if they know where the zeros are going to be and will stop mathematicians like me from doing interesting little bits of tensor algebra to distract them from the deadly boring computer science.
##### nrauhauser
not rated yet Nov 03, 2017
What dolt decided the well defined computer industry term 'kernel' should be overloaded in this context?

Seriously, mathematicians love to invent new terms, this is just bizarre.
##### antialias_physorg
5 / 5 (1) Nov 03, 2017
What dolt decided the well defined computer industry term 'kernel' should be overloaded in this context?

They are using the word kernel as it has been used in data processing (particularly image processing) since...forever. They aren't using a new meaning, here.
##### Da Schneib
1 / 5 (2) Nov 03, 2017
I wonder if this is a compiler for GCGPU usage or ordinary CPU usage.

Update: I examined the paper and this is a standard CPU compiler, but the authors believe the techniques will work for GCGPU usage.