The elusive capacity of data networks

May 15, 2012 by Larry Hardesty
The principle behind network coding is often explained by reference to a so-called butterfly network. When messages A and B reach the same node, they're scrambled together, and their combination (A+B) is passed to the next node. Further downstream, one node uses A to recover B from A+B, while another uses B to recover A from A+B. Graphic: Christine Daniloff

In its early years, information theory — which grew out of a landmark 1948 paper by MIT alumnus and future professor Claude Shannon — was dominated by research on error-correcting codes: How do you encode information so as to guarantee its faithful transmission, even in the presence of the corrupting influences engineers call "noise"?

Recently, one of the most intriguing developments in information theory has been a different kind of coding, called coding, in which the question is how to encode information in order to maximize the capacity of a network as a whole. For information theorists, it was natural to ask how these two types of coding might be combined: If you want to both minimize error and maximize capacity, which kind of coding do you apply where, and when do you do the decoding?

What makes that question particularly hard to answer is that no one knows how to calculate the data capacity of a network as a whole — or even whether it can be calculated. Nonetheless, in the first half of a two-part paper, which was published recently in IEEE Transactions on Information Theory, MIT's Muriel Médard, California Institute of Technology's Michelle Effros and the late Ralf Koetter of the University of Technology in Munich show that in a wired network, network coding and error-correcting coding can be handled separately, without reduction in the network's capacity. In the forthcoming second half of the paper, the same researchers demonstrate some bounds on the capacities of wireless networks, which could help guide future research in both industry and academia.

A typical data network consists of an array of nodes — which could be routers on the Internet, wireless base stations or even processing units on a single chip — each of which can directly communicate with a handful of its neighbors. When a packet of data arrives at a node, the node inspects its addressing information and decides which of several pathways to send it along.

Calculated confusion

With network coding, on the other hand, a node scrambles together the packets it receives and sends the hybrid packets down multiple paths; at each subsequent node they're scrambled again in different ways. Counterintuitively, this can significantly increase the capacity of the network as a whole: Hybrid packets arrive at their destination along multiple paths. If one of those paths is congested, or if one of its links fails outright, the packets arriving via the other paths will probably contain enough information that the recipient can piece together the original message.

But each link between nodes could be noisy, so the information in the packets also needs to be encoded to correct for errors. "Suppose that I'm a node in a network, and I see a communication coming in, and it is corrupted by noise," says Médard, a professor of electrical engineering and computer science. "I could try to remove the noise, but by doing that, I'm in effect making a decision right now that maybe would have been better taken by someone downstream from me who might have had more observations of the same source."

On the other hand, Médard says, if a node simply forwards the data it receives without performing any error correction, it could end up squandering bandwidth. "If the node takes all the signal it has and does not whittle down his representation, then it might be using a lot of energy to transmit noise," she says. "The question is, how much of the noise do I remove, and how much do I leave in?"

In their first paper, Médard and her colleagues analyze the case in which the noise in a given link is unrelated to the signals traveling over other links, as is true of most wired networks. In that case, the researchers show, the problems of error correction and network coding can be separated without limiting the capacity of the network as a whole.

Noisy neighbors

In the second paper, the researchers tackle the case in which the noise on a given link is related to the signals on other links, as is true of most wireless networks, since the transmissions of neighboring base stations can interfere with each other. This complicates things enormously: Indeed, Médard points out, information theorists still don't know how to quantify the capacity of a simple three-node wireless network, in which two nodes relay messages to each other via a third node.

Nonetheless, Médard and her colleagues show how to calculate upper and lower bounds on the capacity of a given wireless network. While the gap between the bounds can be very large in practice, knowing the bounds could still help network operators evaluate the benefits of further research on network coding. If the observed bit rate on a real-world network is below the lower bound, the operator knows the minimum improvement that the ideal code would provide; if the observed rate is above the lower bound but below the upper, then the operator knows the maximum improvement that the ideal code might provide. If even the maximum improvement would afford only a small savings in operational expenses, the operator may decide that further research on improved coding isn't worth the money.

"The separation theorem they proved is of fundamental interest," says Raymond Yeung, a professor of information engineering and co-director of the Institute of at the Chinese University of Hong Kong. "While the result itself is not surprising, it is somewhat unexpected that they were able to prove the result in such a general setting."

Yeung cautions, however, that while the researchers have "decomposed a very difficult problem into two," one of those problems "remains very difficult. … The bound is in terms of the solution to another problem which is difficult to solve," he says. "It is not clear how tight this bound is; that needs further research."

Explore further: Google DeepMind acquisition researchers working on a Neural Turing Machine

Related Stories

Rethinking networking

Feb 12, 2010

Today, data traveling over the Internet are much like crates of oranges traveling the interstates in the back of a truck. The data are loaded in at one end, unloaded at the other, and nothing much happens ...

Secure, synchronized, social TV

Apr 01, 2011

Network coding is an innovative new approach to network design that promises much more efficient use of bandwidth, and MIT researchers have made seminal contributions to its development. But in recent work, ...

Perfect communication with imperfect chips

Aug 05, 2011

One of the triumphs of the information age is the idea of error-correcting codes, which ensure that data carried by electromagnetic signals — traveling through the air, or through cables or optical fibers ...

Recommended for you

Saving lots of computing capacity with a new algorithm

Oct 29, 2014

The control of modern infrastructure such as intelligent power grids needs lots of computing capacity. Scientists of the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg have ...

User comments : 2

Adjust slider to filter visible comments by rank

Display comments: newest first

stealthc
1 / 5 (1) May 16, 2012
this reminds me of tor. Although I doubt that tor employs scrambling two packets sent to two different routers by mixing and then re-assembly at some other point in the network. I think it'd be a great scheme to make it impossible for people to intercept data in transit if done right.
antialias_physorg
not rated yet May 16, 2012
Although I doubt that tor employs scrambling two packets sent to two different routers by mixing and then re-assembly at some other point in the network.

This is how the internet alread works. When you send packets downstream (e.g. when you hit "submit" after typing in a post on physorg) then it's quite likely that not all the created packts take the same route (or even arrive in chronological order) at the physorg server. Some might even get lost along the way and the server will ask for them to be resent (or more likely: the server will not send a 'packet received' to the sender and the sender will try to send the packet again on its own afetr some time).

If you want to eavesdrop on someone it is then best to set up shop either at the ISP of the sender or the receiver (or just plug some malware into the sender's or receiver's computer if you can arrange it and have them send the stuff CC directly to you)

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.