January 24, 2019

Information theory holds surprises for machine learning

New SFI research challenges a popular conception of how machine learning algorithms "think" about certain tasks.

The conception goes something like this: because of their ability to discard useless information, a class of machine learning algorithms called deep neural networks can learn general concepts from raw data— like identifying cats generally after encountering tens of thousands of images of different cats in different situations. This seemingly human ability is said to arise as a byproduct of the networks' layered architecture. Early layers encode the "cat" label along with all of the raw information needed for prediction. Subsequent layers then compress the information, as if through a bottleneck. Irrelevant data, like the color of the cat's coat, or the saucer of milk beside it, is forgotten, leaving only general features behind. Information theory provides bounds on just how optimal each layer is, in terms of how well it can balance the competing demands of compression and prediction.

"A lot of times when you have a neural network and it learns to map faces to names, or pictures to numerical digits, or amazing things like French text to English text, it has a lot of intermediate hidden layers that information flows through," says Artemy Kolchinsky, an SFI Postdoctoral Fellow and the study's lead author. "So there's this long-standing idea that as raw inputs get transformed to these intermediate representations, the system is trading prediction for compression, and building higher-level concepts through this information bottleneck."

However, Kolchinsky and his collaborators Brendan Tracey (SFI, MIT) and Steven Van Kuyk (University of Wellington) uncovered a surprising weakness when they applied this explanation to common classification problems, where each input has one correct output (e.g., in which each picture can either be of a cat or of a dog). In such cases, they found that classifiers with many layers generally do not give up some prediction for improved compression. They also found that there are many "trivial" representations of the inputs which are, from the point of view of information theory, optimal in terms of their balance between prediction and compression.

"We found that this information bottleneck measure doesn't see compression in the same way you or I would. Given the choice, it is just as happy to lump 'martini glasses' in with 'Labradors', as it is to lump them in with 'champagne flutes,'" Tracey explains. "This means we should keep searching for compression measures that better match our notions of compression."

While the idea of compressing inputs may still play a useful role in machine learning, this research suggests it is not sufficient for evaluating the internal representations used by different machine learning algorithms.

At the same time, Kolchinsky says that the concept of trade-off between compression and prediction will still hold for less deterministic tasks, like predicting the weather from a noisy dataset. "We're not saying that information bottleneck is useless for supervised [machine] learning," Kolchinsky stresses. "What we're showing here is that it behaves counter-intuitively on many common machine learning problems, and that's something people in the machine learning community should be aware of."

More information: Caveats for information bottleneck in deterministic scenarios. export.arxiv.org/abs/1808.07593

Provided by Santa Fe Institute

Citation: Information theory holds surprises for machine learning (2019, January 24) retrieved 6 August 2024 from https://phys.org/news/2019-01-theory-machine.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A new approach for software fault prediction using feature selection

86 shares

Feedback to editors

Domestication causes smaller brain size in dogs than in the wolf: Study challenges notion

2 hours ago

Tundra vegetation to grow taller, greener through 2100, study finds

4 hours ago

Living with a killer: How an unlikely mantis shrimp-clam association violates a biological principle

5 hours ago

Bouncing helps people move in sync during dance, study shows

5 hours ago

How plants become bushy, or not: New study sheds light on hormone that controls branching

5 hours ago

Elephants on the move: Mapping connections across African landscapes

5 hours ago

Study finds seasonal shifts in moral values

7 hours ago

Researchers reveal atomic-scale details of catalysts' active sites

7 hours ago

Sniff test for explosives detection extends its reach

7 hours ago

Researchers dig deeper into stability challenges of nuclear fusion—with mayonnaise

9 hours ago

Load comments (0)

Information theory holds surprises for machine learning

Domestication causes smaller brain size in dogs than in the wolf: Study challenges notion

Tundra vegetation to grow taller, greener through 2100, study finds

Living with a killer: How an unlikely mantis shrimp-clam association violates a biological principle

Bouncing helps people move in sync during dance, study shows

How plants become bushy, or not: New study sheds light on hormone that controls branching

Elephants on the move: Mapping connections across African landscapes

Study finds seasonal shifts in moral values

Researchers reveal atomic-scale details of catalysts' active sites

Sniff test for explosives detection extends its reach

Researchers dig deeper into stability challenges of nuclear fusion—with mayonnaise

Relevant PhysicsForums posts

Creating a minimal Windows 11 Bootable USB stick for my ROG Computer

Python Socket library to create a server and client scripts

Safe, free and unlimited xls to xlsx converter?

Help solving a geometrical matching issue with Graph Neural Networks

5 GHz PC WiFi connection Cybersecurity question

Help with some optimization code for Block Matrices

A new approach for software fault prediction using feature selection

Using machine learning for the early detection of anomalies helps to avoid damage

Measuring AI's ability to learn is difficult

Machine learning to predict and optimise the deformation of materials

EMR data can predict myopia development

A new machine learning strategy that could enhance computer vision

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Medical Xpress

Tech Xplore

Science X

Information theory holds surprises for machine learning

Domestication causes smaller brain size in dogs than in the wolf: Study challenges notion

Tundra vegetation to grow taller, greener through 2100, study finds

Living with a killer: How an unlikely mantis shrimp-clam association violates a biological principle

Bouncing helps people move in sync during dance, study shows

How plants become bushy, or not: New study sheds light on hormone that controls branching

Elephants on the move: Mapping connections across African landscapes

Study finds seasonal shifts in moral values

Researchers reveal atomic-scale details of catalysts' active sites

Sniff test for explosives detection extends its reach

Researchers dig deeper into stability challenges of nuclear fusion—with mayonnaise

Relevant PhysicsForums posts

Related Stories

A new approach for software fault prediction using feature selection

Using machine learning for the early detection of anomalies helps to avoid damage

Measuring AI's ability to learn is difficult

Machine learning to predict and optimise the deformation of materials

EMR data can predict myopia development

A new machine learning strategy that could enhance computer vision

Recommended for you

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Newsletter sign up

Donate and enjoy an ad-free experience