share this!
1
4
Share
Email

December 3, 2018

Interpretability and performance: Can the same model achieve both?

by Amit Dhurandhar, IBM

Interpretability and performance of a system are usually at odds with each other, as many of the best-performing models (viz. deep neural networks) are black box in nature. In our work, Improving Simple Models with Confidence Profiles, we try to bridge this gap by proposing a method to transfer information from a high-performing neural network to another model that the domain expert or the application may demand. For example, in computational biology and economics, sparse linear models are often preferred, while in complex instrumented domains such as semi-conductor manufacturing, the engineers might prefer using decision trees. Such simpler interpretable models can build trust with the expert and provide useful insight leading to discovery of novel and previously unknown facts. Our goal is pictorially depicted below, for a specific case in which we are trying to improve performance of a decision tree.

The assumption is that our network is a high-performing teacher, and we can use some of its information to teach the simple, interpretable, but generally low-performing student model. Weighting samples by their difficulty can help the simple model in focusing on easier samples that it can successfully model when training, and thus achieve better overall performance. Our setup is different from boosting: in that approach, difficult examples with respect to a previous 'weak' learner are highlighted for subsequent training to create diversity. Here, difficult examples are with respect to an accurate complex model. This means that these labels are near random. Moreover, if a complex model cannot resolve these, there is little hope for the simple model of fixed complexity. Hence, it is important in our setup to highlight easy examples that the simple model can resolve.

To do this, we assign weights to samples according to the difficulty of the network to classify them, and we do this by introducing probes. Each probe takes its input from one of the hidden layers. Each probe has a single fully connected layer with a softmax layer in the size of the network output attached to it. The probe in layer i serves as a classifier that only uses the prefix of the network up to layer i. The assumption is that easy instances will be classified correctly with high confidence even with first layer probes, and so we get confidence levels p_i from all probes for each of the instances. We use all p_i to calculate instance difficulty w_i, e.g. as the area under curve (AUC) of p_i's.

Now we can use the weights to retrain the simple model on the final weighted dataset. We call this pipeline of probing, obtaining confidence weights, and re-training ProfWeight.

We present two alternatives as to how we compute weights for examples in the dataset. In the AUC approach mentioned above, we note the validation error/accuracy of the simple model when trained on the original training set. We select probes that have an accuracy at least α (> 0) greater than the simple model. Each example is weighted based on the average confidence score for the true label that is computed using the individual soft predictions from the probes.

A second alternative involves optimization using a neural network. Here we learn optimal weights for the training set by optimizing the following objective:

S*=min_w min_β E[λ(Swβ (x),y)], sub. to. E[w]=1

where w are the weights to be found for each instance, β denotes the parameter space of the simple model S, and λ is its loss function. We need to constrain the weights, since otherwise the trivial solution of all the weights going to zero will be optimal for the above objective. We show in the paper that our constraint of E[w]=1 has a connection to finding the optimal importance sampling.

More generally ProfWeight can be used to transfer to even simpler but opaque models such as smaller neural networks, which may be useful in domains with severe memory and power constraints. Such constraints are experienced when deploying models on edge devices in IoT systems or on mobile devices or on unmanned aerial vehicles.

We tested our method on two domains: a public image dataset CIFAR-10 and a proprietary manufacturing dataset. On the first dataset, our simple models were smaller neural networks that would comply to strict memory and power constraints and where we saw 3-4 percent improvement. On the second dataset, our simple model was a decision tree and we significantly improved it by ~13 percent, which led to actionable results by the engineer. Below we depict ProfWeight in comparison with the other methods on this dataset. We observe here that we outperform the other methods by quite some margin.

In the future we would like to find necessary/sufficient conditions when transfer by our strategy would result in improving simple models. We would also like to develop more sophisticated methods for information transfer than what we have already accomplished.

We will present this work in a paper titled "Improving Simple Models with Confidence Profiles" at the 2018 Conference on Neural Information Processing Systems, on Wednesday, December 5, during the evening poster session from 5:00 – 7:00 pm in Room 210 & 230 AB (#90).

Provided by IBM

This story is republished courtesy of IBM Research. Read the original story here.

Citation: Interpretability and performance: Can the same model achieve both? (2018, December 3) retrieved 7 July 2024 from https://phys.org/news/2018-12-interpretability-and-performance-can-the.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

WaveGlow: A flow-based generative network to synthesize speech

5 shares

Feedback to editors

High-selectivity graphene membranes enhance CO₂ capture efficiency

20 hours ago

Exploring the possibility of probing fundamental spacetime symmetries via gravitational wave memory

22 hours ago

Starlings' migratory behavior found to be inherited, not learned

Jul 5, 2024

Webb captures a staggering quasar-galaxy merger in the remote universe

Jul 5, 2024

Repurposed technology used to probe new regions of Mars' atmosphere

Jul 5, 2024

Evidence shows ancient Saudi Arabia had complex and thriving communities, not struggling people in a barren land

Jul 5, 2024

Research finds humpbacks were happier during pandemic pause

Jul 5, 2024

Webb admires bejeweled ring of the lensed quasar RX J1131-1231

Jul 5, 2024

Researchers demonstrate economical process for the synthesis and purification of ionic liquids

Jul 5, 2024

New probe reveals water-ice microstructures

Jul 5, 2024

Load comments (0)

Interpretability and performance: Can the same model achieve both?

High-selectivity graphene membranes enhance CO₂ capture efficiency

Exploring the possibility of probing fundamental spacetime symmetries via gravitational wave memory

Starlings' migratory behavior found to be inherited, not learned

Webb captures a staggering quasar-galaxy merger in the remote universe

Repurposed technology used to probe new regions of Mars' atmosphere

Evidence shows ancient Saudi Arabia had complex and thriving communities, not struggling people in a barren land

Research finds humpbacks were happier during pandemic pause

Webb admires bejeweled ring of the lensed quasar RX J1131-1231

Researchers demonstrate economical process for the synthesis and purification of ionic liquids

New probe reveals water-ice microstructures

Relevant PhysicsForums posts

5 GHz PC WiFi connection Cybersecurity question

I did this POST message configuration damage to my wifi internet, help

Number of Multiplications in the FFT Algorithm

Newbie question about deep learning

Who can find the largest prime number with their own programmed code?

Math Major Trying to Learn CS

WaveGlow: A flow-based generative network to synthesize speech

A light-weight and accurate deep learning model for audiovisual emotion recognition

Method for modeling neural networks' power consumption could help make the systems portable

An evaluation of the accuracy-efficiency tradeoffs of neural language models

Artificial intelligence system uses transparent, human-like reasoning to solve problems

Facebook researchers build a dataset to train personalized dialogue agents

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Medical Xpress

Tech Xplore

Science X

Interpretability and performance: Can the same model achieve both?

High-selectivity graphene membranes enhance CO₂ capture efficiency

Exploring the possibility of probing fundamental spacetime symmetries via gravitational wave memory

Starlings' migratory behavior found to be inherited, not learned

Webb captures a staggering quasar-galaxy merger in the remote universe

Repurposed technology used to probe new regions of Mars' atmosphere

Evidence shows ancient Saudi Arabia had complex and thriving communities, not struggling people in a barren land

Research finds humpbacks were happier during pandemic pause

Webb admires bejeweled ring of the lensed quasar RX J1131-1231

Researchers demonstrate economical process for the synthesis and purification of ionic liquids

New probe reveals water-ice microstructures

Relevant PhysicsForums posts

Related Stories

WaveGlow: A flow-based generative network to synthesize speech

A light-weight and accurate deep learning model for audiovisual emotion recognition

Method for modeling neural networks' power consumption could help make the systems portable

An evaluation of the accuracy-efficiency tradeoffs of neural language models

Artificial intelligence system uses transparent, human-like reasoning to solve problems

Facebook researchers build a dataset to train personalized dialogue agents

Recommended for you

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Newsletter sign up

Donate and enjoy an ad-free experience