While scientists don’t fully understand how machine-learning algorithms have succeeded at “intelligent” tasks like image and speech recognition, they do know that in order to generalize, an algorithm has to remember the important information while forgetting the useless. This idea, often referred to as an “Information Bottleneck,” has generated a flurry of research since it was first proposed in 2000.
Only very recently, however, has this idea been applied to the rapidly developing field of deep learning, i.e., machine learning that uses so-called artifcial neural networks. What would happen if neural networks were explicitly trained to discard useless information, and how to tell them to do so, is the subject of new research by SFI Postdoctoral Fellows Artemy Kolchinsky, Brendan Tracey, and Professor David Wolpert.
“It may be that deep learning networks succeed because of what they learn to ignore, not just what they learn to predict,” Kolchinsky says. “So we ask: what happens if we explicitly encourage a network to forget irrelevant information?”
In their most recent paper, published on the arXiv preprint server, the scientists present a method for training a machine learning algorithm to identify objects using minimal information. The method resolves the problem of how to estimate the amount of information stored in the algorithm by making use of a novel estimator, published this past July by Kolchinsky and Tracey in the journal Entropy.
“The motivation for this paper is to make predic- tions using data from a bandwidth-limited environment,” says Tracey. “Say you’re a satellite in space, or a remote weather station in Antarctica. You can’t send back all of the data you collect, so which pieces of data are the right data to transmit?”
More generally, the method could be used to push networks to learn more abstract and more generalizable concepts, potentially leading to better performance on new data — from recognizing pedestrians near self-driving vehicles, to reporting a five-day weather forecast from Mars.
Read the paper, "Nonlinear information bottleneck," on the arXiv preprint server.