Luís Bettencourt, Steven Brumby, Garrett Kenyon, Will Landecker, Melanie Mitchell, Michael Thomure

Paper #: 13-02-007

Hierarchical networks are known to achieve high classification accuracy on difficult machine-learning tasks. For many applications, a clear explanation of why the data was classified a certain way is just as important as the classification itself. However, the complexity of hierarchical networks makes them ill-suited for existing explanation methods. We propose a new method, contribution propagation, that gives per-instance explanations of a trained network’s classifications. We give theoretical foundations for the proposed method, and evaluate its correctness empirically. Finally, we use the resulting explanations to reveal unexpected behavior of networks that achieve high accuracy on visual object-recognition tasks using well-known data sets.