Direct-Manipulation Visualization of Deep Networks

Direct-Manipulation Visualization of Deep Networks Daniel Smilkov [email protected] Google, Inc., 5 Cambridge Center, Cambridge MA, 02142 Shan Carter [email protected] Google, Inc., 1600 Amphitheater Parkway, Mountain View, CA 94043 D. Sculley [email protected] Google, Inc., 5 Cambridge Center, Cambridge MA, 02142 Fernanda B. Viegas´ [email protected] Google, Inc., 5 Cambridge Center, Cambridge MA, 02142 Martin Wattenberg [email protected] Google, Inc., 5 Cambridge Center, Cambridge MA, 02142 Abstract choices experts make in building a real-world system–the The recent successes of deep learning have led number of units and layers, the activation function, regu- to a wave of interest from non-experts. Gain- larization techniques, etc.–are currently guided by intuition ing an understanding of this technology, how- and experience as much as theory. Acquiring this intuition ever, is difficult. While the theory is important, is a lengthy process, since it typically requires coding and it is also helpful for novices to develop an intu- training many different working systems. itive feel for the effect of different hyperparam- One possible shortcut is to use interactive visualization to eters and structural variations. We describe Ten- help novices with mathematical and practical intuition. Re- sorFlow Playground, an interactive visualization cently, several impressive systems have appeared that do that allows users to experiment via direct ma- exactly this. Olah’s elegant interactive online essays (Olah) nipulation rather than coding, enabling them to let a viewer watch the training of a simple classifier, pro- quickly build an intuition about neural nets. viding a multiple perspectives on how a network learns a transformation of space. Karpathy created a Javascript li- brary (Karpathy) and provided a series of dynamic views of 1. Introduction networks training, again in a browser. Others have found beautiful ways to visualize the features learned by image Deep learning systems are currently attracting a huge classification nets (Zhou et al., 2014), (Zeiler & Fergus, amount of interest, as they see continued success in practi- 2014). cal applications. Students who want to understand this new technology encounter two primary challenges. Taking inspiration from the success of these examples, we created the TensorFlow Playground. As with the work of First, the theoretical foundations of the field are not always Olah and Karpathy, the Playground is an in-browser visual- easy for a typical software engineer or computer science ization of a running neural network. However, it is specifi- student, since they require a solid mathematical intuition. cally designed for experimentation by direct manipulation, It’s not trivial to translate the equations defining a deep and also visualizes the derived “features” found by every network into a mental model of the underlying geometric unit in the network simultaneously. The system provides a transformations. variety of affordances for rapidly and incrementally chang- Even more challenging are aspects of deep learning where ing hyperparameters and immediately seeing the effects of theory does not provide crisp, clean explanations. Critical those changes, as well as for sharing experiments with others. Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). Direct-Manipulation Visualization of Deep Networks Figure 1. TensorFlow Playground. This network is, roughly speaking, classifying data based on distance to the origin. Curves show weight parameters, with thickness denoting absolute magnitude and color indicating sign. The feature heatmaps for each unit show how the classification function (large heatmap at right) is built from input features, then near-linear combinations of these features, and finally more complex features. At upper right is a graph showing loss over time. At left are possible features; x1 and x2 are highlighted, while other mathematical combinations are faded to indicate they should not be used by the network. 2. TensorFlow Playground: Visualization the input features are simply x1 and x2, which themselves are represented by the same type of heatmap. In the next The structure of the Playground visualization is a standard layer, we see units that correspond to various linear com- network diagram. The visualization shows a network that binations, leading to a final layer with more complicated is designed to solve either classification or regression prob- non-linear classifiers. Moving the mouse over any of these lems based on two abstract real-valued features, x1 and units projects a larger version of the heatmap, on the final x2, which vary between -1 and 1. Input units, represent- unit, where it can be overlaid with input and test data. ing these features and various mathematical combinations, are at the left. Units in hidden layers are shown as small The activation heatmaps help users build a mental model boxes, with connections between units drawn as curves of the mathematics underlying deep networks. For many whose color and width indicate weight values. Finally, on configurations of the network, after training there is an ob- the right, a visualization of the output of the network is vious visual progression in complexity across the network. shown: a square with a heatmap showing the output values In these configurations, viewers can see how the first layer (for each possible combination of x1 and x2) of the single of units (modulo activation function, acting as linear classi- unit that makes up the final layer of the network. When the fiers) combine to recognize clearly nonlinear regions. The user presses the “play” button, the network begins to train. heatmaps also help viewers understand the different effects of various activation functions. For example, there is a There is a new twist in this visualization, however. Inside clear visual difference in the effect of ReLU and tanh func- the box that represents each unit is a heatmap that maps tions. Just as instructive, however, are suboptimal combi- the unit’s response to all values of (x1; x2) in a square nations of architecture and hyperparameters. Often when centered at the origin. As seen in Figure1, this provides there are redundant units (Figure3), it is easy to see that a quick geometric view of how the network builds com- units in intermediate layers have actually learned the clas- plex features from simpler ones. For example, in the figure sifier perfectly well and that many other units have little Direct-Manipulation Visualization of Deep Networks effect on the final outcome. In cases where learning is sim- Additional aspects of the visualization make it well-suited ply unsuccessful, the viewer will often see weights going to to education. We have found that the smooth animation zero, and that there is no natural progression of complexity engages users. It also lends itself to a good “spectator ex- in the activation heatmaps (Figure4). perience” (Reeves et al., 2005), drawing students in during presentations. We have seen onlookers laugh and even gasp 3. Affordances for Education and as they watch a network try and fail to classify the spiral data set, for example. Although animation has not always Experimentation been found to be helpful in educational contexts, simula- The real strength of this visualization is its interactivity, tions are one case where there is good evidence that it is which is especially helpful for gaining an intuition for the beneficial (Betrancourt, 2005). practical aspects of training a deep network. The Play- One particularly important feature is the ability to seam- ground lets users make the following choices of network lessly bookmark (Viegas et al., 2007) a particular configu- structure and hyperparameters: ration of hyperparameters and structure. As the user plays with the tool, the URL in the browser dynamically updates • Problem type: regression or classification to reflect its current state. If the user (or a teacher preparing a lesson plan) finds a configuration they would like to share • Training data: a choice of four synthetic data sets, with others, they need only copy the URL. from well-separated clusters to interleaved “swiss roll” spirals. We have found this bookmarking capability invaluable in the teaching process. For example, it has allowed us to put • Number of layers together tutorials in which students can move, step by step, through a series of lessons that focus on particular aspects • Number of units in each layer of neural networks. Using the visualization in these “liv- • Activation function ing lessons” makes it straightforward to create a dynamic, interactive educational experience. • Learning rate • Batch size 4. Conclusion and Future Work • Regularization: L1, L2, or none The TensorFlow Playground illustrates a direct- manipulation approach to understanding neural nets. • Input features: in addition to the two real-valued fea- Given the importance of intuition and experimentation to tures x1 and x2, the Playground allows users to add the field of deep learning, the visualization is designed to some simple algebraic combinations, such as x1x2 make it easy to get a hands-on feel for how these systems 2 and x1. work without any coding. Not only does this extend the reach of the tool to people who aren’t programmers, it • Noise level for input data provides a much faster route, even for coders, to try many variations quickly. By playing with the visualization, users These particular variations were chosen based on experi- have a chance to build a mental model of the mathematics ence teaching software engineers how to use neural net- behind deep learning, as well as develop a natural feeling works in their applications, and are meant to highlight key for how these networks respond to tweaks in architecture decisions that are made in real life. They are also meant and hyperparameters. to be easily combined to support particular lessons. For in- stance, allowing users to add algebraic combinations of the In addition to internal success with the tool, we have seen a two primary features makes it easy to show how a linear strong positive reaction since it has been open-sourced.

Load more