Lagrangian Decomposition for Neural Network Verification

Lagrangian Decomposition for Neural Network Verification Rudy Bunel∗, Alessandro De Palma∗, Alban Desmaison University of Oxford frudy, [email protected] Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli Philip H.S. Torr, M. Pawan Kumar DeepMind University of Oxford Abstract where safety and robustness would be a prerequisite, we need to invent techniques that can prove formal guarantees A fundamental component of neural network for neural network behaviour. A particularly desirable verification is the computation of bounds on property is resistance to adversarial examples (Goodfel- the values their outputs can take. Previous low et al., 2015, Szegedy et al., 2014): perturbations ma- methods have either used off-the-shelf solvers, liciously crafted with the intent of fooling even extremely discarding the problem structure, or relaxed well performing models. After several defenses were pro- the problem even further, making the bounds posed and subsequently broken (Athalye et al., 2018, Ue- unnecessarily loose. We propose a novel sato et al., 2018), some progress has been made in being approach based on Lagrangian Decomposition. able to formally verify whether there exist any adversarial Our formulation admits an efficient supergradi- examples in the neighbourhood of a data point (Tjeng ent ascent algorithm, as well as an improved et al., 2019, Wong and Kolter, 2018). proximal algorithm. Both the algorithms offer Verification algorithms fall into three categories: unsound three advantages: (i) they yield bounds that (some false properties are proven false), incomplete (some are provably at least as tight as previous dual true properties are proven true), and complete (all prop- algorithms relying on Lagrangian relaxations; erties are correctly verified as either true or false). A (ii) they are based on operations analogous critical component of the verification systems developed to forward/backward pass of neural networks so far is the computation of lower and upper bounds on layers and are therefore easily parallelizable, the output of neural networks when their inputs are con- amenable to GPU implementation and able to strained to lie in a bounded set. In incomplete verifica- take advantage of the convolutional structure tion, by deriving bounds on the changes of the predic- of problems; and (iii) they allow for anytime tion vector under restricted perturbations, it is possible stopping while still providing valid bounds. to identify safe regions of the input space. These results Empirically, we show that we obtain bounds allow the rigorous comparison of adversarial defenses comparable with off-the-shelf solvers in a and prevent making overconfident statements about their fraction of their running time, and obtain tighter efficacy (Wong and Kolter, 2018). In complete verifica- arXiv:2002.10410v3 [cs.LG] 17 Jun 2020 bounds in the same time as previous dual tion, bounds can also be used as essential subroutines algorithms. This results in an overall speed-up of Branch and Bound complete verifiers (Bunel et al., when employing the bounds for formal verifi- 2018). Finally, bounds might also be used as a training cation. Code for our algorithms is available at signal to guide the network towards greater robustness https://github.com/oval-group/ and more verifiability (Gowal et al., 2018, Mirman et al., decomposition-plnn-bounds. 2018, Wong and Kolter, 2018). Most previous algorithms for computing bounds are either 1 INTRODUCTION computationally expensive (Ehlers, 2017) or sacrifice a lot of tightness in order to scale (Gowal et al., 2018, Mirman As deep learning powered systems become more and more et al., 2018, Wong and Kolter, 2018). In this work, we common, the lack of robustness of neural networks and design novel customised relaxations and their correspond- their reputation for being “Black Boxes” is increasingly ing solvers for obtaining bounds on neural networks. Our worrisome. In order to deploy them in critical scenarios approach offers the following advantages: ∗ These authors contributed equally to this work. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020. • While previous approaches to neural network problem instances. Others are incomplete, based on re- bounds (Dvijotham et al., 2018) are based on La- laxations of the verification problem. They trade speed grangian relaxations, we derive a new family of opti- for completeness: while they cannot verify properties for mization problems for neural network bounds through all problem instances, they scale significantly better. Two Lagrangian Decomposition, which in general yields main types of bounds have been proposed: on the one duals at least as strong as those obtained through La- hand, some approaches (Ehlers, 2017, Salman et al., 2019) grangian relaxation (Guignard and Kim, 1987). We rely on off-the-shelf solvers to solve accurate relaxations in fact prove that, in the context of ReLU networks, such as PLANET (Ehlers, 2017), which is the best known for any dual solution from the approach by Dvijotham linear-sized approximation of the problem. On the other et al. (2018), the bounds output by our dual are as least hand, as PLANET and other more complex relaxations as tight. We demonstrate empirically that this deriva- do not have closed form solutions, some researchers have tion computes tighter bounds in the same time when also proposed easier to solve, looser formulations (Gowal using supergradient methods. We further improve on et al., 2018, Mirman et al., 2018, Weng et al., 2018, Wong the performance by devising a proximal solver for the and Kolter, 2018). Explicitly or implicitly, these are all problem, which decomposes the task into a series of equivalent to propagating a convex domain through the strongly convex subproblems. For each, we use an iter- network to overapproximate the set of reachable values. ative method for which we derive optimal step sizes. Our approach consists in tackling a relaxation equivalent • Both the supergradient and the proximal method op- to the PLANET one (although generalised beyond ReLU), erate through linear operations similar to those used by designing a custom solver that achieves faster perfor- during network forward/backward passes. As a con- mance without sacrificing tightness. Some potentially sequence, we can leverage the convolutional struc- tighter convex relaxations exist but involve a quadratic ture when necessary, while standard solvers are of- number of variables, such as the semi-definite program- ten restricted to treating it as a general linear opera- ming method of Raghunathan et al. (2018) . Better re- tion. Moreover, both methods are easily paralleliz- laxations obtained from relaxing strong Mixed Integer able: when computing bounds on the activations at Programming formulations (Anderson et al., 2019) have a layer k, we need two solve two problems for each quadratic number of variables or a potentially exponential hidden unit of the network (for the upper and lower number of constraints. We do not address them here. bounds). These can all be solved in parallel. In com- A closely related approach to ours is the work of Dvi- plete verification, we need to compute bounds for sev- jotham et al. (2018). Both their method and ours are eral different problem domains at once: we solve these anytime and operate on similar duals. While their dual problems in parallel as well. Our GPU implementation is based on the Lagrangian relaxation of the non-convex thus allows us to solve several hundreds of linear pro- problem, ours is based on the Lagrangian Decomposition grams at once on a single GPU, a level of parallelism of the nonlinear activation’s convex relaxation. Thanks to that would be hard to match on CPU-based systems. the properties of Lagrangian Decomposition (Guignard • Most standard linear programming based relax- and Kim, 1987), we can show that our dual problem pro- ations (Ehlers, 2017) will only return a valid bound vides better bounds when evaluated on the same dual vari- if the problem is solved to optimality. Others, like ables. The relationship between the two duals is studied the dual simplex method employed by off-the-shelf in detail in section 4.2. Moreover, in terms of the followed solvers (Gurobi Optimization, 2020) have a very high optimization strategy, in addition to using a supergradient cost per iteration and will not yield tight bounds with- method like Dvijotham et al. (2018), we present a proxi- out incurring significant computational costs. Both mal method, for which we can derive optimal step sizes. methods described in this paper are anytime (terminat- We show that these modifications enable us to compute ing it before convergence still provides a valid bound), tighter bounds using the same amount of compute time. and can be interrupted at very small granularity. This is useful in the context of a subroutine for complete 3 PRELIMINARIES verifiers, as this enables the user to choose an appropri- Throughout this paper, we will use bold lower case letters ate speed versus accuracy trade-off. It also offers great (like z) to represent vectors and upper case letters (like W ) versatility as an incomplete verification method. to represent matrices. Brackets are used to indicate the 2 RELATED WORKS i-th coordinate of a vector (z[i]), and integer ranges (e.g., [1; n − 1]). We will study the computation of the lower Bound computations are mainly used for formal verifica- bound problem based on a feedforward neural network, tion methods. Some methods are complete (Cheng et al., with element-wise activation function σ (·). The network 2017, Ehlers, 2017, Katz et al., 2017, Tjeng et al., 2019, inputs are restricted to a convex domain C, over which we Xiang et al., 2017), always returning a verdict for each assume that we can easily optimise linear functions. This is the same assumption that was made by Dvijotham et al. ingful, easy to solve subproblems. We then impose con- (2018). The computation for an upper bound is analogous.

Load more