Model Learning and Model Checking of SSH Implementations
Total Page:16
File Type:pdf, Size:1020Kb
Model Learning and Model Checking of SSH Implementations Paul Fiterau-Bros¸tean˘ ∗ Toon Lenaerts Erik Poll Radboud University Nijmegen Radboud University Nijmegen Radboud University Nijmegen [email protected] [email protected] [email protected] Joeri de Ruiter Frits Vaandrager Patrick Verleg Radboud University Nijmegen Radboud University Nijmegen Radboud University Nijmegen [email protected] [email protected] [email protected] ABSTRACT model learning (a.k.a. active automata learning) [6, 21, 28] to infer We apply model learning on three SSH implementations to infer state machines of three SSH implementations, which we then an- state machine models, and then use model checking to verify that alyze by model checking for conformance to both functional and these models satisfy basic security properties and conform to the security properties. RFCs. Our analysis showed that all tested SSH server models satisfy e properties we verify for the inferred state machines are the stated security properties, but uncovered several violations of based on the RFCs that specify SSH [32–35]. ese properties the standard. are formalized in LTL and veried using NuSMV [13]. We use a model checker since the models are too complex for manual CCS CONCEPTS inspection (they are trivial for NuSMV). Moreover, by formalizing the properties we can beer assess and overcome vagueness or •Networks →Protocol correctness; •eory of computation under-specication in the RFC standards. →Active learning; •So ware and its engineering →Model is paper is born out of two recent theses [19, 29], and is to checking; •Security and privacy →Logic and verication; Net- our knowledge the rst combined application of model learning work security; and model checking in verifying SSH implementations, or more KEYWORDS generally, implementations of any network security protocol. Model learning, Model checking, SSH protocol Related work. Chen et al.[12] use the MOPS soware model ACM Reference format: checking tool to detect security vulnerabilities in the OpenSSH C Paul Fiterau-Bro˘ s¸tean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits Vaan- implementation due to violation of folk rules for the construction drager, and Patrick Verleg. 2017. Model Learning and Model Checking of of secure programs such as “Do not open a le in writing mode SSH Implementations. In Proceedings of International SPIN Symposium on Model Checking of Soware , Santa Barbara, CA, USA, July 2017 (SPIN’17), to stdout or stderr”. Udrea et al.[27] also investigated SSH imple- 10 pages. mentations for logical aws. ey used a static analysis tool to DOI: 10.1145/3092282.3092289 check two C implementations of SSH against an extensive set of rules. ese rules not only express properties of the SSH protocol 1 INTRODUCTION logic, but also of message formats and support for earlier versions and various options. Our analysis only considers the protocol logic. SSH is a security protocol that is widely used to interact securely However, their rules were tied to routines in the code, so had to be with remote machines. e Transport layer of SSH has been sub- slightly adapted to t the dierent implementations. In contrast, jected to security analysis [31], incl. analyses that revealed crypto- our properties are dened at an abstract level so do not need such graphic shortcomings [5, 7, 20]. tailoring. Moreover, our black box approach means we can analyze Whereas these analyses consider the abstract cryptographic pro- any implementation of SSH, not just open source C implementa- tocol, this paper looks at actual implementations of SSH, and in- tions. vestigates aws in the program logic of these implementations, Formal models of SSH in the form of state machines have been rather than cryptographic aws. Such logical aws have occurred used before, namely for a manual code review of OpenSSH [23], in implementations of other security protocols, notably TLS, with formal program verication of a Java implementation of SSH [22], Apple’s ’goto fail’ bug and the FREAK aack [8]. For this we use and for model based testing of SSH implementations [9]. All this ∗Supported by NWO project 612.001.216, Active Learning of Security Protocols research only considered the SSH Transport layer, and not the other (ALSEP). SSH protocol layers. Permission to make digital or hard copies of all or part of this work for personal or Model learning has previously been used to infer state machines classroom use is granted without fee provided that copies are not made or distributed of EMV bank cards [3], electronic passports [4], hand-held readers for prot or commercial advantage and that copies bear this notice and the full citation for online banking [11], and implementations of TCP [14] and on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, TLS [25]. Some of these studies relied on manual analysis of learned to post on servers or to redistribute to lists, requires prior specic permission and/or a models [3, 4, 25], but some also used model checkers [11, 14]. fee. Request permissions from [email protected]. Instead of using active learning as we do, it is also possible to SPIN’17, Santa Barbara, CA, USA © 2017 ACM. 978-1-4503-5077-8/17/07...$15.00 use passive learning to obtain protocol state machines [30]. Here DOI: 10.1145/3092282.3092289 network trac is observed, and not actively generated. is can then provide a probabilistic characterization of normal network trac, but it cannot uncover implementation aws that occur in strange message ows, which is our goal. 2 MODEL LEARNING Figure 1: SSH protocol layers 2.1 Mealy machines A Mealy machine is a tuple M = ¹I;O;Q;q0; δ; λº, where I is a nite inputs to abstract inputs and concrete outputs to abstract outputs. set of inputs, O is a nite set of outputs, Q is a nite set of states, For a thorough discussion of mappers, we refer to [2]. q0 2 Q is the initial state, δ : Q × I ! Q is a transition function, and λ : Q × I ! O is an output function. Output function λ is 3 THE SECURE SHELL PROTOCOL extended to sequences of inputs by dening, for all q 2 Q, i 2 I and e Secure Shell Protocol (or SSH) is a protocol used for secure ∗ σ 2 I , λ¹q; ϵº = ϵ and λ¹q;iσº = λ¹q;iºλ¹δ¹q;iº; σº. e behavior remote login and other secure network services over an insecure ∗ ∗ of Mealy machine M is dened by function AM : I ! O with network. It runs as an application layer protocol on top of TCP, 0 ∗ AM¹σº = λ¹q ; σº, for σ 2 I . Mealy machines M and N are which provides reliable data transfer, but does not provide any form ∗ equivalent, denoted M ≈ N, i AM = AN. Sequence σ 2 I of connection security. e initial version of SSH was superseded distinguishes M and N if and only if AM¹σº , AN¹σº. by a second version (SSHv2), aer the former was found to contain design aws which could not be xed without losing backwards 2.2 MAT Framework compatibility [15]. is work focuses on SSHv2. e most ecient algorithms for model learning (see [17] for a SSHv2 follows a client-server paradigm. e protocol consists recent overview) all follow the paern of a minimally adequate of three layers (Figure 1): teacher (MAT) as proposed by Angluin [6]. Here learning is viewed (1) e transport layer protocol (RFC 4253 [35]) forms the basis as a game in which a learner has to infer an unknown automaton for any communication between a client and a server. It by asking queries to a teacher. e teacher knows the automaton, provides condentiality, integrity and server authentica- which in our seing is a Mealy machine M, also called the System tion as well as optional compression. Under Learning (sul). Initially, the learner only knows the input (2) e user authentication protocol (RFC 4252 [32]) is used to alphabet I and output alphabet O of M. e task of the learner is authenticate the client to the server. to learn M via two types of queries: (3) e connection protocol (RFC 4254 [33]) allows the en- crypted channel to be multiplexed in dierent channels. • With a membership query, the learner asks what the re- ese channels enable a user to run multiple applications, sponse is to an input sequence σ 2 I ∗. e teacher answers such as terminal emulation or le transfer, over a single with the output sequence in A ¹σº. M SSH connection. • With an equivalence query, the learner asks whether a hypothesized Mealy machine H is correct, that is, whether Each layer has its own specic messages. e SSH protocol is H ≈ M. e teacher answers yes if this is the case. Other- interesting in that outer layers do not encapsulate inner layers, and wise it answers no and supplies a counterexample, which is dierent layers can interact. For this reason, we opt to analyze SSH a sequence σ 2 I ∗ that triggers a dierent output sequence as a whole, instead of analyzing its constituent layers independently. for both Mealy machines, that is, AH¹σº , AM¹σº. Below we discuss each layer, outlining the relevant messages which are later used in learning, and characterizing the so-called happy e MAT framework can be used to learn black box models of ow that a normal protocol run follows. soware. If the behavior of a soware system, or System Under At a high level, a typical SSH protocol run uses the three con- Learning (sul), can be described by some unknown Mealy machine stituent protocols in the order given above: aer the client estab- M, then a membership query can be implemented by sending lishes a TCP connection with the server, (1) the two sides use the inputs to the sul and observing resulting outputs.