Model Learning and Model Checking of SSH Implementations

Model Learning and Model Checking of SSH Implementations Paul Fiterau-Bros¸tean˘ ∗ Toon Lenaerts Erik Poll Radboud University Nijmegen Radboud University Nijmegen Radboud University Nijmegen [email protected] [email protected] [email protected] Joeri de Ruiter Frits Vaandrager Patrick Verleg Radboud University Nijmegen Radboud University Nijmegen Radboud University Nijmegen [email protected] [email protected] [email protected] ABSTRACT state machines of three SSH implementations, which we then an- We apply model learning on three SSH implementations to infer alyze by model checking for conformance to both functional and state machine models, and then use model checking to verify that security properties. these models satisfy basic security properties and conform to the e properties we verify for the inferred state machines are RFCs. Our analysis showed that all tested SSH server models satisfy based on the RFCs that specify SSH [32–35]. ese properties the stated security properties. However, our analysis uncovered are formalized in LTL and veried using NuSMV [13]. We use several violations of the standard. a model checker since the models are too complex for manual inspection (they are trivial for NuSMV). Moreover, by formalizing CCS CONCEPTS the properties we can beer assess and overcome vagueness or under-specication in the RFC standards. •Networks →Protocol correctness; •eory of computation is paper is born out of two recent theses [19, 29], and is to →Active learning; •So ware and its engineering →Model our knowledge the rst combined application of model learning checking; •Security and privacy →Logic and verication; Net- and model checking in verifying SSH implementations, or more work security; generally, implementations of any network security protocol. KEYWORDS Related work. Chen et al.[12] use the MOPS soware model Model learning, model checking, SSH protocol checking tool to detect security vulnerabilities in the OpenSSH C ACM Reference format: implementation due to violation of folk rules for the construction Paul Fiterau-Bro˘ s¸tean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits Vaan- of secure programs such as “Do not open a le in writing mode drager, and Patrick Verleg. 2017. Model Learning and Model Checking of to stdout or stderr”. Udrea et al.[27] also investigated SSH imple- SSH Implementations. In Proceedings of SPIN Symposium, California, USA, mentations for logical aws. ey used a static analysis tool to July 2017 (SPIN’17), 10 pages. check two C implementations of SSH against an extensive set of DOI: 10.1145/nnnnnnn.nnnnnnn rules. ese rules not only express properties of the SSH protocol logic, but also of message formats and support for earlier versions 1 INTRODUCTION and various options. Our analysis only considers the protocol logic. However, their rules were tied to routines in the code, so had to be SSH is a security protocol that is widely used to interact securely slightly adapted to t the dierent implementations. In contrast, with remote machines. e Transport layer of SSH has been sub- our properties are dened at an abstract level so do not need such jected to security analysis [31], incl. analyses that revealed crypto- tailoring. Moreover, our black box approach means we can analyze graphic shortcomings [5, 7, 20]. any implementation of SSH, not just open source C implementa- Whereas these analyses consider the abstract cryptographic pro- tions. tocol, this paper looks at actual implementations of SSH, and in- Formal models of SSH in the form of state machines have been vestigates aws in the program logic of these implementations, used before, namely for a manual code review of OpenSSH [23], rather than cryptographic aws. Such logical aws have occurred formal program verication of a Java implementation of SSH [22], in implementations of other security protocols, notably TLS, with and for model based testing of SSH implementations [9]. All this Apple’s ’goto fail’ bug and the FREAK aack [8]. For this we use research only considered the SSH Transport layer, and not the other model learning (a.k.a. active automata learning) [6, 21, 28] to infer SSH protocol layers. ∗Supported by NWO project 612.001.216, Active Learning of Security Protocols Model learning has previously been used to infer state machines (ALSEP). of EMV bank cards [3], electronic passports [4], hand-held readers for online banking [11], and implementations of TCP [14] and Permission to make digital or hard copies of part or all of this work for personal or TLS [25]. Some of these studies relied on manual analysis of learned classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation models [3, 4, 25], but some also used model checkers [11, 14]. on the rst page. Copyrights for third-party components of this work must be honored. Instead of using active learning as we do, it is also possible to For all other uses, contact the owner/author(s). use passive learning to obtain protocol state machines [30]. Here SPIN’17, California, USA © 2017 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM...$15.00 network trac is observed, and not actively generated. is can DOI: 10.1145/nnnnnnn.nnnnnnn then also provide a probabilistic characterization of normal network trac, but it cannot uncover implementation aws that occur in strange message ows, which is our goal. 2 MODEL LEARNING 2.1 Mealy machines Figure 1: SSH protocol layers A Mealy machine is a tuple M = ¹I;O;Q;q0; δ; λº, where I is a nite set of inputs, O is a nite set of outputs, Q is a nite set of states, inputs to abstract inputs and concrete outputs to abstract outputs. q0 2 Q is the initial state, δ : Q × I ! Q is a transition function, For a thorough discussion of mappers, we refer to [2]. and λ : Q × I ! O is an output function. Output function λ is extended to sequences of inputs by dening, for all q 2 Q, i 2 I and 3 THE SECURE SHELL PROTOCOL ∗ σ 2 I , λ¹q; ϵº = ϵ and λ¹q;iσº = λ¹q;iºλ¹δ¹q;iº; σº. e behavior e Secure Shell Protocol (or SSH) is a protocol used for secure ∗ ∗ of Mealy machine M is dened by function AM : I ! O with remote login and other secure network services over an insecure 0 ∗ AM¹σº = λ¹q ; σº, for σ 2 I . Mealy machines M and N are network. It runs as an application layer protocol on top of TCP, ∗ equivalent, denoted M ≈ N, i AM = AN. Sequence σ 2 I which provides reliable data transfer, but does not provide any form distinguishes M and N if and only if AM¹σº , AN¹σº. of connection security. e initial version of SSH was superseded by a second version (SSHv2), aer the former was found to contain 2.2 MAT Framework design aws which could not be xed without losing backwards e most ecient algorithms for model learning all follow the compatibility [15]. is work focuses on SSHv2. paern of a minimally adequate teacher (MAT) as proposed by SSHv2 follows a client-server paradigm. e protocol consists Angluin [6]. Here learning is viewed as a game in which a learner of three layers (Figure 1): has to infer an unknown automaton by asking queries to a teacher. (1) e transport layer protocol (RFC 4253 [35]) forms the basis e teacher knows the automaton, which in our seing is a Mealy for any communication between a client and a server. It machine M, also called the System Under Learning (sul). Initially, provides condentiality, integrity and server authentica- the learner only knows the input alphabet I and output alphabet tion as well as optional compression. O of M. e task of the learner is to learn M via two types of (2) e user authentication protocol (RFC 4252 [32]) is used to queries: authenticate the client to the server. (3) e connection protocol (RFC 4254 [33]) allows the en- • With a membership query, the learner asks what the re- crypted channel to be multiplexed in dierent channels. sponse is to an input sequence σ 2 I ∗. e teacher answers ese channels enable a user to run multiple applications, with the output sequence in A ¹σº. M such as terminal emulation or le transfer, over a single • With an equivalence query, the learner asks whether a SSH connection. hypothesized Mealy machine H is correct, that is, whether H ≈ M. e teacher answers yes if this is the case. Other- Each layer has its own specic messages. e SSH protocol is wise it answers no and supplies a counterexample, which is interesting in that outer layers do not encapsulate inner layers. is a sequence σ 2 I ∗ that triggers a dierent output sequence means that dierent layers can interact. Hence, it makes sense to for both Mealy machines, that is, AH¹σº , AM¹σº. analyze SSH as a whole, instead of analyzing its constituent layers independently. Below we discuss each layer, outlining the relevant e MAT framework can be used to learn black box models of messages which are later used in learning, and characterising the soware. If the behavior of a soware system, or System Under so-called happy ow that a normal protocol run follows. Learning (sul), can be described by some unknown Mealy machine At a high level, a typical SSH protocol run uses the three con- M, then a membership query can be implemented by sending stituent protocols in the order given above: aer the client estab- inputs to the sul and observing resulting outputs.

Load more