<<

arXiv:2012.06340v2 [cs.CR] 26 Feb 2021 01AscainfrCmuigMachinery. Computing for Association 2021 © eeec Format: Reference ACM h lento hudntaettesfwr’ u-iebe- run-time software’s the affect not should alternation The C SN978-x-xxxx-xxxx-x/YY/MM...$15.00 ISBN ACM aaapiain r bqiostak otewd adoption wide the to thanks ubiquitous are applications Java aasuc.Ti ae h a-tTeEdatc sone as attack Man-At-The-End the makes This source. Java [ JVM C Concepts CCS precision. lose to with string analysis call sensitive fixed in- context and context CPS analysis using proposed static sensitive attacks the malicious that argue causes We transformation (CPS). continua- style using passing transformation is source tion approach to The source a handling. on exception based with FJ for obfuscation flow approach control a propose we paper, this In haviour. transfer. flow control program’s the altering engineer- by reverse attempts ing software deters obfuscation flow Control Abstract ofrne(Conference’17). In Conference Version. usi Extended FJ for Passing: Obfuscation Flow Continuation Control 2021. Lu. Ming Zhuo Kenny style passing continuation tion, Keywords security tion mantics hps://doi.org/10.1145/nnnnnnn.nnnnnnn USA DC, Washington, 2017, July Conference’17, [email protected]. post from permiss to permissions specific Request prior republish, requires fee. or lists, a otherwise, to redistribute copy to To Abstrac or honored. permitted. servers be is must ACM credit than with others ing by owned fo work Copyrights page. this first of the on nents c f citation that full work the and and advantage this notice cop commercial this of or that bear profit provided part for fee distributed or or without all made granted not of is use copies classroom hard or or personal digital make to Permission example, For tools. to back with byte-codes codes Java their decompile source to to close easy is are it byte-codes codes, source Java Since devices. android of Introduction 1 hps://doi.org/10.1145/nnnnnnn.nnnnnnn oto lwOfsainfrF sn Continuation using FJ for Obfuscation Flow Control 15 • ; a eue odcmieJv ls lsbc to back files class Java decompile to used be can ] euiyadprivacy and Security . oto o bucto,pormtransforma- program obfuscation, flow Control • otaeadisengineering its and Software C,NwYr,N,USA, NY, York, New ACM, → otaeadapplica- and Software igpr nvriyo ehooyadDesign and Technology of University Singapore javap nomto ytm ehooyadDesign and Technology Systems Information rceig fACM of Proceedings colo nomto Technology Information of School hpe with shipped en hoMn Lu Ming Zhuo Kenny xeddVersion Extended [email protected] o and/or ion [email protected] 13 ayn Polytechnic Nanyang compo- r → Passing e are ies pages. opies Se- ng on or t- [ solutions find [ we as obfuscation, such code [ source of byte-codes domain of the level In the on obfuscation operating many malicious techniques are deter There to decompilation. mechanism through effective attack the of one obfus Code is applications. cation Java to threat security major the of olwn aacd snippet code Java following 1. Example Example Motivating 2 in costly more the cause or to imprecise is computation. become here analysis goal flow Our code. control anal- decompiled flow the control on running ysis by information attempt secret and attack- extract codes The to source into applied. byte-codes been the decompile has ers obfuscation layout which to con- exception the with to FJ extension obfuscate handling. an to approach propose passing we tinuation source paper, and this bytecodes In Java codes. the the between of because correlation insignificant, strong is obfuscation code source and s ncnrlflwofsaintcnqe,wihinclude which [ techniques, flattening flow obfuscation control flow control in est s s a l c 12 easm h takr anacs otebyte-codes the to access gain attackers the assume We iGn(){ ) ( FibGen int int } .Nt httedffrnebtenbtcd obfuscation bytecode between difference the that Note ]. int try int ; 1 = s o p l ; 1 = 2 f ; 0 = 1 f iGn{ FibGen 1,f ; s o p l , f2 , f1 ( t e g } f i 9 ho new throw while = r lse s el { ; s o p l = i } pliglyu bucto.W u u inter- our put We obfuscation. layout applying ] { ) i < x ( int int ++; i ; t = 2 f ; 2 f = 1 f omtvt h anie,ltscnie the consider let’s idea, main the motivate To − { 1; { ) x ; 2 f + 1 f = t { ) x < i ( ep n(); ) ( on i pt ce x E 3 , 11 , 21 n otnainpassing continuation and ] ) 4 ( / / ) 3 ( / / ) 6 ( ) / 5 / ( / / ) 2 ( / / ) 1 ( / / 5 , 17 , 20 ]. - Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu

1 an whose input is of type void and the body returns a boolean value. For brevity, we omit the type annotations of the formal arguments where there is no con- 2 fusion. The return key word is omitted when there is only one statement in the function body. We omit curly brackets in curried expressions, e.g. x -> raise -> k -> { ... } 3 t is the same as x -> { raise -> { k -> { ... } } }, f where x, raise and k are formal arguments for the lambda 4 5 abstractions. For convenience, we treat method declaration f and lambda declaration as interchangeable. For instance, the t 7 8 lambda declaration 6 int => int => int f =x −> y −>{x+y}

9 is equivalent to the following method declaration int f ( int x, int y ) { return x + y;} Figure 1. CFG of get In the last section, we mentioned that the layout obfusca- tion such as identifier renaming should have been applied to } the obfuscated code; however in this paper we keep all the lpos = i; // (7) identifiers in the obfuscated code unchanged for the ease of r = f2; reasoning. For the sake of assessing the obfuscation potency, } catch (Exception e) { // (8) we “flatten” the nested function calls into sequences of as- println("the input should be greater than " signment statements. For example, let x and y be variables + i + "."); of type int, let f be a function of type int => int => int } and g be a function of type int => int; instead of int r = return r; // (9) f(x)(g(y)); , we write: } } int => int f_x = f(x); int g_y = g(y); In the above we define a Fibonacci number generator in int r = f_x(g_y); class FibGen. In the method get, we compute the Fibonacci number given the position as the input. Note that the gener- As we observe in Figure 2, all the building blocks are contin- ator maintains a state, in which we record the last two com- uation functions with type CpsFunc.The simple codeblocks puted Fibonacci numbers, namely, f1 and f2 and the last (1), (6), (7) and (8) from the original source code, which con- computed position lpos. In method get lines 10 and 11, we tain no control flow branching statements, are translated raise an exception if the given input is smaller than i which into nested CPS functions get1, get6, get7 and get8. has been initialized to lpos. Towards the end of the method, (4) containing a throw statement is translated into get4 which we catch the exception and print out the error message. applies the exception object to the con- The number comment on the right of each statement in- tinuation raise of type ExCont.Block(9)hasa return state- dicates the code block to which the statement belongs. In ment, which is translated into a function in which we assign Figure 1, we represent the function get’s control flow as a the variable being returned r to the res variable and call graph. Each circle denotes a code block from the source pro- the normal continuation k of type NmCont.Block (2) is a try gram. ✷ catch statement which is encoded as a call to the trycatch combinator in line 24. Similarly block (3) the if-else state- Inspired by the approach [12], our main idea is to trans- ment is encoded as a call to the ifelse combinator and late control flow constructs, such as sequence, if-else, loop block(5) the while loopis encodedas a call to the loop com- into CPS combinators. In the context of FJ with exception binator. handling, we translate try-catch statements into CPS com- In Figure 3, we present the definitions of the CPS combi- binators as well. nators used in the obfuscation. Combinator loop accepts a In Figure 2 we find the obfuscated code snippet of get condition test cond, a continuation executor visitor to be method in CPS style. The obfuscated code is in a variant executed when the condition is satisfied, a continuation ex- of FJ, named FJ_, which is FJ with higher order functions, ecutor to be activated when the condition is not satis- nested function declaration and mutable variables in func- fied. Combinator seq takes two continuation executors and tion closures. void => void denotes a function type whose executes them in sequence. Combinator trycatch takes a values accept no argument and return no result. Exception continuation executor tr and an exception handling contin- => void denotes a function type that accepts an exception uation hdl. It executes tr by replacing the current excep- and returns no result. type NmCont = void => void de- tion continuation with ex_hdl. Combinator ifelse accepts fines a type alias. (void n) -> {return i < x}; defines a condition test cond, a continuation for the then-branch th Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA

type ExCont = Exception => void ; CpsFunc loop( void => Boolean cond, type NmCont = void => void ; CpsFunc visitor , CpsFunc exit) { type CpsFunc = ExCont => NmCont => void ; return raise −> k −> { int get( int x ) { if (cond()) { int i, t, r, res; Exception ex; NmCont => void visitor_raise = visitor(raise int => ExCont => ( int => void ) => void get_cps = ) ; x −> raise −> k −> { NmCont nloop = n −> { void => bool cond5 = n−> { i bool cond3 = n −> {x < i}; NmCont => void = ploop_raise = ploop(raise CpsFunc get5 = loop(cond5, get6 , get7) ) ; CpsFunc get3 = ifelse(cond3,get4 ,get5); return ploop_raise (k) ; CpsFunc get1_2 = seq(get2 , get9); }; CpsFunc pseq = seq(get1 , get1_2); return visitor_raise (nloop); NmCont => void pseq_raise = pseq(raise); } else { NmCont nk_res = n−>k(res); NmCont => void exit_raise = exit(raise); return pseq_raise (nk_res) ; return exit_raise(k); } } CpsFunc get1 = (ExCont raise) −> (NmCont k) −> { } i = this .lpos; r = −1; return k() ; } } CpsFunc seq(CpsFunc first , CpsFunc second) { return raise −> k −> { Exception => CpsFunc hdl = NmCont => void first_raise = first(raise); e −> {ex = e; return get8 ;} NmCont n_second = n −> { CpsFunc get2 = trycatch( get3 , hdl); NmCont => void second_raise = second(raise); return second_raise(k) ; CpsFunc get4 = raise −> k }; −> raise( new Exception() ) ; return first_raise (n_second); CpsFunc get6 = raise −> k } −>{ t= this .f1 + this .f2; this .f1 = this .f2; } this .f2 = t; i = i + 1; return k() ;} CpsFunc trycatch(CpsFunc tr , Exception => CpsFunc CpsFunc get7 = raise −> k hdl) { −> { this .lpos = i; r = this .f2; return k() ;} return raise −> k −> { CpsFunc get8 = raise −> k ExCont ex_hdl = ex −> { −> { System.out.println("..."); return k() ;} CpsFunc hdl_ex = hdl(ex); CpsFunc get9 = raise −> k NmCont => void hdl_ex_raise = hdl_ex(raise); −> { res = r; return k(); } return hdl_ex_raise (k) ; NmCont id_bind = i −> { res = i; return ; }; } CpsFunc get_x = get_cps(x); NmCont => void tr_hdl = tr(ex_hdl); NmCont => void get_x_raise = get_x(id_raise); return tr_hdl(k); void ign = get_x_raise(id_bind); } return res ; } } CpsFunc ifelse ( void => Boolean cond, void id_raise(Exception e) { return ; } CpsFunc th, CpsFunc el) { return raise −> k −> { Figure 2. get in CPS (flatten) (Line 1-43) if (cond()) { NmCont => void th_raise = th(raise); return th_raise(k); } else { to be executed when the condition is satisfied, a continua- NmCont => void el_raise = el(raise); tion executor for the else-branch el to be activated when return el_raise(k); the condition is not satisfied. } To assess the potency of the obfuscation technique, let’s } put on the hat of the attackers and apply some static analy- } sis to the obfuscated source code. The goal of the attack is to reconstruct the control flow graph from the obfuscated Figure 3. CPS Combinators (flatten) (Line 50-99) source. We apply an inter-procedural data flow analysis to the obfuscated code. For each variable or formal argument in the code, the analysis tries to approximate the set of pos- re-create the (global) control flow graph as presented in Fig- sible lambda expressions which the variable/argument may ure 4. We give names to anonymous functions as _; where capture during the execution. From the approximation we ; refers to the line number appearing in Figures 2 and 3. In Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu

_37 • We show that CPS based control flow obfuscation is _77 _88 _50 effective against static analysis, in particular context

_8 insensitive control flow analysis. ′′ ′ _7 get _ _67 _ _28 7 31 t The rest of the paper is organized as follows, In Section ′ f ′ t ′ 3, we formalize SSAFJ-EH’s syntax and semantics. In Sec- _7 _31 _52 _28 _55 f tion 4, we define the syntax of FJ_as well as its semantics. _68 _78 _90 f We formalize the source-to-source translation from SSAFJ- _35 ′ ′ ′ f EH to FJ . In Section 5, we discuss in details about the po- _68 _78 _90 _52 _ t tency assessment of our obfuscation technique against static t _70 _18 _9 _26 analyses. We discuss about related works in Section 6 and ′ ′ _18 _26 conclude in Section 7.

′ _35 _15 ′ 3 Single Static Assignment Form for FJ _33 _33 _79 _23 with Exception Handling 3.1 Syntax of SSAFJ-EH Figure 4. Reconstructed CFG of get We extend the syntax of SSAFJ [1] with exception handling,

(ClassDecl) cd ::= class  {fd; md } case that there are more than one anonymous functions in- (FieldDecl) fd ::= C 5 ′ (MethodDecl) md ::= C < (C G) {vd;1 } troduced in line ;. We use _8 to denote the first one, _8 to ′′ (VarDecl) vd ::= C G denote the second one and _8 to denote the third one. Com- (Block) 1 ::= ; : {B } pared to the CFG of the original source, the reconstructed (Statement) B ::= 0 | return 4 | throw 4 | G = 4.<(4) CFG of the obfuscated code are far more complex. For in- | try{1 } join {q } catch (C G) {1 } join {q } stance, there exist more than one loop in the obfuscated CFG | join {q } while 4 {1 } in Figure 4, namely, | if 4 {1 } else {1 } join {q } ′ ′ ′ (Assignment) 0 ::= G = 4 |4.5 = 4 1. _52, _28, _55, _52, ′ ′ ′ ′ ′ ′ (Phi) q ::= G = phi(; : G) 2. _ , _ , _ , _ , _79, _ , _70, _ , 68 78 90 26 33 68 (Label) ; ::= !0 | !1 | !2 | ... whereas there is clearly only one loop in the original CFG (Expression) 4 ::= E | G | 4.5 | new C () | this | 4>?4 in Figure 1. The loss of precision is due to the fact that the (Operator) >? ::= +|−|>|<|==| ... (Type) C ::= 8=C | 1>>; | E>83 |  attack which we simulate is using a context insensitive data (Value) E ::= 2 | ;>2 | null flow analysis, which is known to be incomplete in the pres- (MemLoc) ;>2 ::= loc(0) | loc(1) | ... ence of multiple calls to the same function. For example, in Figure 2, lines 12 and 13, we call the combinator seq twice class  {fd; md} defines a class.  denotes a class name. with different actual arguments. The analysis ignores the fd denotes a sequence of field declarations, fd1; ...; fd=. Like- context and union the two sets of actual arguments into sets. wise for md denotes a sequence of method declarations. For These approximation are propagated along to the rest of the simplicity, we do not consider class inheritance, class con- analysis. A similar observation is applicable to context sen- structors and method modifiers. Implicitly we assume each sitive analysis with a fixed size call string, in the presence of class comes with a default constructor and all field declara- multiple calls to recursive combinators such as loop. Attack- tions are public and non-static. C < (C G) {vd;1} defines a ers may choose to use a context sensitive analysis, however method declaration. For simplicity, we restrict the language to achieve a better approximation, the analysis will be much to single argument methods. < denotes a method name. G, more costly and often not practical. ~ and I denote variables. 1 defines a sequence of blocks. For the ease of establishing correctness result, we use an Each block is associated with a label ;. Labels are unique extension of the Single Static Assignment form for FJ with within the method body. Reference to labels is restricted to exception handling (SSAFJ-EH) as the source language of the method’s local . Each block consists of a sequence the translation. The construction of SSAFJ-EH can be ex- of assignment statements or a control flow statement. The tended from the work found in the literature [1], which is last block in a method must contain a return statement. Note not the focus of this paper, hence we omit the details. that all control flow statements potentially alter the default The contributions of this paper include, top-down execution order. The SSA form ensures the defi- • We formalize the single static assignment form of FJ nition of a variable through assignment must dominate all with Exception Handling. the uses of this variable. Unlike the work [12], which uses • We develop a control flow obfuscation algorithm by low-level SSA structure with statements, the SSA form translating SSAFJ-EH to FJ_using continuation pass- introduced in this paper is in a high-level structured form. ing. That is, only certain control flow statements, such as if-else, Control Flow Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA

int get( int x ) { (Global Decl Env) GEnv ⊆ (Class × FieldDecl) ∪ int i_1, i_2, i_5, i_6, t_6, r_1, r_2, r_7; ((Class × MethodName) × MethodDecl) L1: i_1 = this .lpos; (Local Decl Env) LEnv ⊆ (Variable × Value) r_1 = −1; (Memory Store) Store ⊆ (MemLoc × Object) = L2: try { (Object) obj :: obj(C,) ::= ( LEnv Store ) L3: if (x < i_1) { (Exception) 4G exception E, , , ; (Object Field Map) d ⊆ (FieldName × Value) L4: throw new Exception() ; } else { MDssaJ·K :: MethodDecl → Value → Value → GEnv → Store → L5: join {i_5=phi(L3:i_1, L6:i_6)} while ( (Value, Store)ex ′ i_5 < x) { MDssaJ(C <(C G) {vd;1 })K E> EG 64=E BC = ′ L6: t_6 = this .f1 + this .f2; let ;4=E = VDssaJvdK {(Cℎ8B, E> ), (G, EG ) } ′ this .f1 = this .f2; in case BssaJ1K !0 64=E ;4=E BC of ′′ ′ ′ ′ ′ this .f2 = t_6; exception(E,;4=E ,BC .; ) → exception(E,;4=E ,BC ,!0) i_6 = i_5 + 1; (E,;4=E′′,BC′, ;′) → (E,BC′) } L7: this .lpos = i_5; BssaJ·K :: [Block] → Label → GEnv → LEnv → Store → r_7 = this .f2; (Value, LEnv, Store, Label)ex B = B } join {i_3 = phi(L4:i_1, L7:i_5)} ssaJ1K ; 64=E ;4=E BC ssaJ1K ; 64=E ;4=E BC B = B } join {i_2 = phi(L4:i_1)} ssaJ1;1K ; 64=E ;4=E BC case ssaJ1K ; 64=E ;4=E BC of exception(E,;4=E′, BC′,;′) → exception(E,;4=E′, BC′,;) catch (Exception e) { (E,;4=E′, BC′,;′) → B 1 ;′ 64=E ;4=E′ BC′ L8: System.out.println("the input..." + i_2 + ssaJ K "."); BssaJ·K :: Block → Label → GEnv → LEnv → Store → } join {r_2 = phi(L3:r_7, L8:r_1)}; (Value, LEnv, Store, Label)ex L9: return r_2 ; BssaJ; : {if (4) {11 } else {12 } join{q }}K ;? 64=E ;4=E BC = E } case ssaJ{K4 } 64=E ;4=E BC of ′ ′ (CAD4,BC ) → case BssaJ11K ; 64=E ;4=E BC of Figure 5. Method get in Single Static Assignment Form exception(E,;4=E′, BC′′,;′) → exception(E,;4=E′, BC′′, ;′) ′ ′′ ′ ′ ′ ′′ (E,;4=E , BC ,; ) → (E, FssaJqK ; ;4=E , BC , ;) ′ ′ (5 0;B4,BC ) → case BssaJ12K ; 64=E ;4=E BC of exception(E,;4=E′, BC′′,;′) → exception(E,;4=E′, BC′′, ;′) ′ ′′ ′ ′ ′ ′′ try-catch and while, may carry one or more q clauses. There (E,;4=E , BC ,; ) → (E, FssaJqK ; ;4=E , BC , ;) B = E is no goto statement. A q assignment G = phi(; : G) selects ssaJ; : {return 4 }K ;? 64=E ;4=E BC case ssaJ4K 64=E ;4=E BC of (E,BC′) → (E,;4=E,BC′, ;) the right labeled argument ; : G to assign to the left hand 8 8 BssaJ; : {throw 4 }K ;? 64=E ;4=E BC = case EssaJ4K 64=E ;4=E BC of side variable, based on the label of the preceding statement. (E,BC′) → exception(E,;4=E,BC′, ;) ′ For if-else statement, the q assignment is inserted right af- BssaJ; : {try{1 } join{qA } catch(C G) {1 } join {q: }K ;? 64=E ;4=E BC = ter the then- and else-branches, which merges the possible case BssaJ{1 }K 64=E ;4=E BC of ′ ′ ′ ′ ′ ′ ′ different set of values from the branches into a new set of (E,;4=E ,BC , ; ) → (E, FssaJq: K ; ;4=E , BC , ; ) ′ ′ ′ variables. In the while loop, the q assignment is located be- exception(E,;4=E ,BC , ; ) → ′′ = F ′ ′ fore the loop-condition. In try-catch statement, we find two let ;4=E ssaJqA K ; ;4=E + (G, E) ′ ′ ′′ ′ in case BssaJ1 K ; 64=E ;4=E BC of sets of q assignments. The q assignments located after the ′ ′′′ ′′ ′′ ′ ′′ ′′′ ′′ ′′ (E ,;4=E , BC , ; ) → (E , FssaJq: K ; ;4=E , BC ,; ) try block and catch block has a functionality similar to the exception(E′,;4=E′′′, BC′′, ;′′) → exception(E′,;4=E′′′, BC′′,;′′) one in if-else statement. The other one is located between BssaJ; : {join {q } while(4) {1 }}K ;? 64=E ;4=E BC = ′ the try block and the catch block. It is to merge the differ- let ;4=E = FssaJqK ;? ;4=E ′ ent sets of values that are arising in various parts of the try in case EssaJ4K 64=E ;4=E BC of block due to exception being raised. We will discuss more (5 0;B4,BC′) → (=D;;, ;4=E′, BC′,;) ′ ′ ′ in details in the semantics of SSAFJ-EH. C denotes a type. A (CAD4,BC ) → case BssaJ1K ; 64=E ;4=E BC of ′′ ′′ ′ ′′ ′′ ′ type C can be types such as 8=C, E>83 or a class type exception(E,;4=E ,BC , ; ) → exception(E,;4=E , BC , ; ) (E,;4=E′′,BC′′, ;′) → . A value E is either a constant, a memory location or null. B J; : { join {q } while(4) {1 }}K ;′ 64=E ;4=E′′ BC′′ The formal details will be elaborated in the upcoming sub- ssa BssaJ; : {G = 41.<(42 ) }K ;? 64=E ;4=E BC = case EssaJ41K 64=E ;4=E BC of section. Syntax of assignments and expressions is standard. (loc(=), BC′) → case BC′ (loc(=)) of For instance, the corresponding SSA form of the method get obj(C,d) → case 64=E(C, <) of FibGen md → case EssaJ42K 64=E ;4=E BC of from the class in Example 1 is in Figure 5. ′′ ′′ ′′ ′′ (E ,BC ) → case MDssaJmdK loc(=) E 64=E BC of (E′′′,BC′′′) → (=D;;, ;4=E + (G, E′′′), BC′′′,;) 3.2 Semantics of SSAFJ-EH exception(E′′′, _,BC′′′, _) → exception(E′′′,;4=E,BC′′′, ;) BssaJ; : {0}K ;? 64=E ;4=E BC = case AssaJ0K 64=E ;4=E BC of We report a call-by-value semantics of SSAFJ-EH in Figures 6 ′ ′ ′ ′ and 7. We adopt the standard denotational semantics nota- (;4=E , BC ) → (=D;;,;4=E , BC , ;) tion found in [16]. GEnv denotes a constant global environ- Figure 6. Denotational Semantics of SSAFJ-EH (Part 1) ment which maps class names to field declarations and class Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu

VDssaJ·K :: [VarDecl] → LEnv → LEnv For breivity we omitted data constructors in the patterns VD = ssaJ[]K ;4=E ;4=E when there is no ambiguity. VD ; = VD ssaJ(C G) vdK ;4=E ssaJvdK (;4=E + (G, =D;;)) We adopt Haskell’s style list syntax. [] denotes an empty FssaJ·K :: [Phi] → Label → LEnv → LEnv list. G : GB denotes a non-empty list where G refers to the F = ssaJ[]K ; ;4=E ;4=E head and GB refers to the tail. We assume there exists an FssaJG = phi(;1 : G1,,...,;8 : G8 , ..., ;= : G=); qK ;8 ;4=E = implicit conversion from a sequence 11;12; ...;1= to a list 11 : FssaJqK ;8 ;4=E + (G, G8 ) 12 : ... : 1= : []. AssaJ·K :: [Assignment] → GEnv → LEnv → Store → (LEnv, Store) BssaJ·K evaluates a block with respect to the context, i.e. AssaJ[]K 64=E ;4=E BC = (;4=E,BC) the label of the preceding block, the local environment and AssaJ0; 0K 64=E ;4=E BC = case AssaJ0K 64=E ;4=E BC of the memory store. As the output, it returns a tuple of four ′ ′ ′ ′ (;4=E ,BC ) → AssaJ0K 64=E ;4=E BC items, namely, the value of the evaluation, the updated local

AssaJ·K :: Assignment → GEnv → LEnv → Store → (LEnv, Store) environment, the updated memory store and the label from AssaJG = 4K 64=E ;4=E BC = case EssaJ4K 64=E ;4=E BC of the exiting block if there is no exception occurred, other- ′ ′ ′ ′ (E , BC ) → (;4=E + (G, E ), BC ) wise an exception is returned. BssaJ·K evaluates a sequence A J4.5 = 4′K 64=E ;4=E BC = case E J4K 64=E ;4=E BC of ssa ssa of blocks by applying BssaJ·K to each block in order, and (loc(=), BC′) → case BC′ (loc(=)) of ′ ′ propagates the resulting environments if there is no excep- obj(C,d) → case EssaJ4 K 64=E ;4=E BC of (E,BC′′) → (;4=E,BC′′ + (loc(=), obj(C,d + (5 , E)))) tion, otherwise the exception is propagated. We highlight the a few interesting cases of BssaJ·K.Incase EssaJ·K :: Expression → GEnv → LEnv → Store → (Value, Store) of if-else statement, we evaluate either the then-branch11 or EssaJEK 64=E ;4=E BC = (E,BC) EssaJGK 64=E ;4=E BC = (;4=E(G), BC) the else-branch 12 depending on the result of the condition EssaJthisK 64=E ;4=E BC = (;4=E(this), BC) expression 4. Given the label of the exiting block, either from EssaJ4.5 K 64=E ;4=E BC = case EssaJ4K 64=E ;4=E BC of F ′ ′ 11 or 12, we apply ssaJqK to update the local environment (loc(=), BC ) → case BC (loc(=)) of in the result. In case of a try-catch statement, we first evalu- obj(C,d) → (d (5 ), BC′) EssaJnew C ()K 64=E ;4=E BC = let = = <0G;>2 (BC) ate the try block. If the evaluation is successful, we compute d = {(5, null) |5 ∈ 64=E(C) } the result by updating the local environment with FssaJq: K. in (loc(= + 1), BC + (loc(= + 1), obj(C,d))) If some exception arises from the evaluation of the try block, E J4 >? 4 K 64=E ;4=E BC = case E J4 K 64=E ;4=E BC of ssa 1 2 ssa 1 we generate a local environment with F Jq K depending (E1,BC1) → case EssaJ42K 64=E ;4=E BC1 of ssa A (E2, BC2) → (0??;~(>?,E1, E2), BC2) on the location from which the exception is raised. Next we evaluate the catch block under this local environment. Fi- Figure 7. Denotational Semantics of SSAFJ-EH (Part 2) nally we update the output local environment with FssaJq: K. In case of a method invocation, we evaluate the object ex- pression into a memory location, from which we look up names and method names to method declarations. We as- the memory store to retrieve the actual object and its type. sume that the given program is free of type errors and there From the global environment, we retrieve the method dec- is no null pointer reference error. LEnv denotes a local vari- laration based on the method name <. We call MDssaJmdK able environment which maps variables to values. Store de- with the actual arguments to compute the result of the right fines a memory environment that maps memory locations hand side. Finally, we return a tuple consists of a null value, to objects. a updated local environment with the updated binding of As a convention, we write <(0) to refer to the object 1 the left hand side G as well as the updated memory store. associated with the key 0 in a mapping <, i.e. (0,1) ∈ <, The remaining cases are trivial. given that all keys in < are unique. We use < + (0,1) to FssaJ·K walks through the list of q assignments. For each denote an “update if exists – insert otherwise” operation, q of shape G = phi(;1 : G1, ..., ;8 : G8, ..., ;= : G=), it searches i.e. < +(0,1) = {(G,~) ∈ <|0 ≠ G}∪{(0,1)}. for the label matching with the incoming label ;8. The value In this paper, we are only interested in the obfuscation of G8 will be assigned to the variable G. of methods, hence we omit the semantics for class declara- The definitions of AssaJ·K, AssaJ·K and EssaJ·K are straight- tion and field declaration. MDssaJ·K defines the semantics of forward and we omit the details. a method as a function expecting a reference to the current object, a value as the actual argument, a global environment and a memory store and returns a pair of value and mem- ory store as result. Given a domain , we write ex to de- note  ∪ Exception. VDssaJ·K takes a list of variable decla- rations and a local declaration environment as inputs then registers each variable in the declaration environment. Note that we use Haskell’s style of let-binding to introduce tem- porary variables and case expression for pattern matching. Control Flow Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA

4 SSAFJ-EH to FJ_Translation (Global Decl Env) GENV ⊆ (CLASS × FIELDDECL) ∪ ((CLASS × METHODNAME) × METHODDECL) 4.1 Syntax of FJ_ (Local Decl Env) LENV ⊆ (VARIABLE × VALUE) We consider the valid syntax of our target language FJ_ (Memory Store) STORE ⊆ (MEMLOC × OBJECT) (OBJECT) obj ::= obj(),d) (CLASSDECL) CD ::= 2;0BB  {FD; MD } (FIELDDECL) FD ::= )  MD J·K :: METHODDECL → VALUE → VALUE → GENV → STORE ::= ; fj_ (METHODDECL) MD ) " () - ) {VD ( } → (VALUE, STORE) (VARDECL) VD ::= ) - | - = _ ′ MDfj_J() " () - ) {VD; ( })K +0 +G 64=E BC = (STATEMENT) ( ::=  | return  | if () {( } else {( } let ;4=E = VD JVDK {(Cℎ8B, +0), (-, + ) } (ASSIGNMENT)  ::= - =  | . =  fj_ G in case S J(K 64=E ;4=E BC of (+, _, BC′) → (+,BC′) (EXPRESSION)  ::= + | - | this | - () | ." () | . fj_ | >? | new ) () VDfj_J·K :: [VARDECL] → LENV → LENV = (BASIC TYPE) ) :: 8=C | 1>>; | E>83 |  VDfj_J - = _; VDK ;4=E = VDfj_JVDK (;4=E + (-, _)) (FUNCTION TYPE) ::= ) | ⇒ (VALUE) + ::= 2 | _ | !$ | null Sfj_ J·K :: STATEMENT → GENV → LENV → STORE → (VALUE, STORE) (MEMLOC) !$ ::= loc(0) | loc(1) | ... Sfj_ J[]K 64=E ;4=E BC = (null,;4=E,BC) S ′ ′ A (LAMBDA) _ ::= ( - ) → {( } fj_ J;(K 64=E ;4=E BC = let (+,;4=E , BC ) = fj_JK 64=E ;4=E BC ′ ′ in Sfj_J(K 64=E ;4=E BC FJ_is an extension of FJ with the support of anonymous Sfj_ Jreturn K 64=E ;4=E BC = case Efj_JK 64=E ;4=E BC of ′ ′ functions, i.e. lambda abstraction. FJ_differs from SSAFJ-EH (+,BC ) → (+,;4=E,BC ) S = E as follows. Labels, while loop, try-catch and throw state- fj_ Jif () {(1 }else{(2 }K 64=E ;4=E BC case fj_JK 64=E ;4=E BC of ′ S ′ ments are excluded. FJ supports a limited form of higher (CAD4,BC ) → fj_J(1K 64=E ;4=E BC _ ′ S ′ order functions. - = _ defines a local constant variable (5 0;B4,BC ) → fj_J(2K 64=E ;4=E BC whose value is initialized to a lambda abstraction _. Lambda Afj_J·K :: ASSIGNMENT → GENV → LENV → STORE → (VALUE, LENV, STORE) abstraction ( -) → {(} denotes an anonymous function that expects zero or more parameters. Lambda functions E do not introduce local variables within their own scopes. fj_J·K :: EXPRESSION → GENV → LENV → STORE → (VALUE, STORE) 1 E Lambda functions are nested in a top-level method. Note fj_J- ()K 64=E ;4=E BC = case ;4=E(- ) of ′ (( - ) → {( }) → case Sfj_J(K 64=E ;4=E BC= of that - and. denote variables. Variables declared in a method ′ ′ are accessible within its nested functions. - () denotes a (+,, BC ) → (+,BC ) where (+1,BC1) = Efj_J1K 64=E ;4=E BC function application where - isavariableboundtoalambda ... abstraction. ."() denotes a method application. The value (+=, BC= ) = Efj_J= K 64=E ;4=E BC=−1 ′ + in the target language includes constants, lambda abstrac- ;4=E = ;4=E + (-1,+1) + ... + (-=,+=) E = E tion and memory locations. fj_J1." (2 )K 64=E ;4=E BC case fj_J1K 64=E ;4=E BC of (loc(=), BC1) → case BC1 (loc(=)) of obj(),d) → case 64=E() , ") of 4.2 Semanticsof FJ_ MD → case Efj_J=K 64=E ;4=E BC1 of MD In Figure 8 we describe the denotation semantics of FJ_. We (+2,BC2) → fj_JMDK loc(=) E2 64=E BC2 use the upper case symbols GENV, LENV and STORE to capture Figure 8. Denotational Semantics of FJ the run-time bindings. They are similar to the counter-parts _ found in the SSAFJ-EH. MDfj_J·K is similar to MDssaJ·K except that it does not we proceed with the evaluation the body of the lambda ab- keep track of labels and exceptions. VDfj_J·K is nearly iden- straction under the new environment and memory store. (II) tical to VDssaJ·K except that it handles an extra case of local In case of function application Efj_J1."(2)K, we evaluate function declaration. 1 into a memory location loc(=) with an updated memory ′ ′ Sfj_J·K is a simplified version of BssaJ·K without the need store BC . By looking up BC (loc(=)) we retrieve the defini- of keeping track of the labels and the exceptions. tion of the method associated with name ". We then evalu- Afj_J·K is nearly identical to AssaJ·K, hence its definitions ate 2 and apply the resulting value to the method. are omitted. 4.3 SSAFJ-EHto FJ Translation using CPS Efj_J·K differs from EssaJ·K in case of function/method ap- _ plication. There are two different scenarios. (I) In case of We describe the SSAFJ-EH to FJ_translation using CPS in Efj_J- ()K where ;4=E (-) yields a lambda abstraction. We Figures 9 and 10. Specifically, we use command-based con- evaluate all the actual parameters 1 to = into +1 to += with tinuation pass style. the memory store being updated and propagated. We create There are mainly two types of continuations, the excep- an extended local environment by binding -8s to +8s. Finally tion continuation Exception => void and the normal con- tinuation void => void.Each function in CPS form expects 1The examples given in Section 2 Figures 2 and 3 seem to be violating the first argument as the exception continuation and the sec- this restriction. The violation is due to the “flattening” effect, which can ond one as the normal continuation, except for the top level be undone. method. Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu

CMD cpsJ·K :: MethodDecl → METHODDECL CBcpsJ; : {0}K q: qA = let  = CAcpsJ0K ′ CMDcpsJC < (C G) {E3;1 }K =  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ E>83 <; = let VD = CVDcpsJE3K (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → ′ (VD ,) = CBcpsJ[G/8=?DC ]1K [] [] {; return : (); } ′ ′ ′ ′ ) = C;) = C ; - = G; " = < ( , ) = CKcpsJq: K ; ′ ′ ′  = ) ⇒ (G24?C8>= ⇒ E>83) ⇒ () ⇒ E>83) ⇒ E>83 "2?B = in( [ ] + + ,B4@(<; , ) ′ () 8=) → (G24?C8>= ⇒ E>83 A08B4) → () ⇒ E>83 :) → CBcpsJ; : {0};1K q: qA = let  = CAcpsJ0K = ; ; {8=?DC 8=  (A08B4)(() → : (A4B)) }  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ E>83 <; = in ) ′ " () - ) {VD + ++′ + +[ ];) 8=?DC;) ′ A4B; G24?C8>= 4G; (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → "2?B (- ) (83A08B4 ) (A → {A4B = A;A4CDA=; });A4CDA= A4B; } {; return : (); } ′ ′ ( , ) = CBcpsJ1K q: qA CVD J·K :: [VarDecl] → [VARDECL] ′ ′ cps in ( [ ] + + ,B4@(<; , )) CBcpsJ; : {G = 41.<(42 ) }K q: qA = let 1 = CEcpsJ41K = CE CBcpsJ·K :: [Block] → [Phi] → [Phi]→([VARDECL], EXPRESSION) 2 cpsJ42K ′ ′′  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ < = CBcpsJ; : {if (4) {1 }else{1 } join {q }}K q: qA = ; ′ ′ ′ (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → let ( , ) = CBcpsJ1 K q qA { .< ( ) (A08B4) (() E) → {G = E; return : (); })}} ( ′′ ′′) = CB ′′ 1 2?B 2  , cpsJ1 K q qA ′ ′ ( , ) = CKcpsJq K ;  = CEcpsJ4K : ′ ′ ′′′ ′′′ in ( [ ] + + ,B4@(<; , )) ( , ) = CKcpsJq K ; : CB : = ; = = CE in(′ + +′′ + +′′′,B4@(85 4;B4 (() → ,′,′′),′′′)) cpsJ; {G 41.<(42 ) 1 }K q: qA let 1 cpsJ41K  = CE J4 K CB J; : {if (4) {1′ }else{1′′ } join {q }};1K q q = 2 cps 2 cps : A  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ < = ( ′ ′) = CB ′ ; let  , cpsJ1 K q qA (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → (′′,′′) = CB J1′′K q q cps A {1.<2?B (2) (A08B4) (() E) → {G = E; return : (); })}}  = CEcpsJ4K ′ ′ ( , ) = CBcpsJ1K q qA ′′′ ′′′ : ( , ) = CBcpsJ1K q: qA ′ ′ in ( [ ] + + ,B4@(<; , )) in(′ + +′′ + +′′′,B4@(85 4;B4 (() → ,′,′′),′′′)) ′ CBcpsJ; : {join {q } while (4) {1 }}K q: qA = CKcpsJ·K :: [Phi] → !014; → ( [VARDECL], EXPRESSION) let (,) = CKcpsJqK <8=!014; (q ) CKcpsJqK ; = let (-,) = CFcpsJqK ; ′ = CE  cpsJ4K  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ E>83 <:; = ′′ ′′ ′ ( , ) = CBcpsJ1 K q qA (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → ′′′ ′′′ = ( , ) = CKcpsJq: K ; {- ; return : (); } in ( + +′′ + +′′′,B4@(, ;>>? (() → ′,′′,′′′))) in ( [ ], <:; ) ′ CBcpsJ; : {join {q } while (4) {1 }};1K q: qA = CAcpsJ·K :: [Assignment] → [ASSIGNMENT] let (,) = CKcpsJqK <8=!014; (q ) ′ = CE J4K cps CE :: Expression EXPRESSION ′′ ′′ ′ cpsJ·K → ( , ) = CBcpsJ1 K q qA ′′′ ′′′ ( , ) = CBcpsJ1K q: qA CF J·K :: [Phi] → !014; → [(VARIABLE, EXPRESSION) ] in ( + +′′ + +′′′,B4@(, ;>>? (() → ′,′′,′′′))) cps CFcpsJ[]K _ = [] CBcpsJ; : {throw 4 }K q: qA = let (-,) = CFcpsJqA K ; CFcpsJG = ?ℎ8 (..., ;8 : G8 ,...); qK ; | ; == ;8 = (G, G8 ) : CFcpsJqK ;  = (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ E>83 <; = (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → Figure 10. SSAFJ-EH to FJ_Translation (Part 2) {- = ; return A08B4 (CEcpsJ4K); } in ( [ ], <; ) CBcpsJ; : {return 4 }Kq: qA = let  = CEcpsJ4K = =  (G24?C8>= ⇒ E>83) ⇒ (E>83 ⇒ E>83) ⇒ E>83<; The function CMDcpsJ·K converts a method from SSAFJ- (G24?C8>= ⇒ E>83 A08B4) → (E>83 ⇒ E>83 :) → EH to FJ_using CPS. Given an input method in SSAFJ-EH {A4B = ; return : (); } has type t => t’, the conversion synthesizes the output in ( [ ], <; ) CB J; : {try{1 }join{q′ } catch (C G) {1′ }join {q′ }}K q q = (or translated) method of type T => (Exception => void) cps A : : A ′ ′ ′ => (T’ => void) => void = = let (,) = CBcpsJ1K q q , by letting ) C and ) : A ′ ( ′ ′) = CB ′ ′ C . The first argument is the input, the second argument is  , cpsJ1 K q: qA ′′ = (G24?C8>= G) → {4G = G; return ′; } an exception continuation, and the third argument is the ′′′ ′′′ ( , ) = CKcpsJq: K ; normal continuation. The conversion consists of the follow- ′ ′′′ ′′ ′′′ in ( + + + + ,B4@(CA~20C2ℎ(, ), )) ing steps. Firstly we apply the helper function CVDcpsJ·K CB : { { } { ′ } ( ) { ′ } { ′ }}; ′′ = cpsJ; try 1 join qA catch C G 1 join q: 1 K q: qA to translate the declarations. CVDcpsJ·K is an ( ) = CB ′ ′ let , cpsJ1K q: qA identity function, we omit its details. As the second step, we (′,′) = CB J1′K q′ q cps : A apply the helper function CBcpsJ·K which translates the list ′′ = = ; ′;  (G24?C8>= G) → {4G G return  } of blocks from the source method. The result of the trans- (′′′,′′′) = CB J1′′K q q cps : A lation is a pair consisting of a list of local lambda declara- in ( + +′ + +′′′,B4@(CA~20C2ℎ(,′′),′′′)) tions and a main expression. The main expression  is then Figure 9. SSAFJ-EH to FJ_Translation (Part 1) applied to the exception continuation A08B4 and the normal Control Flow Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA

′ continuation () → :(A4B). At last we synthesize the public clause 1 with q: as q assignments from the normal interfacing method " which wraps around the CPS counter- ′ 2 continuation and qA from the exception continuation. part "2?B . The catch clause block 1′ is translated with q ′ as q as- Most of the translation tasks are computed in the helper : signments from the normal continuation and q ′ from function CBcpsJ·K. The function expects a list of blocks, a A list of q assignments from the subsequent block in the nor- the exception continuation. In order to bind the excep- mal continuation and a list of q assignments from the sub- tion into the variable G, we define a wrapper lambda expression which expects the exception as input and sequent block in the exception continuation. CBcpsJ·K trans- lates the blocks structurally. assigns it to 4G. (Recall that 4G is defined in the top level method). Lastly, we construct a connecting con- • In case of a singleton list containing an if-else block, tinuation by resolving q with the current label ;. we apply a helper function CE J·K to translate the : cps • In case of a singleton list containing a throw block, we conditional expression. Then we apply CBcpsJ·K recur- sively to the blocks from the then-branch and the else- first resolve the qA from the exception handler with re- spect to the current label ; by calling CF J·K. Taking branch by using the q assignments, q, from the if-else cps the result from the q resolution, we define a contin- statement’s join clause. To “connect” the translated if- uation function < in which we bind the results, and else back to the subsequent block in the normal con- ; call the exception continuation A08B4() with the trans- tinuation, we apply another helper function CK J·K cps lation of the 4. to construct a continuation that resolves q: with re- • In case of a singleton list containing a return block, we spect to ;. The main expression is constructed struc- define a continuation function <; in which we assign turally from the derived expressions from the various the translation of 4 to A4B. (Recall that A4B is defined in sub-steps with B4@ and 854;B4 combinators, whose def- the top level method). Then we call the continuation initions can be found in Figure 3. :. • In case of a non singleton list of which the head is an • In cases of a list with a method invocation as the head, if-else block, we perform a trick similar to the previ- we translate the sub-expressions 41 and 42 into 1 and ous case, except that we do not construct a continu- 2. We define a continuation function <; in which we CK ation with cpsJ·K to resolve q: . Instead we apply invoke 1.<2?B (2) with A08B4 as the exception con- CBcpsJ·K to 1 recursively. tinuation and the normal continuation is a lambda ex- • In case of a singleton list containing a while block, we pression that captures the result of the method invo- first need to apply CKcpsJ·K to resolve the q with re- cation into an argument E. In the body of the lambda spect to the label of the block from which we enter exression we assign E to G before invoking the contin- the while loop. For convenience, we assume that there uation :. Note that we treat "2?B same as <2?B and the exists a partial order among labels, i.e. !8 ≺ !9 im- call of <2?B could a recursive call or another method plies that !8 must be on the path leading from !> to sharing the same closure context in the same scope. !9 , where !0 is the method’s entry label. We assume The rest of the CBcpsJ·K cases are trivial. that there are only two labels in the q assignments The helper function CFcpsJ·K takes a list of q assignments, in all while blocks, i.e. the first label is the entry la- a label and returns a list pair variable-expression pairs. For bel to the while block, and the second label is loop- each q assignment, it picks the right G8 associated with the back label, and <8=!014; (q) returns the entry label. matching label ; as the second component of the resulting Such a restrictive form does not limit the expressive- pair. CK ness of the language. We assume that there exists a The helper function cpsJ·K synthesizes a continuation pre-processing step that convert any programs into function that connects the block with label ; with the block CF this form. that q is defined, by making use of cpsJ·K. CA CE After resolving q with respect to the entry label to Helper functions cpsJ·K and cpsJ·K are identity func- tions, whose definitions are omitted. the while block, we apply CBcpsJ·K to 1 recursively to In Figure 11, we find the full CPS translation of the get translate the while body. Lastly we apply CKcpsJ·K to method in Figure 5. The result should be identical to the construct a continuation that resolves q: with respect one in Figure 2, except that we do not apply “flattening” to to ;. We build the main expression using the B4@ and nested and curry function calls, we insert extra connection ;>>? combinators, whose definitions can be found in blocks thanks to the q resolutions. Figure 3. • In case of a singleton list containing a try-catch block, Definition 1 (Consistent Global Environments). Let 64=E ∈ GEnv ′ GENV ′ we apply CBcpsJ·K recursively to the block in the try and64=E ∈ . Then we say64=E ⊢ 64=E iff ∀(,<) ∈ ′ 3><(64=E) : 64=E (,<) = CMDcpsJ64=E (,<)K.

2The continuation A → {A4B = A;A4CDA=; } could have been simplified to Lemma 4.1 (SSAFJ-EH to FJ_Translation Consistency). Let A → {A4CDA=; }. However we keep to former just for consistency. < be a SSAFJ-EH method of a class , > be (a reference to) Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu int get( int x ) { Let’s try to apply inter procedural control flow analysis int i_1, i_2, i_5, i_6, t_6, r_1, r_2. r_7; to the obfuscated code in Figures 2 and 3. Recall that the int input, res; Exception ex; goal of the control flow analysis is to approximate the set of possible lambda abstractions that a program variable may int => ExCont => ( int => void ) => void get_cps = capture during the run-time. From that result, as an attacker, x −> raise −> k −> { we can create a global control flow graph with all the lamb- input = x; das and methods involved. return seq(get1 , seq(trycatch Let Λ denote the set of all possible lambda values in the (seq(ifelse(n−> {input < i_1}, − obfuscated program in FJ_. We have the following lattice get4 ,seq(getk3b , loop( n >{i_5 {ex = e; return seq(get8 , get8k) ;}) , mapping variables to sets of lambda functions. get9) ) ( r a i s e ) ( n−>k(res)) (STATE) f ⊆ (VARIABLE × 2Λ) } ExCont => NmCont => void get1 = 5.1 Context Insensitive Control Flow Analysis − − (ExCont raise) > (NmCont k) > { We define the flow function J·K(·) :: STATEMENT → STATE → i_1 = this .lpos; r_1 = −1; return k() ; STATE. The flow function takes a statement and a state and } ExCont => NmCont => void getk3a = raise −> k returns an updated state. Given a statement (, we write J(K −> {r_2 = r_7; return k() ;} to denote J(K(f( ) by making f( an implicit argument where ExCont => NmCont => void get4 = raise −> k f( = 9>8=(() −> {i_2 = i_1; raise( new Exception () }) ExCont => NmCont => void getk3b = raise −> k Let ( be a statement and ?A43(() denote the set of pre- −> {i_5 = i_1; return k() ;} ceding statements of (, we define the join function 9>8=(() ExCont => NmCont => void get6 = raise −> k as −> { t_6 = this .f1 + this .f2; this .f1 = this . 9>8=(() = Ø J%K f2 ; % ∈?A43 (() this .f2 = t_6; i_6 = i_5 + 1; return k() The definition of flow function is given as follows. ; } ExCont => NmCont => void getk6 = raise −> k −> {i_5 = i_6; return k() ;} Jreturn K(f( ) = f( ′ ExCont => NmCont => void get7 = raise −> k Jif  {(} else {( }K(f( ) = f( −> { this .lpos = i_5; r_7 = this .f2; return k J1. = 2K(f( ) = f( () ;} J- = 2K(f( ) = f( − - ∪[- 7→ ∅] ExCont => NmCont => void get7k = rise −> k J- = .K(f( ) = f( − - ∪[- 7→ ∅] −> { i_2 = i_5; return k() } ′ J- = >? K(f( ) = f( − - ∪[- 7→ ∅] ExCont => NmCont => void get8 = raise −> k J- = new ) ()K(f( ) = fB − - ∪[- 7→ ∅] −> { System.out.println("..."); return k() ;} J- = _K(f( ) = f( − - ∪[- 7→ {_}] ExCont => NmCont => void get8k = raise −> k = = − J- . K(f( ) f( − - ∪[- 7→ f( (. )] > { r_2 = r_1; return k() ;} = = ExCont => NmCont => void get9 = raise −> k J-  (1,...,=)K(f( ) f( − - ∪[- 7→ returned] −> { res = r_2; return k(); } where − returned = JreturnStmt(_)K get_cps(x)(id_raise)(i > res = i; return ) ; Ð_∈f( () return res ; J- = 1."(2)K(f( ) = fB − - ∪[- 7→ returned] } where ) = C~?4>5 ( ) Figure 11. SSA to CPS Translation of fib 1 returned = JreturnStmt(_)K Ð_∈GENV(),") The first three cases handle return statement, if statement and field update. They do not contribute any changes to ab- ( ) an object of class , E be a value such that >.< E is well- stract state. In the cases of constant assignment, field assign- = CMD ∈ typed and terminating. Let " cpsJ E 64=E fj_J"K > E 64=E . signment, we set - to be a singleton set. In case of variable aliasing assignment, we set -’s mapping to the same as the 5 Obfuscation Potency Analysis rhs. In case of lambda function invocation, we update the We analyze the potency of the CPS-based control flow ob- mapping of the variable - with a union of all returnable fuscation. states from all the possible bindings of the variable  which Control Flow Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA is bound to some lambda expressions. In case of method call, For clarity and brevity, we adopt the following naming con- it is similar to the lambda function except that we look up vention. We add line numbers to make common variables the lambda expression from the global environment. unique, e.g. raise7 denotes the raise from line 7. We overload the flow function for a lambda function dec- As we can observe from the above, most of variables are laration, whose output abstract state, will serve as the pre- given a unique lambda term to which they can be bound, decessor of the first statement in the function body. except for first, second, first_raise and second_raise. This is caused by the fact that the function seq is invoked in two different locations. = ( ( J_K Ø ⊥[01 7→ (J(K,1 ), ..., 0= 7→ eval(J(K,=)] Through the analysis result, we can approximate a caller- ( ∈caller(_) callee relation between lambda abstractions. We reconstruct a global CFG of the obfuscated get by combining the call where {0 , ..., 0 } = formalArgs(_). Given a statement ( 1 = graphs and the local control flow graphs. The resulting CFG that calls _, ( , denotes the actual argument at 8th position. 8 is presented in Figure 4. The helper function caller(·) :: Λ → [STATEMENT] re- As we discuss in the earlier section, the loss of precision turns the set statements in which the function _ is invoked. is caused by the incompleteness of the context sensitive con- Let f¯ denotes all the abstract states collected from all the trol flow analysis. statements of the target program. 5.2 Context Sensitive Control Flow Analysis caller(_) = {(|( ∈ STATEMENT ∧ - ∈ 3><(f¯)∧ A smarter attacker may attempt to uncover the CFG with _ ∈ f( (-) ∧ - () ∈ AℎB(() for some } better precision with context sensitive analysis. In context sensitive analysis, we extend the abstract state Λ Helper function eval(·, ·) :: STATE → EXPRESSION → 2 , with a context. takes an abstract state and returns a set of lambda functions Λ which the expression might evaluate to. (STATE) f ⊆ (Variable × 2 ) ∪ {unreachable} unreachable eval(f,2) = ∅ In this lattice, is the new ⊥. :: STATEMENT eval(f, _) = {_} We redefine the flow function J·K(·)(·) → CONTEXT STATE STATE eval(f, -) = f (-) → → . The flow function takes a eval(f,>? ′) = ∅ statement, a context and a state and returns an updated state. eval(f,.) = ∅ Given a statement ( and a context 2, we write J(K(2) to de- eval(f, new ) ()) = ∅ note J(K(2)(f( ) by making f( an implicit argument where f = 9>8=(2,() We apply the above analysis to our running example in ( Figures 2 and 3 until the abstract state reaches the fix point. Recall from on our running example, the imprecision of We observe the following results. the context insensitive analysis is caused by the two calls of var func var func var func seq in lines 12 and 13 in Figure 2. If we define the context to raise7 _37 k7 _37 get2 _78 be last call sites of the function, i.e. program locations, we get3 _90 get5 _52 get_1_2 _68 ′ would achieve a better precision, pseq _68 pseq_raise _68 raise18 _43 k18 _70 hdl23 _23 raise26 _79 var context func var context func k26 _70 raise28 _79 k28 _55 first 12 _78 second 12 _35 raise k raise 31 _79 31 _70 33 _43 first 13 _18 second 13 _68 k33 _70 raise35 _43 k35 _15 This is also known as the context sensitive with call string. cond50 _8 visitor _28 exit _31 ′ In the above case we use a call string with size of 1. How- raise52 _79 k52 _70 visitor_raise _28 ′ ′ ploop _50 ploop_raise _52 exit_raise _31 ever in the presence of multiple loops in the source code, the raise68 _43 first _18, _78 second _35, _68 obfuscated code will contain multiple calls to the loop com- ′ ′ ′ ′ k68 _15 first_raise _18, _78 second_raise _35, _68 binator, which contains a recursion. Choosing the size of raise78 _43 k78 _70 hdl_ex _33 ′ ′ the call string is a non-trivial task. A similar observation ap- tr_hdl _90 hdl_ex_raise _33 cond88 _9 th _26 el _52 raise90 _79 plies to other context sensitive analyses, such as functional ′ ′ k90 _70 th_raise _26 el_raise _52 approach, which consider the abstract state at the call site to ′ ′′ id_bind _37 get_x _7 get_x_raise _7 be the context. The worst case complexity of these context tr _90 hdl77 _23 get_cps _7 sensitive analyses makes them less-practical to be applied in cond3 _9 cond5 _8 get1 _18 reverse engineering attacks without using heuristics [14]. get4 _26 get6 _28 get7 _31 get8 _33 get9 _35 id_raise _43 n_loop _55 n_second _70 loop _50 5.3 Complexity of Sub-graph isomorphism seq _67 trycatch _77 ex_hdl _79 Regardless of the precision of the static analysis result, it ifelse _88 n_k_res _15 is computationally expensive to match the original control Conference’17, July 2017, Washington, DC, USA Kenny Zhuo Ming Lu

flow graph with the approximated control flow graph in gen- Similar treatment can be applied to other obfuscated code in eral. Let the original CFG to be  and the approximated CPS style. As observed, such translation does not improve CFG to be , we want to check whether  is sub graph iso- the precision of the static analysis. morphic to , which is NP-complete [6]. For instance Ull- mann’s algorithm [19] is known to be exponential. Some improvement with heuristic algorithms exist. There is no 7 Conclusion known algorithms solving this problem in polynomial time. We extend and develop CPS-based control flow obfuscation This check only returns yes or no. Finding all possible iso- for FJ with exception handling. We formalize the strategy as morphic sub-graphs leads to sub graph matching problem, a source to source translation scheme. We show that the con- which is also NP-Complete. trol flow obfuscation technique is effective against attacks Note that some linear algorithm exists for the special case using static control flow analysis, in particular context in- in which one of the input graphs is fixed and the other is sensitive analysis. We are in the of implementing a planar graph. Unfortunately CFG generated from Java in the reported technique. The progress and some examples general is not guaranteed to be a planar graph [18]. can be found in our development repository [13].

6 Related Works References The CPS-based control flow obfuscation is rooted from the [1] Davide Ancona and Andrea Corradi. 2016. A formal account connection between SSA forms in imperative programming of SSA in Java-like languages. In Proceedings of the 18th Work- shop on Formal Techniques for Java-like Programs, FTfJP@ECOOP languages and lambda terms in lan- 2016, Rome, Italy, July 17-22, 2016, Vladimir Klebanov (Ed.). ACM, 2. guages [2, 4, 10]. Our translation scheme is an extension of hps://doi.org/10.1145/2955811.2955813 Lu’s work[12] and is inspired by Kelsey’s work [10]. In con- [2] Andrew W. Appel. 1998. SSA is Functional Programming. SIGPLAN trast with Lu’s work, we are targeting FJ instead of C style Not. 33, 4 (April 1998), 17–20. hps://doi.org/10.1145/278283.278285 language. As an improvement to Lu’s work, our translation [3] Jan Cappaert and Bart Preneel. 2010. A General Model for Hiding Control Flow. In Proceedings of the Tenth Annual ACM Workshop on scheme supports exception handling, recursive call and call Digital Rights Management (DRM ’10). ACM, New York, NY, USA, 35– to methods within the same scope with continuations. In 42. hps://doi.org/10.1145/1866870.1866877 contrast to Kelsey’s work, our translation is targeted at an [4] Manuel M. T. Chakravarty, Gabriele Keller, and Patryk Zadarnowski. imperative language extended with higher order function 2003. A Functional Perspective on SSA Optimisation Algorithms. In instead of Scheme. Giacobazzi et al proposed a method to COCV ’03: Optimization Meets Compiler Verification (Elec- tronic Notes in Theoretical Computer Science), Jens Knoop and Wolf construct general obfuscators using partial evaluation with Zimmermann (Eds.), Vol. 82. Elsevier Science, 347–361. Issue 2. distorted interpreters [8]. Their work provides a uniform hps://doi.org/10.1016/S1571-0661(05)82596-4 reasoning of how attacks using abstract interpretation can [5] Jien-Tsai Chan and Wuu Yang. 2004. Advanced obfuscation tech- be foiled by a particular obfuscation method (by construct- niques for Java bytecode. Journal of Systems and Software 71, 1 (2004), ing a specific distorted interpreter). Anonca and Corradi [1] 1 – 10. hps://doi.org/10.1016/S0164-1212(02)00066-3 [6] Stephen A. Cook. 1971. The complexity of theorem-proving proce- formalized SSA form for FJ. They applied SSAFJ to improve dures. In IN STOC. ACM, 151–158. the type analysis of object oriented languages such as Java. [7] Olivier Danvy. 1994. Back to direct style. Sci- We note that it is possible that attackers are aware of this ence of 22, 3 (1994), 183 – 195. obfuscation technique and try to reverse-engineer the ob- hps://doi.org/10.1016/0167-6423(94)00003-4 fuscation via CPS, for instance, by applying the technique [8] Roberto Giacobazzi, Neil D. Jones, and Isabella Mastroeni. 2012. Ob- fuscation by Partial Evaluation of Distorted Interpreters. In Proceed- found in [7]. However this unclear to us that how much ad- ings of the ACM SIGPLAN 2012 Workshop on Partial Evaluation and ditional information the attackers can recover by converting Program Manipulation (PEPM ’12). ACM, New York, NY, USA, 63–72. the obfuscated code in CPS back to direct style. For instance, hps://doi.org/10.1145/2103746.2103761 if we ignore the treatment of exception, one could translate [9] Guardsquare. 2020. ProGuard: Open Source Optimizer for Java and the loop combinator in CPS back to direct style as follows, Kotlin. hps://www.guardsquare.com/en/products/proguard [10] Richard A. Kelsey. 1995. A Correspondence Between Con- tinuation Passing Style and Static Single Assignment Form. In void loop ( void => Boolean cond Papers from the 1995 ACM SIGPLAN Workshop on Intermedi- , void => void visitor , ate Representations (IR ’95). ACM, New York, NY, USA, 13–22. hps://doi.org/10.1145/202529.202532 , void => void exit) { [11] Tímea László and Ákos Kiss. 2007. Obfuscating C++ Programs via if (cond()) { Control Flow Flattening. In 10th Symposium on Programming Lan- visitor () ; guages and Software Tools (SPLST 2007). 15–20. loop(cond, visitor , exit); [12] Kenny Zhuo Ming Lu. 2019. Control flow obfuscation via CPS trans- } else { formation. In Proceedings of the 2019 ACM SIGPLAN Workshop on Par- exit () ; tial Evaluation and Program Manipulation, PEPM@POPL 2019, Cascais, } Portugal, January 14-15, 2019, Manuel V. Hermenegildo and Atsushi } Igarashi (Eds.). ACM, 54–60. hps://doi.org/10.1145/3294032.3294083 [13] Kenny Zhuo Ming Lu. 2020. Control flow obfuscation for Java code. http://github.com/luzhuomi/obsidian/. Control Flow Obfuscation for FJ using Continuation Passing Conference’17, July 2017, Washington, DC, USA

[14] Anders Møller and Michael I. Schwartzbach. 2018. Static Program Analysis. Department of Computer Science, Aarhus University, http://cs.au.dk/˜amoeller/spa/. [15] Oracle. 2020. Oracle Java Technologies. hps://www.oracle.com/java/technologies/ [16] K. Petersson. 1990. Syntax and Semantics of Programming Languages. [17] Davide Pizzolotto and Mariano Ceccato. 2019. Obfuscat- ing Java Programs by Translating Selected Portions of Bytecode to Native Libraries. CoRR abs/1901.04942 (2019). hps://doi.org/10.1109/SCAM.2018.00012 arXiv:1901.04942 [18] Raphael. 2011. On Planarity of Control Flow Graphs. hp://lmazy.verrech.net/2011/10/on-planarity-of-control-flow-graphs/ [19] J. R. Ullmann. 1976. An algorithm for subgraph isomorphism. JOUR- NAL OF THE ACM 28, 1 (1976), 31–42. [20] Balachandran Vivek, Sufatrio, Tan Darell, and Thing Vrizlynn. 2016. Control flow obfuscation for Android applications. Computers & Se- curity 61 (05 2016). hps://doi.org/10.1016/j.cose.2016.05.003 [21] Chenxi Wang, Jonathan Hill, John Knight, and Jack Davidson. 2000. Software Tamper Resistance: Obstructing Static Analysis of Programs. Technical Report. Charlottesville, VA, USA.

A Appendix A.1 Pre-processing step that fix while block that has multiple entry labels The only possible case that violates the restrictive form is the use of try-catch with a while loop in the handler. try { ... Li : throw new Exception() ; ... Lj : throw new Exception() ; } join (...) catch (Exception e) { Ll:join (x = phi(Li:xi Lj:xj, Lm:xm)) while (e) { Lm: ... } } The above can be converted into the restrictive form by in- serting an empty assignment block in front of the while block. try { ... Li : throw new Exception() ; ... Lj : throw new Exception() ; } join (...) catch (Exception e) { Lk: { }; Ll: join (x = phi(Lk:xk, Lm:xm)) while (e) { Lm: ... } }