GitFlow: Flow Revision Management for Software-Defined Networks∗

Abhishek Dwaraki†, Srini Seetharaman‡, Sriram Natarajan?, and Tilman Wolf† †Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA USA ‡Infinera Corporation, Sunnyvale CA USA ?Deutsche Telekom, Silicon Valley Innovation Center, Mountain View, CA USA

ABSTRACT The most popular way to realize SDN has been to adopt Our work addresses the problem of revision control for devices that support a standardized application pro- flow state management in SDN-enabled networks, so gramming interface (API), for example OpenFlow [7], that the underlying data plane might be able to provide for programming network behavior in the data plane better state protection, provenance, ease of programma- through external software. This network programmabil- bility, and support for multiple applications. Inspired ity brings with it several challenges: migrating to new by the revision control tools in the software develop- architectures, interoperating with legacy networks, co- ment world, we propose an abstraction and a system ordinating control plane state in a centralized fashion, called GitFlow, which provides flow state revisioning supporting different authors to the state, and restricting in the SDN context. The core idea of GitFlow is to anomalous or malicious behaviors. run a repository server to maintain the authoritative While the aspect of multiple co-existing authors edit- copy of the flow configuration state and track additional ing flow state of a shared resource has been previously meta data for each evolving snapshot of the flow state. investigated [6, 8, 13, 17] from the perspective of con- When multiple applications make incremental commits flict detection or resource isolation, the issue of tracking to state, our system also automates conflict resolution all changes to the flow state has not yet been exam- by new flow state with committed flow state. ined sufficiently. In software development, this process of managing changes to shared data is referred to as We envision that our revision control abstraction will 1 provide safety in the data plane and better programma- revision control . We see a need for similar revision- bility in the control plane. ing and state change management rapidly manifesting in software-defined networks. Traditionally, revision control in the networking world General Terms has been limited to managing CLI configurations of data Design, Management plane devices. The state of the flows and the control plane, however, is inaccessible. In the context of an Categories and Subject Descriptors SDN-enabled network, we have access to much more state than was previously possible. Based on existing .2.6 [Computer-Communication Networks]: software revision control techniques, we propose a sys- Internetworking—Routers; C.2.3 [Computer- tem called GitFlow that provides flow state revisioning Communication Networks]: Network Operations— in the SDN context. GitFlow underlies all flow-level Network Management state management that a switch undertakes, and pro- vides safety at the data plane and better programming 1. INTRODUCTION discipline in the control plane. Software-Defined Networking (SDN) offers an oppor- Additionally, our framework can be easily extended tunity to both the control and data planes of today’s for several other purposes: tracking control plane evo- networks to become programmable and easily evolvable. lution (similar to past works on AS path evolution [2]); providing network level accountability (similar in moti- ∗ This work is based upon work supported by the Na- vation to past works on AIP [1] or packet provenance tional Science Foundation under Grant No. 1421448. [14]); preventing flow space conflicts across different au- Permission to make digital or hard copies of all or part of this work for personal or thors (without being as restrictive as the isolation efforts classroom use is granted without fee provided that copies are not made or distributed of FlowVisor [16] or overlap prevention of Frenetic [8]); for profit or commercial advantage and that copies bear this notice and the full cita- and extracting network state in a form conducive for tion on the first page. Copyrights for components of this work owned by others than automated troubleshooting (beyond methods made pos- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission sible by header space analysis [5] and NDB [9]). This and/or a fee. Request permissions from [email protected]. 1 SOSR2015 June 17-18 2015, Santa Clara, CA, USA The popular revision control software provides au- Copyright is held by the owner/author(s). Publication rights licensed to ACM. thor tracking, versioning and timestamping of changes, ACM 978-1-4503-3451-8/15/06..$15.00 immutability of past alterations, automated conflict res- http://dx.doi.org/10.1145/2774993.2775064. olution, and annotations. Table 1: State management requirements Characteristics State Requirements Decoupled network control Consistency Dynamicity in application Mutability Co-existing applications/authors Safety Network troubleshooting Provenance

per. In the subsequent sections, we describe our state management system and illustrate its use through ex- amples.

Figure 1: Programmability model in SDNs. 3. GITFLOW In this section, we discuss our proposal for state man- makes the GitFlow abstraction a vital part of any SDN agement to improve network programmability. programming workflow. In this paper, we investigate flow rule revisioning, so 3.1 State Management Abstraction that SDN-enabled networks can provide a higher level of The existence of multiple authors attempting to oper- provenance, security, ease of programmability and sup- ate on a shared resource (network state) dictates the fol- port for multiple applications. The remainder of this lowing requirements for concurrent access and account- paper is organized as follows. Section 2 describes the ability: model and design goals flow management in software- defined networks. In Section 3, we present our proposal • Author or application tracking: The system should inspired by the git version control system. In Section 4, possess the ability to exactly identify and report we illustrate applications of our GitFlow approach to the author that was responsible for the change of describe it better. Section 5 concludes the paper. network state. • Versioning and change tracking: Transformation 2. BACKGROUND and evolution of network state can be tracked, pro- Software-Defined Networking allows external network viding valuable information that can be used for applications to directly access the state of the data plane planning and provenance reasons. through well-defined APIs. The data plane device type • State safety: Preventing the alteration of network (virtual or physical), flow table types (e.g., VLAN ta- state that has been committed by a different au- ble, MAC table) and size, and supported feature set thor brings in safety that ensures deterministic (match fields and supported actions) vary depending on system behavior. It is, however, important to al- the deployment scenario (e.g., edge-gateways [11], wide- low the same author to alter actions for previously area networks [15]). Typically, the packet processing established match rule. functionality is achieved via a match-action pairs that • Conflict resolution and rebasing: The mutability span a multi-table pipeline. As the flow table evolves, requirement in statement management brings with through route updates, gratuitous ARP replies or new it a need to resolve conflicts or overlaps in flow tenants addition, multiple tables and their related at- space to a feasible extent. Rebasing is the process tributes (e.g., meters attached to each flows) may be of merging changes to an extent allowable by a updated or modified by different applications/authors. pre-specified policy, based on the level of conflict. All these characteristics have implications on the net- • Annotations: The applications or authors can also work state management. include comments or other metadata to be associ- A typical model of SDN solution is shown in Figure 1. ated with the change. One or more applications, either directly through the API or through an abstracted intent layer, edit the con- In the software development world, the above features figuration state in the data plane, e.g., the flow table. are commonly available in a revision control software, Correspondingly, the data plane exports to the external such as git [3]. By making an analogy between source applications certain operational state, e.g., statistics, code and network state, we envision that adopting a relevant for the programming logic. We adopt this lay- layer of revision control for managing network state be- ered, decoupled model as our SDN programming model. comes vital to the programmability of the network. Such decoupling of the control and data planes in the context of SDN can be challenging when it comes to 3.2 GitFlow Architecture managing network state. Thus, in this paper we only We architect the abstraction described above in a focus on the flow rule configuration state. Table 1 corre- manner similar to that used by git by adopting an “au- lates each of SDN’s programmability characteristics to a thoritative” GitFlow server that keeps up-to-date state certain state management requirement. The safety and that has been committed and programmed onto the flow mutability requirements have been least investigated in tables. In reality, this server and the authoritative copy the past, and we adopt it as the main focus of this pa- of the state it protects can be placed in two locations: • Co-located on the data plane devices: each data plane device can host a separate GitFlow server Table 2: Configuration state model and give it direct access to the flow table. With Type Config data Example this approach, the state and its management are Timestamp {time} 1425884076.342 distributed. Each controller and its applications Author {app uuid} ...1681e6b88ec1 will act as GitFlow clients that conduct transac- Version {change uuid} ...a79d110c98fb tions with each of these servers. If an operator Table {table id} 5 were to make flow state edits on the device, they Flow rule {priority, match, priority: 32768, will use a client to achieve that. action} match: tp dst=22, • Located external to the device: the GitFlow server action: drop is hosted external to the device, in a centralized Timeout {idle, hard} idle: 10 location. All readers and writers to the flow state hard: 0 will use a GitFlow client to access it. The flow Metadata {OF config input} vxlan remote ip= state in the server is always kept in-sync with the 1.1.1.5 flow table in the data plane devices. master To avoid the overhead on switch CPU and to keep state management simpler to reconstruct on a network- M wide scale, we prefer the latter approach of the GitFlow dpid 1 dpid 4 repository being external to the devices in a central- dpid 2 dpid 3 ized server2. The implementation of this server becomes even simpler if one were to host it at a controller in- v1 stance. For the rest of the paper, we work with the ar- chitecture choice of the GitFlow server being co-located with a controller instance. v2 In this approach, all readers (often switch) and writers (often controller), similar to git’s distributed approach, v3 have their own local object database and staging in- app branch dexes. Our architecture requires the writers to use a flowmod monitor module (running in the background) to monitor and analyze flow-modification messages re- ceived by the SDN/OpenFlow agent, as well as to ver- sion network state information that passes through it.

The changes are then pushed to the GitFlow server. data path branch cloned from master The pushed changes are serialized to disk regularly. Fig- periodic flow table snapshot ure 3 provides an architectural overview of this process. event-driven snapshot v4 When a change is made and committed to the set of past app branch rebased app and dp branches merged commit changes, we create a new “snapshot”, which is a copy of the flow state that is versioned and uniquely identified. When the system initializes, the server sets up the re- Figure 2: GitFlow Branching Model vision control framework by creating an empty master branch as part of the GitFlow server. Any data plane Revision control can be performed at two levels of device that connects to the controller triggers the fork granularity: ‘per-flow’ and ‘flow-table’. Based on the of a branch from the master’s initial state. This new application and operational environment, it is under- branch is then tagged with the datapath-id associated standable that an user might pick a different granular- with the corresponding device. The branching behavior ity to perform state management at. The controller can is illustrated in Figure 2. The device then clones this choose to make available a snapshot at either granular- branch locally. The staging index and repository local ity across its other instances, or the other instances can to the device are used to store flow modifications occur- choose to directly “pull” from the GitFlow repository. ing as a result of programming changes effected directly This ensures that all the instances are consistent with on it. The local changesets are, then, pushed to the Git- the underlying network state. Flow server on the controller to keep the network state Although our paper and the Table 2 primarily deal in sync. It is foreseeable that there can be policies to with managing flow state, there are other network states regulate how frequently state changes are pushed up- worth preserving, especially the operational state per- streeam. An instance of the flowmod monitor running taining to liveness of switch resources. For instance, on the switch handles these operations. there are OpenFlow flow configurations, such as failover The flow information and metadata that is important group actions, that specify actions conditional on the to track changes across versions is represented as a facile liveness of ports (logical and physical). During trou- data table in Table 2. bleshooting or provenance inspection, such actions may 2The centralized server can be sufficiently replicated be incorrectly interpreted if the operational state of the and prevented from being a weak link in the SDN pro- ports is not correlated with the configuration state. To gramming. address this need, we add an additional agent for track- APP 1 - Optimizer Algorithm 1 Conflict Resolution and Rebasing Optimize

Local Repo 1: procedure resolve rebase(dpid, flow mods)

Staging Object 2: for all flow mod ∈ flow mods do index database 3: S ← current snapshot of flow space 4: compare flow mod to S Pull opt branch Commit and push 5: if disjoint(flow mod,S) then Git Server CONTROLLER 6: accept flow mod M dpid-1 dpid-2 7: else 8: compute overlap(flow mod, S) Controller Local Repo 9: X ← non-overlapping space Flow SDN/OF Mod Local Agent 10: Y ← overlapping space Monitor Repo 11: Z ⊂ S, match(Z) = match(Y ) Staging Object index database 12: accept X 13: if (Y,Z) is exact overlap then

FLOW_MOD 14: action(Y ) ← action(Z) DPID-2 PUSH 15: else if Y ⊃ Z then DPID-1 16: apply superset merge policy

FLOW_MOD PUSH 17: else 18: apply subset merge policy SDN/ Local Repo Local Repo SDN/ FT 1 OF OF FT 1 19: end if Agent Object Object Agent database database 20: end if FT 2 FT 2 FLOW_MOD COMMIT COMMIT FLOW_MOD Flow Flow 21: commit flow mod FT 3 Mod Staging Staging Mod FT 3 22: end for Monitor STAGE index index STAGE Monitor 23: end procedure SWITCH 1 SWITCH 2

Figure 3: GitFlow Architecture second time to determine the set relation between the two and the requisite action to be performed. ing this operational state and archiving it to a central- The approach will need further extension to make the ized store alongside the GitFlow repository. rebasing author-aware, i.e., only allowing merges based on the author privileges or based on whether changes are 3.3 Conflict Resolution and Rebasing overwriting the same author’s past commits. A git- like approach offers user authentication and privilege During flow state programming, it is common to ex- tracking to specify what source code can be modified perience conflict with the flow space of an existing rule. by which user. We reserve all author-aware rebasing for In the past, SDN implementations based on OpenFlow future work. relied on external mechanisms or implementations, such as FortNOX [13], FlowVisor [17] or that used in [10], to perform conflict resolution or provide flow space isola- 4. GITFLOW USE CASES tion. Recently, the OpenFlow specification [12] included To illustrate the value of GitFlow, we discuss the fol- an option to mark a flag called OFPFF_CHECK_OVERLAP lowing scenarios where versioning the flow table can be on a FLOW_MOD operation to force the data plane to ver- useful for network operations and explain the practical ify if the current FLOW_MOD overlaps with an existing rule value of our abstraction: on the match set; in the event of an overlap, the switch must refuse the addition and respond with an Open- • Understanding the evolution of network state Flow error message. This overlap check, however, is too maintained in the flow table. restrictive and insufficient when frequent flow state pro- • Tracing and identifying security issues or miscon- gramming from multiple authors (applications) becomes figurations in the network state. common. • Adopting state extraction for network trou- Revision control and rebasing (locally replaying and bleshooting. merging changes), as used by git, can be an important tool for programmers. In the GitFlow architecture, we 4.1 Tracking Flow Table Evolution include the logic proposed in Algorithm 1 to attempt As discussed above, flow tables evolve over time local conflict resolution to a feasible extent. The algo- through additions, deletions, and modifications. Cur- rithm takes as input some merging policies, to specify rent switch implementations and controller applications what the action should be when GitFlow detects a con- have limited capabilities to track these changes. Appli- flict and is unable to automatically resolve. Our ap- cations are required to periodically query the network proach relies on building a set of flow-spaces, overlap- state maintained in the switch and parse that informa- ping and non-overlapping, for the new flow modification tion to comprehend the updates made to each flow. This and the existing rule that it potentially overlaps with. approach is complex and might require state manage- Non-overlapping spaces are accepted right away. The ment in application logic. GitFlow can provide basic overlapping space is checked against the current rule a snapshotting and inspection features to efficiently track the evolution of the flow table state. Here we discuss with bisect can help operators backtrack the life of some simple versioning features to highlights its value: a packet in the data path (e.g., set of tables which processed the packet in the multi-table pipeline). • gitflow commit: The commit feature takes a snap- • gitflow reset: While debugging a flow related secu- shot of the changes made to the flow table from the rity issue, an application (or operator) can make previous committed state. When the controller use of reset feature to revert the current flow table programs flow modification messages (e.g, either a to a specific secure state. For example, a compro- single or bundle of flows), GitFlow considers this mised controller entity can introduce a flow up- as a new commit. When flows are programmed in date to re-direct data traffic to the master con- the flow table, the committed changes are merged troller (via send to controller action); if the op- to the master copy and recorded as a version of erator identifies such a flow update as a security the network state. A commit highlights valuable violation, the reset command can help reset the information on changes made to the flow table and flow table state to the last known working state eases an application or administrator to track the and discard all changes made to flow table since evolution of the flow table. the previous committed version. • gitflow diff : With diff, GitFlow can compare and highlight the updates between commits or between 4.3 Revision Control in Troubleshooting the current master state and any previously snap- Past work on network troubleshooting [4,5,9,18] high- shotted version. This helps in comprehending the light how flow space can prove handy to identify is- modifications made to individual flows, specific ta- sues. GitFlow allows extracting different snapshots of bles and the updated actions. the flow space, thereby empowering troubleshooting. • gitflow tag: Tagging is a critical component in This section attempts to address some important ques- GitFlow. Flow modifications (commits) can be tions posed by Heller et al. [4] in their analysis for trou- annotated with tags to carry metadata informa- bleshooting SDNs and how GitFlow can assist in these tion about the updates. For example, a security scenarios. update from a firewall application can add anno- tations specific to the application and provide de- 1. How can we integrate program semantics scriptions about the updates. into network troubleshooting tools? The pa- • gitflow grep: grep searches for the specified input per discusses network equivalents to gdb, valgrind, pattern (e.g., all entries matching on port 80, or gprof, but interestingly, does not talk about re- tagged commits) either within a single commit or vision management. Network specific versions of across commits. these tools help identify errant network states, but revision control can ensure that these situations 4.2 Investigating Security Issues do not occur again. Troubleshooting information Multiple interfaces (sets of controllers, side-channel can be stored as version metadata that can assist access via SSH logins, malicious applications with flow- in avoiding faulty network configurations from oc- level access) with write access to the flow table can in- curing again. troduce security issues in the data plane. A security vio- 2. How can we integrate troubleshooting in- lation or malicious update of any manner can impact the formation into network control programs? packet forwarding functionality, thereby bringing down The presence of multiple versions of network state the network. For example, network operators can login actually help troubleshooting tools revert to a to the switch and introduce simple flow updates that working version in the case of misconfigurations, can undermine the network policies maintained by the failures or such. The use-cases presented in this applications. Similarly, multiple controllers in EQUAL section reinforce this point. roles can introduce conflicting flow behavior that can 3. What abstractions are useful for trou- violate the security policies programmed in the switch. bleshooting? Exposing and abstracting the net- Existing security enforcement kernels are built as an in- work state is in itself an interesting proposition. termediate layer between the controller and the switches Previously, the absence of accountability was a to detect such violations, however, they lack the feature major driver in not exposing network state to ap- to track the local changes made in the data plane (e.g., plications running on the controller. With revision via SSH access). We exhibit some GitFlow features that control, we have an extra layer of backup to make can complement such security offerings: sure the exposed network state is not misused.

• gitflow blame: The blame feature provides infor- As networks grow to be more autonomous and self- mation about the changes and the entity/user that healing in nature, automated troubleshooting tools be- modified the interested flow entries. Additional come a focal point of operations. Flow-level revision annotations such as timestamps, impacted ten- control can potentially aid these tools exercise a more ant information, updates (revisions) to individual intricate level of inspection than just investigate changes flows are provided to detail the interested changes. to the flow tables. • gitflow bisect: bisect identifies the changes that These illustrative use-cases represent the necessity for introduced a bug or violation between two versions versioning across different disciplines such as security of the flow table. In addition, the options provided and troubleshooting. We believe that GitFlow makes a strong case towards filling an existing gap in network troubleshoot networks. In Proceedings of the state management for software-defined networks. Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (Hong Kong, 5. CONCLUDING REMARKS China, 2013), HotSDN ’13, ACM, pp. 37–42. Based on the premise that SDN can also benefit from [5] Kazemian, P., Varghese, G., and McKeown, source code development and troubleshooting tools in N. Header Space Analysis: Static Checking for the software world, we proposed GitFlow as an impor- Networks. NSDI (2012), 113–126. tant abstraction for flow state management. We show [6] Khurshid, A., Zou, X., Zhou, W., Caesar, several examples where adopting revision control in the M., and Godfrey, P. B. VeriFlow: Verifying network programming workflow allows us to achieve Network-Wide Invariants in Real Time. NSDI higher level of safety, provenance, ease of programma- (2013), 15–27. bility, and support for multiple applications. [7] McKeown, N., Anderson, T., Balakrishnan, While the use cases discussed above are mostly in- H., Parulkar, G., Peterson, L., Rexford, J., vestigating single switch or single table scenarios, we Shenker, S., and Turner, J. OpenFlow: envision and expect our revision control system to pro- Enabling innovation in campus networks. vide substantial support in network-wide and multi- SIGCOMM Comput. Commun. Rev. 38, 2 (Mar. table state management. Our revision control frame- 2008), 69–74. work can be used to correlate state across multiple [8] N. Foster, et al. Frenetic: A network switches to identify related disruptive changes, or across . In Proceedings of the 16th multiple flow tables to identify the exact sequence of ACM SIGPLAN International Conference on rules matched for a packet. Functional Programming (2011). As part of our future work, we will implement GitFlow [9] N. Handigol, et al. Where is the debugger for in a realistic SDN deployment setting. The usefulness my software-defined network? In Proceedings of of its features will be evaluated against the presented the first HotSDN workshop (New York, New York, use cases and overhead implications investigated. USA, Aug. 2012), pp. 55–60. GitFlow, as an abstraction, is easily extensible and al- [10] Natarajan, S., Huang, X., and Wolf, T. lows for adding intelligence on how the local copy of the Efficient conflict detection in flow-based state is mutated. Our framework can be reused for other virtualized networks. In Computing, Networking purposes, including batch transactions, state rollbacks, and Communications (ICNC), 2012 International and clustering consistency. We can also extend GitFlow Conference on (2012), IEEE, pp. 690–696. to address network state optimizations, such as conserv- [11] Natarajan, S., Ramaiah, A., and Mathen, M. ing flow table space. As the flow table state evolves over A Software defined Cloud-Gateway automation time, it is critical to maintain the optimal state in the system using OpenFlow. CLOUDNET (2013), forwarding agent as well as handle the limitation of flow 219–226. table size (i.e., TCAM limitation). Efficient aggregation [12] Openflow specification. schemes address this problem by reducing the number https://www.opennetworking.org/ of flow entries, but still maintaining the same forward- sdn-resources/technical-. ing behavior. GitFlow can facilitate such optimizations [13] Porras, P., Shin, S., Yegneswaran, V., and with efficient resolution features. Fong, M. A security enforcement kernel for OpenFlow networks. In Proceedings of the first 6. REFERENCES HotSDN Workshop (2012). [1] Andersen, D. G., Balakrishnan, H., [14] Ramachandran, A., Bhandankar, K., Tariq, Feamster, N., Koponen, T., Moon, D., and M. B., and Feamster, N. Packets with Shenker, S. Accountable internet protocol provenance. In Proceedings of the ACM (AIP). In Proceedings of the ACM SIGCOMM SIGCOMM Poster (2008). 2008 Conference on Data Communication [15] S. Jain, et al. B4: Experience with a (Seattle, WA, USA, 2008), SIGCOMM ’08, ACM, globally-deployed software defined wan. pp. 339–350. SIGCOMM Comput. Commun. Rev. 43, 4 (Aug. [2] Dhamdhere, A., and Dovrolis, C. Twelve 2013), 3–14. years in the evolution of the internet ecosystem. [16] Sherwood, R. Can the production network be IEEE/ACM Trans. Netw. 19, 5 (Oct. 2011), the testbed. In USENIX OSDI (2010). 1420–1433. [17] Sherwood, R., Gibb, G., and Yap, K. K. [3] Git-A Distributed Version Control System. Flowvisor: A network virtualization layer. Open http://git-scm.com. Networking Foundation (2009). [4] Heller, B., Scott, C., McKeown, N., [18] Wundsam, A., Levin, D., Seetharaman, S., Shenker, S., Wundsam, A., Zeng, H., and Feldmann, A. OFRewind: enabling record Whitlock, S., Jeyakumar, V., Handigol, N., and replay troubleshooting for networks. In McCauley, J., Zarifis, K., and Kazemian, P. Proceedings of the 2011 USENIX ATC (June Leveraging SDN layering to systematically 2011).