<<

Entity-based Message Bus for SaaS Integration

Yu-Jen John Sun Supervisor: Prof. Boualem Benatallah

School of Computer Science and Engineering University of New South Wales Sydney, Australia

A thesis submitted in fulfillment of the requirements for the degree of Masters of Science Sep 2015 Acknowledgments

First of all, I would like to thank my supervisor Prof. Boualem Benatallah for this guidance and encouragement during my research. Without his support this work would not have been possible. I thank my co-authors, Dr. Moshe Chai Barukh and Dr. Amin Beheshti for their enjoyable collaborations. I would like to thank Moshe especially for his help in reviewing and editing our publishes. In addition, I would like to thank my fellow research colleagues for their kind encouragement and Denis Weerasiri, for giving me a lot of advices throughout the years. My final thanks go to my family for supporting me in pursuing an ad- vanced degree. I am especially grateful to my sister for her encouragement during my study in UNSW.

ii Abstract

The rising popularity of SaaS allows individuals and enterprises to leverage various services (e.g. Dropbox, Github, GDrive and Yammer) for everyday processes. Concequently, an enormous amount of Application Programming Interfaces (APIs) were generated since the demand of cloud services, allow- ing third-party developers to integrate these services into their processes. However, the explosion of APIs and the heterogeneous interfaces makes the discovery and integration of Web services a complex technical issue. More- over, these disparate services do not in general communicate with each other, rather used in an ad-hoc manner with little or no customizable process sup- port. This inevitably leads to “shadow processes”, often only informally man- aged by e-mail or the like. We propose a framework to simplify the integration of disparate services and effectively build customized processes. We propose a platform for man- aging API-related knowledge and a declarative language and model for com- posing APIs. The implementation of the proposed framework includes an Knowledge Graph for APIs called APIBase and an agile services integration platform, called: CaseWalls. We provide a knowledge-based event-bus for unified interactions between disparate services, while allowing process par- ticipants to interact and collaborate on relevant cases.

iii Publications

Sun, Y.J., Barukh, M.C., Benatallah, B., Beheshti S.M.R.: Scalable SaaS- based Process Customization with Case Walls, 13th International Conference on Service Oriented Computing (ICSOC 2015).

iv Contents

1 Introduction 2 1.1 Background ...... 2 1.2 Motivation and Problems ...... 3 1.3 Contributions ...... 6 1.3.1 Knowledge Graph for APIs ...... 6 1.3.2 Declarative Language for Composing Integrated Pro- cess over APIs ...... 6 1.3.3 Event-based Process Management Platform ...... 7 1.4 Thesis Organization ...... 7

2 State of the Art 8 2.1 Introduction ...... 8 2.2 Interactions with an API ...... 12 2.2.1 API Design Methodology...... 12 2.2.2 API Documentation...... 16 2.2.3 API Testing...... 17 2.2.4 API Programming Knowledge Re-Use...... 18 2.3 Interactions between different API ...... 19 2.3.1 SOA & Microservices...... 19 2.3.2 Process Automation...... 21

v 2.3.3 Social BPM...... 23

3 APIBase 25 3.1 Introduction ...... 25 3.2 API Knowledge Graph ...... 27 3.3 Architecture and Implementation ...... 31 3.3.1 Architecture Overview ...... 31 3.3.2 Graph Database Service ...... 33 3.3.3 APIBase Service ...... 36 3.3.4 Example ...... 41 3.4 Evaluation ...... 46 3.4.1 Experiment Setup ...... 47 3.4.2 Experiment Session ...... 47 3.4.3 Questionnaire ...... 49 3.4.4 Participant Groups ...... 50 3.4.5 Results ...... 50 3.5 Related Work ...... 52 3.5.1 API Documentation ...... 53 3.5.2 API Management ...... 54 3.5.3 Web-Services Repositories...... 54 3.6 Conclusion ...... 55

4 Declarative Language for Composing Integrated Process over APIs 56 4.1 Introduction ...... 56 4.2 Knowledge-Reuse-driven and Declarative Case Definition Lan- guage ...... 59 4.2.1 Knowledge-Reuse Language ...... 59

vi 4.2.2 Declarative Case Definition Language ...... 60 4.2.3 Declarative Case Manipulation Language ...... 61 4.2.4 Illustrative Example ...... 63 4.3 Implementation ...... 65 4.3.1 Architecture ...... 65 4.3.2 Event Bus ...... 67 4.3.3 Case Orchestration Rules ...... 71 4.4 Evaluation ...... 73 4.5 Related Work ...... 76 4.6 Conclusion ...... 78

5 Conclusion and Future Work 80 5.1 Concluding Remarks ...... 80 5.2 Future Directions ...... 82 Bibliography ...... 82

1 Chapter 1

Introduction

This chapter is organized as follows. In Section 1.1, we introduce the basic background. In Section 1.2, we outline the problem that we are addressing and discuss the motivation. In Section 1.3, we summarize our contributions. In Section 1.4 provides the organization of this thesis.

1.1 Background

Traditional structured process-based systems increasingly prove too rigid amidst today’s fast-paced and knowledge-intensive environments. A large portion of processes, commonly described as “unstructured” or “semi-structured” processes, cannot be pre-planned and likely to be dependent upon the inter- pretation of human workers during process execution. On the other hand, there has been a plethora of tools and services (e.g., Web/mobile apps) to support workers with specific everyday tasks and enhanced collaboration. Software-as-a- Service (SaaS) is amongst the forefront of this technology. For instance, tools (henceforth referred to as “services”), such as: (i) Dropbox to store and share files online; (ii) Pivotal tracker to manage tasks and projects;

2 and (iii) Drive to edit and collaborate on spreadsheets. Workers of- ten need to access, analyze, as well as integrate data from various such cloud data services. At the same time, most services expose APIs (Application Programming Interfaces). APIs serve as the glue of online services and their interactions; bearing far-reaching ramifications. Social media already depend heavily on APIs, as do cloud services and open data sets. The spread of the Inter- net to ordinary devices (i.e. Internet of Things) will be facilitated through APIs. Much of the information we receive about the world will therefore be API-regulated. More specifically, the need to integrate user productivity services (i.e, SaaS applications, CRM tools, document and task management tools, together with social media frameworks) is vital - there are numerous pressing use-cases both in the enterprise and the consumer arena. However, while advances in APIs, Service Oriented Architectures (SOA) and Business Process Management (BPM) enable tremendous automation opportunities, new productivity challenges have also emerged: Most organizations still do not have the knowledge, skills, or expertise-at-hand to craft successful SaaS- enabled process customization strategies to take full advantage of automation opportunities. The integration of this is still mostly done through manual development, and even when leveraging existing APIs, it still requires con- siderable technical/programming skills.

1.2 Motivation and Problems

To understand the challenges related to building integrated applications using APIs, we examine the following case-study:

3 Code Review & Development Cycle.

Version Control Systems (VCS) are very common in software engineering - they help avoid collision and improve traceability. While it is important to find where the bug is introduced and revert it, peer review also helps to bring forward discovery of such bugs. Github is one of the most popular online open-source repositories for code. Likewise, Pivotal Tracker (PT) offers a good story-tracking system, to help the team keep track of their progress. To integrate these tools together, Github provides built-in integration via enabling to post commit that contains specific keywords to track and update the state of a story on PT. A typical workflow implementing such integrated process over Github and Pivotal Tracker APIs might look as follows: 1. Project Manager PM creates a Story and assigned to Engineer 2. Engineer starts working on the Story 3. Engineer completes programming task and pushes onto Github 4. Engineer finishes and delivers the Story 5.PM accepts/rejects the delivery Effectively, the PT integration provided by Github parses the commit messages looking for syntax in the form of: “#(story number) ”, such as: [Starts #12345, #23456] ... [Finishes #12345] ... [Delivers #12345]. If any such messages are detected, the corresponding action will be performed in PT. For example, if the engineer commit message contains [Finishes #12345], when Github receives this commit, it will automatically mark the story as finished in PT. This helps simplify the workflow by eliminating the otherwise manual work done within PT. Such typical integration method may lead to the following lim- itations:

4 Fixed workflow: While this inbuilt integration might work nicely to eliminate the manual creation, start, finish and delivery of a PT “story” | it provides little flexibility to adapt to different development cycles. For example, the notion of “continuous integration (CI)” is prominent in software engineer today. This calls for “continuous” testing whenever new changes are made, and in some cases even responsible for the building and deploying of the changes in some configurations. Therefore, if a particular development team decides to adopt CI, this would significantly change the semantics of the deliver action of a story in PT. This means, at Step 4, we may want to introduce additional steps such as: testing and deployment before closing this change. Unfortunately however, the current Github/PT integration would not apply to such development environments, [110]. Shadow Processes: To achieve the above result, it is thus likely the development team would integrate other non-related tools or even conduct the “code review” process manually. In fact, after a code commit on a feature branch, the review should then be initialized. After the review is completed, it should then be merged into the main repository (or master branch). In Github this is referred to as Pull Requests (PR). With the current tools provided by Github, PRs has to be created manually by either using the website interface or command line tool. While the big cycle (accepting task, coding, submitting PRs) are still visible, the small cycles where developer fix code for failed test and test again are not modeled by the Github workflow and is only shown as a series of commits and may reduce the transparency of the whole process. To model this behavior and integrate the Github service, one has to resort to building a home-brewed solution [2], and inevitably results in non-traceable, i.e. “shadow processes”.

5 1.3 Contributions

To address the problems outlined above, in this thesis we propose: (i) Knowl- edge Graph for APIs, a Knowledge Graph that stores information about APIs in a structured way. (ii) a declarative language for building inte- grated process over APIs, a simple, declarative yet powerful language that allows developers to search existing APIs and compose them into customized process definitions. (iii) an Event-based Process Management Plat- form, this platforms provides an event/activity wall to inform case-workers about task progress.

1.3.1 Knowledge Graph for APIs

Central to the proposed framework is the concept of API Knowledge Graph(AKG), where common services integration related low-level logic can be abstracted, organized, incrementally shared and thereby re-used by developers. The type of knowledge captured is organized according to various dimensions in- cluding: API s, Resources, Events and Tasks. By identifying entities (i.e. types/attributes, relationships for each dimension, and their specialization), a novel foundation is introduced to accumulate current dispersed APIs and case knowledge in a structured framework. This offers a unified representa- tion, manipulation and reuse of case knowledge to thereby empower simplified API-enabled process customization.

1.3.2 Declarative Language for Composing Integrated Process over APIs

Empowered by AKG, we propose a simple, declarative yet powerful language that allow developers to search existing APIs and compose them into cus-

6 tomized process definitions. The proposed language enables professional pro- cess developers to incrementally create modular collections of tasks - reusable and customizable process fragments (referred to as a “case”). E.g. Create an issue on a project management service; upload a file into a document man- agement service; send an email when a co-worker upload new version of a file; post videos and photos into social media services, etc. The tasks search component uses a “context” to describe the task “intent” and “objective” (e.g, upload a file, create an issue). Thus, using the business scenario to query the AKG can return tasks that are appropriate for the given context.

1.3.3 Event-based Process Management Platform

We proposed and implemented an Event-based Process Management Platform on top of AKG. This platform provides an event/activity wall to inform case- workers about task progress; together with a simple and declarative language to enable such participants to uniformly and collectively react, interact and collaborate on relevant case.

1.4 Thesis Organization

This thesis is organized as follows: In Chapter 2, we present the state of the art in APIs, Web services and Process Management. In Chapter 3, we present details about the design and implementation for API Knowledge Graph. In Chapter 4, we present details about the design and implementation of the declarative language for composing integrated processes over APIs. Finally in Chapter 5, we provide concluding remarks of this thesis and possible future work.

7 Chapter 2

State of the Art

2.1 Introduction

APIs are at the heart of every major information technology trend. From mobile devices, to cloud and crowd -computing. From the Internet-of-Things (IoT), big-data, to Web 2.0 and social-networks. They all rely on Web-based interfaces to empower connectivity over distributed components, thereby enabling to deliver innovative and disruptive solutions to every industry amongst the global market. Long gone are the days where developers have to code each specific service - often which was a highly tedious, manual and time-consuming task. Instead, APIs are driving the rapid creation of soft- ware. Today with a few lines of code, you can tap into some remarkable resources whether it’s a payment network like MasterCard, a mapping ser- vice like ESRI or the machine learning engine that powers IBM’s Watson, [101]. APIs are created by businesses to empower businesses. This has also been the motivating theme behind some of the major worldwide hackathons, such as TechCrunch’s Disrupt Hackathon, as well as the joint NASA-IBM Space

8 App Hackathon. They all had one endeavour in common: People weren’t creating just simple applications, they were stringing together multiple APIs where they were pulling data, communicating to the cloud, sending SMS messages, displaying the data on a map and taking a credit card payment. All of this is made possible by APIs that with a few lines of code enable programmers to tap into a world of services. At the same time, it’s not always simple to make APIs work and some- times developers have to bend the tool to their will, [101]. Moreover, main- taining user retention also becomes a major challenge, since 95% of users abandon an app within 30 days, and around 50% within 24 hours, [46]. In the mobile industry alone, start-ups often have to face with a 1.5 million app competitors. Competing for success will thus strongly involve studying and gaining mastery skills over the Art of API Programming. Accordingly, this paper embarks on a mission to dissect, unravel and demystify the rather complex, multi-faceted and in yet in some ways orthogonally-dimensional field of API programming. This will mean different things to different stakeholders of the App chain. For instance, business owners will need to think carefully about methodologi- cal API design; the applicability, accuracy and accessibility of their data, and how this ties back to their business value. Amazon.com is prime example of enduring success, arguably unmatched in any industry, [68]. In particular, the fundamental lesson to learn is “creating APIs that expose business value to other developers who may create the remainder of the solution”, [68]. On the other side of the spectrum, other businesses and developers will focus primarily on reusing existing APIs to create value. Tools such as IFTTT [56] has been at the forefront to enable this, and similarly StamPlay [99]

9 which does the same for backend development via the use of browsing its very visual interface to select and configure the right modules. Nonetheless, these tools are merely the tip of the iceberg, where ideally a lot more sophistication and flexibility is required at a fundamental level in order to fully coin success in this area. Simultaneously, the role GUI and frontend developers plays, also bears significant importance to App success. We identify 3 fundamental dimensions in API programming: (a) the in- teraction with an API itself; (b) the interaction between different APIs; and finally (c) the Interaction between APIs and frontend users/apps. Accord- ingly, we have organized this paper in this manner, as illustrated in Fig. 2.1, and further elucidated with examples in Fig. 2.2. Moreover, we identify a set of facets that are common to each of the above-mentioned dimensions. The aim has been to compare and contrast, identify their strengths and weak- nesses and provide a succinct technological landscape. Finally, we employ our analysis and extensive research to provide an envisionment of future trends and directions.

10 Mashup Databases Programming APIs & The Social-Web Process Model Best Practices Knowledge-Reuse API Repositories Process Repositories

Adaptive Case Management Case Management Notation Adaptive Social-BPM Case-Management Practices & Meta-models Flexible BPM Task-Mngmt

Artifact-Centric API “Lifecycle” Management Process-Centric Process Automation Rules-Centric API Documentation & Testing API-Mashups Micro-Services API Design Methodology Service-Bus Approach Backend-as-a-Service Resource Media-type/Protocol Description User-Ident. Admin. Ad-Hoc/Manual Approach

Interaction Interaction “with” an API “between” APIs

Figure 2.1: API Programming: Overview of the Technological Landscape

.

11 Interaction Interaction “with” an API “between” different APIs Programming Knowledge Reuse

APIs & The Social-Web (CodeSearch, OpenHub, Mashup Databases (Yahoo! Pipes, Intel Mash-Maker); MeanPath; StackOverflow, Blogs); Process Repositories (ProcessBase, ARIS, ADONIS, API Repositories (ServiceBase, APIBase, ProcessPEDIA, iGrafx). ProgrammableWeb).

Adaptive Case Management Social-BPM (BPM+SocialMedia: Salesforce CRM, IBM Case Handling Notation (OMG-CMMN); Blueworks, Bonita). Process Automation Artifact-Centric (IBM Business Objects, Gelee; (Data- driven: YSQL, QL.IO)); Process-Centric ( Description Formats: BPEL, XPDL, YAWL, BPMN, API “Lifecycle” Management (CA/Layer7, , BPEL4People, WS-HumanTask; WS02, Mashery, Akana); BPMS Engines: ActiveVOS, JBoss jBPM, JOpera, JAWFlow, Enhydra Shark); Rules-Centric (XChange, Prova, RuleCore, JBoss Drools, ILOG JRules, IFTTT). Backend-as-a-Service API Testing (SOAP-UI, RunScope, QualityLogic); API Documentation (MuleSoft API Designer, APIary.io, Enunciate, Mashery-IO); API Design Methodology ( API Mashups (Webshell.IO); Resource/Schema Conventions: Schema.org, Microformats.org, Dublin Core, IANA; Micro-Services (Netflix Asagard, Pivotal Cloud Foundry); Media-Type: HTML, JSON-API, HAL, Cj, Siren; Target Protocols: WebSockets, XMPP, MQTT, CoAP, Service-Bus Approach (Apache Camel/ServiceMix, PubSubHubBub/PuSH WS02 ESB, Mule ESB). Service Description Formats: WSDL, RAML, WADL, Swagger, Blueprint, AtomSvc; User-Identity Administration: OpenID, OAuth, Portable Contacts, ActivityStrea.ms, Salmon).

Figure 2.2: API Programming: Technology Examples

2.2 Interactions with an API

2.2.1 API Design Methodology.

MediaType. Most Web-service APIs expose Representational State Trans- fer (RESTful) interfaces via JSON messaging. However, since JSON was not originally designed with the concept of type (i.e. it is merely a string repre- sentation of arbitrary objects), it is difficult to determine the media type of

12 the JSON responses, and similarly the payload of the JSON request. HTTP protocols provide the Content-Type header, which can be used as a hint of the returned data type, however, it is overly simple (just a string) and re- lies on conventions to utilize it. Therefore, various Media-Type description language has been proposed to tackle this problem. Several media types are proposed to address the problem of the hetero- geneous nature of JSON objects. JSON-API1, Hypertext Application Lan- guage(HAL) [61], Collection+JSON [6] and Siren [100] are a few of the media types that are proposed specially for RESTful services. These media type descriptions are all based on the fact that RESTful services present resources using URLs. In this manner, by including an URL to a resource and a rela- tion specifier, it becomes possible to easily deduce the related methods. In other words, these conventions augment the JSON objects with types and meta-data to enhance their readability, flexibility and discoverability. De- scribing the types of the response objects, related resources and in the case of HAL, documentation of the resource. At the same time, these media- types promote the use of embedded objects that provide an efficient caching capability for the applications that utilize these media types by embedding related objects directly in the responses. Target Protocols. SOAP was first specified in 1997 as an RPC protocol using XML as data format. It was designed to allow other systems to interact with Microsoft COM components. W3C later took over the specification maintenance and introduced version 1.2 which then became a foundation of Web Services. However, due to the overly complex design, it is quickly over took by other protocols. On the other hand, REST was first proposed as a software architecture that utilize the semantic of HTTP requests and

1http://jsonapi.org/format/

13 provided as a simple software architecture principle. While SOAP (and later WSDL and UDDI) are very well defined, RESTful APIs provides a less formal yet simpler interface that has become overtly popular among web developers. In addition to traditional client-to-server API requests, other protocols like WebSocket [52], PubSubHubbob (PuSH) [40] are also proposed to pro- vide server-initiated communications. WebSocket was introduced in the HTML5 specification to provide a socket-like interface to send and receive messages to/from server in the browser. On the other hand, PuSH extends the feed-based protocols (Atom, RSS) by specifying the protocol between client and server to establish webhooks, or callbacks, over HTTP. However, PuSH requires the client to implement a web server therefore cannot be used when the client is a browser. XMPP emerged as message broker to manage tasks and messages between components in a distributed computing environment, but were then used as an API to various instant message services like Facebook Chat and . XMPP was designed to use TCP connections but can also be applied to web applications using techniques like Bidirectional-streams Over Syn- chronous HTTP (BOSH)2. Effectively, BOSH uses a keep-alive, long-polling HTTP connection as a TCP tunnel for the MQTT protocol. Similar tech- niques were also used to polyfill the implementations of WebSocket before it got implemented in most of the modern browser. User-Identity Administration. OAuth [79] specifications defines a protocol in which users can safely login into the service without risking to disclose their credentials handled by third-party applications. OAuth uses tokens to control the use of users identity, and by controlling the access rights associated with each token, users can control the access rights granted to the

2http://xmpp.org/extensions/xep-0124.html

14 token or even revoke the access. OpenID [87] is an open-protocol for au- thentication, such that users can utilize their identify from one service into another service. It reuses the standard OAuth 2.0 handshake to obtain an access token and uses the token to access the profile of the owner. Similarly, Facebook provides a proprietary version of their open protocol called Face- book Connect [73], which uses a similar process to log users into third-party websites using Facebook identities. Data formats. XML has been the standard go-to data format for a long time before JSON. However, as Web application have become increasingly prominent and one of the most common media formats, developers working with JavaScript found JSON simpler to use because of the native support in JavaScript. While JSON isn’t as flexible as XML, it nonetheless carries great benefit due to the fact that data exchanged between the browser and the server is typically very simple, compared to the large data transfer overhead when using XML. However, an important setback with JSON was the lack of support in defining a schema. Unlike XML who has XSD, JSON did not have a schema language and data exchange depends solely on a manual agreement between server and client. The lack of schema also makes it very hard to perform integration tests, and automated validation. In 2009, the Internet Engineering Task Force (IETF) made a first attempt and drafted a JSON-Schema specification [94]3. With JSON-Schema, the validation of data exchanged between client and server can be done more systematically. Later on, like Protobuf to XML, other binary formats like BSON (mainly used for MongoDB [11] and MsgPack [44]) are also proposed aiming for more efficient data transfer and faster serialization/deserialization. These binary data formats are mostly used in data-intensive backend-to-backend commu-

3As of 2015, is at the stage of draft version 4.

15 nication because processing binary data in web browsers are very inefficient.

2.2.2 API Documentation.

With the increasing proliferation and utilization of APIs, an effective doc- umentation methodology is imperative in order to sustain its growth. For example, understanding the precise details of what an API offers, how it is of- fered, its constrains and restrictions are important for both service providers to reach the market faster, and consumers to gain attraction. Accordingly, standards has been proposed by various groups including enterprises and open-source community to provide such documentation tool. RAML [49], API Blueprint [22], Swagger [89], WSDL [32], WADL [50] are the most widely used document formats. They are designed to provide a documentation spec- ification that allows developers to quickly browse through the operations; as well as the possible requests and responses available. In addition it enables developers to test the APIs online without needing to write code. All except WSDL targets a Resource view of the RESTful APIs, grouping the endpoints by the resources indicated in the URL. With these machine-readable docu- ments, development automation are made possible. For example, server-side code generators (e.g. API Mockups), client library generators and automated API testing. The MuleSoft API Designer is an open-sourced RAML editor, providing syntax highlighting as well as the API Console parsing the RAML and create the UI for developer to test API request on the fly. The APIary.io [7] provide similar documentation tool with API-Blueprint backing up the UI. Mashery-IO [67] provide the I/O Docs documentation tool using its own syntax which resembles Swagger documentation. In addi- tion to the interactive documentation, I/O docs also implements the OAuth interface, allowing users to perform access-controlled API requests without

16 manually entering access tokens.

2.2.3 API Testing.

API development alike other software engineering in general, requires a well defined testing scheme. This ensures products are more reliable and robust against regressions. QualityLogic [57], SoapUI [95] and RunScope [92] pro- vide services to ensure the quality of the APIs but in different forms. QualityLogic provides analysis to the API development from specification review to creating automated tests. The API experts in QualityLogic inspect test conditions and provide a testing framework using open-sourced tools like SoapUI and JMeter, covering the specification as well as regression tests during development. SoapUI is an open-sourced software that provides a wide range of tools to test APIs. It focus on four aspect of API testing: Functional Testing, Service Simulation, Security Testing and Load Testing. In addition to the standard, behavioral tests, SoapUI also provide tools to create mock services by parsing a WSDL or recording RESTful API requests/responses to assist UI development and testing. RunScope provides API testing as well as monitoring and at the same time. Aside from the standard behavioral tests, it is exposed as a Web service itself. This allows developers to incorporate Run-Scope into their development cycle together with other tools like continuous integrations (CI) or with data analysis services like Keen IO. In addition, RunScope, being a online service itself, also provide the functionality to run tests from different locations, which allows developer to test the responsiveness or timing of their APIs from various locations.

17 2.2.4 API Programming Knowledge Re-Use.

The reuse of code undoubtedly carries great potential in enhancing produc- tivity to produce new software [42], as does the larger body of work targeting reuse of “programming knowledge”, at various levels of abstraction. Tradi- tionally, approaches targeted fine-grained source-code reuse. For instance, there are code search engines available such as Google Code Search [47], Krugle [64], [35] and byteMyCode [29]. Exemplar [70] improves this by providing more coarse-grained discovery, making use out of textual descriptions of applications. More interestingly, the identification that APIs provide reusable software components meant advanced approaches oriented reuse over API-calls, and the data flow among API calls to detect relevant software in large application archives. This is because APIs are viewed as strong indicators of the functionality the code provides. Portfolio [69] of- fers a unified approach, selecting components and synthesizing them into software projects; this is done by analyzing API-calls. The search results based on natural-language queries are ranked according to these three cri- teria. Other work [88] adopts semantics, considering method signatures or classes, test cases, contracts and security constraints. The derived source code is changed by multiple transformations in order to meet the require- ments of the user. The strength of this work is that the user can specify in a very detailed way what she is searching for. At the same time however, the specification requires development knowledge and a clear understanding of the future solution, otherwise signatures and class names cannot be used as search criteria. Modern approaches in provisioning reuse is fostered by the “social Web”. While this was initiated by technical blogs with reusable code snippets, this was enhanced by a more targeted exchange with the inception of Q&A sites,

18 such as Stack Exchange [37] (e.g. Stack Overflow [83]), and Expert Exchange [36]. These sites introduce a sort of gamification in difference to traditional forums, with the ability for contributors to earn reputation points and badges, [84]. Ultimately, it is incumbent that pertinent reuse transcend mere source code to include various facets of programming-related knowledge. API reposi- tories such as ProgrammableWeb provided a convenient yet high-level overview of thousands of APIs that could be discovered and potentially combined to build composite applications. ServiceBase [13, 14] extended this idea with the proposition of a knowledge-base encompassing even low-level program- ming knowledge about services that aimed to mask the heterogeneity of vari- ous APIs. In turn, this sought to enhance the productivity of developers with a common API to service offerings supported by an incremental acquisition and collective reuse of programming knowledge. In particular, knowledge such as mapping raw service messages to common data-structures which could be loaded once and reused multiple times.

2.3 Interactions between different API

2.3.1 SOA & Microservices.

The microservices architectural style has been gaining increasing appeal, with a number of key organizations demonstrating its value; these include Ama- zon, Netflix, The Guardian, RealEstate.com.au and CompareTheMarket.com amongst others. Arguably, the essence of microservices are not new, and very much based on the notions SOA had to offer. However, an underlying sum- mary would lead to believe there are two main differences: Firstly, microser- vices are stylistically more refine - this means components are specifically

19 organized around “business capabilities”. Secondly, it aims at breaking away from the commercialization and perceived baggage (e.g. WS-*) of conven- tional SOA, offering a lightweight (e.g. RabbiMQ [107] or ZeroMQ [53]), or even ESB-free technological approach. Whereas, the ESB (i.e. ServiceBus) approach, are often designed to handle sophisticated facilities for message routing, choreography, transformation, and applying business rules. Which most of the microservices community argue is an overkill. Traditionally, applications are built in a monolithic manner: all logic for handling requests are run a single process. Standard SOA assists developers to reuse certain functional components, albeit most applications still remain monolithic. This is often a natural approach, as applications are easier to develop this way (e.g. IDEs supports one application at a time, testing and deploying is also easier). However, to deploy changes to one part of the application, this implies rebuilding and deploying the entire monolith. To alleviate this, while microservices nonetheless utilize service technology to organize components, it takes upon a different approach for how components are split. Instead of splitting in terms of technology (e.g. UI teams, backend teams, database teams), it advocates for very small components (e.g. 10-100 LOC) [90], that are managed by cross-functional teams with the full range of skills (e.g. UI, database, project management, etc.). The means each com- ponent is very easy to manage, can be deployed, evolve and thereby scale independently. However, microservices exhibit several drawbacks. In par- ticular, applications become slightly more complex with many more moving parts. A high-level of automation is necessary, either a home-grown code or a PaaS-like technology such as Netflix Asgard [98], or even an off the shelf PaaS such as Pivotal Cloud Foundry [86]. Moreover, having to properly deal with potential complex distributed data management issues.

20 Nevertheless, despite these shortcomings, the microservices approach is still gaining significant attraction - it is predicted at least 10% of small or- ganization will experiment with this model, [46]. However, at its early stage it should only be employed with correct precautions. For potentially large and complex application, and with an adequate cross-functional team, a mi- croservice strategic decomposition could serve very productively.

2.3.2 Process Automation.

At present, there are multi-faceted dimensions for characterizing processes: For instance, there are structured, semi-structured and unstructured pro- cesses. This form of classification could formally be referred to as “process paradigms”, which represents the typical structure of work-activities that the process can handle. The move from structured to unstructured is vitally important, as business and user requirements need to cope with ongoing changes, and thereby strive for agility. We will investigate this in latter sections, whereas in this section, we pay primary emphasis on the “process representation” dimension, entailing the model, language or interface that a process-designer may be offered in order to specify their desired process. A third dimension is the “process implementation technology”, (e.g. workflow engines, rule engines, or a native program-coded approach). However, this is only relevant insofar of it’s ability to support the higher-level requirements of potential process systems, and thereby not explicitly discussed here. As far as modelling languages are concerned: The Activity-centric ap- proach represents the flow of control from activity to activity that is based on a specified sequence. Process-tasks are therefore ordered based on the dependencies among them. Inherently this approach is therefore well suited for structured process. A variety of languages exist, such as: WS-BPEL [78]

21 (or its predecessors XLANG [103], and BPML [27]), XPDL [109], YAWL [105], BPMN [80]. Although some of these languages may differ in syn- tax and expressivity, fundamentally to encompass: a component model (e.g. Web service); orchestration model (i.e. ordering of control-flow); data-access model (i.e. both application-specific data and control-flow data); exceptional handling as well as transactional syntax. BPM engines that are designed to execute such processes, may adopt a different method to implement the above, but essentially function the same way. The Rules-centric approach unlike activity hierarchies are inherently less structured and less amenable to imposing an order in the flow. Pure rule-engines, such as JBoss Drools Expert [59], Demaq [23, 24], RuleCore [91] and Xchange [28] are well suited to capture processes that have few constraints among activities and where a few rules can specify the process. However, they would only be suitable for small scale processes with a rela- tively low number of rules (to be feasible to understand and maintain). More recently synergies between the “Business Rules Community” [1] and BPM, together with OMG have emerged standards such as the Semantics of Vocab- ulary and Business Rules (SBVR) [81]. However while this approach assist to to express business knowledge in a controlled natural language via the pro- posed vocabulary, it did not directly address the formal integration between business rules vocabulary and process modeling diagrams. In a similar line of work, the recently coined term Event Driven Business Process Management (EDBPM) has also been proposed as a way of merging BPM with complex event processing (CEP), via “event-driven” rules. JBoss jBPM [58] is one such example. Finally, the Artifact-centric approach have activities that are defined in the context of process-related artifacts, which become available based on

22 data-events on the artifact. The intention here is to consider artifacts as first-class citizens, and thus an abstraction to focus the process on core in- formational entities that are most significant (usually from an accountability perspective). The main features of this model are distinct lifecycle state management (e.g. Gelee [10], and more so IBM’s Business Artifacts [76] which aims to combine and information model with lifecycle). In some cases, rules may also that may evaluate and constrain the operations of the artifact [21, 33]. More importantly, the type of model/language employed to define pro- cesses have ramifications on the extent of rigidity or flexibility available within the defined process. However, most current process support systems often tends to only swing to either direction, with limited support for the correct balance of agility.

2.3.3 Social BPM.

In contrast to structured processes, unstructured processes are becoming increasingly predominant, as ideally they allow a great deal of flexibility unmet with current structured process support systems. They are also char- acteristically knowledge centric, and highly dependent on the interpretation, expertise and judgment of the humans doing the work in order to attain successful completion - and that in some cases may only become available at some point “during” execution [82]. Primary support for unstructured processes are usually via Web-based SaaS (Software-as-a-Service) tools, es- pecially those endowed with Web 2.0 community functionality to support the required level of collaboration. A plethora of such tools exist, in a variety of forms, such as: basic communication tools (e.g. email and instant mes- saging); social collaboration tools (e.g. Salesforce Chatter [93]); Enterprise

23 Wikis (e.g. Atlassian Confluence [9], SocialText [96]); as well as a variety of Task Management (Asana [8], Producteev [97] and Tracky[104]); Project Management (MS Project [66], LiquidPlanner [65]) and Document Manage- ment tools (TaskNavigator [54], ArtifactBuddy [48]). However, most of these are out-of-the-box solutions, results in hidden or “shadow processes”. In recent time, Social BPM has gained a lot of attention. Focusing on the rise and leverage of social-networking, some BPM vendors are offering extensions for process models to represent tasks that can be exposed in social networks, or other special types of social tasks. For example, IBM Blueworks [55] and Bonita [26] aims at increasing participant engagement in applica- tions such as voting, ranking, etc. [74]. Similarly, [43] offers a model driven approach for generating code for tasks in popular social networks such as Facebook and LinkedIn. This work also offers an extended business process management engine to support such tasks. Nevertheless, current effort still remain focused on supporting traditional workflow applications in the social network environment [39]. Whereby, more widespread support is needed, such that process support system should encompass support in the spectrum of both structured and unstructured.

24 Chapter 3

APIBase

3.1 Introduction

Services are the core and perform an essential role for employment in most parts of the world. According to the OECD Forum, services constitute two- thirds of world GDP; in Australia the figure is 82.5%. With the soaring growth of widely accessible Internet, business are forced to compete in the global scope to serve customers all over the world, requiring them to provide services online to be able to compete and survive. The European Union recently released the Digital Agenda for Europe aiming to help Europe’s economy to get the most out of digital technologies and online services. On other sides of the world, US, Canada, UK, China and India have also set up various digital economy initiatives to get ready in this digital shift of economy. Accompanying this structural change of economies, new strategies and innovations must provide the industries with tools to create a competitive edge and build more value into their services. While the Internet is trans- forming into a global marketplace and business platform, which is enabling

25 services to be delivered to any location via the Internet, we also see a new phenomenon where organizations are competitively compelled to implement Application Programming Interface (APIs) to allow third-party ‘apps’ to in- tegrate and add new uses to the original service. APIs are now the glue of online services and their interactions. While APIs are rather fundamental to the Web, they have huge influences in our daily use of the Internet; social media and the cloud are two very notable services that rely heavily on the concept of APIs where everyone uses one or another. The importance of API development and management also result in corporations acquiring re- lated API development and management platforms: very recently Facebook bought Parse, Intel bought Mashery, and Mulesoft bought Programmable Web, for example[102]. It is estimated that there are already 30,000 APIs, and that the market will grow five to ten times over the next five years[102] with APIs coming from various areas including cloud resources, platforms, applications and the rest from APIs in appliances, mobile devices, sensors, vehicles and consumer electronic devices. There are however significant gaps and risks in the online service-enabled endeavour. API-based IT systems are often very complex, which take a lot of time to build, and more time to maintain while having to worry about various potential security issues. Even though online services are multiplying, most organizations still do not have the knowledge, skills, or understanding to craft a successful strategy to take full advantage of changing markets and keep up with the proliferation of online opportunities. Meanwhile, the complexity of exploiting APIs is increasing dramatically as development becomes ever more distributed across multiple heterogeneous, autonomous, and evolving networking services. To address the above challenges, we propose five main contributions:

26 • Knowledge Graph for APIs: A knowledge graph where common ser- vices integration related low-level logic can be abstracted, organized, incrementally shared and thereby re-used by developers.

• Domain specific model for resources consumed and produced by APIs. This data model has been realized in a JSON-enabled graph database.

• APIBase: An implementation of the Knowledge Graph supporting both text and graph-based queries.

• API for APIBase: A REST interface to access the Knowledge Graph

• A user study to evaluate the effectiveness of the proposed techniques and tools.

3.2 API Knowledge Graph

To describe the representation and reuse of case knowledge, we begin by first briefly describing the notion of a knowledge graph. A knowledge graph (KG) typically consists of a set of concepts organized into a taxonomy, in- stances for each concept, and relationships among the concepts. For example, Google-Knowledge Graph1 is a graph of popular concepts and instances on the Web, such as places, people, actors, politicians, and many more. In our case, the Knowledge Graph for case management systems contains concepts and instances specifically for case management systems. We represent this Knowledge Graph using a graph data model that includes entities (i.e. API s, Resources, Events, Tasks) and the relationships among them. We formally define a Knowledge Graph for case management systems as an ordered pair G = (V,E,Av,Ae) where, V is the vertex set whose elements

1http://www.google.com/insidesearch/features/search/knowledge.html

27 are the vertices (i.e. nodes or entities) of the graph; E is the edge set whose elements are the edges (or relationships) between vertices of the graph; Av is a set of attributes that can be associated with any vertex in V; and Ae is a set of attributes that can be associated with any vertex in E. In the Knowledge Graph, an entity is represented as a data object that exists separately and has a unique identity. Entities are structured and are instances of entity types, i.e. API s, Resources, Events and Tasks. Figure 3.1 illustrates a canonical graph of possible relationships among entities for case management systems. This figure illustrates a motivating scenario, where a Knowledge Graph connects Tasks Events and APIs for a development team working with Pivotal Tracker and Github. As illustrated in the figure, Pivotal consists of a few Operations that manipulates Story and a StoryFinished Event that’s produced when a Story is finished. In addition, the Automated Task(1) will trigger the FinishStory operation when a PullRequest event from Github is received, providing a simplified work environment for the developer. In Pivotal Tracker, after a story is created, it has to be started manually to initiate the story. To simplify these steps, Task (2) can be used to shortcut this procedure. In the following, we define specific types of entities that are important in the case management systems. API, Application Program Interface, is a set of routines and protocols for building software applications. The API is the building block for software applications and specifies how software components should interact. With today’s ability to include storage, location, payment, social-networking, and more services into one platform via the use of specific APIs, these inter- faces are playing an increasingly important role in facilitating the construc-

28 Figure 3.1: A canonical graph of possible relationships between entities in the Knowledge Graph for the case management systems. tion of complex systems. ProgrammableWeb2 is an example of a repository for reusable APIs. For example, API nodes represent services offered by providers such as Github, Facebook, or Google API. They store necessary descriptions about the providers and the authentication information. For example, Github and GDrive use OAuth2 while Twitter OAuth1. Details of authorization techniques can be found at [77]. Resource, is an entity that can be used as an input or output for an API. In this context, resources may have various granularities, from a large dataset to an entity representing issues in Github, stories in Pivotal Tracker, and even PDF files in Dropbox. In order to represent resources, we utilize JSON-Schema draft v43. Event, is an entity that is the record of an activity in a system. Events can be collected from APIs. An event E is represented by a set of attributes {T, R, τ, D}, where T is the type of the event (e.g. Github provides 25 different events4 such as Push, Issue, and Fork), R is the actor (i.e. person

2www.programmableweb.com 3http://json-schema.org 4https://developer.github.com/v3/activity/events/types/

29 or device) executing or initiating the activity, τ is the timestamp of the event, and D is a set of data elements recorded with the event (e.g. the task associated with the event). We assume that each distinct event does not have a temporal duration. Events can be defined at various abstraction levels as a hierarchy. Lower-level events may be concrete (e.g., events defined at API level). Higher-level events may capture knowledge required for coarse-grain analysis of patterns relevant to a collection of resources (e.g., related stories in Pivotal Tracker). Task, is an entity that represent a set of operations that need to be done to achieve a specific goal. In our model, tasks may encapsulate any- where from a single API endpoint to a combination of several API endpoints. Moreover, we propose a novel feature in which tasks may be defined using a “context”, which describes the “intent” and “objective”. This is particularly useful to a human process-designer, or even a human-driven . In this manner, operation endpoints may be mixed-and-matches between dif- ferent APIs, to provide a more comprehensive Task that correctly and more precisely targets a particular use-case. For example, the task CreateIssue may conform to different intents such as report bugs, create pull requests or request feature. Such intents provide the descriptions of the tasks and can be later indexed and searched for tasks related to certain intent when defining a service integration. From the functionality point of view, there are two main types of tasks: Automated and Manual. Automated tasks will be assigned to an Event that may trigger the task, while manual tasks can only be triggered by an actor, invoking it manually. Tasks may also have sub-tasks. For example, a CodeCommit Task could be defined as a set of sub-tasks containing GithubPush and GithubPullRe-

30 quest APIs. Complex tasks are often broken down to smaller tasks to reduce complexity. Utilizing Resources and Tasks nodes, we can describe a possi- ble service integration by stating: if a Task TA produces Resource R and

Task TB consumes Resource R, then Task TA and TB can be invoked in a sequence. For example, the Task CreateIssue produces an Issue and Task EditIssue consumes an Issue, therefore we can state that EditIssue can be invoked after CreateIssue on the same resource. Folders. To organize the graph, providing a structured, easy to browse graph, we introduce the concept of Folders. There are two types of Folders: CategoryFolder and Entityl/Folder. Folders, as in our file system, are used to group various related nodes together to form a concept, for easier brows- ing while CategoryFolders are design to structure APIs, categorizing them. These folders could be nested, creating a tree-like structure.

Relationship. A relationship R = (E1,E2) is an edge in the Knowl- edge graph which indicates a connection between the entities E1 and E2. As illustrated in Figure 3.1 we have seven types of relationships: “API has OPERATION", “TASK use OPERATION", “TASK use EVENT", “OPER- ATION consumes RESOURCE", “OPERATION produces RESOURCE", “TASK compose TASK", “OPERATION trigger EVENT". In addition, there is “EVENT compose EVENT" indicating that an event is complex, composed of other events.

3.3 Architecture and Implementation

3.3.1 Architecture Overview

Figure 3.2 illustrates the architecture of APIBase. APIBase is layered upon the GraphDB abstraction layer as shown. This masks and abstracts the

31 Figure 3.2: Architecture of APIBase

32 actual implementation of the typed graph database, and communicates with the GraphDB using REST interface. Moreover, APIBase is also exposed through an REST API that will be further demonstrated in Section 3.3.3. For the underlying graph database implementation, there exist many op- tions including neo4j, OrientDB, ArangoDB and so on. With different fea- tures, these graph DBs excel in different directions. The Knowledge Graph we are building requires a flexible typed system that supports path search- ing. In this context, there is a need to check the data structure when users add new entities and limit the possible relationships between entities of cer- tain types. There are many ways to implement such mechanisms based on different Graph Databases mentioned above. However, we have decided to introduce another abstraction layer (GraphDB) so that the underlying graph database could be swapped without having to change the logic of APIBase itself.

3.3.2 Graph Database Service

The Graph Database provides an abstraction layer for the actual database im- plementation. As mentioned above, there is a large number of graph database implementations, all of which offer different features. Our particular require- ments are to have a flexible typed system and the ability to trace paths from one typed entity to another. In this work we will not address the efficiency problem as it depends on the graph database layer: a data access layer can be used to support different graph systems. Efficiency will be an important is- sue, however, it depends on the underlying graph database implementation. In this case, with the proposed abstraction layer, the underlying database can be switched to any implementation that is more efficient, this alleviates efficient concerns from becoming a primary concern in this case.

33 Figure 3.3: Typing system in the GraphDB

Some graph databases offer native support for typed nodes but to the best of our knowledge, none of them support strict schema that can vali- date the data structure. Therefore we build a typing system, based on the graph characteristics (illustrated in Figure 3.3), which introduced two types of nodes: Type and Entity nodes. Type nodes holds a JSON-Schema5, which defines properties and their types. An example of JSON-Schema is shown below:

1 { 2 "name": "API", 3 "author": "John Sun", 4 "email": "[email protected]", 5 "schema": { 6 "properties": { 7 "name " : { 8 "type": "string" 9 } , 10 " type " : { 11 "enum " : [ "REST" , "SOAP" ] 12 } , 13 " provider " : { 14 " type " : " s t r i n g " 15 } , 16 " d e s c r i p t i o n " : {

5http://json-schema.org/

34 17 " type " : " s t r i n g " 18 } , 19 " content " : { 20 " type " : " $ r e l : CONTAINS −> ( Operation | Folder | ResourceType | Task)" 21 } , 22 " r e s o u r c e s " : { 23 " type " : " $ r e l : HAS_RESOURCE −> ( Resource )" 24 } 25 } , 26 "required": ["name", "type", "provider"] 27 } 28 } Listing 3.1: JSON-Schema of APIs

The above code snippet illustrates the schema for the API, describing properties of the type. Some properties are also marked as required, meaning that they are required attributes for this type of node. As we may also notice, there is a special notation:

"type": "$rel: CONTAINS -> ( Operation | Folder | ResourceType | Task)"

This notation is added to our JSON-Schema in order to specify the re- lationships in the graph. This particular example shows that the content attribute of the API contains the relationship CONTAINS towards entities of type Operation, Folder, ResourceType or Task. The syntax are as follows:

1 expr := ’$rel:’ ’−>’ [ ’(’ ’)’ ] 2 types := [ ’|’ ] 3 relationship := string 4 type := string Listing 3.2: Extended syntax for representing relationships in JSON-Schema

By using this syntax, we can define and validate the relationship between entities of different types. On the other hand, the Entity, or instances of the types are added to the graph with a IS_TYPE relationship, while the GraphDB layer should validate the schema of the entity and enforce the relationships.

35 Figure 3.4: Relationships between types in the APIBase

We have used Neo4j6 as the backend for the GraphDB and implement the type system described above. While Neo4j could be replaced with other graph database, its powerful and optimized Cypher Query Language makes it easier to implement the GraphDB and support various complex queries.

3.3.3 APIBase Service

The knowledge graph plays a crucial part in the system. It has to be robust and has to support complex graph queries. As described in the previous section, we implemented a GraphDB with Neo4j7 supporting a RESTful in- terface. Neo4j comes with the label system for its entities, which lets users mark entities with a label, but it does not enforce any schema, except for unique constraint. Therefore we utilized JSON-Schema in the Knowledge graph for validating the JSON data. By using neo4j we can use its powerful graph query language, i.e. Cypher

6http://www.neo4j.org 7http://www.neo4j.org

36 Query Language to query complex relationships. We can easily find a path from one Task to another. With this capability, we can provide the infor- mation about the interoperability between Tasks. The Knowledge Graph provides a REST API that allows service providers or knowledge workers to enrich KG itself. Figure 3.4 shows the relationship between each type and the associated property. To create an API, one can use the following command line:

1 c u r l −H ’ Content−Type: application/json ’ −XPOST http:///api −d ’{ 2 "name": "Github", 3 "type": "REST", 4 "version": "1.0", 5 "provider": "Github Inc.", 6 "description": " APIs", 7 } ’ Listing 3.3: Adding a new API

This command send a POST request to the “/api” end point, which in RESTful terminology, creates a new resource indicated by the end point. As a result, an API entity will be created. If this request is valid and successfully executed, an UUID of the created entity will be returned. Using this UUID, we can retrieve the created entity by issuing the following GET request: curl -XGET http:///api/

We also provide an universal retrieval method as follows curl -XGET http:///

Since UUID are unique across the graph, by using this endpoint we can get any entities regardless of their types. Additionally, this endpoint also returns related entities. For example, if an API has multiple operations associated to its content attribute, the operations will also be fetched and returned in the response. After an API is created, we can also add an Operation to the API:

37 1 c u r l −XPOST http ://< host >/api//content −d ’{ 2 "_type": "Operation", 3 "name": "GetGist", 4 "path": "https://api.github.com/gist/{gistId}", 5 "verb": "GET", 6 "description": "get a gist" 7 } ’ Listing 3.4: Adding a new Operation

Referring to Figure 3.4, we can see that this Operation is to be created and associated with the content attribute of the API. Later on, we can either retrieve them using the specified path ‘/api/UUID/content’, or the universal retrieval method described above. A complete example of API is shown in Section 3.3.4. Adding a Parameter could also be done in a similar manner:

1 c u r l −XPOST http:///operation/{operation id}/parameters −d ’{ 2 "_type": "Parameter", 3 "name": "gistId", 4 "paramType": "path", 5 "required": true 6 } ’ Listing 3.5: Adding a new Parameter

By adding this Parameter, we now have the inputs of the Operation specified and with this information it will be sufficient for a APIBase client to invoke the Operation. ResourceTypes defines arbitrary data structures in the APIBase. Be it a Youtube Video, a Gist in Github or even a snippet of plain text. Resource- Types could be used to define these types and more importantly, associate them with the Parameters so we have more details on what to expect of a parameter. ResourceTypes are defined similar to a Type in the GraphDB, but without the relationships. We can add a new ResourceType using the following command:

38 1 c u r l −XPOST http:///resourcetype/ −d ’{ 2 "name": "Gist", 3 "schema": { 4 \"properties\": { 5 \"gistId\": { 6 \"type\": \"string\" 7 } 8 } , 9 \"required\": [\"gistId\"] 10 } 11 } ’ Listing 3.6: Adding a ResourceType

This command defines a ResourceType specifying that a Gist resource should have a gistId attribute of type string. As can be seen, the syntax in the schema attribute is a JSON-Schema and it would be used to validate resources created from this type. Moreover, with this ResourceType created, we can associate it to a Parameter. The following command shows how we can associate a ResourceType to a Parameter.

1 c u r l −XPUT http:///parameter//resourcetype −d ’{ 2 "_type": "ResourceType", 3 "targetId": , 4 "_mapping": { 5 "gistId": "$.gistId" 6 } 7 } ’ Listing 3.7: Relating a Parameter to a ResourceType

Note that we also present a _mapping attribute which is used to map a ResourceType to a Parameter. Since Parameter are usually string and occasionally a JSON or array, we provide a JSONPath8 mapping mechanism for the mapping. In addition to the JSONPath, we provide some string interpolation as in CoffeeScript9. Figure 3.5 shows an example of Github API with two of its operations

8http://goessner.net/articles/JsonPath/ 9http://coffeescript.org/

39 Figure 3.5: An example of Github API represented in APIBase related to Issues. The CreateIssue operation has two input parameters, title and body and one output parameter. The CloseIssue has one input parame- ter, issue No. and one output parameter. In the figure, output of CreateIssue is associated with a resource type called Issue and the input parameter of CloseIssue, Issue No., is also associated with the same resource type. There- fore, by inspecting the graph, we can find out that these two operations could be used in succession relative to each other. While in the real world, closing an issue directly after creating isn’t particular helpful: the same relationship could be apply to multiple operations, (and with the mapping, to operations in other APIs.)

40 3.3.4 Example

To further elaborate the use of APIBase, we here present an example with Github API. Specifically, we demonstrate the use of APIBase with the Issue- related of Github. This section is structured as follows: • Add a Github API • Add a CreateIssue operation to the Github API • Add Parameters of the CreateIssue operation • Create an Issue ResourceType • Associate the output Parameter with the Issue ResourceType

Creating the Github API

First of all, we created the API node:

1 c u r l −H ’ Content−Type: application/json ’ 2 −XPOST http:///api −d ’{ 3 "name": "Github", 4 "type": "REST", 5 "version": "v3", 6 "provider": "Github Inc.", 7 "description": "github apis" 8 } ’

This returns the UUID of the API entity we created: ee1db224-3331-4b8a- bc11-8839b4e5d6b4. Then, we proceed to add an CreateIssue operation. (To simplify similar requests, we only shows the URL endpoint, HTTP verb and the payload in the rest of this section):

1 POST http:///api//content 2 { 3 "_type": "Operation", 4 "verb": "POST", 5 "name": "CreateIssue", 6 "description": "create an Issue", 7 "path": "https://api.github.com/repos/{owner}/{repo}/issues" 8 }

41 Adding Parameters to the CreateIssue operation

As shown above, there are two place holders in the path attribute, which indicate a replacement is required when actually invoking the operation and will be referred to when we create Parameters:

1 POST http:///operation//parameters 2 { 3 "_type": "Parameter", 4 "type": "path", 5 "required": true, 6 "name": "owner" 7 }

To model the payload, we add the Parameters in Table 3.1. As shown in the table, we created 7 parameters according to the document of Github10. To retrieve these information, we can use the retrieval method:

GET http:///

Then we will get the following response:

1 { 2 "id": "2ae0d4ff −8efa −45dd−a831−d712559f06e1", 3 "_created": 1417354099, 4 "verb": "POST", 5 "description": "create an issue", 6 "name": "CreateIssue", 7 "path": "https://api.github.com/repos/{owner}/{repo}/issues", 8 "_id": 25925, 9 "_type": "Operation", 10 " r e s o u r c e s " : [ ] , 11 " parameters " : [ 12 "556269 cc−d6b0−4fee −9011−9f80c51132e4", 13 . . . 14 ] , 15 " _ r e l a t i o n s " : [ 16 { 17 " type " : "CONTAINS" , 18 " d i r e c t i o n " : " in " , 19 " t a r g e t " : {

10https://developer.github.com/v3/issues/#create-an-issue

42 20 " id " : " . . . " 21 "_created": 1417356581, 22 "name " : "Github " , 23 " type " : "REST" , 24 " v e r s i o n " : "v3 " , 25 "provider": "Github Inc.", 26 "description": "github apis" 27 } 28 } , 29 { 30 " type " : "CONTAINS" , 31 " d i r e c t i o n " : " out " , 32 " t a r g e t " : { 33 " id " : "02 a2cf75 −7bf4 −4149−90fd −80b7efbff8e7", 34 "_created": 1417356581, 35 "name " : " repo " , 36 " r e q u i r e d " : true , 37 "paramType": "path", 38 "_id " : 25932 , 39 "_type": "Parameter" 40 } 41 } , 42 { 43 " type " : "CONTAINS" , 44 " d i r e c t i o n " : " out " , 45 " t a r g e t " : { 46 " id " : "1 e522ace−f5a1 −47 f f −ad49 −680c02c22b70", 47 "_created": 1417356577, 48 "name " : "owner " , 49 " r e q u i r e d " : true , 50 "paramType": "path", 51 "_id " : 25931 , 52 "_type": "Parameter" 53 } 54 } , 55 . . . 56 ] 57 }

The response shows the basic information of the operation including the verb, path, and description which we just added. Parameters are shown as a list of UUIDs which refers to their individual entities. One can retrieve the individual Parameters by using this same endpoint. However, to provide an easier access, we list all the related entities, in this case the Github API and the Parameter entities, in the _relations attribute.

43 Table 3.1: Parameters for CreateIssue operation Name Type Required Description owner path true owner of the repository repo path true name of the repository title body true title of the issue body body false content of the issue assignee body false assignee of the issue milestone body false milestone which this issue belongs to labels body false labels for the issue output output false the output parameter of the operation

Associating a ResourceType with a Parameter

Once we have the API, Operation and Parameters, there is essentially enough information to make a HTTP request to Github. However, APIBase allows the use of ResourceType to annotate each Parameter and to give further details. In this case, we will be associating the output Parameter to an Issue ResourceType: First of all, we create a ResourceType by giving a JSON-Schema:

1 POST http:///resourcetype 2 { 3 "name": "Issue", 4 "schema": "{ 5 \"type\": \"object\", 6 \"properties\": { 7 \" id \ " : { 8 \"type\": \"integer\" 9 } , 10 \" u r l \ " : { 11 \"type\": \"string\" 12 } , 13 . . . 14 \" t i t l e \ " : { 15 \"type\": \"string\" 16 } , 17 \"body \ " : { 18 \"type\": \"string\" 19 } , 20 \" user \ " : {

44 21 } , 22 \" l a b e l s \ " : { 23 \"type\": \"array\", 24 \" items \ " : { 25 \"type\":\"object\", 26 \" p r o p e r t i e s \ " : { 27 \" u r l \ " : { 28 \"type\":\"string\" 29 } , 30 \"name \ " : { 31 \"type\":\"string\" 32 } , 33 \" c o l o r \ " : { 34 \"type\":\"string\" 35 } 36 } 37 } 38 } , 39 \" a s s i g n e e \ " : { 40 . . . 41 } , 42 \" milestone \ " : { 43 . . . 44 } , 45 } , 46 \" r e q u i r e d \ " : [ 47 \" id \" , 48 \" u r l \" , 49 \" t i t l e \" , 50 . . . 51 ] 52 }" 53 }

This creates the Issue ResourceType. Then, we can associate the output Parameter of CreateIssue operation to this resource type and will then end up in the graph shown in Figure 3.5.

1 PUT http:///parameter//resourceType 2 { 3 "targetId": 4 }

In the context of API exploration and educational cases, it is possible to add Resources to API to provide additional details. For example, we can add

45 an YoutubeVideo for Github API that explains the API:

1 POST http:///api//resources 2 { 3 "resourcetype": "YoutubeVideo", 4 "link": "http://youtu.be/..." 5 }

Or a resource as simple as a Text:

1 POST http:///api//resources 2 { 3 "resourcetype": "Text", 4 "link": "remarks about Github API..." 5 }

The Resource model can be used to add additional resources for APIs and Operations. Consequently, users can find those related resources in an easy way and without searching for them over the internet.

3.4 Evaluation

We have conducted a user study to evaluate the proposed approach. In this study, we have aimed to evaluate the validity of the following hypotheses[34, 108] in a controlled environment: 1. APIBase is usable with little training for archiving and searching for APIs and resources 2. The features offered by APIBase are effective and comprehensive The experiment were conducted in a controlled environment to study the model by evaluating the implementation. This type of Synthetic Environ- ment Experiment in software engineering[111] has the potential drawback of being too abstract and away from real practice because of the small time window. However, in contrast to long development cycles in software engi- neering, our case targets only a small number of APIs and Operations which

46 can be done in minutes or within an hour.

3.4.1 Experiment Setup

The participants were selected and assumed to have prior API programming experience or had sufficient understanding of REST APIs. Most of the par- ticipants were students from the School of Computer Science and Engineer, at the University of New South Wales, Sydney Australia.

3.4.2 Experiment Session

We first invited small group of the participants to join a short workshop lasting approximately 30 minutes. This served to help elucidate the APIBase documentation, as well as the RESTful interface. Then we proceeded to release the APIBase API and ask the participants to choose an API and add the information into APIBase. During the presentations, we have explained the data models and reasons of the design, as well as a full example of Github API added in the APIBase which is in the document as well. The following shows three stages of the experiment:

Understanding APIBase

In this stage, participants are required to thoroughly read and reflect on the written documentation of APIBase11. Upon reading, participants are supposed to write one paragraph that sum- marizes their understanding of the main contributions of the work. As part of this reading exercise, they will also be required to familiarize themselves with the Swagger documentation12, that can be found at the link below. This

11https://goo.gl/eWLbyk 12http://johnsd.cse.unsw.edu.au:3000/app/swagger/

47 would be useful for the subsequent stages.

Adding APIs

Adding an API to APIBase. In essence, APIBase is a Knowledge Graph that stores and organizes knowledge about APIs. To make things effective, we propose that every piece of information is stored as a Node in the graph, which can also be connected to one another via some Relationship. In this stage, participants were required to add at most one new API into APIBase. With reference to the written documentation that they read earlier, they will notice this needs to be done in stages, such as: (i) Adding basic API meta-information (e.g. name, endpoint); (ii) adding an Opera- tion or set of Operations; (iii) adding Parameters to operations; and finally (iv) structuring entities using the concept of Folders.

Enriching APIs with Resources

An auxiliary feature of the type of knowledge that is constituted within APIBase is the notion of ResourceTypes and Resources. In general, Resources are instance of some ResourceType. One interesting aspect of Resources are that they are versatile entities in APIBase, and could even for instance be used to provide extra supporting information about APIs, such as links, documents, and YouTube videos. For example, this could be a link to a tutorial video about the API. Having a way to associate such information to APIs can be considered as one of the unique advantages that motivated the design and development of APIBase. In this part, participants were asked to enrich existing APIs (this may be the one they added earlier, or some other API already existing in the base). By enrichment, we mean adding Resources (in many cases, the ResourceType

48 does not need to be created, as it exists, e.g. YouTube ResourceType). For special Resources based on a type that does not exist, participants would first need to create and load a ResourceType, and then proceed to create the Resource instance. Participants were required to add at most one Resource.

3.4.3 Questionnaire

After their attempts to add information, the participants were then asked to fill a simple form to evaluate various aspects of APIBase including the ease of using its API and the readability of the documents. Questions were split into four main areas: (a) Background (b) Generic Concepts (c) Functionality (d) Documentation and Improvements. The questions are mostly 5-point rating questions. The Background questions are related to the familiarity of REST APIs and the Swagger Documentation framework. Since the model in APIBase is similar to the Swagger framework, if the participant has prior experience in Swagger, they might have better a understanding of APIBase. The Generic Concepts questions aim to determine if the participants understand the concept of APIBase as a Database for APIs/services, and the relationships between ResourceTypes, Resources and APIs. The Functionality questions were formulated to determine the experi- ence whilst using APIBase. Specifically, how easy it is to add entities. For instance, we sought to find out the difficulties of adding API, Operation, Parameter, Folder, ResourceType and Resource. The Documentation and Improvements questions were given to determine if the documents are sufficient and clear for the participants to complete the task and gain understanding of the graph nature of APIBase.

49 3.4.4 Participant Groups

We conducted the experiment with 23 total participants. The participants all have computer science background but have different level of understanding of APIs in terms of concepts and practice. Based on their answers in the Background section, we split them into two groups.

• API Practitioners(6 participants): Participants who rated them- selves 3 or more in all background questions. This indicated that they were familiar with the concept of APIs and the tools and have experi- ences using them.

• Generalists(17 participants): Other participants. Since the experi- ment were conducted in a Computer Science and Engineering course, most students already had background knowledge of Web services, al- beit the relatively low rating indicated they may have not been confi- dent in this particular area.

3.4.5 Results

Evaluation of Hypothesis H1

Hypothesis H1 assumes that users can add APIs and resources into APIBase with little training. Figure 3.6 shows that participants, regardless of their experiences using APIs, found APIBase easy to comprehend and the API interface easy to use. As can be seen in the figure, API experts have little or no trouble adding API, Operation and Parameters. On the other hand, regarding the resources, both group have rated it more difficult understanding and using it. Figure 3.6 shows their rating on the difficulties of using the features provided.

50 Figure 3.6: Rate of difficulty regarding using the features

Discussion, H1 As expected, the Practitioners were able to learn to use the platform more quickly as the API/Operation/Parameter relationships will be more clear to them. Furthermore, due to the fact that APIBase is provided as-a-service and is currently supporting only the REST API, their previous experiences in RESTful APIs allows them to learn the concept more easily. The Resource model is a new concept for most of the participants so it is also expected to see a higher difficulty rating for adding resources, however overall most of the participants were able to add resource types and resources without help from the developer.

Evaluation of Hypothesis H2

Hypothesis H2 assumes that the features offered in the APIBase are com- prehensive. In general, participants found the Swagger Documentation of APIBase is comprehensive and easy to use (avg. rating 3.3) and the docu- mentation of APIBase slightly harder to understand due to the amount of information and the limited time (avg. rating 2.7). As shown in Figure 3.7, the Practitioners found the concept of APIBase and the Resource model easy to understand and the Swagger Documentation slightly less. The Generalists,

51 Figure 3.7: Rate of difficulty understanding the concepts on the other hand, found the Swagger Documentation easier to understand than the documentations. Discussion, H2 The results indicate that Practitioners had less difficulty understanding the concepts of APIBase including the Resource model. While the Generalists found the Swagger Documentation easier to use when it comes to interacting with the APIBase, it is less informative about the concept behind APIBase and is only useful for testing purposes. It is possible that the Generalists were focusing more on trying to use the REST API and the Partitioners understanding the concepts, and this may have therefore resulted in the apparent differences. However, we have an overall rating higher than 2.5 and consider it an indication that the concept of APIBase is easy to understand.

3.5 Related Work

With the rapid growth of SaaS service market, the lack of standards within the most popular interface | “REST” makes it significantly difficult to manage effectively sifting through the documentation and the resources of different

52 APIs. Websites and frameworks were thus proposed in attempt to standard- ize different aspects of this problem. There are two main problems these platforms addressed: API Documentation, API Management.

3.5.1 API Documentation

From various communities, different framework for documenting APIs are proposed with Swagger13, RAML14, API Blueprint15 being the most popular ones. Although presented in different languages (JSON, YAML, Markdown) they all focus on REST APIs with the concept of resources (as in REST). These documentation frameworks feature machine-readable documents and thus allows various forms of automation to improve the productivity when developing with APIs. Tools to automatically generate tests and SDKs are then also possible because of the machine-readable formats, this therefore greatly simplifies the development cycle in either developing APIs or using one. On the other hand, Websites like ProgrammableWeb16 and SDKs.io17 targets API cataloging and resource management. They provide multi-level categories and search interfaces to provide easier access to documents for developers looking for specific type of APIs. SDK.io also generate SDKs for different programming languages to access the REST interfaces based on existing documentation frameworks. These websites provide access to different forms of documentations. However, the documentation they provide are mostly a redirection to the original websites of the API and so does not provide additional value.

13http://swagger.io 14http://raml.org 15https://apiblueprint.org 16www.programmableweb.com 17www.sdks.io

53 3.5.2 API Management

API Management platforms provides supports in managing the use of APIs. In addition to providing access to documents, they also provide service to indirectly access the APIs to monitor or control the usages. Websites like apigee18 and Mashape19 are examples of API Management platforms. Similar to a Enterprise Service Bus (ESB), they act as hubs to different APIs and allows the service vendors to control the traffic and rate-limits as well as to charge users for the usage. In order to monitor and control the usage, the platforms usually come with the documentation system mentioned above, or in some cases provide one built by themselves. While these plat- forms provide some form of documentation, it is usually unstructured and are not publicly accessible through API.

3.5.3 Web-Services Repositories.

Due to the revolution of Web based technologies, business environments are changing rapidly. In this context, Web services are considered as a key tech- nology to realize e-business. SOAP and REST are two widely accepted rep- resentation approaches for services [85, 5, 45]. Service repositories help in building systems that can operate in a broader collaborative system. This emphasis the need to store services in repositories for utilizing them in the course of application development. Global repositories, such as UDDI stan- dard, have been proposed earlier to register and locate web service applica- tions. UDDI has not been as widely adopted as its designers had hoped, as the emphasis has shifted to simply relying on Web-based engines in order to locate services. An Example of this is RESTful services, where the goal

18https://apigee.com 19https://www.mashape.com

54 is to use best practices for creating maintainable, extensible, and scalable web services. In our approach, we provide a knowledge graph repository to support and simplify service-execution and integration in development in an easy manner.

3.6 Conclusion

In this chapter, we discussed the difficulties of learning and utilizing APIs caused by the lack of a platform, to gather relevant resources, and the lack of standard API descriptions. These technology gaps make designing API- based system overly complex and often hard to maintain. Even though online services are multiplying, the knowledge and skills required to take full ad- vantage of APIs are still lacking in most organizations. Addressing the challenges, we proposed the Knowledge Graph for APIs, a knowledge graph where service-integration-related logic can be abstracted, organized and re-used by developers. We also proposed the Resource model to abstract the interactions between APIs, describing relationships between different operations in the APIs. Moreover, we present an implementation of the knowledge graph, APIBase, supporting both text and graph-based queries, as a REST service. Finally, we conducted a user study to evaluate the service and the results indicated that the service is easy to understand and use as an API repository.

55 Chapter 4

Declarative Language for Composing Integrated Process over APIs

4.1 Introduction

Web APIs play a vital role in the implementation and integration of business processes. The capacity to automate between APIs indicates clear value in reducing the costs and time needed to develop and integrate applications; in turn enabling to achieve greater degree of business flexibility [30]. Process automation techniques have evolved over various waves: Firstly, there was an integration challenge that led to a large body of research and development in areas such as data integration [112], software components integration, enterprise information integration (EII) and enterprise applications integra- tion (EAI). Thereafter, the focus transformed into service integration and composition [5, 51, 62, 71], which focused on making standalone “software components” to work together, [15]. Finally, the “processes” wave sought to

56 formalize process phases from end-to-end, including: process definition, to enactment, as well as diagnostics, which encompassed monitoring, tracking, analysis and prediction of business processes, [31, 106]. As mentioned in Chapter 3, although API-based integration techniques are quite effective for building composite applications, there are crucial gaps in the SaaS-enabled endeavor: The large number of available services do not communicate with each other, rather employed in an ad-hoc manner. More- over, the reuse of such ready-made services implies conforming to a fixed set of embedded features allowing little room for customization. Alternatively, even if a collection of such services are used for different portions of tasks, this inevitably leads to “shadow processes”, where synchronization between such services is handled in an ad-hoc manner (e.g., transferring a file from one service to another, is often accomplished in a number of non-traceable steps via manual tasks, such as email or the like). Empowered by the API Knowledge Graph, we provide a novel process customization and deployment platform, called Case Walls, which enables: • Professional process developers to incrementally create modular collec- tions of tasks - reusable and customizable process fragments (referred to as a “case”). E.g. Create an issue on a project management service; upload a file into a document management service; send an email when a coworker uploads new version of a file; post videos and photos into social media services, etc. • A simple, declarative yet powerful language that allow case-workers to search existing tasks and compose into customized definitions. Com- posite tasks are abstracted as reusable cases in the AKG for further reuse. The tasks search component uses a “context” to describe the task “intent” and “objective” (e.g, upload a file, create an issue). Thus,

57 using the business scenario to query the AKG can return tasks that are appropriate for the given context. • An event/activity “wall” to inform case-works about task progress; to- gether with a simple and declarative language to enable such partici- pants to uniformly and collectively react, interact and collaborate on relevant case. • The above is supported by a unified, knowledge-based event-bus for case orchestration. The knowledge required at runtime to orchestrate cases (i.e, detecting events, executing tasks, invoking APIs) is automatically extracted from the defined case definitions and expressed as event- action case orchestration rules. (We thus reuse a rule-engine as our execution environment.)

58 4.2 Knowledge-Reuse-driven and Declarative Case Definition Language

The notion of Case conceptualizes a lightweight process (set of service in- teractions). A case is thus defined to consist of: Tasks, People and Event. Tasks indicate the services and the features needed in the interaction. Peo- ple describes who have access to the Case, especially the owner who has the privilege to edit the case. Cases can contain both automated tasks as well as manual tasks when human discretion and thereby intervention is required. However, Tasks can also monitor Events (or patterns thereof) which may serve as notification to participants (e.g. perform some manual task). More- over, as Cases themselves are represented as nodes in the AKG which can be curated and reused in a modular manner. While the platform is exposed via a RESTful interface, we further propose a higher-level command-line Case Search, Definition and Interaction Language.

4.2.1 Knowledge-Reuse Language

Selecting the required tasks for a Case may not be trivial with an extensively populated knowledge-graph. We thus propose an effective search component (utilizing the index of the tasks objective and an iterative keyword search approach [63]). The closest matching tasks are thus recommended to the user, based on the objective tags described in Section3.2. Albeit, the decision of whether a Task (or Sub-task) should be included finally relies on the Case designers’ discretion.

59 expression ::= ”/” op ::= ”task” #matches tasks against all possible related keywords expression ::= ”/”| ” resource” #tasks that are related to the resources matching the keywords op ::= | ”taskinput” ” #tasks#matches that tasks consumes against the all resources possible relatedmatching keywords the keywords | ”resourceoutput” ” #tasks that areproduces related the to theresources resources matching matching the thekeywords keywords | ”inputAPI” ” #tasks that consumesare directly the related resources to the matchingAPI specified the keywords | ”outputevent” ” #tasks that producesare monitoring the resources the events matching specified the keywords | ”APIcase” ” #tasks#directly that matches are directly cases related against to allthe possible API specified related keywords keywords ::= {} | ”event” #set#tasks of keywordsthat are monitoring to perform the the events search specified | ”case” #directly matches cases against all possible related keywords keywords ::= {} #set of keywords to perform the search Figure 4.1: Search Language Syntax expression ::= | new_case ::= ”CREATE CASE” expression ::= [][ service>][][] new_caseextend ::=::= ””EXTENDCREATE CASE”” name> [][][][][] extend ::= ”EXTEND CASE” own ::= []”OWNED BY [][ events>][] shared ::= ”SHARED WITH” < user > [{”, ” < user >}] ownusing_service ::=”::= ”USINGOWNED SERVICE BY” < user”[{”,”}] > sharedmonitor_events ::= ”SHAREDMONITOR WITH EVENTS” < user” < event > [{”, >” <[{”, user ” < >}]event >}] using_serviceinclude_tasks ::=::=” ”USINGINCLUDE SERVICE TASKS”[{”,”}]” < task > [{”, ” < task >}] monitor_events ::= ”MONITOR EVENTS” < event > [{”, ” < event >}] include_tasks ::= ”INCLUDE TASKS” < task > [{”, ” < task >}]

expression ::=Figure 4.2: Case Definition Language op ::=< op task > | < op resource > expressionop_task ::=::= ”#” 4.2.2opop_resource Declarative ::=<::= ” @op” task > | Case< op resource Definition > Language op_taskparams ::=::= ”#” |{ ”=” } op_resource ::= ”@” Casesparams can be ::= defined (or |{ extended) ”=” } using the language as defined below. The syntax contains governance policies over both people and services. For people, constructs such as OWNED BY (permission for editing/updating); SHARED WITH (permission for interacting). For services, the USING SER- VICE construct indicates authorizations information - as the owner has to specify which authorization to share between the users of the Case. The MONITOR EVENTS construct details which events are to be monitored and notified to the participant users. While INCLUDE TASKS configures tasks related to the interaction. Figure 4.3 illustrates an example of a GitHub Code Review Process with

60 Pivotal Tracker. Ordinarily, using Github alone, the Lead Developer may review code by requiring the Engineers to submit their code in forms of Pull Requests1 and then review it on Github. However, if the project manager wish to monitor the review progress on Pivotal Tracker, they will have to manually create review tasks (stories) for lead engineer every time a review is needed. Using Case Walls, we can simplify this process by defining a case and linking automated tasks to auto-create stories when receiving a pull request. Events can also be monitored, notification posted to interested participants and thereby manual tasks (e.g. upon completing the review, merge the code into upstream, and close review process) can also be accomplished.

4.2.3 Declarative Case Manipulation Language

Finally, we propose a interaction language to interact with the Case during execution. Interactions may be with both manual tasks (awaiting human intervention), as well as automated tasks (as in tapping into some 3rd-party service).

1https://help.github.com/articles/using-pull-requests/

CREATE CASE “CodeReview” OWNED BY “Project Manager” SHARE WITH “Project Manager”, “Lead Developer” USING SERVICE “GitHub”, “Pivotal Tracker” MONITOR EVENTS PullRequest, #pull request has been ‘received’ PRMerged, #pull request has been ‘merged’ ReviewFinished #review has been completed INCLUDE ACTION MergePR, #merge a pull-request on GitHub CreateStory, #create a story on pivotal tracker FinishStory, #mark a story finished on pivotal tracker StartStory, #start a story on pivotal tracker DeliverStory, #deliver a story on pivotal tracker AcceptStory #mark a story as accepted on pivotal tracker INCLUDE TASKS CreateReviewOnPR, #invokes ‘CreateStory’ to create a new story to do review DeliverOnPR, #invokes ‘DeliverStory’ when a pull-request is received MergeOnFinish #invokes ‘MergePR’ when a review story is done

Figure 4.3: CodeReview Case Definition Example

61 expression ::= ”/” op ::= ”task” #matches tasks against all possible related keywords | ”resource” #tasks that are related to the resources matching the keywords | ”input” #tasks that consumes the resources matching the keywords | ”output” #tasks that produces the resources matching the keywords | ”API” #tasks that are directly related to the API specified | ”event” #tasks that are monitoring the events specified | ”case” #directly matches cases against all possible related keywords keywords ::= {} #set of keywords to perform the search

expression ::= | new_case ::= ”CREATE CASE” [][][][] extend ::= ”EXTEND CASE” [] [][][]

own ::= ”OWNED BY” < user > shared ::= ”SHARED WITH” < user > [{”, ” < user >}] using_service ::=”USING SERVICE”[{”,”}] monitor_events ::= ”MONITOR EVENTS” < event > [{”, ” < event >}] include_tasks ::= ”INCLUDE TASKS” < task > [{”, ” < task >}]

expression ::= op ::=< op task > | < op resource > op_task ::= ”#” op_resource ::= ”@” params ::= |{ ”=” }

Figure 4.4: Case Manipulation Language

Figure 4.5 illustrates Case Walls for 3 participants in the Code Review process defined earlier. On the left, we see a set of notifications - this informs the participants what actions to take (if any). Actions might be interact- ing with some external software service (e.g. [P] Pivotal Tracker, and [G] GitHub]), or performing some manual task (i.e. [MT]). However, moreover behind the scenes automated tasks (i.e. [AT]) are also being performed as defined in the Case. It is thus apparent without Case Walls all interactions would be done manually, with little or no flexibility. Not only do Case Walls help automate certain tasks, it also automates the notification process - thus

[P] Create Story “S1” and assign to Eng1 “PullRequest” (has been received by Eng1)

[MT] #task “AcceptStory” (accept S1) “PRMerged” (reviewed code merged)

[MT] #task “AcceptStory” (accept S2) Project“Review Finished” (code review done!) Manager

“StoryCreated” (New Story S1 Created) [P] Eng1 starts working on “S1”

[G] Eng1 finishes “S1” and send pull-request

[AT] #task “DeliverOnPR” (deliver S1)

Engineer “1” [AT] #task “CreateReviewOnPR” (Story S2)

“StoryCreated” (New Story S1 Created) [P] Lead-Eng begins “S2” to Review Code

[G] Lead-Eng completes “S2” and “Merges” “StoryCreated” (New Story S2 Created)

Lead [AT] #task “DeliverOnPR” (deliver S2) Developer Case Walls Task Interactions

Figure 4.5: Case Walls with Illustration of Interactive Behavior

62 making it more simpler for participants to identify what needs to be done. Subsequently, the interaction language can be used to call upon manual tasks in a simple manner. As for implementation, the semantics of the language are translated into Rule-based expressions for the purpose of execution (refer to next section, at Section 4.3.3).

4.2.4 Illustrative Example

To illustrate the use of the languages described above, we employ the scenario described in Section 1.2 as an example, and show how we can use the Case Definition Language(CDL) and Manipulation Language(CML) to model the process. We assume that the tasks and events information are already present in the AKG. This section is structured to show the usages as follows: 1. Create a Case for Project Manager 2. Manipulating Stories 3. Create a Case for Engineer

Create a Case for Project Manager

To create a Case for Project Manager, we first analyze the use case of a Manager. The Manager needs to (a) Create/Assign/Accept/Reject Stories (b) Monitor the progress of each Story. Therefore, we will create a Case as follows:

CREATE CASE "Case_PM" OWNED BY "Project Manager" USING SERVICE "Pivotal Tracker" MONITOR EVENTS "StoryCreated", StoryFinished", "StoryDelivered", "StoryComments" INCLUDE ACTION "CreateStory", "AssignStory", "AcceptStory", "RejectStory" INCLUDE TASKS "CreateAndAssignStory"

63 As shown, the above statement creates a Case which the Project Manager will be using to interact with Pivotal Tracker. This Case monitors the events related to Story being updated. For instances, the StoryDelivered event will be shown when a story is delivered, and the manager will have to inspect the delivered work to make further decision.

Manipulating Stories

After the Case is created for manager, the manager can then do story-related actions. For example, if the manager wants to initiate a new development cycle, she can then run the CreateAndAssignStory task to create a story and assign it to an engineer:

#CreateAndAssignStory title="NewFeature" description="please work on this new feature" assignee="EngineerA"

With this statement, the manager can create a new story and assign it to the engineer without actually going to the Pivotal Tracker website. Once the story is created, she will also receive the event stating that a new story has been created since a StoryCreated event is also monitored. While CreateAndAssignStory task creates and assigns the story, it is also possible for the manager to use CreateStory and AssignStory separately since they are included in the Case. This way, the Case can be used in a more flexible manner.

Create a Case for the Engineer

Managing the services used by the Engineer is a little bit more complex since it requires interaction between Pivotal Tracker and Github. However, modeling the interaction with Case is much simpler. We first identify the

64 interactions that the Engineer would need: (a) Monitor new story being created (b) Start working a story (c) Finish and Deliver the story after the feature is implemented. With the needs identified, we can now create a Case:

CREATE CASE "Case_Engineer" OWNED BY "EngineerA" USING SERVICE "Github", "Pivotal Tracker" MONITOR EVENTS "StoryCreated", StoryFinished", "StoryDelivered", "StoryComments", "CommitPushed" INCLUDE ACTION "StartStory", "FinishStory", "DeliverStory" INCLUDE TASKS "FinishOnPR"

The Case indicates that the Case_Engineer will monitor events about Sto- ries as well as Commits being pushed on Github. These events will help EngineerA monitor her own progress and new tasks assigned to her. Upon receiving a Story assigned to her, she can then run the StartStory to report that she will now start working on the story. When she finishes the im- plementation, she can simply push her commits to Github and send a pull request. The FinishOnPR Task here is an Automated Task which is triggered when a pull request is made and automatically invokes the FinishStory ac- tion. Employing the Case, the Engineer will no longer have to go to Pivotal Tracker and Github separately and manually Finish or Deliver the stories.

4.3 Implementation

4.3.1 Architecture

Figure 4.6 illustrates the system architecture and interaction of the main com- ponents of the Case Walls platform. At the heart of the system is the API Knowledge-Graph (AKG) for Case-processes. It maintains the ontological

65 relationships between key entities and facilitates task/case-based processes. Manipulation as well as effective searching of the case-based KG are con- ducted via the respective components (as shown). The Event-Management system: collects raw event-data from different services; and thereby, processes them using the patterns in the KG. This feeds into the Rule-Engine which performs pattern-matching and infers which actions to perform by calling upon the Task Execution Engine. Overall, Case Walls has been implemented and exposed via a RESTful interface, together with an event-notification system. Event-notifications can either be PuSHed or Polled, and effectively formulate the Case-based Activ- ity Walls. Process participants can then take actions to interact/manipulate tasks. While at the time of writing the interface provided is programmatic only, future plans are to implement a GUI of the case-walls; or AS IS, we expect the platform to be extensible enabling 3rd-party higher-level and cus-

ProcessProcess 3rd-PartyAPP “Case Walls” ProPro Higher-APP 000000000000000 Level 00000 0 000 0000 000000000000000 APPs 00000 0 000 0000

Declarative Case Language Event-Notifications RESTful Interface

Case Walls Service Invoke Operations

Searching/Manipulation Task Execution Rule-Engine Component Engine

Inject Events Lookup Add Entities Entities

Event Management Events… Event Patterns System

Knowledge- Graph (KG) Web-Services Cloud

Figure 4.6: System Architecture of Case Walls

66 tomizable applications.

4.3.2 Event Bus

The Event Bus (EB) is implemented to aggregate and process the events from different services. We use Fluentd2 for aggregating and dispatching the events received from various services, as well as Norikra3 and Esper EPL4 for processing and generating high level events. For instance, we are able to define an event with only the attributes we need (thus masking the differ- ent payload and/or structure of the events). Higher-level events also enable defining an event based on a series of targeted events rather than just a single event. Likewise, we utilize MongoDB5 and ElasticSearch6 for archiving and indexing event data. For event collection, we implement an event-collecting application that connects to different services. Collected events are sent to Fluentd to dispatch to component for further processing and indexing. The Event Bus also implements an event publishing interface using ZMQ7 pub/sub model. By utilizing ZMQ we provide a event publishing channel for client to subscribe. Once subscribed, clients will start receiving events processed by the Event Bus. ZMQ allows us to create channels using a socket-like8 interface. Instead of establishing a standalone message-broker like MQTT9 or RabbitMQ10, the ZMQ allows the EB to build the messaging system in a more flexible manner.

2http://www.fluentd.org/ 3http://norikra.github.io 4http://esper.codehaus.org 5http://www.mongodb.org 6http://www.elasticsearch.org 7http://zeromq.org 8http://man7.org/linux/man-pages/man2/socket.2.html 9http://mqtt.org 10https://www.rabbitmq.com

67 Event Collector

The Event Collector is written in Go11 utilizing the robust concurrency model Go provides. Event Collector implements several components: OAuth inter- face, HTTP server for handling pushed notification and a background worker for polling events from different services. Two kinds of events collecting mech- anisms are implemented:

1. Pushing Services like SendGrid, Twilio, , Pivotal Tracker let users register a callback url, often called webhook, to where the service will send a request when events arrive. There is also a proto- col call PubSubHubBub [41] proposed by Google trying to standardize this event delivery method. Despite the different implementations from various services that we’ve studied, we discover that it can simply be described as HTTP requests carrying events. Therefore, we have im- plemented an HTTP server that is able to receive events sent from the services and process them. This method, comparing to the tra- ditional polling or long-polling, is far more efficient and require less efforts maintaining the connection.

2. Polling Some services use the legacy event polling model, requiring the client to constantly check whether there are new events. Another vari- ation of it developed lately is long polling, sometimes called comet12, designed to reduce the response time and connection overhead by estab- lish a keep-alive HTTP request listening for events. Services providing this kind of mechanisms include Twitter, Twich.tv and Plurk. To sup- port this, we’ve also implemented a multithreaded event collector to constantly poll for new events.

11https://golang.org 12http://www.webreference.com/programming/javascript/rg28/index.html

68 As depicted in Figure 4.7, the events are gathered either through the back- ground workers or received from a service directly from the HTTP server. The HTTP server acts as an event receiver that assigns a unique URL for each webhook service receive callbacks. The background workers polls us- ing either normal polling or long polling to constantly look for new events. Events received are then transformed as (t, D, s, u) tuples in which t is the timestamp, D is the raw payload received from the service, s refers to the service and finally u, the user relevant to this event.

Figure 4.7: Event Collector collecting events from services

Event Patterns

Services usually provide events to notify user that something has happened, or to inform the user if they have either successfully performed, or failed performing some action. However, every service has its own event data struc- tures. For example, Github has 35 different types of events as of the writing of this thesis 13 and each has a its own data structure. Google Drive, on 13https://developer.github.com/v3/activity/events/types/

69 the other hand, has only one common structure for its events14 but can be interpreted as many kinds of change events depending on the actual change of files. Therefore, to provide a reliable event bus, we used Norikra to process the event data coming from different sources and generate events based on the patterns defined in AKG. Norikra uses Esper EPL to process and generate events. Using a SQL- like language, it can process and extract data from incoming events. EPL is a powerful and extensible language which not only provides extracting information from single events but also from a series of event in a window. For instance, if we are to process an Github Push Event, filtering out the owner, name of the repository and the hash of the commit we can use the following EPL query: select data_github.owner as owner, data_github.repository as repo, data_github.commit as commit from raw where service=’github’ and type=’push’

This statement indicates that when an event is received, Norikra will check if the event is sent from Github and is a push event. If such an event is detected, Norikra will then generate another event extracting the owner, repository and commit attribute from the original event. As mentioned above, EPL also supports detecting complex event. Instead of processing events individually, we can also define an event pattern that detects a pattern in a series of event. For instance, the following example captures the event sequence where a merge commit follows a normal commit, indicating that some features are being merged into the repository. select b.data_github.commits[0].sha as commit, ’merge’ as type from pattern [ every a=raw(service=’github’ and type=’push’) -> b=raw(service=’github’ and type=’push’ and b.data_github.commits[0].parents[0] = a.data_github.commits[0].sha)] 14https://developers.google.com/drive/v2/reference/changes

70 4.3.3 Case Orchestration Rules

As earlier mentioned, Case Walls has been exposed as a RESTful service, which we have implemented using Node.js that is built upon the Knowledge- Graph and Event Management System. Layered further upon this, we provide a more higher-level and declarative case manipulation language (refer Sec- tion 4.2). However, to implement this language the semantics are translated into Rule-based expressions, denoted: Rtype :(Events → Actions). This effectively means, rules are generated from case definitions. Once deployed, the event-bus can detect relevant event-patterns, and working in conjunction with the rules-engine provision the execution of cases. Since we mentioned there are two main types of tasks: manual and automated. There are also two corresponding types of rules. Automated-task rules, denoted Rautomated consist of service-related events (e.g. PullRequest ); and task-actions (e.g.

DeliverOnPR ). Manual-task rules, denoted Rmanual may additionally consist of special internal events and actions. While not always necessarily utilized, they are at the disposal of the developer in order to grasp better control over manual tasks. In particular, they may prove useful to manage UI compo-

15 nents associated with manual tasks. For example, if an event ei triggers some manual task to be performed, the rule Rulemanual :(ei → ax) may be defined, where ax can prepare or perform some pre-processing to some UI component. Likewise, another rule Rulemanual :(ei.ej → ay) could denote that the manual task has been completed, where ej is some UI event that the task was completed, and ay could then be some post-processing action. To better demonstrate case orchestration rules in the case of automated- tasks, we illustrate as shown in Figure 4.8 the list of rules that is generated

15Even with some 3rd-party tools, developers may tap into the tool (e.g. via Webhooks as in the case of Github). Alternatively, we may also refer here to custom UI components built by the developer to reflect certain manual tasks.

71 to realize the process described in the example Code Review case that we defined earlier (refer Section 4.2). On the left-hand side we recall the events, actions and tasks defined as part of this case. In this case, we have one rule defined for each task, resulting in three automated-type rules as described, in order to provision the given case.

Case: CodeReview Events Tasks e: PullRequest t: CreateReviewOnPR //task “DeliverOnPR” e: PRMerged t: DeliverOnPR Rule : (PullRequest -> DeliverStory[S1]) t: MergeOnFinish automated e: ReviewFinished . . . //task “CreateReviewOnPR” Actions Ruleautomated : (PullRequest -> CreateStory[S2]) a: MergePR . . . a: CreateStory //task “MergeOnFinish” a: FinishStory Ruleautomated : (ReviewFinished -> MergePR) a: StartStory . . . a: DeliverStory a: AcceptStory

Figure 4.8: Example of Generated Case Orchestration Rules

Depicted in Figure 4.9, when a Case is created, the system create a Case entity in the Knowledge Graph then begin listening for specified events from the Event Bus. After an Event is received from 3rd-party services, it is sent back to Case Walls to check if it matches any Rules. If so, the corresponding Task is then invoked, finishing the Rule triggering process. Upon the creation of a Case, Case Walls register an event listener in EB to listen for the specified Events. Then, when an event is received from EB, a looks up in the rules specified in this Case is performed to find a match. If there is one, the action in the rule is then triggered. The Task Execution Engine will than look up the required information in the KG and perform the API call. The Task performed could trigger another event being sent from the service and if there is another rule set to capture this event this same process will go on until no other rule is triggered. The Task Execution Engine implements a Service Bus[12]. When a Task is sent from Rule Engine to execute, it looks up required parameters and

72 Figure 4.9: Interaction between Case Walls system, Knowledge Graph and Event Bus configurations in the Knowledge Graph, and with the authorized tokens it prepares and send the request to the actual API endpoint.

4.4 Evaluation

Evaluation Objectives. To evaluate overall effectiveness, we assessed the following hypotheses: CaseWalls is capable of (a) Improving the productiv- ity to model, reuse and execute customized service-oriented processes; and (b) Increasing the efficiency of application maintainability for agile service integration.

Experimental Setup. To assess the validity, the experiment was conducted by implementing a real-life use-case scenario. Analysis was then con- ducted via comparison to other approaches (incl. Javascript, Java, BPEL,

73 Yahoo! Pipes). We divided our scenario into 2 phases; where the latter phase was to add onto the former, thus assessing ease of maintainability. Overall productivity was then measured as: (a) Time taken to complete task; (b) Total number of lines-of-code (LOC) excluding white-space; and (c) Number of extra dependencies needed.

Use-Case Scenario (Code Review & Development Cycle). Version Control Systems (VCS) are very common in software engineering - they help avoid collision and improve traceability. While it is important to find where the bug is introduced and revert it, peer review also helps to bring forward discovery of such bugs. Github is one of the most popular online open-source repositories for code. Likewise, Pivotal Tracker (PT) offers a good story- tracking system, to help the team keep track of their progress. Phase 1 of this scenario involves integration of these two tools in the basic workflow described below:

1/ Project Manager PM creates a Story and assigned to Engineer

2/ Engineer starts working on the Story

3/ Engineer completes programming task and pushes onto Github

4/ Engineer finishes and delivers the Story

5/ PM accepts/rejects the delivery

Effectively, Github + PT integration may be implemented by parsing commit messages for syntax in the form of: “#(number) ”, such as: [Starts #12345, #23456] ... [Finishes #12345] ... [Delivers #12345]. If any such messages are detected, the corresponding action will be performed in PT. For example, if the engineer commit message containing [Finishes #12345],

74 when Github receives this commit, it will automatically finish that story in PT. This helps simplify the workflow by eliminating the otherwise manual work done within PT. While this basic integration provides an initial improvement to eliminate the manual creation, start, finish and delivery of a PT “story”, Phase 2 involves adapting it to “continuous integration (CI)”. The notion of CI, as prominent in software engineering today, calls for “continuous” testing when- ever new changes are made. This would thus significantly alter the semantics of the deliver action. This means, at Step 4, we may want to introduce ad- ditional (and iterative) stories (cf. Steps 2-4) for: testing and deployment before closing this change.

Experimental Results. We set out to prove (or disprove) our hypotheses; the results are illustrated in Figure 4.10. Outrightly, BPEL was excluded as realtime events are not available without writing custom extension to the engine. The same applied for Yahoo! Pipes, as it could not receive real- time events via webhooks16. Hypothesis H(a) was evaluated as the time to complete both tasks (excluding setup time) and LOC. Using CaseWalls this resulted in only 30mins, compared to an average of 245mins using other approaches (decrease of ∼ 88%); while LOC was 53 compared, to an average of 376 (decrease of ∼ 86%), respectively. For CaseWalls setup included time to create all the Operations/Events/Tasks in the KG; whereas for Javascrip- t/Java this meant downloading and installing the requisite SDKs (where ap- plicable). CaseWalls also resulted in less overall setup time. H(b) was then measured as the cost of implementing Phase 2 (including any setup time).

16A possible solution could be done using feeds (rss/atom) for receiving the events, and a web-query framework, such as YQL to make requests. However, doing so is less interactive, less efficient and also requires writing sufficiently complex javascript.

75 CaseWalls resulted in only 25mins to implement compared to an average of 220mins (decrease of ∼ 89%), with 30 and 93 LOC respectively (decrease of ∼ 67%). 400 500 Phase 1: 320 400 setup impl. time 240 300 impl. LOC

160 200 Phase 2: setup 80 100 impl. time Lines of Code (LOC) Code Linesof Time to Complete (min)Complete Timeto impl. LOC 0 0 Java Java Case Walls Javascript Case Walls Javascript

Figure 4.10: Evaluation Results for GDrive Contribution Calculator use-case Overall, it was clear both time and LOC is significantly reduced when using CaseWalls. We thus validate both our hypotheses as true with very promising results. Moreover, our approach did not require any additional libraries, whereas others required on average at least 2-3. CaseWalls also provided the facility of increased transparency, as well as agile participant control - compared to other solutions which were rather rigid. In light of these results, this evaluation study successfully demonstrates the anticipated benefit of our proposed approach.

4.5 Related Work

While advances in APIs, SOA and BPM enabled tremendous automation op- portunities, new productivity challenges have also emerged. To address this challenge, we proposed a framework to simplify the integration of disparate services and effectively build customized processes. In this section, we briefly discuss the related work into three main areas: Web-Services Repositories, Process Management, and Case-Management.

76 Process Management Business processes are central to the operation of both public and private organizations. Recently, business world is getting increasingly dynamic, as various technologies such as Internet have made dynamic processes more prevalent. In this context, processes will be less well-defined and more ad-hoc: in the real world, true end-to-end processes are rarely thoroughly well-defined and are often presented as a mixture of structured/unstructured and knowledge-intensive processes. Software-as-a- Service and other Web technologies have been introduced to provide good support for social interaction, transparency and productivity, and to facilitate the modeling of the control flow between activities: modern processes have flexible underlying process definition where the control flow between activities cannot be modeled in advance but simply occurs during run time. Case-Management Case Management, also known as case handling, defined as a common approach to support knowledge intensive processes [60, 4, 3]. When a customer initiates a request for some services, the set of in- teractions among people, e.g. customer and other relevant participants, and artifacts from initiation to completion is known as the ‘case’. While well conceptualized, it’s actual implementation still remains vague and depends on it’s context of use, [25]. Production Case Management(PCM)[75, 72] is a paradigm to organize and structure the daily work within organiza- tion, defining execution alternatives at design-time. A PCM framework was proposed[72] to model the business process. The concept of fragments was in- troduced as smaller process logic, providing flexibility for different cases with data dependencies and state modeling. Derived from BPMN, it provides a more user-friendly framework comparing to scenario-based modeling with petri nets[38]. ProcessBase[16], introduced the concept of “Hybrid-Process” together with a “domain-specific model”, addressing the technological gap in

77 process support and provided a implementation of general use for building Hybrid-Processes.

4.6 Conclusion

In this chapter, we proposed a declarative process customization and de- ployment platform, namely Case Walls. Case Walls combines various pro- cess paradigms: from structured process to ad-hoc style process, particularly supported by the Knowledge Graph. We do this, whilst also superimpos- ing the prescribed requirements that Case-Management has offered. Our work is inspired and builds upon previous work including ProcessBase [16], as well as inspired by efforts in general knowledge graphs such as linked data. Case Walls utilizes the underlying model, albeit focuses on a unique knowledge-driven graph of key entities and relationships. As did Freebase for encyclopediac information, the knowledge-base in Case Walls is targeted at provision agile service integration capabilities. We have thus proposed a novel API Knowledge Graph (AKG) to facilitate the organization, integra- tion, querying, and reusing of the case management knowledge. Moreover, the ability for case-works to “re-use” process knowledge is a vibrant change to most existing process platforms. Empowered by this knowledge graph, we also provided a novel case cus- tomization and deployment platform, which we have implemented and ex- posed as a RESTful service. However, to even further increase user efficiency, we introduced a simple, declarative yet powerful language to query and an- alyze the knowledge graph. Unlike existing and previous works, Case Walls also focuses on the transparency aspect of mid-process knowledge. The con- cept of “walls” thus act as an activity wall akin to social status updates,

78 albeit instead updates are sourced as relevant events from case tasks. Par- ticipants are then empowered to track the case execution, and react/interact accordingly. Experimental results shows promising results, in particular addressing the dimensions of increased user-efficiency. In future, we are excited to enhance and extend the language power and expressivity - as well as implement a novel graphical user-interface that mimics social-networking platforms. Whereby case-based process functionality can effectively be combined within everyday tasks. We are therefore very optimistic this work provides the foundation for future growth into a new breed of enhanced process-support.

79 Chapter 5

Conclusion and Future Work

In this chapter, we summarize the contributions of this dissertation and dis- cuss some future research directions to build on this work.

5.1 Concluding Remarks

Business world is getting increasingly dynamic as various technologies such as social media and Web 2.0 have made dynamic processes more prevalent. For example, email communication about a process, instant messaging to get a response to a process related question, allowing business users to generate processes, and allowing front-line workers to update process knowledge (us- ing new technologies such as process wikis) [18] makes the use of complex, dynamic and often knowledge intensive activities an inevitable task [19]. Such ad-hoc processes have flexible underlying process definition where the control flow between activities cannot be modeled in advance but simply occurs during run time [20]. In such cases, the process execution path can change in a dynamic and ad-hoc manner due to changing business require- ments, dynamic customer needs, and people’s growing skills. Examples of

80 this, are the processes in the area of government, law enforcement, financial services, and telecommunications [17]. In this dissertation, we focused on the problem of building integrated applications using Application Programming Interfaces (APIs). We have discussed the difficulties of learning and utilizing APIs caused by the lack of a platform (to gather relevant resources) and standard API descriptions. We have also discussed the challenges in building composite application with API-based integration techniques. To address these challenges, we have proposed a novel process customization and deployment platform to assist knowledge workers in an easy way. Below, we summarize the most significant contributions of this dissertation:

• We have proposed to use a Knowledge Graph were service-interaction- related logic can be shared and re-used by developers. The knowledge graph is a novel foundation introduced to accumulate current dispersed APIs and the case knowledge in a structured framework.

• We have proposed a Resource model to abstract the interactions among APIs in the Knowledge Graph.

• We have presented an implementation of the knowledge graph,APIBase, supporting both text and graph-based queries, as a REST service.

• Inspired by previous work in ProcessBase [16], we proposed a declar- ative language for composing integrated process over APIs, enabling professional process developers to incrementally create modular collec- tions of tasks.

• Supported by a knowledge-based event-bus for case, we presented an event/activity “wall” to inform case-works about task progress; together

81 with a simple and declarative language to enable such participants to uniformly and collectively react, interact and collaborate on relevant case.

5.2 Future Directions

In this dissertation, we have investigated the problem of building integrated applications using Application Programming Interfaces (APIs). We believe that this is an important research area, which will attract a lot of attention in the research community. In the following, we summarize the future work directions for this dissertation:

• We plan to extend the use of Knowledge Graph by investigating the indexing of APIs’ features. By describing APIs and their Operations using features, developers can search for relevant resources more ef- ficiently. For example, if a developer is building an application that requires file uploading and cloud storage functionality. If searching by features is enabled, instead of searching for keywords like “Dropbox” or “Google Drive”, she can search for “file uploading” and “cloud storage” features and find all the APIs related to the search criteria.

• We plan to incorporate Natural Language Processing (NLP) techniques to enhance the experiences searching for APIs. While features of APIs are typically strictly defined and have domain-specific keywords that makes them easy to find for domain experts, the APIs could be difficult to discover for developers with less experiences in the domain or for students trying to learn APIs. With the assist of NLP techniques, we could build a more friendly environment for educational purposes.

82 Bibliography

[1] Business rules community, http://www.brcommunity.com

[2] Codebrag http://codebrag.com/index.html

[3] Van der Aalst, W.M., Weske, M., Grünbauer, D.: Case handling: a new paradigm for business process support. Data & Knowledge Engineering 53(2), 129–162 (2005)

[4] Swenson et. al., K.: Taming the Unpredictable Real World Adap- tive Case Management: Case Studies and Practical Guidance. Future Strategies Inc (2011)

[5] Alonso, G., Casati, F., Kuno, H., Machiraju, V.: Web Services: Con- cepts, Architectures and Applications. Springer Publishing Company, Incorporated (2010)

[6] Amundsen, M.: Collection+ json (2013)

[7] Apiary: Apiary, https:/Apiary.IO

[8] Asana: Human task management, https://asana.com/guide/

[9] Atlassian: Confluence, https://confluence.atlassian.com

83 [10] Báez, M., Parra, C., Casati, F., Marchese, M., Daniel, F., di Meo, K., Zobele, S., Menapace, C., Valeri, B.: Gelee: Cooperative lifecycle man- agement for (composite) artifacts. In: Service-Oriented Computing, pp. 645–646. Springer (2009)

[11] Banker, K.: MongoDB in action. Manning Publications Co. (2011)

[12] Barukh, M., Benatallah, B.: ServiceBase: A Programming Knowledge- Base for Service Oriented Development. Database Systems for Ad- vanced Applications pp. 123–138 (2013), http://link.springer. com/chapter/10.1007/978-3-642-37450-0\_9

[13] Barukh, M.C., Benatallah, B.: Servicebase: A programming knowledge-base for service oriented development. In: Database Sys- tems for Advanced Applications. pp. 123–138. Springer (2013)

[14] Barukh, M.C., Benatallah, B.: A toolkit for simplified web-services programming. In: Web Information Systems Engineering–WISE 2013, pp. 515–518. Springer (2013)

[15] Barukh, M.C., Benatallah, B.: Processbase: A hybrid process manage- ment platform. In: Service-Oriented Computing, pp. 16–31. Springer (2014)

[16] Barukh, M.C., Benatallah, B.: ProcessBase: {A} Hybrid Process Man- agement Platform. In: Franch, X., Ghose, A.K., Lewis, G.A., Bhiri, S. (eds.) Service-Oriented Computing - 12th International Conference, {ICSOC} 2014, Paris, France, November 3-6, 2014. Proceedings. Lec- ture Notes in Computer Science, vol. 8831, pp. 16–31. Springer (2014), http://dx.doi.org/10.1007/978-3-662-45391-9\_2

84 [17] Beheshti, S.M.R.: Organizing, Querying, and Analyzing Ad-hoc Pro- cesses? Data. Ph.D. thesis, UNIVERSITY OF NEW SOUTH WALES SYDNEY (2012)

[18] Beheshti, S.M.R., Benatallah, B., Motahari-Nezhad, H.: Scalable graph-based olap analytics over process execution data. Distributed and Parallel Databases pp. 1–45 (2015), http://dx.doi.org/10. 1007/s10619-014-7171-9

[19] Beheshti, S., Benatallah, B., Nezhad, H.R.M.: Enabling the analy- sis of cross-cutting aspects in ad-hoc processes. In: Advanced Infor- mation Systems Engineering - 25th International Conference, CAiSE 2013, Valencia, Spain, June 17-21, 2013. Proceedings. pp. 51–67 (2013), http://dx.doi.org/10.1007/978-3-642-38709-8_4

[20] Beheshti, S., Benatallah, B., Nezhad, H.R.M., Sakr, S.: A query lan- guage for analyzing business processes execution. In: Business Process Management - 9th International Conference, BPM 2011, Clermont- Ferrand, France, August 30 - September 2, 2011. Proceedings. pp. 281– 297 (2011), http://dx.doi.org/10.1007/978-3-642-23059-2_22

[21] Bhattacharya, K., Caswell, N.S., Kumaran, S., Nigam, A., Wu, F.Y.: Artifact-centered operational modeling: Lessons from customer en- gagements. IBM Systems Journal 46(4), 703–721 (2007)

[22] Blueprint, A.: Api blueprint, http://apiblueprint.org/

[23] Böhm, A., Kanne, C.C., Moerkotte, G.: Demaq: A foundation for declarative xml message processing. arXiv preprint cs/0612114 (2006)

[24] Böhm, A., Marth, E., Kanne, C.C.: The demaq system: declarative development of distributed applications. In: Proceedings of the 2008

85 ACM SIGMOD international conference on Management of data. pp. 1311–1314. ACM (2008)

[25] Böhringer, M.: Emergent case management for ad-hoc processes: A solution based on microblogging and activity streams. In: Business Process Management Workshops. pp. 384–395. Springer (2011)

[26] BonitaSoft: Social bpm: Bonitasoft, http://documentation. bonitasoft.com/5x/social-bpm

[27] BPML, B.: Business process modeling language 1.0. Business Process Management Initiative (2001)

[28] Bry, F., Eckert, M., Patrânjan, P.L.: Xchange: A reactive, rule-based language for the web (2006)

[29] byteMyCode: bytemycode, http://www.bytemycode.com/

[30] Cantara, M.: User survey analysis: Soa, web services and web 2.0 user adop- tion trends and recommendations for software vendors, north america and europe, 2005-2006 (January 2007), http://www.gartner. com

[31] Castellanos, M., Alves de Medeiros, K., Mendling, J., Weber, B., Weit- jers, A.: Business process intelligence. Handbook of Research on Busi- ness Process Modeling pp. 456–480 (2009)

[32] Christensen, E., Curbera, F., Meredith, G., Weerawarana, S., et al.: Web services description language (wsdl) 1.1 (2001)

[33] Cohn, D., Hull, R.: Business artifacts: A data-centric approach to mod- eling business operations and processes. IEEE Data Eng. Bull. 32(3), 3–9 (2009)

86 [34] Cypher, Allen and Dontcheva, Mira and Lau, Tessa and Nichols, J.: No code required: giving users tools to transform the web. Morgan Kaufmann (2010)

[35] Duck, B.: Open hub, http://code.openhub.net/

[36] Exchange., E.: Experts exchange - the network for technology profes- sionals, http://www.experts-exchange.com/

[37] Exchange, S.: Stack exchange., http://cs.stackexchange.com,

[38] Fahland, D., Prüfer, R.: Data and abstraction for scenario-based mod- eling with Petri nets. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7347 LNCS, 168–187 (2012)

[39] Fischer, L.: Social bpm. Future Strategies Inc.(May 2011) (2011)

[40] Fitzpatrick, B., Slatkin, B., Atkins, M.: Pubsubhubbub core 0.3 (2011)

[41] Fitzpatrick, B., Slatkin, B., Atkins, M.: Pubsubhubbub core 0.3– working draft. Project Hosting on Google Code (2010)

[42] Frakes, W.B., Kang, K.: Software reuse research: Status and future. IEEE transactions on Software Engineering (7), 529–536 (2005)

[43] Fraternali, P., Brambilla, M., Vaca, C.: A model-driven approach to social bpm applications. Social BPM. Future Strategies Inc.(May 2011) (2011)

[44] Furuhashi, S.: Messagepack (2014)

87 [45] Geambasu, R., Cheung, C., Moshchuk, A., Gribble, S.D., Levy, H.M.: Organizing and sharing distributed personal web-service data. In: Pro- ceedings of the 17th international conference on World Wide Web. pp. 755–764. ACM (2008)

[46] Giza, M.: Soa trends: From microservices to appdev, what to expect in 2015, http://searchsoa.techtarget.com/feature/ SOA-trends-From-microservices-to-appdev-what-to-expect-in-2015

[47] Google: Google code, http://code.google.com

[48] Greenberg, S., Voida, S., Stehr, N., Tee, K.: Artifacts as instant mes- saging buddies. In: System Sciences (HICSS), 2010 43rd Hawaii Inter- national Conference on. pp. 1–10. IEEE (2010)

[49] Group, R.: Raml - restful api modeling language, http://raml.org

[50] Hadley, M.J.: Web application description language (wadl) (2006)

[51] Halevy, A.Y., Ashish, N., Bitton, D., Carey, M., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: suc- cesses, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. pp. 778– 787. ACM (2005)

[52] Hamalainen, H.: Html5: Websockets

[53] Hintjens, P.: ZeroMQ: Messaging for Many Applications. " O’Reilly Media, Inc." (2013)

[54] Holz, H., Rostanin, O., Dengel, A., Suzuki, T., Maeda, K., Kanasaki, K.: Task-based process know-how reuse and proactive information de- livery in tasknavigator. In: Proceedings of the 15th ACM international

88 conference on Information and knowledge management. pp. 522–531. ACM (2006)

[55] IBM: Ibm blueworks, http://www.blueworkslive.com

[56] IFTTT: Ifttt - put the internet to work for you, https://ifttt.com

[57] Inc., Q.: Software quality services and test tools from qualitylogic (2001), https://www.qualitylogic.com/

[58] JBoss: jbpm (process/workflow), http://www.jboss.org/jbpm

[59] JBossDrools: Drools expert user guide, version 5.5, http://docs. jboss.org/drools/release/5.5.0.Final/drools-expert-docs/ html_single/index.html

[60] Kaan, K., Reijers, H.a., Molen, P.V.D.: Introducing Case Management : Opening Workflow Management âĂŹ s Black Box. S. Dustdar, J.L. Fiadeiro, and A. Sheth (Eds.): BPM 2006, LNCS 4102, 358–367 (2006)

[61] Kelly, M.: Json hypertext application language (2013)

[62] Kim, D.J., Agrawal, M., Jayaraman, B., Rao, H.R.: A comparison of b2b e-service solutions. Communications of the ACM 46(12), 317–324 (2003)

[63] Klemisch, K., Weber, I., Benatallah, B.: Context-aware UI Component Reuse. In: CAiSE’13: 25th International Conference on Advanced In- formation Systems Engineering. pp. 68–83 (2013)

[64] Krugle: Aragon consulting group., http://www.krugle.com/

[65] LiquidPlanner: Liquidplanner - online project management software, http://www.liquidplanner.com/

89 [66] Marmel, E.: Microsoft Project 2007 Bible, vol. 767. John Wiley & Sons (2011)

[67] Mashery: Mashery: Api management and api strategy services, https: //mashery.com

[68] McLarty, M.: A business perspective on apis, http://www.infoq.com/ articles/web-apis-business-perspective

[69] McMillan, C.: Searching, selecting, and synthesizing source code. In: Proceedings of the 33rd International Conference on Software Engi- neering. pp. 1124–1125. ACM (2011)

[70] McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exem- plar: A source code search engine for finding highly relevant applica- tions. Software Engineering, IEEE Transactions on 38(5), 1069–1087 (2012)

[71] Medjahed, B., Benatallah, B., Bouguettaya, A., Ngu, A.H., Elma- garmid, A.K.: Business-to-business interactions: issues and enabling technologies. The VLDB Journal—The International Journal on Very Large Data Bases 12(1), 59–85 (2003)

[72] Meyer, a., Herzberg, N., Weske, M., Puhlmann, F.: Implementation Framework for Production Case Management: Modeling and Execu- tion. Frpu.De http://www.frpu.de/pdf/edoc2014.pdf

[73] Morin, D.: Announcing facebook connect. online]. Facebook, May 9 (2008)

90 [74] Motahari-Nezhad, H.R., Swenson, K.D.: Adaptive case management: Overview and research challenges. In: Business Informatics (CBI), 2013 IEEE 15th Conference on. pp. 264–269. IEEE (2013)

[75] Motahari-Nezhad, H.R., Swenson, K.D.: Adaptive Case Management: Overview and Research Challenges. 2013 IEEE 15th Conference on Business Informatics pp. 264–269 (2013), http://ieeexplore.ieee. org/lpdocs/epic03/wrapper.htm?arnumber=6642886

[76] Nigam, A., Caswell, N.S.: Business artifacts: An approach to opera- tional specification. IBM Systems Journal 42(3), 428–445 (2003)

[77] Nolan, D., Lang, D.T.: Authentication for web services via oauth. In: XML and Web Technologies for Data Sciences with R, pp. 441–461. Springer (2014)

[78] OASIS: Web services business process execution language version 2.0, http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf

[79] OAuth: The oauth authorization framework, http://oauth.net/

[80] ObjectManagementGroup(OMG): Business process model and nota- tion, http://www.bpmn.org/

[81] ObjectManagementGroup(OMG): Semantics of business vocabulary and business rules (sbvr), http://www.omg.org/spec/SBVR/1.2

[82] Olding, E.: Social bpm: Getting to doing. Tech. rep., Technical Report (2011)

[83] Overflow, S.: Stack exchange, http://stackoverflow.com,

91 [84] Overflow., S.: What is reputation? how do i earn it and lose it?, https://stackoverflow.com/help/whats-reputation

[85] Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. big web services: making the right architectural decision. In: Pro- ceedings of the 17th international conference on World Wide Web. pp. 805–814. ACM (2008)

[86] Pivotal-Cloud-Foundry: http://pivotal.io/ platform-as-a-service/pivotal-cloud-foundry

[87] Recordon, D., Fitzpatrick, B.: Openid authentication 1.1. Finalized OpenID Specification, May (2006)

[88] Reiss, S.P.: Semantics-based code search. In: Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on. pp. 243–253. IEEE (2009)

[89] Reverb-Technologies-Inc.: Swagger - a simple, open standard for de- scribing rest apis with json, https://helloreverb.com/developers/ swagger

[90] Richardson, C.: Microservices: Decomposing applications for deployability and scalability, www.infoq.com/articles/ microservices-intro

[91] RuleCore: Complex event processing server, http://www.rulecore. com/

[92] Runscope: Api testing and monitoring, https://www.runscope.com/

[93] Salesforce: Chatter, https://www.salesforce.com/au/chatter/ overview/

92 [94] Schema, J.: Json schema and hyper-schema (2013)

[95] SoapUI: http://www.soapui.org/

[96] SocialText-Inc.: Socialtext, http://www.socialtext.com

[97] Software, J.: Producteev - task management for teams, https://www. producteev.com/

[98] Sondow, J.: asagard: Web-based cloud management and deployment. the netflix tech blog (2012)

[99] StamPlay: Stamplay | connect. automate. invent. stamplay | con- nect. automate. invent. stamplay - connect, automate, invent, https: //stamplay.com

[100] Swiber, K.: Siren: a hypermedia specification for representing entities (2013), https://github.com/kevinswiber/siren

[101] TechCrunch: Apis fuel the software that’s eat- ing the world, http://techcrunch.com/2015/05/06/ apis-fuel-the-software-thats-eating-the-world/

[102] TechCrunch: Techcrunch "facebook and the sudden wake up about the api economy", http://techcrunch.com

[103] Thatte, S.: Xlang: Web services for business process design. Microsoft Corporation 2001 (2001)

[104] Tracky: Tracky - a social collaboration tool, https://tracky.com/

[105] Van Der Aalst, W.M., Ter Hofstede, A.H.: Yawl: yet another workflow language. Information systems 30(4), 245–275 (2005)

93 [106] Van Der Aalst, W.M., Ter Hofstede, A.H., Weske, M.: Business process management: A survey. In: Business process management, pp. 1–12. Springer (2003)

[107] Videla, A., Williams, J.J.: RabbitMQ in action. Manning (2012)

[108] Weber, I., Benatallah, B., Paik, H.Y.: Form-based Web Service Com- position for Domain Experts V(212), 30 (2010), http://arxiv.org/ abs/1005.3014

[109] WorkflowManagementCoalition(WfMC): Xml process definition lan- guage (xpdl), http://www.xpdl.org/

[110] Zabel, I.: Level up your development workflow with GitHub & Pivotal Tracker (2012), http://pivotallabs.com/ level-up-your-development-workflow-with-github-pivotal-tracker/

[111] Zelkowitz, M.V., Wallace, D.R.: Experimental models for validating technology. Computer 31(5), 23–31 (1998)

[112] Ziegler, P., Dittrich, K.R.: Three decades of data intecration—all prob- lems solved? In: Building the Information Society, pp. 3–12. Springer (2004)

94