Discovering and Using Functions via Semantic Querying

Lander Noterman

Supervisor: Prof. dr. ir. Ruben Verborgh Counsellors: Ben De Meester, Anastasia Dimou

Master's dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering

Department of Electronics and Information Systems Chair: Prof. dr. ir. Koen De Bosschere Faculty of Engineering and Architecture Academic year 2017-2018

Discovering and Using Functions via Semantic Querying

Lander Noterman

Supervisor: Prof. dr. ir. Ruben Verborgh Counsellors: Ben De Meester, Anastasia Dimou

Master's dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering

Department of Electronics and Information Systems Chair: Prof. dr. ir. Koen De Bosschere Faculty of Engineering and Architecture Academic year 2017-2018 Preface

The author(s) gives (give) permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the copyright terms have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation. June 1, 2018

Word of thanks

Firstly, I would like to thank Ben De Meester and Anastasia Dimou for acting as counsellors for this work. Ben’s supervision especially was tremendously helpful during the process of writing this thesis. His knowledge about the subject helped me learn a lot about the and its technologies, and his feedback and advice was immensely valuable for the successful completion of this work. I would also like to thank Prof. dr. ir. Ruben Verborgh for being the supervisor for this dissertation. His lessons introduced me to the Semantic Web and its powerful capabilities. Finally, I would like to thank my friends, my family, and especially my girlfriend for supporting me through the course of my studies and during the making of this dissertation. Their continued support helped me to stay focussed and successfully complete even the more challenging parts of the past five years.

i Discovering and Using Functions via Semantic Querying

by Lander Noterman

Supervisor: Prof. dr. ir. Ruben Verborgh Counsellors: Ben De Meester, Anastasia Dimou

Master’s dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering

Department of Electronics and Information Systems Chair: Prof. dr. ir. Koen De Bosschere Faculty of Engineering and Architecture Academic year 2017-2018

Abstract

On today’s web, functions are available in the form of code snippets, packages and Web APIs. How- ever, existing solutions for searching functions on the web lack the ability to search these functions by type signature, hence keyword search is used instead, which is imprecise. Package managers partially automate the process of acquiring functions, however, they do not automate the invocation of them. This work focusses on the problem of querying and automated usage of functions. Using Linked Data and Semantic Web technologies, we created the FunctionHub: a system for searching for functions on the web and invoking them using a uniform interface. In this system, functions and implementations are semantically described using RDF, hence they can be queried using SPARQL. Our evaluation demonstrates four improvements that result from linking semantic descriptions of functions to those of implementations: (i) more precise search abilities, like searching by type signa- ture, (ii) automated invocation of implementations, (iii) linking descriptions of functions and imple- mentations enables abstracted function processing, this system can be used for at-runtime discovery and invocation of functions and inference of knowledge from RDF data, (iv) redundancy of imple- mentations avoids a single point of failure. Finally, abstracted function processing brings us closer to a future where intelligent agents can not only understand the data on the Semantic Web, but act upon it using these functions. Keywords — Semantic Web, Linked Data, automated function processing, semantic querying

ii Discovering and Using Functions via Semantic Querying Lander Noterman

Supervisor: Prof. dr. ir. Ruben Verborgh Counsellors: Ben De Meester, Anastasia Dimou

Abstract—On today’s web, functions are available in the processable: available functions do not have a se- form of code snippets, packages and Web APIs. However, mantic description, hence, they cannot be discovered existing solutions for searching functions on the web lack and invoked automatically. (iv) Functions are stored precise methods of finding a desired function. Acquiring and provided from a centralized location, this might functions is partially automated by package managers, however, they do not automate the invocation of them. cause problems, as there is a single point of failure. This work focusses on the problem of querying and While solving these problems would improve automated usage of functions. Using Linked Data and searching for and using functions on the web for Semantic Web technologies, we created the FunctionHub: developers, machines could benefit even more from a system for searching for functions on the web and these improvements. On the Semantic Web, in- invoking them using a uniform interface. In this system, telligent agents are envisioned to be able to use functions and implementations are semantically described using RDF, hence they can be queried using SPARQL. the web to accomplish tasks [1]. Such tasks can Our evaluation demonstrates several improvements that be accomplished using functions, hence this works result from linking semantic descriptions of functions aims to enable machines to use them. to those of implementations, like more precise search abilities, abstracted function processing and redundancy of implementations. Finally, abstracted function processing II.RELATED WORK brings us closer to a future where intelligent agents can Apart from the main technologies of the Semantic not only understand the data on the Semantic Web, but act upon it using these functions. Web, some specific technologies are used in this work. Additionally, code search engines and pack- age managers offer functionality that is sought to be I.INTRODUCTION improved with this work. With the advent of the , reusing code in software projects became much easier. Developers can download libraries, find code A. Describing and instantiating NPM modules in GitHub1 repositories, download packages from 2 For finding and using NPM modules, this NPM etc. However, this approach cannot yet be work makes use of Linked Software Dependen- fully automated: finding code requires searching for cies (LS(D)). LS(D) presents two technologies: the it on the web and using libraries or packages from Object-Oriented Components ontology and Compo- package managers requires reading documentation. nents.js [2]. These technologies are used to describe We identified four main problems with the current and instantiate NPM modules respectively. situation of searching for and using functions on the web: (i) Searching for functions is mostly keyword- based, which is an imprecise way of finding func- B. Describing Web APIs tions. (ii) Using functions is environment specific: Hydra is an ontology to semantically describe functions should be implemented in the correct pro- REST web services. It describes the operations that gramming language. (iii) Functions are not machine can be executed using the web service in a semantic 1http://www.github.com way, hence automating the usage of the service 2http://www.npmjs.com becomes possible [3].

iii C. Describing abstract functions 1) Describing functions with Linked Data en- A number of ontologies enable describing func- ables search capabilities beyond keyword tions, however, these descriptions are closely related search. to their implementation (e.g., Object-Oriented Com- 2) Linking abstract function descriptions with ponents describes NPM modules, Hydra describes specific implementations enables the use of REST Web APIs). The Function Ontology (FnO) a uniform interface to invoke functions. is an ontology to describe abstract functions, the 3) Using Linked Data to describe functions, in- descriptions do not contain details regarding the telligent agents can make use of them to implementation, hence they can describe functions enable automated processing of information. in any implementation [4]. 4) Following the Linked Data principles enables distribution of storage and responsibility. The utility of these first three hypotheses can D. Code search engines be demonstrated using some specific use cases. A Code search engines allow users to search number of use cases are formulated to clarify the through source code. In this work, our solution hypotheses and enable evaluation of this work. is compared to searchcode.com and GitHub code search. Both of these tools offer a web interface for 1) A search engine for functions in which we searching through source code. cannot only search by keywords, but also by type signature. Additionally, this search engine not only enables finding code, but also E. Package managers web services implementing this functionality. Package managers are used to find and download 2) A package manager that allows us to find functionality in the form of software packages avail- functions using type signature and is not lim- able on the web. Examples of package managers are ited to one programming language. NPM, Maven and pip. Searching for these packages 3) A program that can easily switch the imple- can be done through a web interface and/or through mentation that is used to perform a function a CLI tool, however, only using keywords. More- at runtime, without knowing this function over, they do not offer functionality to automate the beforehand. More specifically, this application invocation of these packages. can switch between an on-device implemen- tation and a cloud implementation depending on which is the fastest in the current circum- III.HYPOTHESESANDUSECASES stances. We identified problems with the current situation 4) A program that infers knowledge from Linked of searching for and using functions on the web. Data documents. In this program, new knowl- Following hypothesis comprise the solution to these edge is derived from existing knowledge using problems: functions that are available on the web.

from the Function Ontology (FnO)

Function ... function parameterMapping ParameterMapping Mapping implementation Implementation

Position Property ParameterMapping ParameterMapping

JavaScript Java WebApi Implementation Implementation

Legend

subclass class JavaScript NpmPackage JavaClass JsonApi relation Function

Fig. 1: A visual representation of the FunctionHub ontology.

iv IV. APPROACH app. The server has a connection to an RDF To solve the problems and implement the use , which holds the function and implemen- cases discussed before, we designed the Function- tation descriptions. Using a server instead of letting Hub. The FunctionHub is used to find and use clients communicate with the RDF database di- functions through the use of semantic descriptions. rectly allows us to optimize the process of querying Functions (abstract entities describing what opera- database (e.g., by caching responses). It also avoids tion is performed on some type of data) and im- implementing functionality (like querying) in multi- plementations (a concrete realisation of a function, ple programming languages, as well as allowing us i.e. how a function is performed) are described to change underlying technologies without impact- semantically using Linked Data using the Function- ing the client libraries. The client libraries provides Hub ontology. Having these descriptions, SPARQL functionality to simplify a client’s interaction with can be used to query them to find functions and the server. The web app provides a search engine implementations matching certain specifications like for the available functions and implementations and type signature. Making these descriptions available a way to add them to the system. The clients and through JSON-LD allows applications to use these the web app can query the server for functions and descriptions to instantiate and execute the functions implementations using a query object. This query that are described. object specifies the constraints that a function needs to adhere to to be considered as a possible response to the query. For example, the query object might V. ONTOLOGY specify to return all functions with type signature We have previously discussed existing ontologies (float, float -> float). After the client that allow us to describe both functions (FnO) and or web app specifies this query, it is sent to the certain types of implementations (Object-Oriented server, which converts it into a SPARQL query. Components ontology, Hydra). These ontologies are This SPARQL query is used to query the RDF used to create the FunctionHub ontology (Figure 1), database for matching functions. From the results which enables linking the abstract descriptions of that this RDF database provides, the server gen- FnO to specific implementation descriptions like erates a JSON-LD document containing the de- those of the Object-Oriented Components ontology scriptions of matching functions and corresponding and Hydra. This is done through a Mapping class, implementations, together with the Mapping nodes which specifies how the parameters of the abstract connecting the two. This document is returned to function map onto the the parameters of the specific the requesting client. The web app requests this implementation. document to provide search results for users using the FunctionHub search engine. On the other hand, a VI.ARCHITECTURE client application would receive this document to be The FunctionHub system consists of three main able to instantiate the function described in it. The parts: the server, the client libraries, and the web client library facilitates this process by providing

if software impl.

Implementations

10. send implementation

1. build query 9. request implementation multiple times RDF Database

2. send query 3. request function 7. return descriptions Client 4. query by SPARQL Client Function Hub Server 8. instantiate implementation Library 6. return function + Function Descriptions 11. return instatiated function implementation descriptions

12. invoke implementation SPARQL endpoint 5. return partial function + implementation descriptions

Fig. 2: A schematic representation of the general architecture of the FunctionHub system.

v functions to, in the case of a code implementation, // Create a query for finding a "hello world" , function. → retrieve the implementation from the remote loca- Query q= new Query("hello world"); tion specified in the description and instantiate it // Retrieve the first matching function from the , server. (by e.g., compiling and loading the implementation). → If the implementation is a web service, the client Function fnc= fnServer.query(q)[0]; library acts as a proxy that accesses the web service // Get the first available mapping and instantiate , its implementation. → when the function is invoked. In this way, both code FuncInst inst= ImplHandler.instantiate(fnc, , fnc.mappings[0]); implementations and web services can be executed → using the same uniform interface. A visual represen- // Execute the function and print the result. tation of this architecture is provided in Figure 2. System.out.println(inst.executeFunction()); /* OUTPUT: Hello world! VII.EVALUATION */ Using a prototype implementation of the sys- Listing 1: Simplified code example demonstrating abstracted function tem described above, the derived hypotheses are processing. evaluated. This evaluation is objective-based: we proposed objectives in the form of hypotheses and now demonstrate if and how these objectives were Listing 1. As can be seen, no information about achieved [5]. Evaluation consists of implementing the implementation is specified. The implementation the proposed use cases using the FunctionHub producing the output could be of any type supported system for the first three hypotheses. The fourth by the Java library (currently: Java method or Web hypothesis is evaluated by demonstrating that the API), since the query does not specify an imple- distributed nature of this system avoids the single mentation type. point of failure problem. For the second hypothesis, we proposed a use case for a program that can dynamically switch im- A. Improved function search plementations based on the current circumstances. The first hypothesis states that we can make use Using the FunctionHub system, such a program of Linked Data to improve searching for functions was implemented as an experiment to determine on the web by enabling searching by type signature. if this could result in speed gains compared to Using the facilities provided by the FunctionHub, programs that do not dynamically switch imple- we created a search engine that improves upon mentations. The experiment consists of comparing existing search engines searchcode.com and GitHub the average execution times of a function that is Code Search by enabling searching on type signa- invoked multiple times, this function is implemented tures as well as keywords. In the same manner, im- as a web service as well as as a Java method. provements over package managers NPM, pip and The Java method is slower by a factor of 10. Maven are made. A proof-of-concept of a package This scenario is realistic for functions that require manager was created that provides a CLI interface powerful cloud infrastructure to perform optimally, for finding functions that adhere to the specified e.g. accurate image recognition or OCR (Optical type signature and contain specified keywords in Character Recognition). In ideal circumstances, the their name or description. Additionally, this package web service would thus offer the fastest execution manager provides functionality to download imple- times. However, circumstances are not always ideal: mentations to disk in order to be used in software the cloud infrastructure could be under high load, projects. or the user’s network could be slow or unreliable. In our experiment, this situation is simulated by an increasing execution time for the web service B. Abstracted function processing implementation, while the execution time of the The FunctionHub system enables abstracted func- on-device Java implementation remains constant. tion processing, i.e. functions from the web can be The execution times necessary for these situations invoked without having knowledge of the specifics are recorded, as well as the execution times of of the implementation. This is demonstrated in our program, which dynamically switches between

vi these implementations depending on the execution web, and multiple implementations can be assigned speed of the web service. Figure 3 visualizes the to one function. Because of this, redundancy in implementations can be achieved: when applica- 14 tions depend on a FunctionHub function, multi-

12 ple implementations can be used to execute this function. This can be useful in the case that an 10 implementation becomes unavailable, e.g. when a

8 web service goes down or when a package breaks due to changes. In this case, another implementation 6 that is assigned to the same function can be used 4 Execution(seconds)time instead, hence there is no disruption for the devel-

2 oper. Existing systems lack this functionality, hence a breaking implementation can disrupt the workflow 0 0 1 2 3 4 5 6 7 8 9 10 11 12 of other developers that use this implementation in Web Service latency (seconds) their projects. An example of such a case is when a

Java Method Web Service Hybrid developer unpublished some popular packages from NPM, resulting in a large amount of breaking pack- Fig. 3: Comparison of execution times between a Java implementa- ages [6], [7]. Implementation redundancy avoids tion, an increasingly slow web service, and a hybrid approach that uses the fastest implementation. such problems. results of this experiment. As can be seen, The web service is considerably faster than the Java VIII.CONCLUSION implementation if no slowdown is experienced by the web service. However, once the web service In this dissertation, we provided a solution for experiences a slowdown greater than about 5 sec- improving the current situation of discovering and onds, the Java implementation becomes the fastest. using functions on the web using Semantic Web At this point, our dynamically switching program technologies. Creating the FunctionHub server and switches its implementation from the web service client libraries, we reached the goal of abstracted to the Java method. On the figure, this is evident and generalized execution of functions. The Func- by the “Hybrid” line following the implementation tionHub achieved these results through the use with the lowest execution time. This approach thus of Linked Data and existing Semantic Web tech- results in the lowest execution time. nologies, such as SPARQL and JSON-LD. We demonstrated that abstracted function processing is C. Inferring knowledge from Linked Data made possible by using Semantic Web technologies. The third hypothesis can be accepted by cre- This system has potential to offer advantages to ating a program that allows inferring knowledge developers and end users, as well as the Semantic from Linked Data using the FunctionHub. This Web in general. Developers can make use of the program accepts a file containing several RDF system for reusing existing implementations. End nodes of the same type. If these nodes contain users might see speed improvements by dynamically missing values, the program is able to query the switching implementations depending on the current FunctionHub server for functions that are able to environment. Finally, abstracted function processing infer these missing values from the available val- brings us closer to a future where intelligent agents ues. A screencast demonstrating this functionality can not only understand the data on the Semantic can be found at http://users.ugent.be/~lnoterma/ Web, but act upon it using these functions. discovering-using-functions/inferencer.mp4.

D. Distribution and redundancy of implementations IX.FUTURE WORK In the FunctionHub system, implementations are not saved on the FunctionHub server itself. Im- A number of improvements can be made to make plementations can thus be from anywhere on the the current system more reliable.

vii a) Validation: The descriptions and imple- [10] M. Lefrançois and A. Zimmermann, “Supporting Arbitrary mentations are not checked for correctness when Custom Datatypes in RDF and SPARQL,” in The Semantic Web. Latest Advances and New Domains, ser. Lecture these are added to the system. Incorrect descrip- Notes in Computer Science. Springer, Cham, pp. 371–386. tions and implementations prevent the system from [Online]. Available: https://link.springer.com/chapter/10.1007/ working correctly, hence, validation should be done 978-3-319-34129-3_23 on these to make the system more reliable. A tech- nology that might be used to guarantee correctness of the descriptions is SHACL, a Semantic Web technology for validating graphs [8]. b) Distributed descriptions: Descriptions are accumulated in a single server, this might cause problems related to scalability and results in a single point of failure. In the future, these descriptions, like the implementations, might be distributed on the web. Technologies for federated queries, like DARQ [9], can be used to accomplish this task. c) Complex data types: The current system provides support for primitive datatypes like int, float and string. An improved system might support arbitrarily complex datatypes as inputs and outputs for the functions. This would enable more flexibility in the type of functions and implemen- tations that can be added and used in the system. Existing technology that could be used for this purpose is presented in [10].

REFERENCES [1] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” vol. 284, no. 5, pp. 28–37. [2] R. Taelman, J. Van Herwegen, S. Capadisli, and R. Verborgh, “Reproducible software experiments through semantic configurations.” [Online]. Available: https://linkedsoftwaredependencies.org/articles/reproducibility/ [3] M. Lanthaler and C. Guetl, “Hydra: A Vocabulary for Hypermedia-Driven Web APIs,” vol. 996. [4] B. De Meester, A. Dimou, R. Verborgh, and E. Mannens, “An ontology to semantically declare and describe functions,” in International Semantic Web Conference. Springer, pp. 46–49. [Online]. Available: http://link.springer.com/chapter/10. 1007/978-3-319-47602-5_10 [5] D. Stufflebeam, “Evaluation Models,” vol. 2001, no. 89, pp. 7–98. [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/ev.3 [6] Azer Koçulu. I’ve Just Liberated My Modules. [Online]. Avail- able: http://azer.bike/journal/i-ve-just-liberated-my-modules [7] Isaac Z. Schlueter. Kik, left-pad, and npm. [On- line]. Available: https://blog.npmjs.org/post/141577284765/ kik-left-pad-and-npm [8] Holger Knublauch and Dimitris Kontokostas, “Shapes Constraint Language (SHACL).” [Online]. Available: https://www.w3.org/TR/shacl/ [9] B. Quilitz and U. Leser, “Querying Distributed RDF Data Sources with SPARQL,” in The Semantic Web: Research and Applications, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 524–538. [Online]. Available: https://link.springer.com/chapter/10.1007/ 978-3-540-68234-9_39

viii Contents

1 Introduction 1 1.1 Problem Statement ...... 2 1.2 Contributions ...... 2 1.2.1 The FunctionHub ontology ...... 3 1.2.2 FunctionHub server ...... 3 1.2.3 Client Libraries ...... 3 1.3 Outline ...... 3

2 Related works 4 2.1 Semantic Web ...... 4 2.2 Semantic Web and Web APIs ...... 6 2.3 Software projects and the Semantic Web ...... 7 2.3.1 Data transformation in RDF ...... 7 2.4 Code Sharing and Reuse ...... 8 2.5 Semantic code search ...... 9 2.6 Discussion ...... 10

3 Research Questions and Hypotheses 11

4 Approach 13 4.1 Using Semantic Web technologies ...... 13 4.1.1 A common vocabulary ...... 13 4.1.2 Highly descriptive queries ...... 13 4.1.3 Return values ...... 14 4.2 Building upon existing ontologies ...... 14 4.2.1 FnO ...... 14 4.2.2 JSON-LD ...... 14 4.2.3 Linked Software Dependencies ...... 15 4.3 Research Goals ...... 15

5 Use Cases 16 5.1 Function search engine ...... 16 5.2 Package Manager ...... 16 5.3 Provide a context-dependent solution ...... 17 5.4 Interactive Linked Data processing ...... 17

6 FunctionHub ontology 19 6.1 Comparison to existing technologies ...... 19

ix 6.2 Classes ...... 20 6.2.1 Function ...... 20 6.2.2 Implementation ...... 21 6.2.3 Mapping ...... 21 6.2.4 ParameterMapping ...... 22 6.2.5 Summary ...... 22 6.3 Example ...... 22

7 Architecture 25 7.1 General overview ...... 25 7.2 FunctionHub server ...... 26 7.2.1 Web API entrypoint ...... 27 7.2.2 FunctionHub processor ...... 27 7.2.3 SPARQL processor ...... 27 7.2.4 RDF database ...... 27 7.3 Client Libraries ...... 27 7.3.1 Client application ...... 27 7.3.2 Client library ...... 28

8 Implementation 29 8.1 Server ...... 29 8.1.1 Technologies ...... 29 8.1.2 Handling incoming queries ...... 30 8.1.3 Querying the RDF database ...... 31 8.1.4 Providing JSON-LD function descriptions ...... 31 8.1.5 Providing JSON-LD mappings and implementations ...... 31 8.1.6 Combining results and returning to client ...... 32 8.1.7 Additional functionality ...... 32 8.2 Client Libraries ...... 32 8.2.1 JavaScript/Node.js Library ...... 33 8.2.2 Java Library ...... 34 8.3 Web App ...... 35 8.3.1 Technologies ...... 35 8.3.2 Search Engine ...... 35 8.3.3 Adding functions and implementations ...... 36

9 Evaluation 39 9.1 Improving search with Linked Data ...... 39 9.1.1 Search Engine ...... 39 9.1.2 Package Manager ...... 41 9.1.3 Result ...... 42 9.2 Abstracted Function Processing ...... 43 9.2.1 Code example ...... 43 9.2.2 Context-dependent implementations ...... 43 9.2.3 Result ...... 46 9.3 Solving Linked Data problems ...... 46

x 9.3.1 Queries ...... 46 9.3.2 Linked Data inferencer ...... 47 9.3.3 Result ...... 48 9.4 Distribution of storage and responsibility ...... 48 9.4.1 Implementation redundancy ...... 48 9.4.2 Result ...... 51 9.5 Discussion ...... 51

10 Conclusion 52 10.1 Conclusion ...... 52 10.2 Future Work ...... 53 10.2.1 Versions ...... 53 10.2.2 Validation ...... 53 10.2.3 Complex data types ...... 53 10.2.4 Distributed descriptions ...... 54 10.2.5 Security ...... 54 10.2.6 Non-exact matches ...... 54

Bibliography 55

A FunctionHub Ontology Reference 58 A.1 Function, Parameter and Output ...... 58 A.2 Problem ...... 58 A.3 Implementation ...... 58 A.3.1 JavaClass ...... 58 A.3.2 JavaScriptFunction ...... 58 A.3.3 NpmPackage ...... 58 A.3.4 JsonApi ...... 59 A.4 Mapping ...... 59 A.4.1 JavaClassMapping ...... 59 A.4.2 JavaScriptFunctionMapping ...... 59 A.4.3 NpmPackageMapping ...... 59 A.4.4 JsonApiMapping ...... 59 A.5 ParameterMapping ...... 59 A.5.1 PositionParameterMapping ...... 60 A.5.2 PropertyParameterMapping ...... 60

xi Acronyms

API Application Programming Interface. 2, 13

CLI Command-Line Interface. 1, 16

DOAP Description of a Project. 7, 19

FnO Function Ontology. 7, 8, 10, 14, 19, 20, 30, 58

LS(D) Linked Software Dependencies. 7

N3 . 4, 6

OCR Optical Character Recognition. 17

OWL . 5

RDF Resource Description Framework. 4, 5

RDFS RDF Schema. 5

REST Representational state transfer. 6, 7

RIF . 6

SHACL Shapes Constraint Language. 53

SOAP Simple Object Access Protocol. 6

SPARQL SPARQL Protocol And RDF Query Language. 5

Turtle Terse RDF Triple Language. 4

URI Uniform Resource Identifier. 4

VCS Version Control System. 8

WADL Web Application Description Language. 6

WSDL Web Service Description Language. 6

xii Chapter 1

Introduction

There exists an enormous amount of open source code on the web, this can be in the form of libraries, GitHub projects, code snippets etc. This makes it possible for developers to re-use existing code in their projects instead of writing the code themselves. However, finding the right code snippet or library is not always an easy task: the code must be in the correct programming language, be compatible with the version the developer is using (e.g., code for Java 8 does not always work in Java 7), use the correct data types etc. Even if all these variables are accounted for, a code snippet usually still has to be changed to follow the same style and/or as the rest of the project. In the case of a library, a developer has to install it, read the documentation and figure out how to best incorporate it into his project. An existing solution to this problem is a package manager. A package manager is able to down- load and install a library, usually stored in a central repository of packages, and provide an easy way to access it from code (usually by importing the package). While these package managers enable automatically retrieving and installing of these packages, they provide no structured information about how to use them. This is left for the developer to figure out from the description and the documentation, hence, automatically invoking the functions inside these packages is not possible. Another problem is finding the correct package in the first place: this is usually done by searching for keywords related to the desired functionality on the package manager’s website, Command-Line Interface (CLI) tool or on a general search engine. This should generally work well if the desired functionality is well defined, but can be cumbersome if related keywords do not provide the desired results. While the aforementioned problems are applicable to humans, they are even more applicable to machines. With the Semantic Web, Tim Berners-Lee envisioned a web that can be used by hu- mans and by machines, so called intelligent agents [1]. To allow intelligent agents to find and use functions, these need to be defined semantically and additionally be queryable by using semantic descriptions. For the remainder of this introduction, we will state the problems more formally, discuss the contributions this work makes to the state of the art, and give an outline for the remainder of this thesis.

1 1.1 Problem Statement

We identified a number of specific problems with the current situation of searching for and using functions available on the web:

Problem 1: searching for functions is mostly keyword-based In current code search engines, we are usually limited to searching by keywords. The ability to provide more context, like type signatures, could greatly improve the search results.

Problem 2: using functions is environment specific When searching for a function to use, it must be written in the correct programming language and be compatible with the used programming environment.

Problem 3: individual functions are not machine-processable While semantic information might be available about a software project, this is usually not the case for the individual functions in the project.

Problem 4: storing and providing functions from a centralized location Depending on func- tionality provided by a centralized system causes scalability issues and creates a single point of failure.

To provide a solution to these problems, this dissertation discusses the idea and implementation of the FunctionHub: a system that can find functions by providing it with a semantic description of the desired function, instead of keywords. This enables both humans and automated processes to find a suitable function and understand how to use it to complete their goal. Such a system could provide multiple implementations of the same function, allowing a user or machine to choose the most appropriate implementation for their use case. An implementation could be in any programming language or a Web Application Programming Interface (API) could be used to execute the function. This system should be open, so users can add their own functions and implementations to the FunctionHub, allowing others to discover and use these functions in their projects. Following the principles of the Semantic Web, function implementations should distributed on the , while the semantic descriptions of them are available in a FunctionHub server to provide an entry-point. This allows efficient querying of the functions while mitigating the problem of storing the implementations. By providing the descriptions in a standardized Linked Data format, copies can be made and provided by any entity that deems it necessary. This could help to prevent problems that arise when data is centralized in one location, e.g., low availability of the endpoint because of high load, or the problem of creating a single point of failure. The chaos caused by a single developer removing his modules from NPM1 in 2016 is a prime example of such a single point of failure [2].

1.2 Contributions

This section explores the different building blocks of the system: the FunctionHub ontology, the server and the client libraries. For each of these, we discuss how it relates to and possibly improves upon the state of the art.

1Node Package Manager (http://www.npmjs.com)

2 1.2.1 The FunctionHub ontology

To solve the problems we discussed in section 1.1, we need a new ontology for creating the descrip- tions that the FunctionHub system makes use of. While existing ontologies allow us to describe functions and web services, a new ontology was created to describe software implementations and to describe the relationships between the functions and their respective implementations.

1.2.2 FunctionHub server

The central part of the system is a server application, which has a connection to an RDF database. This application serves as an entrypoint to the function and implementation descriptions.

1.2.3 Client Libraries

While the server provides an endpoint for querying function and implementation descriptions, client libraries were created to facilitate the use of these descriptions in an effort to create working ap- plications. In the scope of this dissertation, two client libraries are provided that facilitate the interaction with the FunctionHub server. As a proof of concept, each of these provides a subset of the proposed features.

1.3 Outline

This section provides an overview of this dissertation. Chapter 2 provides a study of currently avail- able work relating to the problems discussed in this introduction. Some background and specific goals for this thesis are provided in chapter 4. In chapter 5, several use cases are introduced that both clarify the goal of the system and are used for evaluation. Chapter 6 discusses the ontology that was created for this thesis. Before discussing the implementation of our solution in chapter 8, chapter 7 depicts the architecture that was used. Chapter 9 provides an evaluation of our work and discusses opportunities for improvement. Finally, the dissertation is concluded in chapter 10, which reflects on the work that was done and discusses opportunities for future work.

3 Chapter 2

Related works

Because the FunctionHub is based on Linked Data and is designed to be used in the context of the Semantic Web, this chapter firstly provides a concise introduction to the Semantic Web and its technologies. Afterwards, we will explore the current state of the art of existing technologies that link the Semantic Web with Web APIs, software, and data transformation, as they aim to solve similar problems. Furthermore, because sharing and reuse of functions is the main goal of this dissertation, existing systems of code sharing and reuse are explored. Lastly, a discussion is provided where these existing technologies are related to our work.

2.1 Semantic Web

The Semantic Web is a form of Web content that can be interpreted by machines [1]. It is built upon the Linked Data principles. The most important aspect of Linked Data is that entities can be identified using Uniform Re- source Identifier (URI)s. By using URIs — more specifically HTTP URIs — entities can be derefer- enced. When dereferencing a URI, its result should be a document that describes the entity [3]. It is important that these documents follow standards, which are the core technologies of the Semantic Web. These are discussed here. Resource Description Framework (RDF) is the main component of Linked Data, it is a framework for representing information on the web [4]. The main idea is to define data as ‘triples’. As the name suggests, triples are composed of three components: a subject, a predicate and an object. Subjects and objects are ‘things’ that exist in the world, these are called resources, while properties denote a relationship between two resources. A predicate is thus a property. This means that a triple is a composition of two resources (the subject and the object) connected by a property (the predicate). By combining multiple triples, an RDF Graph is created [5]. Figure 2.1 depicts a simple example of an RDF graph. This graph contains 5 triples, hence there are 5 arrows (representing predicates) connecting the subjects and objects. The graph contains some information about Tim Berners-Lee and John Doe, and connects the two using a :knows predicate. RDF data is historically represented in XML [6], but other, more convenient formats to present RDF have been designed. These include Terse RDF Triple Language (), N-Triples and JSON- LD [7, 8, 9, 10]. They are short-hand formats to express RDF data in plain-text. There also exists supersets of RDF, like Notation3 (N3). Contrary to the XML representation, these formats tend to be less verbose and more readable for humans, as the XML format can grow rather quickly, even for

4 https://www.w3.org /People/Berners­Lee/ Tim Berners­Lee

foaf:name foaf:homepage

http://example.com /Tim_Berners_Lee

foaf:knows

foaf:name John Doe http://example.com foaf:givenName /John_Doe Johnathan

Figure 2.1: An example RDF graph. small data graphs. Of course, the Semantic Web needs more than a framework for storing data. There is a need to structure the data in such a way that computers can try to understand it. A technology that allows this for RDF is RDF Schema (RDFS). It is a vocabulary that is used for modelling in RDF data. It contains resources like Resource, Class and Literal, and properties like range, domain, type and subClassOf. Using this vocabulary allows to define groups and model relationships between types of data and define valid ranges and domains for properties. Web Ontology Language (OWL) is very similar to RDFS, but it is more expressive and allows for more restrictions on the data. Like with RDFS, classes can be defined and the relationships between these classes. However, OWL also allows to add restrictions to the data and adds even more logical relations between it. With these extra semantics, new knowledge can be inferred from the given data. An example of an OWL property is SymmetricProperty1. By assigning this property to e.g., the predicate isSiblingOf, a reasoner can infer that if A is a sibling of B, then B must also be a sibling of A. Most RDF data is stored in triplestores (also known as RDF ). To query these databases, the Semantic Web stack uses SPARQL Protocol And RDF Query Language (SPARQL). It is a query language, not unlike SQL, that is used to extract information from a triplestore. Syntactically, it somewhat resembles SQL because it has clauses like SELECT, WHERE and ORDER BY. However, SPARQL allows to describe the relationships between the data that is queried instead of describing which tables should be joined on what key as in SQL . To this end, a SPARQL query’s WHERE clause has a lot in common with the RDF representation formats. Listing 2.1 provides an example of a SPARQL query: when executed on a dataset containing the information of people, described using the foaf ontology, this query results in all the objects associated with the predicate foaf:name, which are the names of the people in the graph. Variables are denoted with a question mark before the variable name. In this example, there are two variables: person and name. These variables are used in a triple pattern inside the WHERE clause. From all

1https://www.w3.org/TR/owl-ref/#SymmetricProperty-def

5 1 PREFIX foaf:

2 SELECT ?name

3 WHERE{

4 ?person foaf:name ?name.

5 }

Listing 2.1: A simple SPARQL query example.

the triples matching this triple pattern, the value of the name variable is returned, as specified in the SELECT clause [11]. Rule Interchange Format (RIF) is used to combine rules from different sources on the web. The Semantic Web is distributed, which means that data is not stored in one location, it also means that anyone, anywhere is allowed to create a vocabulary that fits his needs. Because of this, it is possible that information about the same subject can be found in multiple places on the web, represented using different vocabularies. These two datasets might have common knowledge, but some knowledge can be in one dataset and not in the other. With RIF, rules can be written that define how one vocabulary’s structure is related to the other. For example, a rule can be written that defines that an actor playing a role in a movie in the IMBd dataset means that this actor is starring in that movie in the DBPedia dataset 2.

2.2 Semantic Web and Web APIs

In this section, some previous efforts that aim to describe web services and/or functions in a struc- tured way are discussed. We will first discuss WSDL and WADL, which describe web services syntac- tically. Afterwards, OWL-S, RESTdesc and Hydra are discussed, which are technologies to describe web services semantically, using the Semantic Web stack.

Web Service Description Language (WSDL) WSDL is a language (in XML) for describing Simple Object Access Protocol (SOAP) web services syntactically [12]. A WSDL document describes the available functions and their parameter types, return type etc. However, it does not have the ability to add semantic meaning to these descriptions, hence it can’t be used to reason about e.g., the results of a function call.

Web Application Description Language (WADL) WADL is similar to WSDL, but it is more fo- cused on Representational state transfer (REST)-based services instead of SOAP [13]. Functionality- wise, WSDL and WADL are equivalent.

OWL-S OWL-S is an ontology designed to describe Semantic Web Services3. Contrary to both WADL and WASDL, however, OWL-S is designed to describe these web services semantically in- stead of syntactically. It aims to provide an ontology that allows automatic Web service discovery, invocation, composition and interoperation [15]. Because OWL-S is designed to describe Semantic Web Services, it cannot be used to model an existing (e.g., REST) API.

2https://www.w3.org/TR/2013/NOTE-rif-primer-20130205/#A_Simple_Example_in_RIF 3A Semantic Web Service is a web service which includes markup to allow for automatic discovery, execution, composi- tion, and interoperation [14].

6 RESTdesc RESTdesc is a description format for hypermedia APIs using the REST architectural style. RESTdesc does not introduce an additional vocabulary, but instead favors the reuse of ex- isting vocabularies to represent the API and it’s hypermedia controls. RESTdesc descriptions are expressed in N3, a superset of RDF [16, 17].

Hydra Similarly to RESTdesc, Hydra is a vocabulary to describe and add hypermedia controls to a REST Web API. It is designed to combine the REST architectural style (which offers hypermedia controls) with the Linked Data principles. This enables state transitions to be modelled in a machine- readable format, allowing a higher degree of decoupling between the client and the server [18]. Unlike RESTdesc, Hydra does use its own vocabulary to represent the Web API.

2.3 Software projects and the Semantic Web

This section will highlight some works that use Linked Data to somehow simplify or streamline processes involving developing or analyzing software.

Description of a Project (DOAP) The DOAP ontology is an ontology that aims to provide a way to describe software projects semantically [19]. Its schema allows to define a project’s name, the programming language it was written in, supported operating systems and other information about a software project.

Linked Software Dependencies (LS(D)) LS(D) introduces two technologies: the Object-Oriented Components ontology and Components.js. These aim to provide researchers the tools to create reproducible software experiments by creating semantic configurations. Components.js is a de- pendency injection framework that uses configurations made with the Object-Oriented Compo- nents ontology to instantiate dependent components. By using a semantic configuration instead of hard-coded component instantiations, the software becomes highly modular, re-usable and repro- ducible [20].

Function Ontology (FnO) FnO is an ontology that is used to define functions semantically on the abstract level, i.e., independent of the technology used to implement it. It allows specifying general information about functions, such as their name and description, as well as describing their expected inputs and returned output [21].

CodeOntology CodeOntology provides a parser to analyze Java source code and represent it using RDF triples. This allows for automatically creating semantic descriptions of code and performing expressive queries over source code. By matching documentation comments to DBpedia entities, it becomes possible to search for methods performing a certain function. [22].

2.3.1 Data transformation in RDF

Most computing tasks involve somehow transforming some input data into output data. A number of works have introduced the concept of including data transformations in RDF. Since data transfor- mations are essentially functions, and thus relate to the goal of this thesis, some of these works are discussed here.

7 R2RML-F While R2RML is a standard for specifying mappings between a relational database and RDF, R2RML-F is an extension that allows to include transform functions in the R2RML map- pings so that no preprocessing is needed on the database. The transform functions are written in ECMAScript and the functions are discovered and executed at runtime during the mapping pro- cess [23].

VOLT VOLT mitigates the need to store redundant triples in RDF databases by acting as a proxy between the client and a SPARQL server. It computes, at query-time, derived triples from existing triples stored in the database. To accomplish this, VOLT uses instructions that are encoded in RDF using the VOLT vocabulary. A plugin system is available to perform more advanced derivations that might be difficult to encode in VOLT instructions. As derived data is calculated at runtime, VOLT can provide the provenance information to understand the origin of the data [24].

Declarative Data Transformations for Linked Data Generation In this paper by De Meester et al., FnO is used to describe data transformation functions. Because FnO is not implementation- specific, this approach allows the data transformations to be reusable across implementations. The decoupling of the mapping operation and the data transformation presents an additional advan- tage to developers, as these two systems can consequently be improved independent from each other [25].

2.4 Code Sharing and Reuse

The main idea for the FunctionHub system is to be a new form of a code sharing system. There are many existing code sharing solutions, the most important of which will be summarized as to explore the wide field of code sharing solutions. One of the most popular methods to share code is to host the code in an online repository. Usually, these repositories’ primary goal is to be an online Version Control System (VCS), that developers can use to keep track of changes and the history of a software project. One of the most well known examples of such a platform is GitHub4. Being one of the largest sources of code on the web, GitHub seems like the ideal platform for developers to find reusable code. GitHub has a search engine, which can be used to search through all public repositories on the platform. It is possible to search for wanted functions by using regular expressions or searching for function names, but one can not assume the found functions can be used independent of the whole project or if these actually perform the correct operations. Moreover, these functions often lack documentation and are usually only meant to be used in the project they are contained in. GitHub, and other VCS hosting platforms are thus not good candidates for finding reliable, reusable implementations. Another popular method to share code is through so-called packages. While some systems might choose to use other terminology, the basic idea of a package is to be a self-contained library that can easily be included in a software project and provides functionality that is reusable across projects, it is an implementation of component-based software engineering [26]. Some examples of platforms that provide packages are NPM5, pip6, Maven7 and Gradle8. These are called package managers

4http://www.github.com 5http://www.npm.com 6https://pypi.org/project/pip/ 7http://maven.apache.org 8http://gradle.org

8 and they assist software developers with finding and installing packages and their dependencies. These systems are a better fit for the problem at hand: their packages offer functionality that is meant to be used by others in their own projects and, because of this, the functions are usually well documented and the licensing allows reuse of the code. However, finding a suitable package for a specific use case can still be a challenge, e.g., searching for ‘euclidean distance’ on NPM presents the user with 18 packages. Most of them seem to offer the desired functionality, but one can only be certain after reading the documentation of the packages. Having found a package, it can be installed and used after learning how to use the package from the documentation. An alternative to these methods is searching for implementations using a general search engine like Google. More often than not, this method presents the user with several options, ranging from StackOverflow pages to NPM packages, GitHub repositories, and other sources of code. While these can provide the developer with a correct implementation, the code is usually not usable without some adjustments, reading comments on the posted code or even installing a library or package using a package manager. Lastly, specialized websites like searchcode.com9 allow users to search for code using simple keywords and filters. It indexes several well-known code hosting platforms. While websites like this allow developers to find code more easily than using a general search engine, it does not offer more advanced information about the code that would make it possible to reason about it, nor do they guarantee that the code can be used freely (due to licensing issues or dependencies).

2.5 Semantic code search

Searching for code based on a semantic description is not a new idea. Previous efforts can be found that provide a solution to this problem, however not in the context of the Semantic Web. Nevertheless, the insights these efforts provide can still be useful, which is why they are discussed here.

Semantic Code Browsing In a paper by García-Contreras et al. [27], a system for generating semantic descriptions from code and querying these descriptions for relevant methods is presented. While it does not use Semantic Web technologies to achieve this goal, it aims to solve a similar problem. It presents functions in an abstract way and allows querying with partial descriptions of the desired function. Since it analyses functions to generate descriptions, it can only be used for implementations of which the source code is available and written in a programming language for which an analyser is written.

Semantics-based code search Reiss [28] proposes a solution that searches for relevant methods based on user input. The solution involves transforming the code that is searched into an abstract syntax tree, which is then used to find the code the user is looking for. An important aspect of Reiss’ solution is that it not only searches for code, but it also transforms and possibly combines multiple code sections to conform to the specifications provided by the user. As the system cannot guarantee the correctness of this approach, developers are required to provide tests that are used to verify the correctness of the resulting code.

9http://www.searchcode.com

9 Hoogle Hoogle is a search engine for functions in the Haskell standard library. It allows searching by type signature and function name, using a web interface or a CLI. Hoogle accomplishes this by indexing the functions in the standard library and storing the results, containing the descriptions the search engine uses to provide the results. By creating Hoogle, an improved search experience was made for Haskell users searching for standard library functions. However, it is written solely for Haskell, so other programming languages can’t take advantage of its functionality [29].

2.6 Discussion

With OWL-S, RESTdesc and Hydra, the field of connecting the Semantic Web with Web APIs is well- explored. While OWL-S defines a new type of Web API, RESTdesc and Hydra are able to describe existing REST APIs. REST APIs have a uniform interface, a limited amount of “commands”, including GET, POST, PUT and DELETE. This property makes them highly suited to be semantically described, since the commands themselves carry some semantic meaning [30]. Software projects and code don’t have this uniform interface: each programming language has a different compiler or interpreter and the details of invoking the code can differ from project to project. A semantic description of a software project thus must contain more details if we hope to be able to invoke the code based on this description. The Object-Oriented Components ontology aims to be an ontology that allows such advanced functionality. The usefulness of the ontology has been proven by the creation of Components.js, which uses descriptions of NPM packages to instan- tiate software projects dynamically. However, as the name implies, Components.js only supports JavaScript projects. While Web APIs and software projects are fundamentally different in nature, their use to a devel- oper might be the same: integrating the efforts of other developers into their own work. Essentially, Web APIs and software have the same function: they both receive input and respond with output. By sharing APIs and code on the web, developers can save time by reusing existing implementations of functionality needed in their projects. Currently, package managers are an important solution to allow developers to share and reuse functions. While this system works well for complex functionality, it seems excessive for a sim- ple function like the calculation of the euclidean distance. The time spent looking for a suitable package, reading the documentation, installing the package and/or adding the dependency and eventually calling the package’s implementation from your own code could arguably be spent more efficiently by implementing the desired functionality yourself. Moreover, package managers usually only support one programming language or programming environment, so trying to accomplish the same in another environment requires searching for a suitable package manager (if any), learning a new command line interface and possibly other tools. This observation is closely related to the problem described by De Meester et al. in [31]. As a solution, the authors envision a system for discovering and using functions through content nego- tiation, allowing both Web APIs and software implementations to be seamlessly integrated into one system. The solution presented required the existence of an ontology that could capture an abstract description of a function. FnO is an ontology that was created to tackle these kinds of problems. It allows us to capture the functionality of both Web APIs and code in a uniform manner. While the work in [31] explores the concept of a hub that collects function descriptions and connects them with implementations, no implementation of this concept was provided.

10 Chapter 3

Research Questions and Hypotheses

In the previous chapters, we identified problems and subsequently explored existing solutions re- lated to these problems. From section 2.6, it is clear that no complete solution to the problems discussed in section 1.1 is available. This chapter specifies the research question that follow from these problems, and derives hypotheses from this question. In section 1.1, a number of problems were identified. The main goal is summarized in the ques- tion: “How can we improve the current situation for discovering and (re-)using functions for both humans and machines?”. To answer this question, a number of more detailed questions should be answered. These research questions are derived from the identified problems:

Research Question 1 How can we make the searching experience for functions more precise?

Research Question 2 How can we provide functions in such a way that the used environment is irrelevant?

Research Question 3 How can we provide functions to machines in order to enable automatic processing of data?

Research Question 4 How to create a system in which functions are distributed, as to avoid the problems of a centralized system?

To answer these research questions, we investigated related works in chapter 2, and discussed their shortcomings regarding the problems from section 1.1 in section 2.6. Taking this knowledge into account, we derived following hypotheses:

Hypothesis 1: Describing functions with Linked Data enables search capabilities beyond keyword search As discussed in section 2.4 and section 2.5, searching for function implementations by specific filters and constraints is an interesting use case. We discussed existing methods of searching for code, and found that some are able to search through code on the web, while others are designed to search through local code. The first category of solutions was found to be not pow- erful enough, lacking features like defining requirements for function parameters and return types, while the second requires too much processing power to be able to search through a very large amount of code. We believe describing functions with Linked Data enables a better search experience for both humans and machines.

11 Hypothesis 2: Linking abstract function descriptions with specific implementations enables the use of a uniform interface to invoke functions In section 2.2 and section 2.3, we discussed works that relate functions, in the form of Web APIs and software implementations respectively, to the principles of Semantic Web and Linked Data. However, none of these works offer the ability to connect an abstract function, defined in, e.g., FnO, to a description of an implementation, described in e.g. the Object-Oriented Com- ponents ontology. Making this connection would allow us to make use of an implementation by referring to it via an abstract function description. As an example, this would allow such a system to use both Web APIs and software implementations interchangeably, using the same function description.

Hypothesis 3: Using Linked Data to describe functions, intelligent agents can make use of them to enable automated processing of information In addition to the previous hypothesis, describing functions in Linked Data enables connect- ing functions with Linked Data problems. We believe that making this connection can enable useful functionality related to automated processing of data, analogous to the functionality described in subsection 2.3.1, with the additional advantage of being more abstract, which allows for more flexible use cases. For example, we can connect the problem of “calculat- ing a distance from two coordinates” —described in RDF—, to a distance function in an RDF graph. This would allow intelligent agents to automatically discover the correct functions and implementations to solve this problem.

Hypothesis 4: Following the Linked Data principles enables distribution of storage and responsibil- ity According to Tim Berners-Lee, one of the important “rules” of Linked Data, is that it should link to relevant data elsewhere on the web [32]. Ideally, the data should be “open” as well, which means it is accessible by anyone and makes use of open standards to provide the data. By using open standards, we enable interoperability with other Linked Data/Semantic Web applications. Additionally, by following the Open Data principles, we allow anyone to use, modify, copy and/or republish the function and implementation descriptions, mostly mitigating the issue of scalability and creating a single point of failure.

In the following chapters, a solution is proposed with the aim of accepting these hypotheses. Subsequently, chapter 9 evaluates the solution by accepting or rejecting the derived hypotheses.

12 Chapter 4

Approach

With the FunctionHub, we aim to provide an implementation of the system described in [31], i.e., a system that both humans and machines can use to discover and use functions. Using the system, one can quickly find an appropriate implementation of the desired function, with the desired input parameters and output, by describing the wanted properties in a query. This chapter describes the considerations that are taken into account while implementing this system. First, however, we define some terminology as words like “function” and “implementation” are used as separate concepts in this thesis, while in most contexts, these words overlap. In the context of this dissertation, a function is an abstract entity that transforms one or mul- tiple inputs with a certain data type into one output with a certain data type. To be able to use a function, there has to exist an implementation of this function. An implementation can be in the form of a program, written in a programming language or, a Web API. An implementation is thus a certain realisation of the abstract entity that is a function. A function can be realised in multiple implementations. In the following sections, the use of Semantic Web technologies will be discussed, as well as the most important existing technologies that are reused in this thesis. Finally, some specific goals about the functionality of the system are presented.

4.1 Using Semantic Web technologies

This section describes some important advantages that result from using Semantic Web technology in this context.

4.1.1 A common vocabulary

By using Semantic Web technologies and using existing ontologies, the system becomes part of a growing ecosystem of tools and applications that use a common “vocabulary” for exchanging data on the web. In doing so, the data this system produces can easily be shared with others using Semantic Web tools, potentially enabling interoperability with many other technologies.

4.1.2 Highly descriptive queries

The Semantic Web allows us to support highly descriptive queries without explicitly programming every aspect of the query. This is achieved using SPARQL, which is the standard query language for

13 the Semantic Web. Consequently, the FunctionHub supports SPARQL as a query language. This allows a query to be arbitrarily complex, as long as the query is valid SPARQL and the result of this query is a function. To accommodate users that are not Linked Data experts and have needs for which simpler queries are sufficient, we also provide a more traditional JSON API. This JSON query can sub- sequently be converted to a valid SPARQL query, after which an RDF database with support for SPARQL takes care of reasoning and filtering results based on the query. However, it is important to note that a JSON-based query method presents limitations, since it can only accept properties that are known by the system and can thus be converted to SPARQL.

4.1.3 Return values

Without using the Semantic Web, after specifying what function the developer wants to use, he/she still needs to refer to the documentation on how to correctly use the implementation. Discovering and using functions at runtime would be error-prone as the developer does not know beforehand how the return value should be interpreted. By using semantic function descriptions, this problem can be avoided by specifying the way of interpreting the return value in the descriptions. For ex- ample: if the FunctionHub contains a function for calculating distance between two coordinates, it might be unclear if the returned distance should be interpreted as meters, kilometres, miles, etc. By providing a textual description that specifies the return value’s unit, and associating this descrip- tion with the return value, this problem can be avoided. For intelligent agents, functions can be linked with Linked Data “problems” in the semantic description. For the distance example specified above, an associated Linked Data problem could be transforming two geo:lat_long objects to a dbpedia:distance object1.

4.2 Building upon existing ontologies

Building upon existing technologies is a valuable technique for creating complex systems. By using them, we can focus on the goals of this thesis instead of being concerned with other requirements. The most important existing technologies that are reused are discussed below.

4.2.1 FnO

An important element for this thesis is the semantic description of a function. Using this description, the available functions can be queried and filtered. For creating these descriptions, we make use of FnO, an existing ontology for describing functions. In doing so, we make use of existing work and must not concern ourself with designing an ontology for describing functions, and we can focus on the main goal of this thesis, which is discovering and using these functions in applications.

4.2.2 JSON-LD

We chose to use JSON-LD as a serialization format for the FunctionHub’s Web endpoints because it is a standard format that can be interpreted by a large number of Semantic Web applications. Using JSON-LD allows the FunctionHub to use DESCRIBE queries instead of complex SELECT queries to query the underlying RDF database in many cases, which makes it more generic and easier to

1Prefixes used throughout this thesis are conform with the results of http://prefix.cc

14 extend in the future (see also chapter 8). Additionally, JSON-LD is based on JSON and is thus easily interpreted by existing languages, especially JavaScript (and TypeScript).

4.2.3 Linked Software Dependencies

Linked Software Dependencies was previously discussed in section 2.3. It introduces two technolo- gies that are used in this work: the Object Oriented Components ontology and the Components.js dependency injection framework. These technologies are used in the FunctionHub to describe and instantiate NPM package implementations, respectively.

4.3 Research Goals

In order to accept the hypothesis derived in chapter 3, the completed system should have the fol- lowing functionality:

• Search for functions and implementations, to support hypothesis 1.

• Automatically instantiate and invoke implementation with a uniform interface. The type of implementation should not influence how it is invoked. This functionality is important for hypothesis 2.

• Hypothesis 3 requires a method to search by Linked Data problem, returning functions and implementations that allow to solve the requested problem.

• Supporting multiple programming languages and multiple implementation types (e.g., soft- ware package, code snippet, and web service) to illustrate the system’s distributed nature and ability to support a wide variety of implementations.

More specifically, for this prototype implementation, the system should support both Java and JavaScript as implementation languages. These specific languages were chosen for two reasons. The first is that Java and JavaScript are a statically and dynamically typed languages, respectively. This difference is an important one when describing implementations in these languages semantically and when dynamically executing these, as statically typed languages offer more information from its type signature, while dynamically typed languages are easier to execute dynamically, as they don’t immediately fail to execute when there is a type mismatch. Supporting both types of languages proves that our system is not limited to statically or dynamically typed languages. The other reason is that these languages are usually well known in the development community (authors included), which facilitates ease of understanding of this thesis, especially when code examples are involved.

15 Chapter 5

Use Cases

In this section, we will present some possible use cases for the proposed system. These use cases will be used as a basis for evaluation, thus they must encompass the research goals for this dissertation.

5.1 Function search engine

As we have determined in the previous chapters, searching for functions on the web could be im- proved. Searching by keywords is an imprecise way of finding functions that meet certain require- ment, such as a particular input or output type. Current solutions are only applicable to a certain type of implementation (e.g., Hoogle only works for Haskell), or do not offer the desired search- ing capabilities (e.g., GitHub search and searchcode.com). Using the FunctionHub, this situation could be improved. The search engine use case could be implemented by providing an interface for querying a database for functions and implementations that match the given constraints. The sys- tem could provide a SPARQL query interface to accomplish this, however, while powerful, SPARQL requires knowledge of the concepts of Linked Data, the structure of the data and the vocabularies that are in use. A more accessible implementation could be based in a web interface. A website might provide a form to build a query, which helps users that are unfamiliar with the internally used technologies. After submitting the form, the system could provide the user with all functions and implementations that match the constraints the user has specified in the form. Communication between the web interface and the FunctionHub can be done using a standardized interface, which allows other developers to integrate this search engine in their projects.

5.2 Package Manager

A possible implementation for a function is a software package. These packages are traditionally obtained from package managers, as discussed in section 2.4. Currently, packages available in package manager’s repository can be found using keyword searches through the platform’s website or, for some package managers, through its CLI. Using the FunctionHub’s semantic descriptions of functions and implementations, a more precise way of finding these packages could be created, like searching by input and output type. By including software packages in the internal graph of the FunctionHub, it could function as a package manager: the user can search for a package that fits their needs, download this package locally through the system, and consequently use it in their current project. To implement this use case, a CLI can be provided, which would function in a similar

16 way to existing package managers like NPM. An additional advantage of using the FunctionHub as a package manager is the language-agnostic nature of the system: a user could use the FunctionHub as a package manager for multiple programming languages and environments.

5.3 Provide a context-dependent solution

Up until recently, all home pc’s could be assumed to have similar amounts of computational power. This meant that application developers could estimate what functionality was achievable on the cus- tomer’s hardware and what functionality might better be performed on a more powerful machine like a computing cluster. Today, the amount of resources that are available on a customer’s device vary wildly. Some users prefer to perform their daily tasks on highly portable devices like smart- phones, while others prefer the power of a desktop pc. This means that developers that aim to pro- vide an application for both of these types of devices —and all devices in between these extremes— can face a dilemma when implementing functionality that requires a significant amount of resources. An example of such a task is Optical Character Recognition (OCR): this task can be performed on desktop PCs, but on smartphones, this task might better be offloaded to a cloud service. The FunctionHub could provide a system that mitigates this dilemma: when a cloud implemen- tation and a software implementation are available in the FunctionHub’s graph, the FunctionHub could intelligently decide the most suitable implementation to use based on the available resources on a user’s device or on the state of the external service. In the example described before, the system would decide to use a cloud-based implementation on a smartphone, but a software imple- mentation on a desktop pc.

5.4 Interactive Linked Data processing

Linked Data is a promising method for storing data, because of the semantic meaning the ontologies add to the data. Via inferencing1, new relationships can be discovered in existing data by using the semantic rules embedded in ontologies. Ontologies are praised for their ability to model complex relationships between objects, however, relationships between literal values are harder to express using ontological constructs. For example, from (Flipper isA Dolphin) and (Dolphin subClassOf Mammal), we can derive a new relationship (Flipper isA Mammal). This is a relationship that can easily be described using ontologies. But, (Flipper BMI 15.7) can not easily be inferred from (Flipper length 1.8m) and (Flipper weight 51kg) because the relationship between the literals weight, length and BMI is a relationship that is not well suited to be described using ontological constructs. To describe relationships like this, we would have to make use of an ontology that can model the formula BMI = weight length2 . While possible, we can imagine that describing more complex formulas would result in very complex ontological descriptions. Functions are better suited for this kind of functionality, they allow us to define relationships be- tween literals of arbitrary complexity in a more compact way than ontologies. Additionally, function implementations can be executed directly in interpreted languages or after compilation in compiled languages. This means that no computation time has to be spent on interpreting the description of the formula and converting it to a transformation that a program can execute. Thus, like ontologies,

1https://www.w3.org/standards/semanticweb/inference

17 functions can be used for inference and creating new knowledge from relationships between literals much in the same way as ontologies are used to infer new knowledge from relationships between other types of objects. Consequently, the FunctionHub could be used as a system for extracting new knowledge from existing knowledge. By including Linked Data problems to the function graph, the correct implementations for these functions can be found and executed to derive new information. In the BMI example, the FunctionHub would contain a function that calculates BMI, along with one or more implementations of this function. An application can then request a function that allows the calculation of the property “BMI”, given a number of known properties, including “length” and “weight”. The FunctionHub could then respond with all known implementations of the BMI func- tion, along with the required parameters for this implementation, allowing a client to derive the new information.

18 Chapter 6

FunctionHub ontology

For creating the FunctionHub system, we require connecting functions and implementations. In chapter 2, we discussed, among others, the ontologies Hydra, DOAP, and the Object-Oriented Com- ponents ontology. These ontologies allow us to describe an implementation: with Hydra, web ser- vices can be described, DOAP can be used to describe a software project, and with the Object- Oriented Components ontology, NPM packages are described. Unlike these ontologies, FnO allows us to describe functions. Functions are not associated with a specific implementation, and thus a function description alone cannot be used to execute a function. By combining these two types of ontologies, we aim to provide the facilities to execute a function. This chapter describes the created FunctionHub ontology.

6.1 Comparison to existing technologies

As the intro of this chapter explains, the FunctionHub ontology can be seen as a combination of two types of ontologies: ontologies describing functions, and ontologies describing implementations. Building the FunctionHub ontology, we integrated some of these existing ontologies, since they already provide a subset of the features necessary for the FunctionHub. The FunctionHub ontology can be seen as the “glue” that combines several ontologies. Table 6.1 compares the FunctionHub ontology with Hydra, Object-Oriented Components ontology, DOAP, and FnO. The table lists the main function of each of these ontologies and the features that can be gained from combining them.

Table 6.1: Comparing existing ontologies (Hydra, Object-Oriented Components ontology, DOAP, and FnO) to the FunctionHub ontology

Hydra OOCO DOAP FnO FH Describe functions     a Describe environment-specific implementations     b Enables instantiating/executing implementations      Enables instantiating/executing multiple types of imple-      mentations Connects functions and their implementations     

aThis functionality is achieved by using FnO. bThe FunctionHub ontology makes use of Hydra, OOCO and DOAP to describe implementations. Each of these enable describing a specific implementation type. Other environment-specific implementation ontologies could be added for other implementation types.

19 from the Function Ontology (FnO)

Function ... function parameterMapping ParameterMapping Mapping implementation Implementation

Position Property ParameterMapping ParameterMapping

JavaScript Java WebApi Implementation Implementation

Legend

subclass class JavaScript NpmPackage JavaClass JsonApi relation Function

Figure 6.1: A schematic representation of the FunctionHub ontology.

6.2 Classes

The FunctionHub ontology defines several classes related to the implementation type. These are related to each other through subclassing, and allow the FunctionHub to determine the type of im- plementation. The type of an implementation node thus determines the method of instantiating the implementation. How this is done exactly is explained in section 8.2 as part of the implementation chapter of this thesis. Figure 6.1 visualizes the relationship between these classes. Two new main classes are defined by the FunctionHub ontology: Implementation and Mapping. Another important class is Function, which is one of the classes from FnO. We reuse the Function class from FnO to represent a function in the FunctionHub ontology. Because this class is fundamental to the functioning of the Function- Hub, this class is briefly discussed here, as are the classes defined by the newly created ontology1.

6.2.1 Function

In FnO, the Function class defines an abstract function. Predicates like expects and returns are used to describe the expected parameters and return values. FnO also defines predicates for the name of the function and a description of what the function does. These can be used to perform a keyword search on the functions. A schematic overview of the classes in FnO can be found in Figure 6.2. The FunctionHub ontology does not make use of the Execution and Algorithm classes of FnO. However, the Problem class is used and extended somewhat to provide support for mapping Linked Data problems onto functions. Such problems are first discussed in this work in subsec- tion 4.1.3. The class from FnO is extended by adding an input and output property to this class. The input predicate’s range is the list of datatypes of the input for this problem, while the output predicate’s range is the datatype for the output of this problem. It should be noted that the order of the datatypes in the input list should correspond to the order of Parameters in a Function’s expects property.

1Ontology available in Turtle format at http://users.ugent.be/~lnoterma/discovering-using-functions/ ontology.ttl.

20 Figure 6.2: A schematic overview of the classes in FnO [21].

6.2.2 Implementation

An instance of the Implementation class represents, as the name implies, an implementation. In the current version of the ontology, Implementation has three subclasses: JavaScriptImplementa- tion, JavaImplementation, and WebAPI. As the ontology was created for the purpose of creating the FunctionHub system, only those implementation types that are supported by the system were added. These three classes are again subclassed, to define the implementation type further, as multiple types of Java/JavaScript/WebAPI. . . implementations might exist. For example, in the prototype presented in this thesis, the only JavaImplementation that can be instantiated is a Java class. In the future, however, Java packages could also be supported. A Java package is an implementation in Java, so the JavaImplementation class is applicable, but it cannot be of the type JavaClass, since a Java package should be instantiated in a different way than a Java class. The properties that can be used on these classes depend on the exact implementation type. For a full reference, please refer to Appendix A. This reference contains the range and domain for each property. These ranges and domains are important, as the FunctionHub system uses these to find and instantiate functions.

6.2.3 Mapping

Each implementation type has a corresponding Mapping type. This mapping defines how to inter- pret the Function description in relation to the implementation description. For example, when a function has two inputs, the mapping defines which function input corresponds to which implemen- tation parameter. The Mapping class is thus the “glue” connecting a function description and an implementation description. Each Mapping instance is connected to exactly one Function and one Implementation definition through the predicates function and implementation. How the inputs and output of the function relate to the implementation is defined in a ParameterMapping instance.

21 6.2.4 ParameterMapping

The ParameterMapping class defines how a Function’s parameters are mapped onto an implemen- tation. Several ParameterMapping subclasses exist, pertaining to different methods of mapping function parameters to implementation parameters. In Java and JavaScript, the parameter’s posi- tions are used to differentiate between them when executing the function/method, while in a JSON API, the JSON’s property keys serve this purpose.

6.2.5 Summary

In summary, the class Function allows us to describe the semantics an abstract function, such as its input and output types and a textual description. This description can be used to search and filter functions. The Implementation class is used to describe an implementation, and allows instan- tiation of this implementation. The Mapping class connects the two previous classes, enabling the instantiation of an abstract function, through an associated implementation.

6.3 Example

The example in Listing 6.1 (continued in Listing 6.2) describes a function that returns the population density of an area. It calculates this using two parameters: the total population of the area end the total area in square kilometres. The output is a value in people per square kilometre. Two implementations are defined for this function: a JavaScript function, and a Java class. Map- pings are defined for both of these. Lastly, this function is associated to a Linked Data problem with inputs dbpedia:populationTotal and dbpedia:PopulatedPlace/area, and as output dbpedia:populationDensity.

22 1 @prefix: .

2 @prefix dc: .

3 @prefix xsd: .

4 @prefix fno: .

5 @prefix fnhub: .

6 @prefix : .

7

8 :populationDensity

9 a fno:Function;

10 dc:description "This function returns the population density of an ,→ area."^^xsd:string ;

11 fno:expects(:populationDensityParam1:populationDensityParam2);

12 fno:name "Calculating the population density"^^xsd:string ;

13 fno:returns:populationDensityOutput;

14 fno:solves:calculatingPopulationDensity.

15

16 :populationDensityParam1

17 a fno:Parameter;

18 fno:predicate[

19 dc:description "The total population (absolute value)."^^xsd:string ;

20 fno:type xsd:float

21 ];

22 fno:required true.

23

24 :populationDensityParam2

25 a fno:Parameter;

26 fno:predicate[

27 dc:description "The total area (in square kilometers)."^^xsd:string ;

28 fno:type xsd:float

29 ];

30 fno:required true.

31

32 :populationDensityOutput

33 a fno:Output;

34 fno:predicate[

35 dc:description "The population density (in amount per square ,→ kilometer)."^^xsd:string ;

36 fno:type xsd:float

37 ];

38

39 :implementationPopulationJava

40 a fnhub:JavaClass;

41 fnhub:class-name "PopulationCalculator";

42 doap:download-page ,→ "http://example.com/implementations/java/PopulationCalculator.java".

43

44 :implementationPopulationDensityJavaScript

45 a fnhub:JavaScriptFunction;

46 doap:download-page ,→ "http://example.com/implementations/js/calculatePopulationDensity.js".

Listing 6.1: Example of a complete FunctionHub description of a function and two implementations, in Turtle.

23 47 :mappingPopulationDensityPopulationJava

48 a fnhub:JavaClassMapping;

49 fnhub:function:populationDensity;

50 fnhub:implementation:implementationPopulationJava;

51 fnhub:parameterMapping[

52 a fnhub:PositionParameterMapping;

53 fnhub:functionParameter:populationDensityParam1;

54 fnhub:implementationParameterPosition1

55 ],[

56 a fnhub:PositionParameterMapping;

57 fnhub:functionParameter:populationDensityParam2;

58 fnhub:implementationParameterPosition2

59 ];

60 fnhub:method-name "calculatePopulationDensity".

61

62 :mappingPopulationDensityPopulationDensityJavaScript

63 a fnhub:JavaScriptFunctionMapping;

64 fnhub:function:populationDensity;

65 fnhub:implementation:implementationPopulationDensityJavaScript;

66 fnhub:parameterMapping[

67 a fnhub:PositionParameterMapping;

68 fnhub:functionParameter:populationDensityParam1;

69 fnhub:implementationParameterPosition1

70 ],[

71 a fnhub:PositionParameterMapping;

72 fnhub:functionParameter:populationDensityParam2;

73 fnhub:implementationParameterPosition2

74 ].

75

76 :calculatingPopulationDensity

77 a fno:Problem;

78 fnhub:input( ,→ );

79 fnhub:output .

Listing 6.2: Continuation of Listing 6.1.

24 Chapter 7

Architecture

This chapter discusses the architecture of the FunctionHub implementation. First, a general overview is given which explains how the various components of the implementation interoperate. Subse- quently, these various components are discussed individually.

7.1 General overview

Figure 7.3 provides a high level overview of the FunctionHub’s general architecture. Various com- ponents interoperate to provide the functionality that was determined in chapter 4. The client libraries are not required to follow a specific architecture, since server and client are loosely cou- pled through a Web API, hence a generalized description is given below. The general use of the FunctionHub system, when used in combination with a client library, is as follows:

1. A client application needs a function. The application determines its needs and builds a query based on these specifications. For example, a query might specify that the application is looking for functions which return a float value, and accepts two integer values as input. Additionally, keywords might be included to further narrow down the available options.

2. The query is passed on to the client library.

3. The client library makes a request with this query to the remote FunctionHub server.

if software impl.

Implementations

10. send implementation

1. build query 9. request implementation multiple times RDF Database

2. send query 3. request function 7. return descriptions Client 4. query by SPARQL Client Function Hub Server 8. instantiate implementation Library 6. return function + Function Descriptions 11. return instatiated function implementation descriptions

12. invoke implementation SPARQL endpoint 5. return partial function + implementation descriptions

Figure 7.1: High level overview of the architecture of the FunctionHub.

25 4. After receiving a request, the FunctionHub server constructs a SPARQL query from the query which was built by the client application. Subsequently, the SPARQL query is sent to an RDF database with a SPARQL endpoint to receive the results of the query.

5. The RDF database responds to the SPARQL query with descriptions of functions and associated implementations. A combination of SELECT and DESCRIBE queries are performed to receive all necessary information.

6. After combining the results of the SPARQL queries into a JSON-LD document, the result is sent back to the client library as a response of the request done in step 3.

7. The descriptions of matched functions and associated implementations are returned to the client application.

8. The client application chooses to instantiate one of the received implementations.

9. In case the application requested a software implementation, the client library extracts the necessary information for instantiation and requests the implementation code from the web. On the other hand, if the application requested a Web API implementation, the client library creates a proxy function which will access the Web API once the function is invoked.

10. The client library receives the implementation code and performs the necessary tasks to create a function that can be invoked by the application.

11. The created function is returned to the client application.

12. The instantiated function is invoked by the application.

From the above, the following software components of the system can be derived: FunctionHub server, RDF database, client library, implementations, and the client application. In the following sections, architectural considerations regarding these components are discussed.

7.2 FunctionHub server

process query

Server FunctionHub SPARQL Clients Entrypoint Processor Processor RDF DB

Figure 7.2: Architecture of the FunctionHub server.

The most important component of the FunctionHub system is the server. Using a server instead of letting clients communicate with the RDF database directly allows us to optimize the process of querying database (e.g., by caching responses). It also avoids implementing functionality like query- ing in multiple programming languages, as well as allowing us to change underlying technologies without impacting the client libraries. The server has three main components: the entrypoint for the Web API, the FunctionHub processor, and the SPARQL processor. Function and implementation descriptions are stored in an RDF database. The server thus follows a layered architecture: each

26 layer is responsible for a specific role in the server application. These layers are discussed in the following sections. A schematic overview of this architecture can be found in Figure 7.3.

7.2.1 Web API entrypoint

Clients connect to the FunctionHub server through an HTTP API. No specific API style (e.g. REST, SOAP) was used in this prototype. To provide this point of entry, an HTTP server is listening at all times. The HTTP server receives queries from clients in the form of HTTP requests from which the actual query object is extracted. This query object is then processed further by the FunctionHub processor. Apart from accepting requests, this component is also responsible for packaging the results from the FunctionHub processor into an HTTP response packet and sending it to the client.

7.2.2 FunctionHub processor

The FunctionHub processor is the component that is responsible for processing a user’s query and providing the resulting function and implementation descriptions in return. This component inter- faces with the SPARQL processor for retrieval of data.

7.2.3 SPARQL processor

To interface with the RDF database, a SPARQL processor component is included. This component is responsible for handling communication with the RDF database. It receives queries from the FunctionHub processor, which are processed and sent to the database. The results of these queries, retrieved as JSON SPARQL query results1 or JSON-LD, depending on the type of query (see chap- ter 8), are subsequently returned to the FunctionHub processor for further processing.

7.2.4 RDF database

A fundamental component in the FunctionHub system is an RDF database with SPARQL query sup- port, i.e. a SPARQL endpoint. This database stores all function and implementation descriptions. SPARQL queries are used to retrieve these in the FunctionHub server. The RDF database cannot be accessed directly by the client, all communication flows through the FunctionHub server.

7.3 Client Libraries

This section discusses the general flow of information in the client libraries, without going into detail about how these operations are performed in the libraries because, as mentioned earlier, these libraries are not necessarily built using the same architecture. Discussion of the implementation of these libraries can be found in section 8.2.

7.3.1 Client application

The client application is a program that is using the FunctionHub client libraries. For example, a package manager that makes use of the FunctionHub system, or an IDE using the FunctionHub for providing implementation suggestions. This client builds a query according to their needs and

1https://www.w3.org/TR/sparql11-results-json/

27 Implementations

Client Client FunctionHub Application Library

Figure 7.3: Architecture of the FunctionHub client libraries. passes it on to the library. Communicating with the library, the client application receives objects representing function and implementation descriptions, as well as the implementations themselves.

7.3.2 Client library

The client library itself is responsible for communication with the Web. This includes querying the FunctionHub server and retrieving or invoking available implementations. When the client application sends a query to the library, the library forwards it to the Function- Hub server. After receiving the response, the library processes it and returns the results to the client application. These results contain functions adhering to the desired specifications, and for each function the implementations that implement this function. Additional information about the implementations is provided, e.g. whether it is a software implementation or a web service, which the client application can use to decide which implementation to use. In most instances, however, the exact implemen- tation the client will use is irrelevant, as long as it provides the application with the desired result. Subsequently, the client library requests this implementation from the web. After processing by the library, to e.g. instantiate the implementation, the client application is able to use the implementa- tion.

28 Chapter 8

Implementation

This chapter covers the implementation of the FunctionHub system. More specifically, the first section covers the implementation of the server, the second discusses the client side of the system and its two implementations: the Java and Node.js client libraries. Furthermore, the implementation of the Web app is discussed. This app can be seen as a special case of a client application, however, since it is the entrypoint for users to add new functions and implementations to the system, it is an integral part of the system. The code repository for the implementations discussed in this chapter can be found on https://git.datasciencelab.ugent.be/fno/function-hub.

8.1 Server

8.1.1 Technologies

In this subsection, the technologies used to built the FunctionHub server are introduced. Further- more, we briefly explain why these technologies were chosen to build the system. The code for the server is written in TypeScript, a superset of JavaScript created by Microsoft1. TypeScript adds, among others, strong typing support to the language. Strong typing helps to ensure consistency in the program’s state, and helps developers to spot some programming errors at compile time thanks to the TypeScript compiler. These properties are desirable for building the FunctionHub server, since they help to build reliable software, which is especially important for a server application. After the TypeScript code is transcompiled to JavaScript, it is executed by Node.js. Node.js is a JavaScript runtime built on the V8 JavaScript engine2. It is lightweight and event-driven, and it has a large ecosystem of packages available using NPM, including many Semantic Web-related ones. These properties make it a good choice for the FunctionHub server. The Web API is built using Express, which is a framework for building web applications in Node.js. It was chosen because it is lightweight, easy to set up, and provides the necessary fea- tures for our use case. For storing and retrieving RDF data, GraphDB was chosen for its ease of use and good perfor- mance [33].

1https://www.typescriptlang.org/ 2https://nodejs.org/en/

29 1 var query={

2 expects:[{ type: 'float'},{ type: 'float'}]

3 };

Listing 8.1: An example query object.

1 query.expects.forEach((param,i) =>{ 2 result += `?func fno:expects/rdf:rest*/rdf:first ?param${i}. 3 ?param${i} fno:predicate ?${i}.

4 ?${i} fno:type xsd:${param.type}.

5 `;

6 });

Listing 8.2: Converting a query’s expects parameter to a partial SPARQL query in the FunctionHub server.

8.1.2 Handling incoming queries

Incoming queries are in the form of HTTP POST requests to the /query endpoint. Queries performed to this endpoint are of the form of a JSON object that contains the requirements that a function must have. An example of such a query can be found in Listing 8.1. Additionally, keywords might be included that filter the results based on partial strings occurring in the function’s description and/or name. After receiving this HTTP POST request and extracting the query object from it, the query object is passed on to the processor which converts it to a SPARQL query. An example of this conversion process is given in Listing 8.2. In this code fragment, the expects property of the query object is converted into a part of the SPARQL query. This partial query is appended to result, which is the SPARQL query that will eventually be sent to the RDF database. As an example, the query object in Listing 8.1 is converted by the processor to the SPARQL query in Listing 8.3. The SPARQL query selects some basic information about the function: its name, description, and –most importantly– URI. In the WHERE clause, we specify ?func to be a fno:Function defined by FnO and its fno:expects property to be a parameter with predicate type xsd:float. Additionally, two FILTER statements are added to specify that the two parameters are distinct. For example, this avoids functions with only one float parameter to be matched with a query requesting functions

1 SELECT DISTINCT ?func WHERE{

2 ?func rdf:type fno:Function. 3 ?func fno:expects/rdf:rest*/rdf:first ?param0. 4 ?param0 fno:predicate ?pred0.

5 ?pred0 fno:type xsd:float. 6 ?func fno:expects/rdf:rest*/rdf:first ?param1. 7 ?param1 fno:predicate ?pred1.

8 ?pred1 fno:type xsd:float.

9 FILTER(?param0!=?param1)

10 FILTER(?param1!=?param0)

11 }

Listing 8.3: Example query after conversion.

30 with two float parameters. The order of the parameters is irrelevant at this stage, since the query is only used to find and filter available functions. The order only gains importance during instantiation on the client side.

8.1.3 Querying the RDF database

Once a query has been created as explained in the previous section, it is used to query the RDF database. To accomplish this, the query is sent to the SPARQL processor. The SPARQL processor contains the functionality to communicate with the RDF database through its SPARQL endpoint. Before sending the query to the database, the SPARQL processor prepends a set of default prefixes, after which it sends the completed query to the RDF database. Through the HTTP headers, a JSON response is requested. If the SPARQL query results in one or more available functions, this result is returned as a list of function URIs, as this was requested in the query.

8.1.4 Providing JSON-LD function descriptions

The previous sections explained how a SELECT query is built from a query object and subsequently used to receive function URIs from the RDF database. However, in response to a query, clients expect a JSON-LD object that contains function and implementation descriptions. These descrip- tions can easily be obtained from the RDF database using DESCRIBE queries. DESCRIBE queries can be used to receive a full description of a specified URI. Additionally, GraphDB, the RDF database used in our implementation, supports returning the results of a DESCRIBE query in JSON-LD. This functionality is helpful for the implementation of the FunctionHub server, as it avoids the need of implementing a mechanism for converting the SPARQL results into JSON-LD in the server. To provide JSON-LD descriptions of the functions returned by the SPARQL query from Listing 8.3, each function URI is thus requested through a DESCRIBE query to the RDF database. For each function, this results in a JSON-LD description of all triples that are directly reachable from function URI. This includes the function’s name and description. Other information, such as the function’s expected parameters and return value, are included as URIs. To provide a full description to the client, such as the one in Listing 6.1, an additional DESCRIBE query is done for each of these URIs. These steps result in a full description of the requested functions.

8.1.5 Providing JSON-LD mappings and implementations

The steps from the previous sections result in a full description of the functions matching the initial query sent to the server. However, additional descriptions are necessary to enable the process of in- stantiating implementations that are able to execute the requested function. For this purpose, an ad- ditional SPARQL SELECT query is done to receive the mappings that apply to this function. An exam- ple of such a query for the function with URI http://example.com/functions/populationDensity is shown in Listing 8.4. By matching the function URI to the fnhub:function predicate of instances of the Mapping type, both the Mapping type itself and the associated Implementation instance can be retrieved. Retrieving the full descriptions from these URIs is done in an analogous way as how a full de- scription was retrieved from a function URI as explained in the previous section.

31 1 SELECT DISTINCT ?implementation ?mapping

2 WHERE{ 3 ?implementation rdf:type/rdfs:subClassOf* fnhub:Implementation. 4 ?mapping rdf:type fnhub:Mapping.

5 ?mapping fnhub:function .

6 ?mapping fnhub:implementation ?implementation.

7 }

Listing 8.4: A SPARQL query requesting the Mapping and Implementation instances that apply to the function with URI http://example.com/functions/populationDensity

8.1.6 Combining results and returning to client

After the previous steps have been completed, the set of distinct JSON-LD descriptions is combined into one JSON-LD document. As the descriptions are returned in their expanded form by the SPARQL processor, this can be done by simply concatenating the different descriptions. Finally, the complete JSON-LD document is compacted through JSON-LD’s compact function and sent to the client that performed the initial HTTP POST request through Express’ Response object.

8.1.7 Additional functionality

Apart from querying functions using the /query route, other API routes exist that offer additional functionality. Using the / route, a SPARQL query can be used instead of the simplified JSON queries that are accepted by /query to query the FunctionHub server. These queries should return function URIs in the func variable (like in Listing 8.3) to be able to be further processed and generate the JSON-LD document. The /node route provides a JSON-LD description of a provided URI. This is comparable to the SPARQL query DESCRIBE . This functionality is useful when additional information is required that is not available in the description received by /query. A /implementation route is available for retrieving implementation descriptions and the associ- ated mappings for this implementation. Finally, /jsonld is a route for adding descriptions to the RDF database. It accepts a JSON-LD input and adds this document to the RDF database graph. This is used for creating new function and implementation descriptions.

8.2 Client Libraries

To make use of the previously described FunctionHub server, two client libraries were created. The first is a library for Node.js and thus written in JavaScript. The second was written in Java. These libraries facilitate the use of the FunctionHub server by providing functionality for querying the server and methods for instantiating the implementations that can be found through this process. In this section, the most important features of these libraries are discussed.

32 8.2.1 JavaScript/Node.js Library

The Node.js library supports querying the FunctionHub using a JavaScript object. This JavaScript object is equivalent to the JSON query object explained in subsection 8.1.2. To perform HTTP requests to the FunctionHub server, the library request-promise was used, which is available on NPM3. This library provides simplified access to Node’s built-in http module. To process the JSON- LD documents resulting from a query, the jsonld library4 is used. Apart from querying the FunctionHub server, this library supports instantiating and executing two types of implementations: JavaScript functions (as snippets) and NPM packages. The following paragraphs discuss how this is accomplished.

JavaScript Functions In this context, a JavaScript function is a JavaScript module that directly exports a function, i.e. the implementation is a single file containing a function in the form of exports = function{...}(...). As can be seen in Appendix A, a JavaScriptFunction implementation has a doap:download-page property, which is the URL where this file can be found. While presenting limitations, e.g., no dependencies on other files or other external modules can be present, this method proved adequate for the purposes of this thesis. To instantiate such a function, the JavaScript file containing the function has to be downloaded. This is done by performing a HTTP GET request to the URL indicated by doap:download-page. After downloading, the file is optionally saved to disk to avoid having to re-download the file when the function is instantiated multiple times. With the file available locally, the function inside it can be instantiated using Node’s vm.runInThisContext function5, which evaluates the data inside the JavaScript file and thus returns the desired instantiated JavaScript function. The client application that requested this function can subsequently make use of it as it would use any other JavaScript function.

NPM package Apart from simple JavaScript functions, NPM packages are also supported. This support is achieved through Components.js, a dependency injection framework for NPM modules. This is made possible by describing this type of implementation using the Object-Oriented Com- ponents ontology. By describing a module in this ontology, the Components.js library is able to instantiate it. A prerequisite for this is that the described NPM package is available locally. Since, in our use case, a client usually does not know the required package in advance, this prerequisite would impair the main goal of this client library, i.e. enabling at-runtime discovery and invocation of functions. To solve this problem, the library installs the NPM package programmatically using the npm.install function, before instantiating it through Components.js. While this does not re- quire the package itself to be available on the user’s device, it does require NPM to be installed. Afterwards, the instantiated component is returned to the client. Listing 8.5 provides a simple example of how this library can be used.

3https://www.npmjs.com/package/request-promise 4https://www.npmjs.com/package/jsonld 5https://nodejs.org/api/vm.html#vm_vm_runinthiscontext_code_options

33 1 // Import the library and specify FunctionHub server location

2 const fnHub= require('./functionHub-library')('http://example.com/fnhub');

3

4 // Construct a query that will provide us with a function that indents a string

5 const indentQuery={

6 expects:[{ type: 'string'},{ type: 'integer' }],

7 returns:{ type: 'string'},

8 keywords:['indent']

9 };

10

11 // Query the FunctionHub server with the constructed query

12 const queryResult= await fnHub.doQuery(indentQuery);

13

14 // Use the first function that has been returned

15 const func= queryResult[0];

16

17 // Use the first implementation of this function

18 var indentImplementation=(await fnHub.getImplementationsFromFunction(func))[0];

19

20 // Execute this implementation with the parameters 'Hello' and '8'

21 console.log(indentImplementation('Hello',8));

22 // Output: " Hello"

Listing 8.5: An example of using the JavaScript library.

8.2.2 Java Library

The Java library supports the same basic functionality as the JavaScript library: querying functions and instantiating supported implementations. However, the types of implementations the Java li- brary supports does differ from those found in the JavaScript library. The supported implementation types are Java classes and JSON Web APIs.

Querying In the Java library, a Query class is defined. Objects of this query class represent the queries like in Listing 8.1. This class thus allows to create a query to be executed by the Function- Hub server. After creating the query (using the new keyword and the Query class’ constructor), it can be passed to the library’s query function. The library converts this Java object to a JSON object using the Gson6 library and sends it to the server. Just like in the JavaScript library, a JSON-LD document is returned as a result of this query. Since Java is a statically typed language, obtaining a usable Java object requires extracting values from the received JSON-LD document and creating objects from these values. For example, from the JSON-LD document, the function’s name and de- scription are extracted as String objects, while the function’s fnhub:expects values are converted to Parameter objects. In this way, Function, Mapping and Implementation objects are created when receiving a JSON-LD document as the return value of a query request. The properties these classes contain are very similar to those defined for the FunctionHub ontology’s classes. The object ob- tained from this conversion process is a Function object that holds references to available Mapping objects.

Instantiating Java classes Once a Mapping is obtained, the accompanying implementation can be instantiated. This is implemented by the ImplementationHandler class. The method instan-

6https://github.com/google/gson

34 tiateFunctionImplementation accepts a Function and Mapping object as argument and returns a FunctionInstance object. For the JavaClass implementation type, the Java class file is firstly downloaded and compiled. Afterwards, the class is loaded using Java’s URLClassLoader. Through reflection, the method containing the requested functionality is instantiated by using the class and method name which were extracted from the received JSON-LD document. The instantiated Method and Class are subsequently used to create a new LocalFunctionInstance object, which is returned to the caller. The LocalFunctionInstance class implements the FunctionInstance interface, which ensures a uniform API for all supported implementation types.

Instantiating Web API implementations Since a Web API implementation makes use of an on- line service to execute a function instead of relying on locally available code, instantiating a Web API seems like a meaningless process. However, as explained earlier, the goal of this library is to provide uniform access to different types of implementations. Hence, an object conforming to the Function- Instance interface should be available, even for Web APIs. This object is of the WebFunctionInstance class.

Executing the instantiated implementation The FunctionInstance objects that result from in- stantiating an implementation can be used to execute the implementation. For this purpose, the executeFunction method is defined on these objects. This method is implemented differently in LocalFunctionInstance and WebFunctionInstance. In LocalFunctionInstance, executeFunction in- vokes the method that was obtained from instantiating the implementation, while in WebFunction- Instance, it performs an HTTP request to the web service as defined by its Hydra description (see Appendix A for JsonApi and JsonApiMapping). From the result, the correct value is extracted and returned to the caller of the executeFunction method. An example of how the Java library is used can be found in Listing 9.1.

8.3 Web App

Apart from the server and client libraries, a Web App was created. This Web App serves two pur- poses: it serves as a search engine for the functions and implementations available in the Function- Hub, and allows users to add functions and implementations to the system.

8.3.1 Technologies

The Web App is built using Angular7, a popular framework for creating single page web applications. Bootstrap8 was used to create the UI. Using these technologies allows us to create interactive Web applications quickly and efficiently. For working with JSON-LD documents, the jsonld library is used once again.

8.3.2 Search Engine

Previous sections discussed querying the FunctionHub for functions and implementations using a query object. This same approach is used to create the search engine. Creating a search engine for the FunctionHub thus involves building a Web interface for generating such a query object and

7https://angular.io/ 8https://getbootstrap.com/

35 submitting it to the FunctionHub server. Additionally, an interface is built for displaying the obtained results. Two interfaces are available for querying the FunctionHub: a simple search which only performs a keyword search over the functions’ names and descriptions, and an advanced search which in- cludes functionality like searching by parameter type and description or return values. The first of these interfaces is similar to general search engines, while the second is comparable to a search engine specifically created to search through code, like GitHub Search. Building the search interfaces is fairly straightforward: HTML components like text inputs and select drop-down lists are used to obtain the values for a query in a graphical interface (see Fig- ure 9.1). These values are internally represented as a query object through the process of data binding9. Using this technique, a query object like the one in Listing 8.1 is generated. When the user initiates the search, this query is sent to the server using Angular’s HTTP facilities and results are obtained. The results are displayed as a list of functions. Each row contains a function’s name and a list of implementation types available for this function. Clicking on a row reveals more information about this function, like its description, inputs and outputs, and more information about the associated implementations. Screenshots of this functionality are shown in Figure 8.1.

8.3.3 Adding functions and implementations

Adding functions and implementations can also be done through the Web App. To this end, a form similar to the form used to perform a search is presented to the user that desires to add a function to the FunctionHub. The user is then required to fill in all the details concerning this function (see Figure 8.2). Optionally, an implementation can be added to this function. Again, data binding is used to create a JavaScript object containing all the user input. Upon submitting, a JSON-LD document is generated from this input that describes the function and optionally added implementation using the FunctionHub ontology. The document is akin to the description of Listing 6.1, although represented in JSON-LD instead of Turtle. Subsequently, this document is sent to the FunctionHub server, which results in the description being added to the database.

9https://angular.io/guide/template-syntax#ngmodel---two-way-binding-to-form-elements-with-ngmodel

36 (a) List of available functions that are the result of the query specified in Figure 9.1.

(b) Details view of the “Indent string” function.

Figure 8.1: Function search results.

37 Figure 8.2: An example of adding a function and implementation to the FunctionHub using the web interface. In this case the left-pad function is added, including a JavaScript implementation.

38 Chapter 9

Evaluation

This chapter discusses the evaluation of the FunctionHub system. In chapter 3, research questions and hypotheses were derived from the statements in section 1.1. Use cases were derived from these hypotheses and presented in chapter 5. This evaluation is objective-based: testing the hypotheses involves evaluating FunctionHub’s ability to support these use cases and the performance of the system [34]. Table 9.1 provides an overview of the hypotheses and use cases evaluated in this section.

Table 9.1: Table depicting which use cases relate to which hypotheses. Hypothesis 4 not included since it does not have an associated use case.

Hypothesis 1 Hypothesis 2 Hypothesis 3 Method of evaluation

Comparison with other code search Use Case 1  engines Comparison with other package Use Case 2  managers Providing code examples and run- Use Case 3  time experiment Providing a proof-of-concept imple- Use Case 4  mentation

9.1 Improving search with Linked Data

The first hypothesis derived in chapter 3 states that describing functions with Linked Data enables more precise search capabilities. To test this hypothesis and demonstrate that this functionality can be supported by the FunctionHub, the following sections discuss the use cases from section 5.1 and section 5.2, which describe a search engine for functions and implementations, and a package manager respectively.

9.1.1 Search Engine

Using the FunctionHub system, we created a search engine that does not search the code itself, but rather the functions of which the code is an implementation. This method allows for filtering options at the function-level, which is not possible in other web-based code search tools.

39 Table 9.2: Feature table comparing the FunctionHub search engine to GitHub and searchcode

GitHub SearchCode.com FH Search Built upon Linked Data    Find open-source code    Filter by programming language    Search inside of implementations    Find web services    Filter functions by input/output types & description    Find multiple implementations of a function    Find functions from multiple locations    Filter by source    Filter by repository   

Table 9.2 compares the features of our implementation to two other web-based code search tools: GitHub search, and searchcode.com. This table is not exhaustive. It includes the most important features of the FunctionHub search engine and some notable drawbacks. This table was built by visiting the tools’ query interfaces and inspecting the features they offer1. The tools included in this table were discussed earlier in section 2.4. Comparing these tools to the FunctionHub search engine, the biggest difference lies in the core mechanic of the search: GitHub and searchcode both search through the source code itself, while our search engine searches the function descriptions. This difference in approach affects the features these tools can provide. FunctionHub search has the ability to search by input and/or output data type and description, because these are explicitly described using the FunctionHub ontology (see chapter 6). GitHub Search and searchcode do not offer this functionality in their filtering options. One might use regular expressions or other advanced keyword search techniques to mimic the FunctionHub functionality, but this process requires some trial-and-error and is not generalizable to multiple implementation types. Additionally, some programming languages, like JavaScript, do not specify data types in function signatures, which makes filtering by type impossible in traditional code search tools. Another advantage of using function descriptions is the ability to search for implementations that are not in the form of code. Functionality can be provided by web services, an implementation type supported by the FunctionHub. Using FunctionHub search, these services’ associated function descriptions can be searched, allowing users to find these services as an implementation. Because one function can be implemented by a multitude of implementations, FunctionHub search can find multiple implementations that meet the desired properties, even if only one function meets these properties. The distributed nature of the FunctionHub allows searching for implementations from anywhere on the web. Searchcode includes a number of popular code repositories as sources for its search, while GitHub only offers searching through code on its own platform. Searchcode offers some functionality that the FunctionHub does not support at this moment. Filtering by source and repository are examples of such functionality. FunctionHub search offers some valuable functionality that searchcode.com and GitHub Code Search do not offer. To demonstrate this, we attempted to do a query to find all functions with a string as input and output, once using these existing search tools and once using FunctionHub

1These interfaces can be found on https://searchcode.com and https://github.com/search/advanced for search- code.com and GitHub Code Search respectively.

40 Figure 9.1: Searching for all functions with string input and string output. search. We observed that searchcode.com and GitHub Code Search offer no filters to support this query, while the FunctionHub does (see Figure 9.1). Hence, the FunctionHub based search engine offers functionality that existing tools do not. This supports our hypothesis that using Linked Data can improve the experience of searching for functions on the web.

9.1.2 Package Manager

As was discussed in section 2.6, a package manager is an existing solution to the problem of reusing code. Much like the FunctionHub, its aim is to provide a uniform interface for using code from the web. Because of this overlap in goals, it is possible to provide similar functionality in the Function- Hub. A proof-of-concept command-line interface implementation of a package manager using the Func- tionHub, fnHubCLI, was created. The implementation provides the most important features of a package manager: searching for implementations in the central repository, and installing these implementations locally. To test hypothesis 1, a feature was added that demonstrates the system’s ability to improve upon current package manager systems. This feature is the ability to search for implementations with type and programming language constraints (e.g., a user can search for JavaScript implementations having two floating numbers as input parameters and one string as output parameter) in addition to keywords search. This is an improvement upon current package manager systems, which are usually limited to searching by keywords only. fnHubCLI was implemented in Node.js using the Node.js library described in subsection 8.2.1. All types of software implementations that are included in the FunctionHub ontology are supported. In Table 9.3, we compare fnHubCLI to existing package managers Maven and NPM. Once again, this table was built by listing the most important features of our implementation and some notable

41 missing features. This package manager was designed as a method to test hypothesis 1, thus the emphasis was on finding packages using this CLI. As the table demonstrates, fnHubCLI offers more advanced search capabilities than the other package managers. Additionally, it can find web services instead of only software implementations. These can obviously not be downloaded to disk, but it might be interesting for a developer to be aware of the fact that these services exist. As the emphasis for this use case is on search, no support for “tasks” (analogues for, e.g., npm start, npm build, mvn run. . . ) was added. Another feature most other package managers offer is the ability to manage package versions. This means the developer can specify which version of the package to use when installing it. This limitation exists because the FunctionHub ontology does not support describing version information at this point. Despite these limitations, improvements over other package managers can be observed in terms of search capabilities: both NPM and pip are package managers that offer search in their CLI, however, this search only supports searching the packages’ names and descriptions and, in the case of NPM, package maintainers’ name2. This, again, supports our hypothesis. As an example, we again search for functions and implementations with string input and string output, this time using the prototype package manager. For doing so, we use the command fnHubCLI -find -e string -r string. This command specifies that we want to search for functions which expect a string as input and return a string as output. The result of this command is as follows:

Found 2 functions: Name: Indent string Description: Indent each line in a string. Implementation URIs: http://example.com/indentStringImplementationNPM

Name: left-pad Description: String left pad: add spaces to the left of a string until the length

,→ of the string is the specified amount. Implementation URIs: http://example.com/leftpadImplementationNPM,

,→ http://example.com/leftpadImplementationJavaScript

The package manager has thus found two functions that meet the specified conditions and dis- plays some basic information about these functions. Additionally, the URIs of the associated imple- mentations are provided.

9.1.3 Result

By creating the FunctionHub search engine and fnHubCLI, and subsequently comparing these im- plementations to existing work, we demonstrated advantages from using Linked Data in this context. Both use cases benefit from the improved search capabilities that were made possible by the system. Therefore, hypothesis 1 can be accepted.

2See https://pip.pypa.io/en/stable/reference/pip_search/ and https://docs.npmjs.com/cli/search for the rel- evant documentation

42 Table 9.3: Feature table comparing the fnHubCLI to NPM and Maven

NPM Maven fnHubCLI Built upon Linked Data    Search for packages using keywords    Search for packages by programming language    Search for packages by type signature    Search for web services    Download software packages to disk    Tasks    Versioning support   

9.2 Abstracted Function Processing

Hypothesis 2 states that linking abstract function descriptions with specific implementations en- ables possibilities for abstracted function processing. As to evaluate this hypothesis, we provide a small code example that demonstrates how to use the FunctionHub as a tool to use two different implementations of the same function interchangeably, as well as a prototype implementation of the use case from section 5.3, which allows us to evaluate the advantages of such a system.

9.2.1 Code example

Listing 9.1 illustrates the FunctionHub’s ability to abstract the execution of a function away from its implementation. In this example, one query is executed which results in one function with two implementations: the first is a Java method, the second is a web service. As both implementations are of the same function, the output should be the same for the same inputs. Both implementations are instantiated as FunctionInstance objects, which are both executed by calling the executeFunc- tion method on this object. Thus, a developer does not need to differentiate or take into account the type of implementation in their code. The abstraction allows uniform access to any implementation supported by the used client library.

9.2.2 Context-dependent implementations

By abstracting functions from their implementations, the implementations become loosely coupled to their respective functions. This means that, with some effort, the execution of a function can be abstracted away from its specific implementations. The FunctionHub client libraries support such abstraction. This abstraction allows users to query the FunctionHub for functions, and execute this function using any of the supported implementations interchangeably. For example, in a Java program, the same function could be executed by an implementation as a Java method, as well as by a Web API. This functionality could be useful in situations where the most appropriate implementation de- pends on the current circumstances, as was discussed in section 5.3. To evaluate this hypothesis, an experiment was conducted in which the performance of dynami- cally switching the implementation by circumstances was compared to the performance of a tradi- tional program where the implementation remains constant. To set a baseline, the average runtime is measured of executing two ‘Hello World’ implementa- tions 10 times. The Hello World function is quite simple, it simply returns the string “Hello World!”.

43 1 // Create a query for finding a local function.

2 // The keyword "hello world" is added to the query,

3 // no filters for inputs, output or problem are added,

4 // hence the "null" values in the other fields.

5 Query query= new Query(new String[]{"hello world"}, null, null, null, ,→ ImplementationType.LOCAL);

6

7 // Querying the FunctionHub and receiving an array of functions,

8 // retrieve the first result and assign it to the "function" variable.

9 Function function= fnServer.query(query)[0];

10

11 // Get the first implementation associated with this function

12 // (a Java method in this case, because of the "LOCAL" query), and instantiate it ,→ through the ImplementationHandler.

13 FunctionInstance javaMethod= ,→ ImplementationHandler.instantiateFunctionImplementation(function, ,→ function.implementationMappings[0]);

14

15 // Create a query for finding a Web API function.

16 // Other values stay the same.

17 query= new Query(new String[]{"hello world"}, null, null, null, ,→ ImplementationType.WEB_API);

18

19 // Querying the FunctionHub and receiving an array of functions,

20 // retrieve the first result and assign it to the "function" variable.

21 function= fnServer.query(query)[0];

22

23 // Get the first implementation associated with this function

24 // (a web service, because of the "WEB_API" query), and instantiate it through the ,→ ImplementationHandler.

25 FunctionInstance webService= ,→ ImplementationHandler.instantiateFunctionImplementation(function, ,→ function.implementationMappings[0]);

26

27 // Execute both functions and print the result.

28 System.out.println("The result of the Java method is: "+ ,→ javaMethod.executeFunction());

29 System.out.println("The result of the web service is: "+ ,→ webService.executeFunction());

30

31 /* 32 OUTPUT:

33 The result of the Java method is: Hello world!

34 The result of the web service is: Hello world! 35 */

Listing 9.1: Example code in Java.

44 Table 9.4: Results of executing the “Hello World” function as Java method and as Web service implementations

Average runtime (in msecs) Java Method Web Service including instantiation (no caching) 130.09 23.39 including instantiation (with caching) 4.82 23.36 excluding instantiation 0.02 23.31

The tested implementations are of the type Java method and Web service. Both are acquired and instantiated using the FunctionHub’s Java library. The Web service is implemented by Node.js in combination with Express. While, during testing, this web server, the FunctionHub server, and client are running on the same machine, the web server artificially adds 20 milliseconds to the execution time to simulate a remote web service. Three measurements were conducted for each implementation: runtime including instantiation, once with caching of local implementations and once with caching disabled, and the runtime excluding the instantiation. Without caching, the Java class is retrieved, compiled and loaded each time the function is instantiated. With caching, the compiled Java class is saved on the user’s device, hence only the loading step happens in this case. The runtime excluding instantiation is relevant when a function is instantiated once, but invoked several times. Table 9.4 shows the results of these measurements. As can be expected, without caching, invoking a (fast) remote Web service is faster than down- loading, compiling, loading and invoking a Java method. However, when the Java method has been used once, and has thus been cached, it becomes significantly faster to use the locally saved imple- mentation, even when loading the class before each invocation of the function. When an instantiated function is reused during the lifespan of an application, the time necessary to execute the ‘Hello World’ Java implementation is negligible. This table also demonstrates that, since no actual processing is necessary to ‘instantiate’ a Web service, this process has a negligible effect on the runtime. With this baseline set, we can test the case of a more realistic scenario. For this evaluation, we simulated a function that was implemented efficiently in the Web API, but is fairly slow as a Java method. We could imagine this function to be a complicated task, like image recognition or speech synthesis, which benefit greatly from running on a highly optimized cloud infrastructure, and thus is more difficult to implement efficiently on consumer hardware. A disadvantage to using web services is that their response times may vary greatly, depending on the load at that point in time, which might cancel out the more efficient implementation. Additionally, a user’s connection to the web service might be rather poor, which could cause networking delays to be a cause for slow execution speed. An implementation on the user’s hardware does not suffer from these disadvantages, but might be slower when conditions are optimal for the web service. Clearly, the optimal (fastest) implementation depends on the current circumstances. To represent this scenario, we simulated a web service of which the execution time varies, while the execution time of an on-device implementation remains constant. Figure 9.2 illustrates the results of this experiment in which a function is implemented as both a Java method, and a Web API. The implementations in this experiment have a runtime of 5 seconds and 0.5 seconds for the Java method and Web service, respectively. As can be seen in the graph, the execution time of the Web service varies from almost zero to about 12 seconds. The Java method implementation, meanwhile, remains constant around 5 seconds. Using the FunctionHub library, we can easily switch implementations when it is noticed

45 14

12

10

8

6

4 Execution(seconds)time

2

0 0 1 2 3 4 5 6 7 8 9 10 11 12 Web Service latency (seconds)

Java Method Web Service Hybrid

Figure 9.2: Comparing dynamically switching implementations to static implementations. that using a local implementation would be faster than using the Web service. For this experiment, this would be at the point when the Web service’s execution time exceeds 5 seconds. A client that automatically switches at this point is depicted in the graph as the “Hybrid” approach. It is apparent that this approach minimizes the overall execution time, as it follows closely the execution time of the fastest implementation at each given point.

9.2.3 Result

This section proves the possibilities for function abstraction to be used to the user’s advantage, as well as the ability to implement these advantages using the FunctionHub system. These observa- tions allow us to accept hypothesis 2.

9.3 Solving Linked Data problems

Hypothesis 3 concerns the added value of connecting functions (and by extension, their implemen- tations) to Linked Data problems, this was discussed in section 5.4. To test this hypotheses, we firstly demonstrate some examples of queries that relate to this hypothesis and how they can be used. Afterwards, we discuss the use of these types of queries in a prototype implementation of the use case discussed in section 5.4.

9.3.1 Queries

To solve Linked Data problems with functions, problems have to be linked to functions solving them. The FunctionHub ontology supports this, which allows creating queries that request functions able to solve these particular problems. An example of such a query is included in Listing 9.2. This query requests functions that are able to convert two inputs, a population density and an area, into an

46 1 {

2 "solves":{

3 "input":["dbpedia:populationDensity", "dbpedia:PopulatedPlace/area"],

4 "output": "dbpedia:populationTotal"

5 }

6 }

Listing 9.2: Example query (as JSON)

inferencing by FunctionHub

`

Figure 9.3: Screenshot of the FunctionHub inferencer proof-of-concept implementation. Missing data is inferred from a Turtle file containing dbpedia:Country nodes.

output: the total population of that area. This might be solvable by multiple functions and each of these functions might have multiple implementations. In the next section, we discuss a prototype application that can generate such queries and can use the information resulting from sending the query to the FunctionHub to provide users with a solution of the problem.

9.3.2 Linked Data inferencer

To evaluate this use case, a prototype Linked Data inferencer was created that uses the FunctionHub for finding functions that can transform data. The application receives Linked Data input which might have missing data. If the missing data can be derived from the other triples, the application requests a function that is able to do so from the FunctionHub. This way, the FunctionHub is used to generate knowledge from existing knowledge, like the use case in section 5.4 describes. The prototype offers a graphical interface to display the results and request additional user input if necessary. For example, the user might need to decide which function or implementation to use if multiple results were found. Screenshots of this prototype can be seen in Figure 9.3. Additionally, a screencast is available at

http://users.ugent.be/~lnoterma/discovering-using-functions/inferencer.mp4. Each row corresponds to a

47 node in an RDF document. In this case, each node is of the type dbpedia:Country. They have properties populationDensity, area, and populationTotal. Some values, however, are miss- ing. Through the use of the functions and implementations in the FunctionHub system, the miss- ing data is inferred by querying for and executing the totalPopulation and populationDensity functions available in the FunctionHub. These functions were found by querying functions which “solve” the required problems. In this case, two problems needed to be solved: the problem of acquiring dbpedia:populationDensity from the other available properties, and acquiring dbpe- dia:populationTotal from the other available properties.

9.3.3 Result

From the results obtained in this section, we can conclude that the FunctionHub is able to assist automation of Linked Data processing. Therefore, hypothesis 3 can be accepted.

9.4 Distribution of storage and responsibility

The fourth and last hypothesis states that following the Linked Data principles enables distribution of storage and responsibility. For the FunctionHub, this concept is mostly found in the fact that implementations are not stored on the FunctionHub server itself, but somewhere else on the web. In combination with functions possibly having multiple implementations, this means that a function can still be found and executed even when an implementation becomes unavailable (e.g., web service down or code made private on GitHub). Additionally, by using technology standards (e.g., SPARQL, JSON-LD), we assure that the system is open and can be integrated with other systems.

9.4.1 Implementation redundancy

As noted in section 1.1, the way the NPM package manager works can cause problems when highly- used packages get unpublished from the platform. When packages are no longer available through the package manager, other packages that depend on this package will fail to operate correctly. Additionally, when development for a package stops, external factors (like changes to a runtime or programming language, or to other packages) could cause these packages to not function correctly anymore. For highly popular packages, the effects of a package being unpublished or breaking can cause a snowball effect in which many other developers are affected. While NPM has taken countermeasures to prevent such catastrophic failure in the future [35], it remains a problem with no real solution. By utilizing the FunctionHub system, such problems could be mitigated somewhat, as developers can specify the function their project depends on, instead of the implementation (i.e.,the package). By doing this, developers do not rely on one implementation working correctly, but rather on any of the implementations linked to that function working correctly. Applying such a system in NPM could have prevented the left-pad incident, provided the left- pad function had multiple implementations: when one developer unpublished their package, other developer’s left-pad implementations could have fulfilled the role of the missing one. Listing 9.3 shows the description of the left-pad function using FnO. No specific implementation is described in this listing. Listings 9.4 and 9.5 show descriptions of implementations of the left-pad function as an NPM package and as a JavaScript function, respectively, as well as the mappings connecting function and implementation description.

48 1 :leftpad

2 a fno:Function;

3 dc:description "String left pad: add spaces to the left of a string until the ,→ length of the string is the specified amount."^^xsd:string ;

4 fno:expects(:leftpadParam1:leftpadParam2);

5 fno:name "left-pad"^^xsd:string ;

6 fno:returns:leftpadOutput.

7

8 :leftpadParam1

9 a fno:Parameter;

10 fno:predicate[

11 dc:description "The string to be padded."^^xsd:string ;

12 fno:type xsd:string

13 ];

14 fno:required true.

15

16 :leftpadParam2

17 a fno:Parameter;

18 fno:predicate[

19 dc:description "The amount of characters the output should have."^^xsd:string ,→ ;

20 fno:type xsd:integer

21 ];

22 fno:required true.

23

24 :leftpadOutput

25 a fno:Parameter;

26 fno:predicate[

27 dc:description "The output of the left pad operation."^^xsd:string ;

28 fno:type xsd:string

29 ];

30 fno:required true.

Listing 9.3: The left-pad function described using FnO

49 1 :leftpadImplementationNPM

2 a oo:Module;

3 a fnhub:NpmPackage;

4 doap:name "left-pad";

5 oo:component:leftpadImplementationNPMLeftpadFunction.

6

7 :leftpadImplementationNPMLeftpadFunction

8 a oo:ComponentInstance;

9 rdfs:comment "This component is the function that pads the string to the ,→ left."^^xsd:string .

10

11 :mappingLeftpadLeftpadImplementationNPM

12 a fnhub:NpmPackageMapping;

13 fnhub:function:leftpad;

14 fnhub:implementation:leftpadImplementationNPM;

15 fnhub:parameterMapping[

16 fnhub:functionParameter:leftpadParam1;

17 fnhub:implementationParameterPosition1

18 ],[

19 fnhub:functionParameter:leftpadParam2;

20 fnhub:implementationParameterPosition2

21 ].

Listing 9.4: The NPM left-pad package described using the FunctionHub and Object-Oriented Com- ponents ontology, mapping included

1 :leftpadImplementationJavaScript

2 a fnhub:JavaScriptFunction;

3 doap:download-page "http://localhost:4000/implementations/js/leftpad.js".

4

5 :mappingLeftpadLeftpadImplementationJavaScript

6 a fnhub:JavaScriptFunctionMapping;

7 fnhub:function:leftpad;

8 fnhub:implementation:leftpadImplementationJavaScript;

9 fnhub:parameterMapping[

10 fnhub:functionParameter:leftpadParam1;

11 fnhub:implementationParameterPosition1

12 ],[

13 fnhub:functionParameter:leftpadParam2;

14 fnhub:implementationParameterPosition2

15 ].

Listing 9.5: A JavaScript left-pad implementation described using the FunctionHub ontology, map- ping included

50 Using the function description as a dependency instead of one of these two specific implemen- tations, allows a package manager to make use of either of these implementations to provide the desired functionality. This, in turn, creates a redundancy in functionality, which mitigates the de- scribed problem. An analogous problem is when projects depend on a web service for certain functionality: when the web service becomes unavailable, the project might not function correctly. By depending on the function the web service provides instead of on the web service itself, the FunctionHub can provide one or several alternatives for the web service which might be unavailable.

9.4.2 Result

In this section, we demonstrated the advantages of using distributed implementations. We deter- mined that these advantages can alleviate the issues which led to hypothesis 4. Therefore, we accept hypothesis 4.

9.5 Discussion

Using the FunctionHub ontology, we created a system that is able to provide functionality that helps to improve a number of different use cases. For a function search engine, the FunctionHub system provides advanced filtering options across programming languages and implementations, and searching for multiple implementations of the same function. For a package manager, the system provides similar improvements, which allows developers to find a suitable software package more precisely than existing solutions. Also for developers, the FunctionHub can be used as a platform for at-runtime discovery and execution of implementations for desired functionality. We proved that by using a different imple- mentation depending on the circumstances, improvements in execution speed can be achieved in certain situations. Switching implementations can be done with little effort if the FunctionHub is used to provide the desired functionality. Lastly, redundancy in implementations for functions is possible, which makes the system robust and avoids a single point of failure for implementations. In this chapter, we have concluded that all four hypotheses can be accepted. The FunctionHub system is thus a possible solution for the problems stated in section 1.1.

51 Chapter 10

Conclusion

In the following sections, we form a conclusion about the research that was conducted in this dis- sertation. Furthermore, we discuss possible further improvements to the FunctionHub ontology and application.

10.1 Conclusion

In this dissertation, we provided a solution for improving the current situation of discovering and using functions on the web using Semantic Web technologies. The FunctionHub ontology allows us to connect abstract functions to specific implementations, which is essential for reaching the goal of abstracted and generalized execution of functions in Semantic Web environments. Using the ontology to create the FunctionHub server and client libraries, this goal has been reached. Evaluation demonstrated numerous improvements over current solutions for searching for and managing external implementations. It showcases a new method of invoking different types of implementations using a uniform interface. This method can be used to achieve speed gains in applications and enables inferencing new knowledge from Linked Data. The FunctionHub achieved these results through the use of Linked Data and existing Semantic Web technologies, such as SPARQL and JSON-LD. This marks the openness and distributed nature of the FunctionHub, which allows others to make use of and improve upon the system in its current form. With this thesis and through a proof of concept application, we demonstrated that abstracted function processing is made possible by using Semantic Web technologies. This system has potential to offer advantages to developers and end users, as well as the Semantic Web in general. Develop- ers can make use of the system for reusing existing implementations, without worrying about the details of this implementation. End users might see speed improvements through the improvements introduced by dynamically switching implementations depending on the current environment. Fi- nally, abstracted function processing brings us closer to a future where intelligent agents can not only understand the data on the Semantic Web, but act upon it using these functions.

52 10.2 Future Work

Although the FunctionHub reached the goals of providing a solution to the identified problems, further research and development is necessary to reach the state of a stable and user-friendly expe- rience. Some of the current problems and limitations are identified in this section.

10.2.1 Versions

Currently, the FunctionHub ontology lacks the concept of ‘versions’. When using the FunctionHub for, e.g., a package manager, this concept should ideally be added. Without a versioning system, it is not possible to create reproducible programs, since implementations can get updated to e.g. fix bugs or add more features, but the updated implementations might be incompatible with the existing code. In [20], research had been done to create reproducible software in the context of NPM pack- ages. A similar system could be added to the FunctionHub to support versions and thus create reproducible software. In this context, adding versioning is mainly an engineering problem, rather than a research problem.

10.2.2 Validation

In this thesis, we built a prototype implementation that assumes that the functions and implemen- tations that are added to it are correct. However, we can’t generally assume that, e.g., the imple- mentation are always free of bugs or are doing what their description implies. Further research should be done on how to validate the information that users add to the system. An example would be to allow users to report functions and implementations that do not work correctly or have other problems to maintain a repository of qualitative implementations. Because functions posted to the FunctionHub have a semantic description, machines can reason about them. Thus, it might be possible to ‘test’ an implementation for validity before adding it to the repository. For example: if two implementations are posted for the same function, we would expect that the output of the functions on the same inputs are the same. The system could do an equality check on the outputs of the functions and if they do not match, one of the two implementations might be wrong or contain a bug. To ensure that the created descriptions are valid, we could make use of Shapes Constraint Lan- guage (SHACL), a language designed to validate RDF graphs against a set of constraints [36]. Further work is necessary to research the feasibility and effectiveness of such systems in the FunctionHub.

10.2.3 Complex data types

In the scope of this thesis, the FunctionHub’s functions are limited to input and output type that are primitive. These are the most basic types in programming languages, e.g., float, int, string. This limitation prevents us from supporting functions that make use of more complex data structures in their inputs and outputs. These could be standard structures like arrays or trees, but also custom ones, like C structs or Java objects. Research has been done on methods to use custom datatypes in RDF and SPARQL, e.g., [37]. However, further research is necessary to determine how to integrate these datatypes in the Func- tionHub’s ontology and process these in the client libraries.

53 10.2.4 Distributed descriptions

In the current system, while implementations themselves are distributed, the function and imple- mentation descriptions are stored in a single database and accessible through a single entrypoint. To ensure better availability and preventing a single point of failure, distribution of these descrip- tions is desirable. Distributing control over the system over multiple entities additionally ensures that it remains open and allows these entities to extend and improve it. Being distributed on the web is one of the core principles of Linked Data. However, a SPARQL endpoint is not distributed as it only provides access to the graphs stored on the SPARQL endpoint’s server. Previous research has been done to provide a solution to allow queries to be executed over multiple datasources that are distributed on the web. Examples of such research are [38] and [39]. [38] presents DARQ, which is a query engine for federated SPARQL queries. [39] presents an approach to execute SPARQL queries over the Web of Linked Data. Incorporating such systems with the FunctionHub would mean that not only implementations themselves are distributed, but also their descriptions and the descrip- tions of functions. This would eliminate the problem of single point of failure and encourage more openness.

10.2.5 Security

For the prototype designed in this work, security was not taken into account. However, downloading and executing unknown code from the web can be extremely harmful. To address this issue, at least two important measures should be taken. Firstly, a system should be in place that ensures that the implementations added to the FunctionHub do not contain malware. Performing a malware scan on the implementations before making them available through the FunctionHub could be a solution. However, as no malware scanner is infallible, a system of reporting malicious behaviour might also be implemented, which could disable the implementation until it is found to be safe to use. Secondly, we should make sure the implementation that is downloaded has not been tampered with. This could be implemented using digital signatures, which could be added to implementation descriptions. By adding these signatures, clients can verify that the downloaded implementation has not been tampered with on the server or during transport between the server and client.

10.2.6 Non-exact matches

The current search system expects exact queries. Functions that match only part of the description are not returned. A more powerful system might support non-exact matches in order to provide results even if there are typing errors in keywords, or if the datatypes requested as inputs or output is not exactly equal to the datatypes in available functions. For example: a function accepting floating point numbers as inputs would most likely function exactly the same when integers are used as inputs. In the current system, floats and ints are treated as totally distinct types.

54 Bibliography

[1] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” vol. 284, no. 5, pp. 28–37.

[2] Azer Koçulu. I’ve Just Liberated My Modules. [Online]. Available: http://azer.bike/journal/ i-ve-just-liberated-my-modules

[3] Danny Ayers, Max Völkel, Leo Sauermann, and Richard Cyganiak, “Cool URIs for the Semantic Web.” [Online]. Available: https://www.w3.org/TR/cooluris/

[4] G. Schreiber and Y. Raimond, “RDF 1.1 Primer.” [Online]. Available: https://www.w3.org/TR/ rdf11-primer/

[5] RDF 1.1 Concepts and Abstract Syntax. [Online]. Available: https://www.w3.org/TR/ rdf11-concepts/

[6] Fabien Gandon and Guus Schreiber, “RDF 1.1 XML Syntax.” [Online]. Available: https://www.w3.org/TR/rdf-syntax-grammar/

[7] David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers, “RDF 1.1 Turtle.” [Online]. Available: https://www.w3.org/TR/turtle/

[8] Tim Berners-Lee and Dan Connolly, “Notation3 (N3): A readable RDF syntax.” [Online]. Available: https://www.w3.org/TeamSubmission/n3/

[9] David Beckett, “RDF 1.1 N-Triples.” [Online]. Available: https://www.w3.org/TR/n-triples/

[10] Manu Sporny, Dave Longley, Gregg Kellogg, Markus Lanthaler, and Niklas Lindström, “JSON-LD 1.0.” [Online]. Available: https://www.w3.org/TR/json-ld

[11] Lee Feigenbaum. SPARQL By Example. [Online]. Available: https://www.w3.org/2009/Talks/ 0615-qbe/

[12] Roberto Chinnici, Jean-Jacques Moreau, Arthur Ryman, and Sanjiva Weerawarana, “Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language.” [Online]. Available: https://www.w3.org/TR/wsdl20/

[13] Marc Hadley, “Web Application Description Language.” [Online]. Available: https: //www.w3.org/Submission/wadl/

[14] S. A. McIlraith, T. C. Son, and H. Zeng, “Semantic Web Services,” vol. 16, no. 2, pp. 46–53. [Online]. Available: http://dx.doi.org/10.1109/5254.920599

[15] OWL-S: Semantic Markup for Web Services. [Online]. Available: https://www.w3.org/ Submission/OWL-S/

55 [16] R. Verborgh, T. Steiner, D. Van Deursen, S. Coppens, J. G. Vallés, and R. Van de Walle, “Functional Descriptions As the Bridge Between Hypermedia APIs and the Semantic Web,” in Proceedings of the Third International Workshop on RESTful Design, ser. WS-REST ’12. ACM, pp. 33–40. [Online]. Available: http://doi.acm.org/10.1145/2307819.2307828

[17] R. Verborgh, T. Steiner, D. Van Deursen, J. De Roo, R. Van de Walle, and J. Gabarró, “Capturing the functionality of Web services with functional descriptions.”

[18] M. Lanthaler and C. Guetl, “Hydra: A Vocabulary for Hypermedia-Driven Web APIs,” vol. 996.

[19] R. Gardler. Project Catalogues and Project Descriptors using DOAP. [Online]. Available: http://oss-watch.ac.uk/resources/doap

[20] R. Taelman, J. Van Herwegen, S. Capadisli, and R. Verborgh, “Repro- ducible software experiments through semantic configurations.” [Online]. Available: https://linkedsoftwaredependencies.org/articles/reproducibility/

[21] B. De Meester, A. Dimou, R. Verborgh, and E. Mannens, “An ontology to semantically declare and describe functions,” in International Semantic Web Conference. Springer, pp. 46–49. [Online]. Available: http://link.springer.com/chapter/10.1007/978-3-319-47602-5_10

[22] M. Atzeni and M. Atzori, “CodeOntology: RDF-ization of Source Code,” in International Semantic Web Conference. [Online]. Available: https://iswc2017.ai.wu.ac.at/wp-content/ uploads/2017/10/271.pdf

[23] C. Debruyne and D. O’Sullivan, “R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings.” in LDOW@ WWW. [Online]. Available: https://pdfs.semanticscholar.org/ b4f5/6a6c75bfb97b751acd90ca840fc2ae892e70.pdf

[24] B. Regalia, K. Janowicz, and S. Gao, “VOLT: A provenance-producing, transparent SPARQL proxy for the on-demand computation of linked data and its application to spatiotemporally dependent data,” in International Semantic Web Conference. Springer, pp. 523–538. [Online]. Available: http://link.springer.com/chapter/10.1007/978-3-319-34129-3_32

[25] B. De Meester, W. Maroy, A. Dimou, R. Verborgh, and E. Mannens, “Declarative Data Transformations for Linked Data Generation: The Case of DBpedia,” in The Semantic Web, ser. Lecture Notes in Computer Science. Springer, Cham, pp. 33–48. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-58451-5_3

[26] M. de Jonge, “Package-based software development,” in 2003 Proceedings 29th Euromicro Conference, pp. 76–85.

[27] I. Garcia-Contreras, J. F. Morales, and M. V. Hermenegildo, “Semantic code browsing,” vol. 16, pp. 721–737, wOS:000386589800014.

[28] S. P. Reiss, “Semantics-based Code Search,” in Proceedings of the 31st International Conference on Software Engineering, ser. ICSE ’09. IEEE Computer Society, pp. 243–253. [Online]. Available: http://dx.doi.org/10.1109/ICSE.2009.5070525

[29] N. Mitchell, “Hoogle overview,” vol. 12, pp. 27–35.

[30] R. Fielding and J. Reschke. Transfer Protocol (HTTP/1.1): Semantics and Content. [Online]. Available: https://tools.ietf.org/html/rfc7231#section-4

56 [31] B. De Meester, A. Dimou, R. Verborgh, E. Mannens, and R. Van de Walle, “Discovering and using functions via content negotiation,” in Proceedings of the 15th International Semantic Web Conference: Posters and Demos. CEUR-WS, pp. 1–4. [Online]. Available: http://hdl.handle.net/1854/LU-8133280

[32] Tim Berners-Lee. Linked Data - Design Issues. [Online]. Available: https://www.w3.org/ DesignIssues/LinkedData.html

[33] G. Atemezing, “Benchmarking Commercial RDF Stores with Publications Office Dataset,” p. 15.

[34] D. Stufflebeam, “Evaluation Models,” vol. 2001, no. 89, pp. 7–98. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ev.3

[35] Isaac Z. Schlueter. Kik, left-pad, and npm. [Online]. Available: https://blog.npmjs.org/post/ 141577284765/kik-left-pad-and-npm

[36] Holger Knublauch and Dimitris Kontokostas, “Shapes Constraint Language (SHACL).” [Online]. Available: https://www.w3.org/TR/shacl/

[37] M. Lefrançois and A. Zimmermann, “Supporting Arbitrary Custom Datatypes in RDF and SPARQL,” in The Semantic Web. Latest Advances and New Domains, ser. Lecture Notes in Computer Science. Springer, Cham, pp. 371–386. [Online]. Available: https: //link.springer.com/chapter/10.1007/978-3-319-34129-3_23

[38] B. Quilitz and U. Leser, “Querying Distributed RDF Data Sources with SPARQL,” in The Semantic Web: Research and Applications, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 524–538. [Online]. Available: https://link.springer.com/chapter/10. 1007/978-3-540-68234-9_39

[39] O. Hartig, C. Bizer, and J.-C. Freytag, “Executing SPARQL Queries over the Web of Linked Data,” in The Semantic Web - ISWC 2009, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 293–309. [Online]. Available: https://link.springer.com/chapter/10. 1007/978-3-642-04930-9_19

57 Appendix A

FunctionHub Ontology Reference

A.1 Function, Parameter and Output

Function, Parameter and Output are classes from the ontology FnO. More information about this ontology and its classes can be found in [21] and on https://fno.io/.

A.2 Problem

Problem is a class from FnO, however, additions have been done to this class for the purpose of this thesis. These additions can be found in the following table:

Predicate Range Description fnhub:input rdf:List List of this problem’s input datatypes fnhub:output rdfs:Datatype This problem’s output datatype

A.3 Implementation

A.3.1 JavaClass

Predicate Range Description doap:download-page xsd:string The URL at which the Java class can be downloaded fnhub:class-name xsd:string Java class name

A.3.2 JavaScriptFunction

Property Range Description doap:download-page xsd:string The URL at which the JavaScript function can be down- loaded (file contents: exports = function(...){...})

A.3.3 NpmPackage

The NpmPackage class is equivalent to the Module class of the Object-Oriented Components on- tology. For the available predicates for this class, see http://componentsjs.readthedocs.io/en/ latest/configuration/modules/.

58 A.3.4 JsonApi

The JsonApi class is equivalent to the ApiDocumentation class of the Hydra ontology. For the avail- able predicates for this class, see https://www.hydra-cg.com/spec/latest/core/.

A.4 Mapping

The following table contains the properties that are common to all Mapping classes. Properties that are specific to a subclass of Mapping are specified in this section’s subsections.

Property Range Description fnhub:function fno:Function The function that is connected to an im- plementation in this mapping fnhub:implementation fnhub:Implementation The implementation that is connected to a function in this mapping fnhub:parameterMapping fnhub:ParameterMapping The mapping of the Function’s expects Parameters to an implementation’s pa- rameters

A.4.1 JavaClassMapping

Property Range Description fnhub:method-name xsd:string The name of method in the Java class that executes the func- tion

A.4.2 JavaScriptFunctionMapping

No additional properties.

A.4.3 NpmPackageMapping

No additional properties.

A.4.4 JsonApiMapping

Property Range Description fnhub:operation hydra:Operation The Hydra Operation that executes the function fnhub:outputMapping fnhub:ParameterMapping The mapping of the Function’s returns Output to the JSON property containing the output

A.5 ParameterMapping

The following table contains the properties that are common to all ParameterMapping classes. Prop- erties that are specific to a subclass of ParameterMapping are specified in this section’s subsections.

59 Property Range Description fnhub:functionParameter fno:Parameter The function parameter this ParameterMapping refers to

A.5.1 PositionParameterMapping

Property Range Description fnhub:implementationParameterPosition xsd:int The position of the implementation pa- rameter that this ParameterMapping relates to

A.5.2 PropertyParameterMapping

Property Range Description fnhub:implementationProperty xsd:string The property of a JSON object this Parame- terMapping relates to

60