Detecting Cycles in Graphql Schemas

Linköping University | Department of Computer and Information Science Bachelor thesis, 16 ECTS | Computer Engineering 2019 | LIU-IDA/LITH-EX-G--19/001--SE Detecting Cycles in GraphQL Schemas Jonas Lind Kieron Soames Supervisor : Patrick Lambrix Examiner : Olaf Hartig Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con- sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni- versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. c Jonas Lind Kieron Soames Abstract GraphQL is a database handling API created by Facebook, that provides an effective al- ternative to REST-style architectures. GraphQL provides the ability for a client to spec- ify exactly what data it wishes to receive. A problem with GraphQL is that the freedom of creating customized requests allows data to be included several times in the response, growing the response’s size exponentially. The thesis contributes to the field of GraphQL analysis by studying the prevalence of simple cycles in GraphQL schemas. We have im- plemented a locally-run tool and webtool using Tarjan’s and Johnson’s algorithms, that parses the schemas, creates a directed graph and enumerates all simple cycles in the graph. A collection of schemas was analysed with the tool to collect empirical data. It was found that 39.73 % of the total 2094 schemas contained at least one simple cycle, with the average number of cycles per schema being 4. The runtime was found to be on average 11 mil- liseconds, most of which consisted of the time for parsing the schemas. It was found that 44 out of the considered schemas could not be enumerated due to containing a staggering amount of simple cycles. It can be concluded that it is possible to test schemas for cyclicity and enumerate all simple cycles in a given schema efficiently. Acknowledgments We would like to thank Olaf Hartig, for the support and opportunity to take part in your research, we hope you find the results as interesting as we did. Another huge thank you goes out to Yun Wan Kim from Toronto for providing the collection of GraphQL schemas. We would also want to thank all friends and family that have supported us in our studies and the writing of this thesis. Finally, thank you to our opponents Daniel Dzabic and Jacob Mårtensson for the insightful critique, and for your patience. iv Contents Abstract iii Acknowledgments iv Contents v List of Figures vii List of Tables viii 1 Introduction 1 1.1 Background . 2 1.2 Motivation . 2 1.3 Research questions . 3 1.4 Delimitations . 3 2 Theory 4 2.1 GraphQL . 4 2.2 Graph Theory . 7 2.3 Algorithms . 8 3 Method 18 3.1 Issues to be addressed . 18 3.2 Comparing the algorithms . 18 3.3 Creating the cycle detecting tool . 19 3.4 Web application . 21 3.5 Testing the implementation . 23 3.6 Empirical evaluation . 23 4 Results 24 4.1 Adjustments of the schema converter . 24 4.2 Runtime . 24 4.3 Distribution of Types . 25 4.4 Cycles and SCCs in schemas . 25 5 Discussion 33 5.1 Implementation . 33 5.2 Cycles in schemas . 34 5.3 Evaluation of the Method . 35 6 Conclusion 36 6.1 The work in a wider context . 36 6.2 Future work . 36 v Bibliography 37 A Appendix 38 A.1 Git repositories . 38 vi List of Figures 2.1 An example of some of the more important GraphQL schema types. 5 2.2 An example of a GraphQL query and the expected result. 7 2.3 The graph illustrated is an example of a directed multigraph. The edges are illustrated as arrows and are unidirectional, leading from a vertex to another. 7 2.4 An example of a directed multigraph that contains multiple SCC’s. ......... 8 2.5 An example SCC that contains 2 cycles (1 Ñ 2 Ñ 3 Ñ 1 and 2 Ñ 4 Ñ 5 Ñ 2). 13 2.6 An example illustration the operation of the Weinblatt algorithm. 16 3.1 An example of a GraphQL schema. This schema is the schema for the graph shown in Figure 2.4 . 20 3.2 An illustration of how the data is structured when converted to a graph. 21 4.1 Average and median runtime of the cycle detection tool for the collection of schemas. Note that none of the edge case schemas are represented in this graph. 25 4.2 Runtime for each schema (without the edge case schemas), where the time is di- vided into the three phases of the cycle detection tool. 26 4.3 The average distribution of vertex types in graphs. 26 4.4 Proportion of schemas that contained cycles versus those that did not. 27 4.5 The distribution of cycles in non-edge case schemas. Only the schemas that con- tain cycles are included. 28 4.6 a) The distribution of cycles compared to the number of edges in schemas. b) The distribution of cycles compared to the number of vertices in schemas. 29 4.7 The distribution of cycles compared to the ratio between edges and vertices in schemas. 30 4.8 The amount of cycles in schemas compared to the size of the largest SCC in each schema. 30 4.9 Screenshot of the web application before the user have entered a GraphQL schema to be evaluated. 31 4.10 Screenshot of the web application after the user entered a schema. The application evaluated the schema and produced a directed graph and all simple cycles are presented to the user in the leftmost text-area. 32 A.1 An illustration of the general layout of the git repositories, branches, forks and how they are connected for the web application as well as for the locally run tool. 39 vii List of Tables 4.1 Average vertices, edges, average connectivity and the average size of the largest SCC in evaluated schemas, clustered by the number of cycles contained in the schemas. 28 viii 1 Introduction Data transfers in a client-to-server relationship have been around since the first websites started to emerge in the 1990s. However, the way clients and servers communicate has been a topic of discussion and research for a long time. In the early 2000s the Representational State Transfer (REST) architectural style, coined by Fielding [4], started to become the most prevalent solution to how developers allowed clients to access websites and is the most com- mon solution right up to the current day. REST-style architectures gave website developers powerful tools that would allow users to request, send or modify data from the server by sending GET, POST, PUT or DELETE requests over the predefined paths. Using a REST-style architecture would allow a client visiting a website to be provided with the information the client needed, when the client needed it. The client is also allowed to interact with the website in more ways than just retrieving information. For example, a client could enter a username and a password into a form to be able to create a profile on a website with persistent data of the client’s information. However, as web services started to increase in scope and with the rise of social me- dia web applications, the amount of data that is sent to and from modern web servers in- creased drastically. The increase in data traffic would illuminate the limitations of using a REST style architecture, since it uses predefined static paths from the client to the server. In a REST style architecture, a datapoint is a node from which data can be received, e.g. www.website.com/datapoint-name where datapoint-name is one of the many datapoints that a website potentially could have.

Load more