<<

Interactive Web Reporting Dashboard for Enterprise Businesses. Integrating ’s Web tools for a one-stop reporting interface

Apostolos Gouvalas

SID: 3306160002

SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Mobile and Web Computing

DECEMBER 2017 THESSALONIKI – GREECE

Interactive Reporting Dashboard for Enterprise Businesses. Integrating Google’s Web tools for a one-stop reporting interface

Apostolos Gouvalas

SID: 3306160002

Supervisor: Dr. Christos Berberidis

SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Mobile and Web Computing

DECEMBER 2017 THESSALONIKI – GREECE

Abstract

The aim of this dissertation was to develop, using cutting-edge technologies, a visual analytics dashboard for enterprise businesses, utilizing the Analytics and Real Time Re- porting APIs from Google APIs. Moreover, we had to present the data from all web prop- erties of a user’s account in order to give them a better overview across all their tracking properties, while maintaining a user-friendly environment.

Apostolos Gouvalas 18 Dec 2017

-i- Acknowledgements

I would like first to thank my dissertation supervisor Dr. Christos Berberidis of School of Science at International Hellenic University for providing guidance when needed. I would also like to thank the collaborative company, iTrust, for providing the thesis subject and specially their lead developer Mpampis Sykovaridis for guiding me regard- ing the technologies I utilized in the development process and for giving me constructive feedback when I present them the final product. Next, I would like to thank my parents for supporting me both financially and emo- tionally with my decision to study at this master and throughout my life in general. Last but not least, I would not have achieved any of this without my girlfriend Chrysa, who was supporting and encouraging me from the very beginning and throughout the year in order to overcome any obstacles.

-ii- Contents

ABSTRACT ...... I

ACKNOWLEDGEMENTS ...... II

CONTENTS ...... III

1 INTRODUCTION ...... 1

1.1 THE PROBLEM ...... 2

1.2 SOLUTION ...... 2

1.3 OVERVIEW OF NEXT CHAPTERS ...... 3

2 LITERATURE REVIEW ...... 5

3 UTILIZED TECHNOLOGIES ...... 9

3.1 BACK-END ...... 10 3.1.1 Node.js ...... 10 3.1.2 & Yarn ...... 11 3.1.3 Express.js ...... 12 3.1.4 Passport.js ...... 12 3.1.5 Mongoose.js ...... 12 3.1.6 Socket.io ...... 13 3.1.7 Googleapis ...... 13 3.1.8 Request ...... 13 3.1.9 Express-session ...... 13 3.1.10 Lodash ...... 13 3.1.11 Morgan ...... 13 3.1.12 Concurrently ...... 14

3.2 FRONT-END ...... 14 3.2.1 React ...... 14 3.2.2 React Router ...... 15 3.2.3 Redux ...... 15 3.2.4 React-sparklines ...... 15 3.2.5 Redux-form ...... 15 3.2.6 Moment.js ...... 15 3.2.7 React-widgets ...... 15

-iii- 3.2.8 MaterializeCSS ...... 16 3.2.9 Font Awesome ...... 16 3.2.10 Particle.js – React Component ...... 16

4 IMPLEMENTATION ...... 17

4.1 ARCHITECTURE HIGH-LEVEL OVERVIEW ...... 17

4.2 ITRALYTICS BACK-END ...... 18 4.2.1 Authentication Related Route Handlers ...... 18 4.2.2 Google Analytics Related Route Handlers ...... 22 4.2.2.1 requireLogin Middleware ...... 23 4.2.2.2 validateAccessToken Middleware ...... 23 4.2.2.3 refreshAccessToken Middleware ...... 24 4.2.2.4 GET /api/analytics/accountSummary ...... 25 4.2.2.5 GET /api/analytics/reportsBatchGet ...... 27 4.2.3 Socket.io & Google Analytics Real Time Reporting API ...... 28

4.3 ITRALYTICS FRONT-END ...... 30 4.3.1 Initial crate-react-app setup ...... 31 4.3.2 Single Page Application ...... 34 4.3.3 Connection to our API & displays ...... 36 4.3.3.1 Log-in ...... 36 4.3.3.2 Fetching User’s Google Analytics Web Properties ...... 37 4.3.3.3 Initialize ChartsList Component ...... 40 4.3.3.4 Display Analytics for All User’s Web Properties ...... 40 4.3.3.5 Get Real-time Analytics data ...... 41

5 FUTURE WORK ...... 45

6 CONCLUSIONS ...... 47

BIBLIOGRAPHY ...... 49

APPENDICES ...... 55

Appendix 1: List of Web Analytics tools [54] ...... 55 Appendix 2: Back-end & font-end technologies list...... 56 Appendix 3: iTrAlytics Back-end ...... 57 Appendix 4: iTrAlytics Front-end Screens ...... 61

-iv- List of Figures

Figure 1: Google Analytics – Hierarchy of accounts ...... 5 Figure 2: 2016 Stack Overflow developer survey ...... 9 Figure 3: 2017 Stack Overflow developer survey ...... 9 Figure 4: Application technologies overview ...... 18 Figure 5: Enable Google APIs and get credentials ...... 19 Figure 6: Google OAuth Credentials ...... 20 Figure 7: Running two servers on development process...... 32 Figure 8: Forward requests to our Express server...... 33 Figure 9: iTrAlytics React components overview...... 35 Figure 10: How Redux works ...... 38 Figure 11: Access to store with React-Redux Provider component ...... 39 Figure 12: Traditional features of web analytics tools analyzed by Ivan Bekavac and Daniela Garbin Praničević ...... 55 Figure 13: Passport.js Google strategy. Retrieve or Save a user to the database...... 57 Figure 14: Route handler for starting the OAuth process ...... 57 Figure 15: Route handler after user give permission to access his/her Google data...... 58 Figure 16: Route handler for logging out users ...... 58 Figure 17: Route handler for fetching users web properties information ...... 59 Figure 18: Route handler for fetching Google Analytics data ...... 60 Figure 19: iTrAlytics ...... 61 Figure 20: iTrAlytics Google OAuth ...... 62 Figure 21: iTrAlytics after log-in ...... 62 Figure 22: iTrAlytics dashboard initial view ...... 63 Figure 23: iTrAlytics dashboard /Week analytics ...... 64 Figure 24: iTrAlytics dashboard Real-time analytics ...... 65

-v- List of

Table 1: List of technologies used in back-end & front-end ...... 56

-vi- 1 Introduction

There are more than 3.78 billion internet users [1] [2], as of this writing, and this mandates to all kind of businesses, regardless of their size, kind of operations or place of origin, to have an internet presence, in order for them to grow in our ever-evolving digital era. Putting your business on the web not only makes it possible for new clients/customers to get to know your business but also, for already customers to keep up with your updates and offers which can help your business grow loyal customers. It does not matter if your main business operations are not online, as only the fact that you have a corporate with some valuable information for your services, offers and location are enough to en- gage new customers and help grow your business. In the other case, that your main busi- ness operations are online, it comes without saying that a strong online presence is essen- tial. However, since there are so many internet users from all around the world, it comes to the business owner to somehow identify critical information about these users. Infor- mation such as where the users are coming from, in order for instance to have multilingual support to the website, or for e-shop owners to support multiple currencies. Other infor- mation could be related to the browser or the device the users are using to access the corporate website, which can dictate the development of the website to support specific browsers or make it responsive to different device widths and heights. Such useful information about your users led to the development of web analytics tools, such as the Google Analytics. According to the Digital Analytics Association, “Web Analytics is the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage” [3]. While, nowadays there are numerous web analytics tools (see Figure 12 in Appendix 1) with different functionalities, measures, capabilities, requirements and of course pric- ing, we will make use of the data we can get from the Google Analytics API and develop our own web analytics dashboard to present them. Google Analytics is the major platform for getting and viewing web analytics data and is used by all kind of businesses.

-1- 1.1 The Problem Google, offers two variations of Google Analytics: Google Analytics, which is free, and Google Analytics 360, which is more suitable for enterprises and comes with more capa- bilities and functionality than the free version, but also comes with a monthly fee. The free version is widely used both by small to medium companies and large corporations, as well as individual site owners, not only because it is a free product, powered by a giant tech firm, but also because it is a powerful tool, with many capabilities and insights for your users. However, many newcomers or not so tech savvy people may find the Google Analytics platform somehow overwhelming. There is too much information in the default Google Analytics dashboard and this led to the demand of developing that can simplify the presentation of the information [4]. Another need that the free Google Analytics does not cover is when we have more than one and we would like to have an overview of all our websites in one dash- board. Google Analytics offers insights for a specific website at a time and it only offers aggregated overview of all your websites in the Google Analytics 360 suite, by letting you set up a so called “Roll Up account” [5].

1.2 Solution To tackle the two aforementioned problems, we developed our own web application, us- ing modern web development technologies and utilizing the Google APIs. Thus, iTraLyt- ics was born. The approach we took in iTraLytics to solve the abovementioned problems was to query the user’s Google Analytics account and ask for all their declared websites. Then, we make queries for all retrieved websites in order to get the data we want, and we present them to the user in simple line charts, also known as sparklines [6]. With this approach, the user can have a quick overview for key metrics, about their online visitors across all their websites. Furthermore, following a similar methodology, we created a second dashboard which presents real time data about active users in any of the user’s websites, which we also categorize them according to the device they use to access the website whether it be from desktop, tablet or mobile device.

-2- 1.3 Overview of Next Chapters In the following chapter, we will review the google analytics account structure and how the presentation of the data lead to our problem regarding the view of multiple websites and also, we will make a quick review to existing software about visual analytic dash- boards. Next, we will present the technologies we used for our web application and in the fourth chapter we will have a look at the implementation level, meaning that we are going to see what architecture we used for our development process and core concepts and fea- tures of the app. In the end, we will propose some additions for the application, as future work and we will conclude the dissertation with a sum up of all chapters.

-3-

2 Literature Review

To better understand the needs of our application a review was made both on our web analytics provider, which in our case is Google Analytics (free edition), and on existing software, that are used by enterprise companies for visualizing their web analytics data. To begin with, we mentioned that one of the problems we will try to solve is the way we will present the data from Google Analytics when a user has many properties in their account. A Google Analytics account is structured like the following figure:

Figure 1: Google Analytics – Hierarchy of accounts

First, a user must have an account so as to get access to Analytics. Then, on this account, the user can add properties. A property, in the Google Analytics is a website, a device or a mobile application that the user wishes to get analytics data for. How a user handles the relationship between his/her account and the properties is up to him/her. The user can create one account with only one property or can have one account with many properties. Finally, the views are his/her access point to a report. The user defines specific views for a specific property [7].

-5- The problem we will try to solve rises from the fact that, when a user is having more than one properties in his/her account, he/she can view only one “view”, which as we said is tied to one property. In our implementation we will change the view part. In iTrAlytics, our view will be comprised of the views of all our declared properties, giving us a greater overview across all our sites. The idea to develop our own web application for presenting web analytics data in a dashboard is not something new. In fact, there is a plethora of software available in the market that trying to offer enterprise businesses an easy way to overview their data in beautiful dashboards. While we cannot do an exhaustive review of all such software, we will however mention some of the most used and well-funded one’s. Before we begin our review, it is worth mentioning that our application cannot com- pete with such software as they are made by big development teams and they also get big sums of funding from corporate businesses. By developing our own application, we do not try to make a market ready product, but rather we focus on solving the aforementioned problems and use a modern technology stack for our development process. According to a Tracxn1 2016 report about BI and analytics, over $1060M were in- vested in visualization tools from which the $726M was for visual analytics tools [8]. As we can see, the visualization of analytics data is a multi-million industry and corporate companies, that are interested in using such tools, are willing to make big investments. Some of the tools in this category that are quite popular are the Klipfolio, Cyfe and the Domo which, according to the same abovementioned report, received the most funds, reaching $458.6M from various investors [8]. The software we mentioned are used by companies of various sizes and industries. They all have some things in common, including the ease of connection with various data sources ranging from Google Analytics and Marketo, to databases and local . After the data connection, they provide customizable dashboards to display valuable infor- mation to the end-user. Specifically, Klipfolio is a cloud-hosted real-time dashboard that is accessible via web and mobile. It offers 100s of data services connectors that are built for ease of use by their clients. After a data connection, Klipfolio will automatically retrieve the data and will

1 Tracxn provides information for startups in venture capital, private equity and corporate development

-6- provide a customizable dashboard to the end-user which will lead them to critical decision making by monitoring their data [9, 10]. Cyfe, provides a cloud-based service enabling its users to monitor and share business data from a single location in real-time. It offers similar ease in terms of data connectivity and thus, allows its clients to monitor everything from sales and web analytics, to mar- keting campaigns and custom business data [11, 12]. Finally, Domo is developed with similar, to both aforementioned software, purpose in mind which is to help businesses grow and support them in valuable and critical deci- sion making by monitoring their data. From the previous software, Domo defers in that it offers even more options for data connectivity out of the box with over 450 connectors. Furthermore, as we already saw, it is a platform which received a huge amount of funding by companies such as eBay, Google and Facebook [13, 14]. While we did not get into much detail in the aforesaid software, as it is not needed for our purpose, we got the main idea, which is to offer an easy way to their clients to connect their data to the individual platform and then the platform will structure the data for a visual feedback to the client. This whole process is critical for businesses, since in our digital world the amount of data we produce changes every second and it is easy to lose valuable information.

-7-

3 Utilized Technologies iTralytics behind the scenes utilizes plenty of JavaScript technologies so as to offer its end-users the right data about the users of their websites via Google Analytics and, at the same time, a great user experience. A lot of factors led us to consider JavaScript as the main programming language of the iTrAlytics web application. According to Stack Overflow 2016 & 2017 developer surveys, JavaScript is the num- ber one programming language among developers [15] [16].

Figure 2: 2016 Stack Overflow developer sur- Figure 3: 2017 Stack Overflow developer sur- vey vey

Moreover, JavaScript was undertaken a major update at June of 2015 with the 6th edition gone public. This version gave the language new and powerful features, like destructur- ing, classes, arrow functions and more [17] [18]. Furthermore, Node.js, which it will be discussed a bit later, made it possible for us, to run JavaScript on the server. In addition, we are now at 2017 and front-end JavaScript frameworks like , React, Vue.js, just to name a few, are getting more and more attention from the developers and companies who are looking at modern web development processes [19].

-9- More or less, all the aforementioned reasons, led us to pick up JavaScript as the main development language, as with the approach we took, by having a Node.js back-end and rendering the front-end with React, we ended up using JavaScript across all our applica- tion, either in the front-end or the back-end. Having the same language for the whole app, just made the whole process a lot easier in terms of debugging, development and main- tainability. Followingly, we are going to see every technology and JavaScript package we made use of in iTrAlytics development process, by separating them in back-end and front-end. Only the utility library Lodash, which was used both in the front-end and the back-end, it will be described once in the back-end section. For a full list of the technologies used, with URLs to the official sites, see Table 1: List of technologies used in back-end & front- end at appendix 2.

3.1 Back-end

3.1.1 Node.js Node.js is the base of our back-end environment and is the framework which allows us to run JavaScript outside of the browser. Node.js feature set allows for system file ma- nipulation, create and remove folders, query databases directly or even create web servers using Node.js. In general, it gives capabilities of more “popular” languages, like , PHP or python. From the official nodejs.org, Node.js is “a JavaScript runtime built on Chrome’s V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.” [20]. To better understand this, we are going to explain this definition, about what Node.js is. The first part of the definition refers to the fact that both JavaScript code written either on Node.js or Chrome browser are using the same V8 JavaScript engine. Just for refer- ence, the V8 engine is used inside Node.js and Google’s Chrome browser and is an open source JavaScript runtime engine which is written in C++ and takes JavaScript code and compiles it to machine code, making our applications really fast [21] [22]. The second part of the definition conveys much more information, so we will break it into smaller parts. To begin with, I/O refers to the communication from Node.js appli- cation to other things inside of internet of things, like a read or write request to a database,

-10- a file manipulation on our file system or when making an HTTP request to a separate like the Google Analytics API for fetching user activity on our website. The non- blocking I/O means that, for instance, while one user is requesting a URL from google, other users can be requesting to read a file from database. They can be requesting all sort of things without preventing anyone else from getting some work done. Finally, the event- driven part refers to the event-driven programming in which events, such as a key press from the keyboard, a mouse click, a message from a function or a response from an API, determine the flow of the program. Node.js behind the scenes utilizes an event loop to handle events, but we are not going to see details into that, as is out of the scope of this dissertation [21] [22].

3.1.2 npm & Yarn Node.js makes it possible to run JavaScript code outside of browser and this led to the development of new node-based tools or also known as packages, which we, developers, make use of, to help us in the web development process. Tools such as and gulp with which we can automate time consuming tasks, like auto-compiling sass code to , minify JavaScript and many more are accessible to us because of Node.js. However, so as to make use of these tools, we need another tool which can facilitate the installation process, the updating process to newer versions and also keep track of currently installed packages in our projects. This tool is the npm, which stands for node , and is a package manager for JavaScript and the default package man- ager of Node.js. From the official documentation of npm “npm is a way to reuse code from other developers, and also a way to share your code with them, and it makes it easy to manage the different versions of code.” [23] [24]. Currently, npm registry holds 475,000 packages of free and reusable code [24]. Yarn is a newer package manager for JavaScript made by engineers on Facebook and in collaboration with Exponent, Google and Tilde [25]. Yarn was developed after engi- neers at these companies tried to solve problems with consistency, security and perfor- mance they faced upon using npm in large codebases and teams. We made use of Yarn to install all our packages for the development of iTrAlytics.

-11- 3.1.3 Express.js The official website describes Express.js as “a minimal and flexible Node.js web appli- cation framework that provides a robust set of features for web and mobile applications” [26]. The minimal aspect of Express is what makes it appealing to a lot of developers. Min- imal does not that it is not useful or lucking features, but only that Express offers the minimum layer between us and the server [27]. Express is also flexible and lets us use its functionality as needed, as we can replace whatever does not fit our needs. Other frameworks take the opposite approach by having strict rules that we have to follow and making it difficult to remove unwanted functional- ity or even alter it [27]. Finally, the web application framework part refers to the functionality the Express offers, which among other it offers simple routing handling, session management and ease of templating with mustache, EJS, etc. in order to build single-page or multi-page and hybrid web applications [27]. Express makes use of special functions, which are called middlewares and have ac- cess to the request (req) and response (res) object as well as the next middleware function (usually denoted with a variable named next) in the application’s request-response cycle [28]. We can also write our own middlewares or install packages that offer middlewares for specific purposes, like the Passport package that we will see next.

3.1.4 Passport.js Passport does one thing and it does it right. It authenticates requests. Passport is an au- thentication middleware that has general helpers for handling authentication in Express apps. Due to the varying methods of authentication one app can have, Passport has au- thentication mechanisms for specific methods, like /password, Google OAuth, Fa- cebook OAuth, etc. as individual packages which are known as strategies [29]. In iTrA- lytics we make use of the passport-google-oauth20 strategy for handling OAuth 2.0 au- thentication with Google.

3.1.5 Mongoose.js MongoDB is quite often the database of choice with Node.js due to their shared use of JavaScript and the popularity of JSON as a data format for web APIs. Mongoose is an

-12- object data modeling (ODM) library which offers built-in ways to define, maintain and validate data structures and models and use them to interact with the DB [30].

3.1.6 Socket.io Socket.io is a powerful JavaScript framework for real-time bidirectional event-based communication. Socket.io provides a server and a client library for making real-time up- dates between a browser client and a web server [31, 32]. We make use of socket.io to automatically make real-time queries in the Google Real-time Analytics API.

3.1.7 Googleapis Googleapis or google-api-nodejs-client is the official Node.js client library for accessing Google APIs. This package also offers authorization and authentication with OAuth 2.0, API keys and JWT [33]. We make use of this package to make calls to the Google Real- time Analytics API.

3.1.8 Request This package simplifies the way we make HTTP requests by providing easy to use and remember methods for common HTTP methods, like put, get, post, etc. [34]. We make use of request package to make requests to Google APIs.

3.1.9 Express-session Express-session is a middleware for handling session in Express apps. We use it in com- bination to the connect-mongo package which handles the session store to a MongoDB [35, 36].

3.1.10 Lodash Lodash is a popular JavaScript utility library which offers helpful functions when working with arrays, numbers, objects, etc. [37]. We make use of Lodash library both in the back- end and the front-end and mainly when iterating through arrays.

3.1.11 Morgan Morgan is an Express middleware that logs HTTP requests and we use it on the develop- ment environment of our application to know when we get a request from the front-end client or when we make a request to the Google APIs [38].

-13- 3.1.12 Concurrently Concurrently package let us run commands from our package. file and we use it in the development environment to run simultaneously both the back-end express server and the front-end server, which is part of the create-react-app we will see shortly [39].

3.2 Front-end

3.2.1 React React is not a full featured framework like Angular or ember, but rather it is a library for building composable user interfaces. React encourages the creation of highly reusable UI components, such as comment boxes, pop up modals, sortable tables, etc. Each compo- nent has each own functionality, and we can nest components to build complex UIs. Even- tually, we end up with a web page which is comprised of multiple components. This makes our code more readable and maintainable, as each React component encloses the relevant, to it, HTML and JavaScript functionality. Apart from that, React is really good when data change over time. When a component is first initialized, its render method is called, which generates the view. When our data change, the render method is called again, which will make the specific component that handles this data to re-render its view [40]. However, getting started development with React can be time consuming and some- how troublesome, as for the moment we need some extra configuration to build React applications. Namely, we need a , like , which will translate our modern JavaScript code into code that will also work on older browsers. Furthermore, we need a bundler as well, like webpack or which will handle the task of bundling our modular code into one minified package so as to optimize load times [41]. For us developers, in order to speed up our development process we can make use of create-react-app. Create-react-app is a package which is backed up by the developers of React at Facebook and let us create React applications without caring about the build configuration as create-react-app takes care of that. In addition, it gives us some other useful functionalities, like meaningful error , test ready environment and more [42].

-14- 3.2.2 React Router React router is a navigational component and helps us declare the routing of our applica- tion. Routing takes place as our application is rendering and then we declare our routes in a composable way throughout our app like any other component [43].

3.2.3 Redux Redux is “predictable state container for JavaScript apps” [44]. State container means a collection of all the data that describes the app. With Redux, we have a centralized state object which holds all our application data. Redux is a standalone, lightweight library which can be used with Ember, Angular, React, jQuery or vanilla JavaScript.

3.2.4 React-sparklines This package is a React component for making beautiful and expressive Sparkline charts [6] [45].

3.2.5 Redux-form Redux-form makes a lot of work behind the scenes for us, which intends to have form data flowing into our redux application state. Practically, this mean we can do things like instant validation, enable/disable, hide/show component on form input change, or submit. Moreover, redux-form keeps track for the values of each field, if a field is focused or not, and even if the user has interacted with a field or not [46].

3.2.6 Moment.js Moment.js is a JavaScript library that makes working with Date object a lot easier. Addi- tionally, this library provides a very useful set of functions to work with Date objects, like date localization, different display formats, date validators and more [47].

3.2.7 React-widgets React widgets offers a set of form input react components that are extensible and easy to use [48]. In our application, we make use of react-widgets datepicker and dropdown com- ponents.

-15- 3.2.8 MaterializeCSS MaterializeCSS is a CSS framework based on and we used it throughout our app to style and handle the layout changes [49].

3.2.9 Font Awesome Font Awesome is an iconic font and CSS toolkit. By using it, we can display in our ap- plication scalable vector icons which we can customize with the power of CSS [50].

3.2.10 Particle.js – React Component Finally, for the spectacular particle interaction at the landing page of the app, we made use of the Particle.js – React Component, which is a wrapper for the original Particle.js library made by Vincent Garreau. This library permits us to create customizable particles interactions [51, 52].

-16- 4 Implementation

Now that we have an overview of the technologies we used in iTrAlytics, we are going to present how we fit them together in order to produce the final application. At this point it is worth mentioning that we took the extra step to also deploy our application to the Heroku platform. Thus, we went through all steps of the development lifecycle of a modern web application. We setup a local development environment and made the necessary steps aiming to deploy our application to a production environment like Heroku. Although we will not get into details how Heroku works, it is enough to know that it is a cloud Platform as a Service, meaning that we do not have to worry about the infra- structure and thus we can focus on our app. Heroku offers a free plan which we used to deploy iTrAlytics. Furthermore, Heroku provides a command line interface so as to in- teract with the platform and makes it very easy to deploy our application by pushing our Git code.

4.1 Architecture High-Level overview In the following figure (Figure 4) we observe an overview of how our application works. When a user enters the iTrAlytics URL into their browser, we are going to respond to them by sending an HTML document with a bundle JavaScript file that contains our React app application. While a user interacts with the front-end React app, we want to show them some information. This information is handled from the back-end of our application. The React app will communicate with our own Express API, which will handle all the communication with Google APIs and the Mongo database. The React app will never directly communicate with either the Google APIs or the Mongo database, where we store our user information and Google Analytics profile information. For this purpose, an API has been set up to deal with this. The communication between our front-end and the Ex- press API is done via HTTP request which return some amount of data in JSON format.

-17-

Figure 4: Application technologies overview

Next, we are going to see how things work in the back-end and afterwards we will move on to the front-end.

4.2 iTrAlytics Back-end Our back-end is powered by Node.js and Express. We run a node server on a specific port of our system (e.g: http://localhost:5000 in our development environment), which can receive requests and send back responses. Then, Express looks at the requests and if we have set up a specific Route Handler that matches the request, Express will handle it and send back a response to whomever made the initial request.

4.2.1 Authentication Related Route Handlers The first Route Handler we wrote was for signing in users with Google OAuth. app.get( '/auth/google', passport.authenticate('google', { scope: [ 'https://www.googleapis.com/auth/userinfo.profile', 'https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/analytics', 'https://www.googleapis.com/auth/analytics.edit' ], accessType: 'offline', approvalPrompt: 'force'

-18- }) );

The above sample code will handle any request made to our server with this specific route: http:localhost:5000/auth/google. We mentioned in the previous chapter that we are using the Passport.js library to han- dle authentication, alongside the google strategy of Passport.js which will instruct Pass- port.js how to handle Google OAuth. In order to be able to use Google OAuth, we need first to get a client ID and a client secret from the Google OAuth service.

Figure 5: Enable Google APIs and get credentials

By visiting the https://console.developers.google.com page, we can enable the google APIs that our app will make use of and also get the essential API credentials for the OAuth process. In Figure 5, we first (see No.1) ensure that we have selected the appropriate project, or we create a new one, if we don’t have already one. Then, (see No.2) we need to enable some APIs. iTrAlytics makes use of the (see No.3) Analytics API, Google An- alytics Reporting API and Google+ API (this is for the OAuth process). Finally, we visit the Credentials link (see No.4).

-19-

Figure 6: Google OAuth Credentials

From the Credentials page (see Figure 6) first we need the (see #1) Client ID and the Client secret. We will use those in our google strategy for Passport.js (see Appendix 3, Figure 13). Then, (see #2 of Figure 6) we declare the origin URL. This is the URL where the request is coming from and is the root URL of our back-end Express app. Finally, we declare (see #3 of Figure 6) authorized URIs. Those are the URIs where the user will be redirected after he/she grants permission to our app to access his/her Google data. We declare what data exactly we will have access in our application, in the above route han- dler for “/auth/google”, in the scope2 object. This is a custom route which we are han- dling in our Express app with a Route Handler (see Appendix 3, Figure 15). At this point, we have access to four variables, that are coming back to us after the OAuth process with the Google, from which we are interesting only at three. We get an Access Token value, a Refresh Token value and the profile variable which contains the Google ID of the user, his/her email and display name. We store these information to our Mongo database, for future user information retrieval as well as to authenticate user requests to Google APIs. The instance of our MongoDB is on the cloud, managed by the mLab. mLab is a cloud database service (or a Database-as-a-Service) which hosts MongoDB databases [53]. They offer a free plan with some storage limitations, but it is enough for our storage needs.

2 Scope URLs are predefined by Google and there is a full list of them alongside with what access you get, here: https://developers.google.com/identity/protocols/googlescopes

-20- We use Mongoose.js to connect to our MongoDB, which is a straight forward process. We only need to provide the URL to our MongoDB, which is provided to us by the mLab service. const keys = require('./config/keys'); const mongoose = require('mongoose'); // Connecting to MongoDB using mongoose to our application // Use native promises mongoose.Promise = global.Promise; mongoose .connect(keys.mongo.URI, { useMongoClient: true }) .then(() => console.log('Connection to MongoDB successful.')) .catch(err => console.error('Could not connect to the database: ', err));

We store our sensitive data, like the Google related data we discussed above, our mLab MongoDB URL etc. to a separate file we call “keys.js” and we require it when we need any data from there. Inside our keys.js file we conditionally load a separate file which actually contains the keys we need, depending to which environment we are; development or production.

/** * File : keys.js.js * Project : iTrAlytics * Author : Apostolos Gouvalas */

// keys.js - figure out what set of credentials to return if (process.env.NODE_ENV === 'production') { // we are in production - return the prod set of keys module.exports = require('./prod'); } else { // we are in development - return the den keys module.exports = require('./dev'); }

The actual MongoDB URL in our file it is in the following form:

The obfuscated code is our username and password for the user we have set up in mLab service and who is able to connect to our MonogDB called “itralytics-dev”. We have also setup two more route handlers regarding authentication. The first one is the /api/currentUser:

/** * GET /api/currentUser * req: incoming Request * res: outgoing Response */ app.get('/api/currentUser', (req, res) => { // allow only a subset of User model to be accessible by a user let forUser;

-21- if (req.user) { forUser = _.pick(req.user, ['email', 'name']); } else { forUser = false; } res.send(forUser || req.user); });

This route handler will return a subset of the User model, containing only the email and the name of the user, as long as it finds a logged-in user. We have some sensitive infor- mation in our User model and thus we use the pick() function of the Lodash library which permits us get a subset of an object properties. The second route handler is used for the logout process, which is handled by: /api/log- out:

/** * GET /api/logout */ app.get('/api/logout', (req, res) => { req.logout(); req.session = null; res.redirect('/'); });

This route handler will logout a user, destroy its session and redirect him/her to our root path “/”. Now that we have covered in some detail our authentication related route handlers, we are going to analyze our Google Analytics related route handlers. This part of the project is really interesting, as these route handlers are used as the intermediate routes from our front-end to the Google APIs. It will be here that we perform some logic in order to return the correct data to our front-end.

4.2.2 Google Analytics Related Route Handlers To make requests to Google APIs we have some prerequisites. First, we need to confirm that whomever made the request is an authenticated user, and then and only then, we need to check their AccessToken (the one we got from Google, during the OAuth process). Access tokens are used alongside our requests to Google APIs in order for Google to authenticate us and are only valid for 3600 seconds (which is one hour). Thus, we had to perform some logic, in order to check if the Access token is still valid or to Refresh it otherwise. For making our requests to Google APIs we made use of the request library with promises.

-22- Since we want to run the above logic in our requests, Express provides a functionality called middlewares. Middlewares are pieces of code that we or others write. Subsequently we can run this code in our route handlers and perform checks, fetch data or whatever logic we need to perform. Finally, we can decide if the request will continue or need to be stopped.

4.2.2.1 requireLogin Middleware In this middleware we check if the user is logged-in. If the user it is not logged-in we return an error message alongside a 401 status code, indicating an Unauthorized request. If, however, the user is indeed logged-in we call the next() function which will run the next middleware, if we have chain one, or continue to our actual route handling code.

// route middleware to make sure a user is logged in module.exports = (req, res, next) => { console.log('*** requireLogin is Running...'); if (!req.user || !req.isAuthenticated()) { return res.status(401).send({ error: 'You must log in!' }); }

// there is a logged in user next(); };

4.2.2.2 validateAccessToken Middleware This middleware will fetch the AccessToken, which we store to our User model during the OAuth process and make a request to the https://www.googleapis.com/oauth2/v1/to- keninfo?access_token=OurAccessTokenHere. It will return an object with some infor- mation, if the Access token is still valid, or an error, if it is not. In the first case that the user still has a valid Access token we make a call to next() in order to continue to our route handling code. Otherwise, we call our refreshAccessToken middleware which will try to refresh the Access token. const rp = require('request-promise-native'); const refreshToken = require('./refreshAccessToken');

/** * get Token Info * GET https://www.googleapis.com/oauth2/v1/tokeninfo?access_to- ken=${access_token} */ module.exports = async (req, res, next) => { console.log('*** validateAccessToken is Running...'); if (!req.user || !req.isAuthenticated()) { return res.status(401).send({ error: 'You must log in!' }); }

-23- try { const tokenInfo = await rp( `https://www.googleapis.com/oauth2/v3/tokeninfo?access_to- ken=${req.user .googleAccessToken}` );

console.log('/!\\ Your AccessToken is still valid.'); next(); } catch (error) {

console.log('/!\\ Your AccessToken has expired, we will try to re- fresh it.'); // call refresh token refreshToken(req, res, next); } };

4.2.2.3 refreshAccessToken Middleware This is our last middleware, which is called only if the user’s Access token is invalid and tries to refresh it. Here we have a bit more complicated logic, as we need to make a POST request to Google, including our Google Client ID, Client secret and the user’s Refresh token (which we get the first time a user OAuth’s with our app and we store it in our database). If we successfully get a response from Google, we have also to update the user’s data in the database, so that we can store the newly refreshed Access token. Only if we successfully do that we continue to our rest route handler. const axios = require('axios'); const mongoose = require('mongoose'); const passport = require('passport'); const keys = require('../config/keys'); mongoose.Promise = global.Promise; const User = mongoose.model('users'); /** * Refresh Token * POST url: https://www.googleapis.com/oauth2/v4/token?cli- ent_id=${c_id}&client_secret=${c_scrt}&refresh_token=${r_to- ken}&grant_type=refresh_token * @param req * @param res * @param next * @returns {Promise.} */ module.exports = async (req, res, next) => { console.log('*** refreshAccessToken is Running...'); try { let c_id = keys.google.clientID; let c_scrt = keys.google.clientSecret; let r_token = req.user.googleRefreshToken;

const tokenRefresh = await axios({ method: 'post', url:

-24- `https://www.googleapis.com/oauth2/v4/token?client_id=${c_id}&cli- ent_secret=${c_scrt}&refresh_token=${r_token}&grant_type=refresh_to- ken` });

if (tokenRefresh.data.access_token) { // Token successfully refreshed console.log( '/!\\ Your AccessToken has been refreshed. We will try to save it in the db now.' ); try { // call the findByIdAndUpdate() const user = await User.findOneAndUpdate( { googleId: req.user.googleId }, { $set: { googleAccessToken: tokenRefresh.data.access_token, googleParams : tokenRefresh.data } }, { new: true } );

// fail to update the user if (!user) { throw new Error('Error while trying to save the new Ac- cessToken to the db.'); }

req.login(user, function(err) { if (err) return next(err); //res.redirect('/api/analytics/accountSummary'); next(); });

} finally {}

} // End if (tokenRefresh.data.access_token)... } catch (err) { console.log('/!\\ An error occurred while trying to refresh the token:'); console.log('Status Code: ', err.statusCode); console.log('Message: ', err.message); return next(err); } };

4.2.2.4 GET /api/analytics/accountSummary This is the first route handler in our Google Analytics related route handlers and is making a request to the Google Analytics Management API, so we can retrieve all of user’s de- clared web properties in their Google Analytics account. Since this information is some- what static, and we use each web properties profile id in our next route handler, we store this information to the session for faster retrieval.

/** * GET /api/analytics/accountSummary

-25- */ app.get( '/api/analytics/accountSummary', requireLogin, validateAccessToken, async (req, res) => { //do we have a cached version? if (req.session['gaProfiles']) { console.log('Google Profile from cache'); console.log(JSON.stringify(req.session['gaProfiles'], null, 2)); res.status(200).send(JSON.parse(req.session['gaProfiles'])); return; } try { const analyticsAccounts = await rp({ url: 'https://www.googleapis.com/analytics/v3/management/ac- counts', auth: { bearer: req.user.googleAccessToken }, json: true }); // get the account id let accountId = analyticsAccounts.items[0].id; let profiles = []; const analyticsWebProperties = await rp({ url: `https://www.googleapis.com/analytics/v3/management/ac- counts/${accountId}/webproperties`, auth: { bearer: req.user.googleAccessToken }, json: true }); if (analyticsWebProperties.items.length < 1){ throw new Error('Your Account does not have any Web Proper- ties. Try add some or try with another account.'); }

analyticsWebProperties.items.forEach(function(webProp) { if (webProp.defaultProfileId) { profiles.push({ id: 'ga:' + webProp.defaultProfileId, name: webProp.name, site: webProp.websiteUrl }); } });

req.session['gaProfiles'] = JSON.stringify(profiles);

res.status(200).send(profiles); } catch (error) { console.error('/!\\ Error while trying to get Analytics Pro- files:'); console.log('Status Code: ', error.statusCode); console.log('Message: ', error.message); res.status(400).send(error); } } );

-26- 4.2.2.5 GET /api/analytics/reportsBatchGet In this route handler we are making the actual request to Google so that we can get the Google Analytics Report data that we want to present in the front-end. This route handler will check if we passed along some query parameters to specify the web property from which we want to fetch the data, the date range (start/end date), the metrics (pageviews or unique pageviews) and the dimensions (year, month, week or day). For all properties, except the web property profile id, we are making a check if they are passed as query arguments to our route handler. In case they are not, we give them a default value. If we succeed with the request, we respond by sending our Analytics data alongside a 200 status code to indicate that everything went as intended.

/** * GET /api/analytics/reportsBatchGet */ app.get( '/api/analytics/reportsBatchGet', requireLogin, validateAccessToken, async (req, res) => { try { if (!req.query.viewId) { console.error('/!\\ Error while trying to get Analytics Re- ports Data: NO profile ID provided'); return res.status(400).send(); } // set the params from the query OR to the defaults let viewId = req.query.viewId;

let startDate = (req.query.startDate === 'undefined' || !req.query.startDate) ? '2017-07-01' : req.query.startDate; let endDate = (req.query.endDate === 'undefined' || !req.query.endDate) ? '2017-10-29' : req.query.endDate;

let metricsExpression = (req.query.metricsExpression === 'undefined' || !req.query.metricsExpression) ? 'ga:pageviews' : req.query.metricsExpression;

let dimensionsName = (req.query.dimensionsName === 'undefined' || !req.query.dimen- sionsName) ? 'ga:month' : req.query.dimensionsName;

const analyticsReportsBatchGet = await rp({ method: 'POST', url: 'https://analyticsreporting.googleapis.com/v4/re- ports:batchGet', auth: { bearer: req.user.googleAccessToken

-27- }, body: { reportRequests: [ { viewId: viewId, dateRanges: [ { startDate: startDate, endDate: endDate } ], metrics: [ { expression: metricsExpression } ], dimensions: [ { name: dimensionsName } ] } ] }, json: true });

let analyticsData = analyticsReportsBatchGet.reports[0].data;

res.status(200).send(analyticsData); } catch (error) { console.error('/!\\ Error while trying to get Analytics Reports Data:'); console.log('Status Code: ', error.statusCode); console.log('Message: ', error.message); res.status(400).send(); } } );

4.2.3 Socket.io & Google Analytics Real Time Reporting API In chapter 4.2.1 we mention that we have to enable an API from the Google APIs web page in order to make use of it in our application. Google Analytics Real Time Reporting API is somewhat different as it is available in limited beta version and we had to first sign up to access the API. Here we will discuss how we made use of the Google Analytics Real Time Reporting API in combination to socket.io library so we can deliver real-time analytics to our users. This time, we took another approach than the route handling, that we saw previously, so we could experiment with the different possibilities we had. We made use of “rooms” with socket.io, like the ones we can find in a chat application but instead of exchanging communication messages, we sent the real-time analytics data

-28- to the user. A room is an arbitrary channel that we define and in which sockets can join and leave. Since we are on the back-end of our application, we will see how we handle things here and in the next chapter we will see the front-end of the socket.io setup. First, we configured our express server to allow for incoming web socket connections. This means that our server will be able to accept connections and we will set up the client to make the connection. Tis way, we can establish a persistent connection between the client (front-end) and our server.

/** * setup the Socket.io server */ const io = require('socket.io').listen(server3);

Now we are ready to emit or listen for events. We, then, listen for clients connecting to our server.

/** * Listen when a client connects to our server */ io.use(function (socket, next){ // wrap and use the express session middleware sessionMiddleware(socket.request, {}, next); }) .on('connection', socket => { console.log(` <~~> Server: New user connected to Server.`);

Here is the first tricky part. Since we will start the whole “fetch and send the real-time” process automatically, when a user enters a specific part of our front-end application and we do not use the route handler approach like before, we need somehow to get the user data, so we can authenticate with Google and make the requests to the Real Time Report- ing API. For that reason, we wrap the session middleware (as we can see in the above code snippet) to our socket.io. With that in place we can get the logged-in user id which we can use it to fetch the user data from the database. As we will see in the next chapter, in our front-end, we create an event when a user enters a specific part of our front-end application. Thus, in our back-end we listen for that event:

// listen for 'getReal' events socket.on('getReal', async function(data) { console.log(`<~~ Server: Received a 'getReal' event.`);

3 Where server, is our express app server

-29- Once, we receive such an event, we then get the user id from the session and make a request to our database to fetch the whole user data:

// get the connected user id let clientUserId = socket.request.session.passport.user; // convert string Id to ObjectId let clientUserObjId = mongoose.Types.ObjectId(clientUserId); user = await findAndReturnClientUser(clientUserObjId);

Alongside the “getReal” event that the front-end send us, it also sends us a room name. We use that to establish a room specific connection for our data exchange.

// join the 'Get Real' room, to establish a room-specific channel lay- ered over our socket connection. socket.join(data.room); console.log(`~~> Server joined the room '${data.room}'`);

As soon as our server socket.io joins this room it starts making requests to Real Time Reporting API every fifteen seconds4 and we send the data to our front-end.

/** call API and emit every 15 sec */ setInterval( () => getRealTimeDataApiAndEmit(socket), 15000 );

Finally, we listen for another event from the client side, which denotes that the user left the Real-time part of our application and thus we should leave the room and stop making requests to the Google’s API.

// listen for 'UnReal' events socket.on('UnReal', data => { console.log(`~~> Server: Client left the room, so end the subscrip- tion to that room.`); // end the client subscription to that room socket.leave(data.room); });

Having analyzed the back-end application structure we are now ready to see how things work in the front-end with the create-react-app.

4.3 iTrAlytics Front-end We mentioned at the chapter 3.2.1 that we made use of create-react-app so we can use React in our front-end part of the application. By doing so, we speed up the development process a lot. Running a React application requires a certain amount of pre-configuration like, configuring webpack loaders to bundle our application files, add webpack plugins

4 Due to quote limitations of the Real Time Reporting API, and because we make a lot of request, we have to set a time interval, in order to not exceed the limits.

-30- for extra functionality and write babel configuration so it can translate our modern JavaS- cript (ES6) to JavaScript that most browsers can understand. Create-react-app comes pre- configured with all aforementioned configurations and since it is made by the original React developers we can rest assured that we get the optimal configuration.

4.3.1 Initial crate-react-app setup First of all, create-react-app provides a simple cli which we run like:

> create-rect-app client

The above command will create a folder named “client” and a predefined scaffolding for our application, as well as install all initial dependencies. When the installation is finished we can navigate to this folder and run: yarn start

This will start a server, from webpack, running at: http:localhost:3000

By visiting the above URL, we have access to the front-end of our application running React. The above server at port 3000 is only running locally during the development stage and we do not use it in the production. Create-react-app provides a script, which we run before deploying our application to production, like we did with Heroku, and is bundling up our application to produce an optimized version of our application. This optimized version, consists of an index. file and a static folder which contains our CSS, JavaS- cript and media that we wrote and used in our application. In other words, we only need to run this server in the development environment. However, since we already have a server running at port 5000, for our back-end part of the application, we needed a way to also run the front-end server at the same time. While this can be done easily, by running each server from their respective folder, we found out that this was a time-consuming process, as during development we had to re- start our servers a lot, and thus, running them from their respective folders was not the best approach for us. For that, we used a module called “concurrently” which let us chain up commands to our package.json file. Then we ended up running one command to start both our servers. To get a better understanding of the architecture that our application has at the mo- ment we will have a look at the diagram in Figure 7.

-31-

Figure 7: Running two servers on development process.

From the image above, we see an overview of our application architecture. From top to bottom, we have our browser when someone visits our application and two servers. From left to right we have our React server (the one running by webpack) and our Node.js/Express server. As we have seen, the Express server pulls some information from the MongoDB and sends us json data in response to various requests we make. The React server will take a couple of different components (Header.js, App.js etc.) and serve a bundle of JavaScript, so we can view our application. In other words, we got one server which serves us with the font-end application assets and another one which serves all our data of the application. This setup right now it may seem confusing or unnecessary, but in production we only have our Express server and nothing else. As we already men- tioned, the create-react-app has a build script which leaves us only with the necessary JavaScript, CSS and media files, that we need to run our React app. The last piece of configuration we need to make, in order our front-end React appli- cation to communicate with our Express backend server, is to setup a proxy. Create-react-

-32- app is already bundled with a proxy functionality and thus, we only need to configure it in our package.json file5, like this:

"proxy": { "/auth/google": { "target": "http://localhost:5000" }, "/api/*": { "target": "http://localhost:5000" } },

Basically, here we say to our React application to forward any request to /auth/google or /api/* paths to our Express server. Figure 8 shows how our requests are handled by the proxy.

Figure 8: Forward requests to our Express server.

With that in place we are ready to start developing our front-end application.

5 This package.json file resides in our client folder and is the configuration file of our create-react-app.

-33- 4.3.2 Single Page Application We developed our React application as a Single Page Application (SPA). This approach defers from the traditional approach where we have multi-page applications, for example the development of a website. The main difference in SPA is in navigation, where when we navigate through our application we do not load an entirely new page. Our pages, also known as views in the context of SPAs, are loaded inline within the same initial page. This means, that our views are comprised from different parts or components and each component is in charge for a specific view in our application. When a user navigates from the menu to a specific page of our app, we simply load the appropriate component for the view that the user requested. However, when we use a web application we have certain expectations regarding the navigation. When we click on a menu link we expect not only to see a change in the content of the page and in the URL displayed in the location bar of our browser, but also that we can use the browser’s back and forward buttons. Such functionality, for multi- page applications, works out of the box without us requiring doing anything. In SPAs however, because we do not navigate to a new page, we have to provide such functional- ity. In order to successfully do so to our SPA, we need to provide routing to our applica- tion. Routing is when we map URLs to views, inside our single-page app. Essentially, we have only one index.html with a plain HTML5 document and some css links and only a single div inside it, like:

iTralytics - Get to know your users!

Then, the create-react-app will add/load a bundle file containing all our JavaScript files. With React, we write JavaScript code which renders the HTML code we want to

-34- display in our application. We separate each view of our application in components which contain logic and HTML for that specific view. In the end we have something like the following diagram in Figure 9.

Figure 9: iTrAlytics React components overview.

React offers a collection of components called React Router for handling routing to React applications and we made use of that so as to map certain components to specific routes. import { BrowserRouter, Route } from 'react-router-dom'; ... render() { return (

-35-

); } With the above code, we declare that we want to show the Header and Footer always regardless of the path in the location bar. Then we say that we want to show the Landing component only when we are at the root URL of application. At last, we specify two paths, /dash and /dash/realtime which both load the Dashboard component, which in turn conditionally renders the appropriate view, either the ReportsParams and ChartsList components or the RealTimeDisplay component, as well as the SideBar component. Finally, in our application menus inside of the Header and SideBar components re- spectively we make use of another React Router component, the Link component. The Link component is what makes it possible to navigate between the different routes we specified just before. A Link is similar to an anchor HTML tag, but it will not cause the whole page to reload which is the desirable result. A sample of such link is the dashboard link we show to logged-in users:

Dashboard