Transforming Options Market Data with the Dataflow SDK Salvatore Sferrazza & Sebastian Just, FIS ([email protected] and [email protected])

Total Page:16

File Type:pdf, Size:1020Kb

Transforming Options Market Data with the Dataflow SDK Salvatore Sferrazza & Sebastian Just, FIS (Salvatore.Sferrazza@Fisglobal.Com and Sebastian.Just@Fisglobal.Com) Transforming Options Market Data with the Dataflow SDK Salvatore Sferrazza & Sebastian Just, FIS ([email protected] and [email protected]) Preface When FIS (formerly SunGard) released the whitepaper Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Cloud Bigtable in Q2 2015, a key module of the experiments was utilizing MapReduce for persisting data to Google Cloud Bigtable via the HBase API. At that time, our team had been looking forward to working with Google Cloud Dataflow as a MapReduce alternative, but Google Cloud Dataflow was not yet generally available (GA). In the months since the Bigtable whitepaper was released, the Google Cloud Dataflow managed service has been promoted to GA, and this whitepaper details our experiences applying the Google Cloud Dataflow SDK to the problem of options market data transformation. Introduction Much of the software development in the capital markets (and enterprise The specific transformations we’ll perform in this use case include the systems in general) revolves around the transformation and enrichment following: of data arriving from another system. Traditionally, developers have • Ingest options market ticks and extract the contract’s expiration considered the activities around Extracting, Transforming and Loading date, strike price and reference asset from the symbol associated (ETL) data a particularly unglamorous dimension of building software with each tick products. However, these activities encapsulate functions that are core • Load the transformed data into a Google BigQuery schema that to every tier of computing. Oftentimes, the activity involves legacy supports queries on these individual contract properties. For systems integration, and perhaps this is the genesis of ETL’s unpopular example, finding all the in-the-money (ITM) contracts traded reputation. Nonetheless, when data-driven enterprises are tasked with between 2:00 and 3:00 PM requires comparing the strike price of a harvesting insights from massive data sets, it is quite likely that ETL, in contract with the spot price of the underlying asset. Hence, having one form or another, is lurking nearby. the strike price encoded within the options symbol string is an obstacle to any analytics that require arithmetic operations involving The Google Cloud Dataflow service and SDK represent powerful tools the strike price for myriad ETL duties. In this paper, we introduce some of the main concepts behind building and running applications that use Dataflow, then get “hands on” with a job to transform and ingest options market Prerequisites symbol data before storing the transformations within a Google BigQuery data set. While Dataflow’s SDK support is currently limited to Java, Google has a Dataflow SDK for Python currently in Early Access Program (EAP). As such, all examples herein are in Java using Maven for builds Objectives and dependency management. We hope to publish a supplementary • Introduce Google Cloud Dataflow’s programming and execution implementation of this same project using the Python SDK during model public beta. • Apply Dataflow to transform market data (specifically options For the hands-on portion of the paper, the following conditions were prices) and persist the transformations into a BigQuery table required: for analysis • Enabling the Dataflow API for a Google Developers Console project • Enabling the BigQuery API for a Google Developers Console project • Enabling billing on your Google Cloud Platform account Transforming Options Market Data with the Dataflow SDK 1 • Installing the Google Cloud SDK on your client operating system At runtime, the pipeline and its component ParDo functions are analyzed by the Dataflow optimizer and factored into its execution plan, • Installing Maven for managing the Java project possibly merging steps along the way and sequencing them according • For the Dataflow libraries specifically, adding the following to the semantics implied by the pipeline class and its component ParDo dependency to your project’s pom.xml as follows: operations. There is no explicit relationship defining map, reduce or <dependency> shuffle operations. Rather, the optimizer sequences operations based <groupId>com.google.cloud.dataflow</groupId> on the semantics of the input and output PCollections, similar to <artifactId>google-cloud-dataflow-java-sdk-all</artifactId> how a SQL query optimizer builds a query plan based on the selected <version>RELEASE</version> columns, available indices, query predicates and table schema. </dependency> Additionally, the source code may be cloned from the Git repository PCollections located at https://github.com/SunGard-Labs/dataflow-whitepaper. Another central concept in the Dataflow model is the PCollection (for Although the source code repository is self-contained, parameters Parallel Collection), which represents the primary group of elements may need to be adjusted to reflect the specific Google Cloud project being processed by the Dataflow job at any stage in the pipeline. With parameters for your environment. the typical MapReduce programming model (e.g., Hadoop), key- value pairs are emitted from collections of elements. In contrast, a PCollection simply represents the collection of active elements within Programming Dataflow the pipeline and is available as input to ParDos and DoFns. In our State maintenance and scalability experience, we have found that this both flattens the learning curve and simplifies the programming model overall compared to the currently Because a full description of the role that application state plays in available alternatives. determining scalability is beyond the scope of this paper, please refer to this explanation, which effectively breaks down the trade-offs inherent This design offers great flexibility as the programmer is freed from to various techniques around application state maintenance. having to implement a particular Map or Reduce interface imposed by the SDK. In particular, we find this enables greater focus on the Suffice it to say, state management is a critical factor contributing to business relevance of the associated inputs and outputs, as well as a system’s scalability. A common solution for applications requiring the specific nature of the elements that are processed. For example, near-linear scalability amid large workloads is parallelization. However, the PCollection of one job stage may have a direct lineage from the development for parallel processing can be tricky compared to more PCollection of the previous stage, but it is not required to per se. This traditional serial workloads. As you’ll see, Google’s Cloud Dataflow SDK affords a more reactive/functional approach to programming — the removes much of this friction with clear and concise API semantics. output of one processing stage is simply the input of the subsequent stage, not dissimilar to an idealized callback-driven control flow. Pipelines, ParDos and DoFns The bedrock of any Dataflow job is the pipeline, which is responsible for It is also important to remember that each PCollection can be treated as the coordination of an individual Dataflow job. The actual logic of the job a first-class citizen. A PCollection can be used multiple times in a single stages is performed by ParDo (for Parallel Do) subclasses that implement pipeline, effectively creating a new logical branch of pipeline execution their logic within the template of a DoFn (for Do Function) implementation each time. callback. A ParDo’s principal responsibility is the transformation of one (input) PCollection to another (output) PCollection. Transforming Options Market Data with the Dataflow SDK 2 Job inputs Although Dataflow allows for flexibility with regards to data sources, As of this writing, the I/O APIs supported by the Dataflow SDK there is one thing that all jobs share: an input operation such as a include the following: file on Google Cloud Storage, the contents of a BigQuery table or query, a message arriving on a Cloud PubSub topic, or even your • TextIO (text file access via URI, including Google Cloud Storage) own customized I/O source — which always bootstraps the initial • BigQueryIO (Google Cloud Platform BigQuery) PCollection for the pipeline. • AvroIO (Apache Avro) During the lifecycle of a pipeline, many different (side) inputs may be • PubsubIO (Google Cloud Platform Pub/Sub) employed as needed to transform, filter and/or process the currently active PCollection within the pipeline. • BigTableIO (Google Cloud Bigtable - currently in beta) Native support for input sources By inheriting functionality from more foundational classes, such as The stock Dataflow SDK ships with several classes that support TextIO, and overriding the relevant API implementations, the amount input and output operations with resources external to Dataflow’s of code required to implement a custom CsvIO or XmlIO is reduced. execution context. In some cases, you may ultimately inherit from classes that these I/O classes themselves are derived from, or you may inherit directly from For example, for a pipeline to initialize a PCollection from the an existing I/O class. contents, a file living within Google Cloud Storage requires importing com.google.cloud.dataflow.sdk.io.TextIO and, via the The native TextIO Reader method currently returns a PCollection PipelineOptions interface, configuring it with the URI of your storage corresponding to each line of text in the specified file(s). But what if the bucket, such as: gs://<storage-bucket-on-gcs>/DATA1.txt.
Recommended publications
  • Google Cloud Platform Integration
    Solidatus FACTSHEET Google Cloud Platform Integration The Solidatus Google Cloud Platform (GCP) integration suite helps to discover data structures and lineage in GCP and automatically create and maintain Solidatus models describing these assets when they are added to GCP and when they are changed. As of January 2019, the GCP integration supports the following scenarios: • Through the Solidatus UI: – Load BigQuery dataset schemas as Solidatus objects on-demand. • Automatically using a Solidatus Agent: – Detect new BigQuery schemas and add to a Solidatus model. – Detect changes to BigQuery schemas and update a Solidatus model. – Detect new files in Google Cloud Storage (GCS) and add to a Solidatus model. – Automatically detect changes to files in GCS and update a Solidatus model. • Automatically at build time: – Extract structure and lineage from a Google Cloud Dataflow and create or update a Solidatus model. FEATURES BigQuery Loader Apache Beam (GCP Dataflow) Lineage A user can import a BigQuery table definition, directly Mapper from Google, as an object into a Solidatus model. A developer can visualise their Apache Beam job’s The import supports both nested and flat structures, pipeline in a Solidatus model. The model helps both and also includes meta data about the table and developers and analysts to see that data from sources dataset. Objects created via the BigQuery Loader is correctly mapped through transforms to their sinks, can be easily updated by a right-clicking on an providing a data lineage model of the pipeline. object in Solidatus. Updating models using this Generating the models can be ad-hoc (on-demand by feature provides the ability to visualise differences in the developer) or built into a CI/CD process.
    [Show full text]
  • Google's Mission
    & Big Data & Rocket Fuel Dr Raj Subramani, HSBC Reza Rokni, Google Cloud, Solutions Architect Adrian Poole, Google Cloud, Google’s Mission Organize the world’s information and make it universally accessible and useful Eight cloud products with ONE BILLION Users Increasing Marginal Cost of Change $ Traditional Architectures Prohibitively Expensive change Marginal cost of 18 years of Google R&D / Investment Google Cloud Native Architectures (GCP) Increasing complexity of systems and processes Containers at Google Number of running jobs Enabled Google to grow our fleet over 10x faster than we grew our ops team Core Ops Team 2004 2016 4 Google’s innovation in data Millwheel F1 Spanner TensorFlow MapReduce Dremel Flume GFS Bigtable Colossus Megastore Pub/Sub Dataflow 2002 2004 2006 2008 2010 2012 2013 2016 Proprietary + Confidential5 Google’s innovation in data Dataflow Spanner NoSQL Spanner Cloud ML Dataproc BigQuery Dataflow GCS Bigtable GCS Datastore Pub/Sub Dataflow 2002 2004 2006 2008 2010 2012 2013 2016 Proprietary + Confidential6 Now available on Google Cloud Platform Compute Storage & Databases App Engine Container Compute Storage Bigtable Spanner Cloud SQL Datastore Engine Engine Big Data Machine Learning BigQuery Pub/Sub Dataflow Dataproc Datalab Vision API Machine Speech API Translate API Learning Lesson of the last 10 years... ● Democratise ML ● Big datasets beat fancy algorithms ● Good Models ● Lots of compute Google BigQuery BigQuery is Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. There is no infrastructure to manage and you don't need a database administrator, so you can focus on analyzing data to find meaningful insights using familiar SQL.
    [Show full text]
  • Are3na Crabbé Et Al
    ARe3NA Crabbé et al. (2014) AAA for Data and Services (D1.1.2 & D1.2.2): Analysing Standards &Technologies for AAA ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA) Authentication, Authorization & Accounting for Data and Services in EU Public Administrations D1.1.2 & D1.2.2– Analysing standards and technologies for AAA Ann Crabbé Danny Vandenbroucke Andreas Matheus Dirk Frigne Frank Maes Reijer Copier 0 ARe3NA Crabbé et al. (2014) AAA for Data and Services (D1.1.2 & D1.2.2): Analysing Standards &Technologies for AAA This publication is a Deliverable of Action 1.17 of the Interoperability Solutions for European Public Admin- istrations (ISA) Programme of the European Union, A Reusable INSPIRE Reference Platform (ARE3NA), managed by the Joint Research Centre, the European Commission’s in-house science service. Disclaimer The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication. Copyright notice © European Union, 2014. Reuse is authorised, provided the source is acknowledged. The reuse policy of the European Commission is implemented by the Decision on the reuse of Commission documents of 12 December 2011. Bibliographic Information: Ann Crabbé, Danny Vandenbroucke, Andreas Matheus, Dirk Frigne, Frank Maes and Reijer Copier Authenti- cation, Authorization and Accounting for Data and Services in EU Public Administrations: D1.1.2 & D1.2.2 – Analysing standards and technologies for AAA. European Commission; 2014. JRC92555 1 ARe3NA Crabbé et al. (2014) AAA for Data and Services (D1.1.2 & D1.2.2): Analysing Standards &Technologies for AAA Contents 1.
    [Show full text]
  • Software Development Methodologies on Android Application Using Example
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by VUS Repository POLYTECHNIC OF ŠIBENIK DEPARTMENT OF MANAGEMENT SPECIALIST STUDY OF MANAGEMENT Ivan Bumbak SOFTWARE DEVELOPMENT METHODOLOGIES ON ANDROID APPLICATION USING EXAMPLE Graduate thesis Šibenik, 2018. POLYTECHNIC OF ŠIBENIK DEPARTMENT OF MANAGEMENT SPECIALIST STUDY OF MANAGEMENT SOFTWARE DEVELOPMENT METHODOLOGIES ON ANDROID APPLICATION USING EXAMPLE Graduate thesis Course: Software engineering Mentor: PhD Frane Urem, college professor Student: Ivan Bumbak Student ID number: 0023096262 Šibenik, September 2018. TEMELJNA DOKUMENTACIJSKA KARTICA Veleučilište u Šibeniku Diplomski rad Odjel Menadžmenta Diplomski specijalistički stručni studij Menadžment Razvojne metode programa na Android platformi koristeći primjer Ivan Bumbak [email protected] Postoji mnogo razvojnih metoda programskih rješenja koje se mogu koristiti za razvoj istih na bilo kojoj platformi. Koja metoda će se koristiti ovisi o zahtjevnosti samog projekta, koliko ljudi radi na projektu, te u kojem vremenskom roku projekt mora biti isporučen. U svrhu ovog diplomskog rada razvijena je Android aplikacija putem tradicionalne metode, iako su danas sve više i više popularne takozvane agile metode. Agile, ili agilan, znači biti brz i sposoban reagirati na vrijeme te prilagoditi se svim promjenama u bilo kojem trenutku razvoja projekta. U radu su objašnjenje najpopularnije agile metode te su prikazane prednosti korištenja agile metoda u odnosu na tradicionalnu metodu. (37 stranica
    [Show full text]
  • Economic and Social Impacts of Google Cloud September 2018 Economic and Social Impacts of Google Cloud |
    Economic and social impacts of Google Cloud September 2018 Economic and social impacts of Google Cloud | Contents Executive Summary 03 Introduction 10 Productivity impacts 15 Social and other impacts 29 Barriers to Cloud adoption and use 38 Policy actions to support Cloud adoption 42 Appendix 1. Country Sections 48 Appendix 2. Methodology 105 This final report (the “Final Report”) has been prepared by Deloitte Financial Advisory, S.L.U. (“Deloitte”) for Google in accordance with the contract with them dated 23rd February 2018 (“the Contract”) and on the basis of the scope and limitations set out below. The Final Report has been prepared solely for the purposes of assessment of the economic and social impacts of Google Cloud as set out in the Contract. It should not be used for any other purposes or in any other context, and Deloitte accepts no responsibility for its use in either regard. The Final Report is provided exclusively for Google’s use under the terms of the Contract. No party other than Google is entitled to rely on the Final Report for any purpose whatsoever and Deloitte accepts no responsibility or liability or duty of care to any party other than Google in respect of the Final Report and any of its contents. As set out in the Contract, the scope of our work has been limited by the time, information and explanations made available to us. The information contained in the Final Report has been obtained from Google and third party sources that are clearly referenced in the appropriate sections of the Final Report.
    [Show full text]
  • Google Earth Engine Sentinel-3 OLCI Level-1 Dataset Deviates from the Original Data: Causes and Consequences
    remote sensing Technical Note Google Earth Engine Sentinel-3 OLCI Level-1 Dataset Deviates from the Original Data: Causes and Consequences Egor Prikaziuk * , Peiqi Yang and Christiaan van der Tol Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, 7500 AE Enschede, The Netherlands; [email protected] (P.Y.); [email protected] (C.v.d.T.) * Correspondence: [email protected] or [email protected]; Tel.: +31-534-897-112 Abstract: In this study, we demonstrate that the Google Earth Engine (GEE) dataset of Sentinel-3 Ocean and Land Color Instrument (OLCI) level-1 deviates from the original Copernicus Open Access Data Hub Service (DHUS) data by 10–20 W m−2 sr−1 µm−1 per pixel per band. We compared GEE and DHUS single pixel time series for the period from April 2016 to September 2020 and identified two sources of this discrepancy: the ground pixel position and reprojection. The ground pixel position of OLCI product can be determined in two ways: from geo-coordinates (DHUS) or from tie-point coordinates (GEE). We recommend using geo-coordinates for pixel extraction from the original data. When the Sentinel Application Platform (SNAP) Pixel Extraction Tool is used, an additional distance check has to be conducted to exclude pixels that lay further than 212 m from the point of interest. Even geo-coordinates-based pixel extraction requires the homogeneity of the target area at a 700 m diameter (49 ha) footprint (double of the pixel resolution). The GEE OLCI dataset can be safely used if the homogeneity assumption holds at 2700 m diameter (9-by-9 OLCI pixels) or if the uncertainty in the radiance of 10% is not critical for the application.
    [Show full text]
  • A Google Earth Engine Wetland Extent Mapping Tool Using Open Source Optical and Radar Satellite Sensors
    1/16/2020 AGU - iPosterSessions.com A Google Earth Engine Wetland Extent Mapping Tool using Open Source Optical and Radar Satellite Sensors Melissa Ferriter, Erica O'Connor, Alice Lin, Christopher Notto, Benjamin Holt, Bruce Chapman NASA Jet Propulsion Laboratory, NASA DEVELOP National Program PRESENTED AT: https://agu2019fallmeeting-agu.ipostersessions.com/Default.aspx?s=02-1E-D1-3C-F1-F0-4D-AE-39-1A-14-5D-1A-41-19-95&pdfprint=true&guestview=true 1/13 1/16/2020 AGU - iPosterSessions.com BACKGROUND Google Earth Engine, a cloud based geospatial processing platform that hosts a multi-petabyte catalog of satellite imagery, allows for large scale processing and analysis It is challenging to map wetlands at a large scale due to their spatial heterogeneity, dynamic nature, and spectral similarity to other landcover types Due to the pivotal role of wetlands in our environmental law and their vulnerability to the changing climate, there is a need for fully automated, standardized, on-demand wetland mapping on national and global scales Our objective was to create an automated, multi-sensor, and seasonal approach for wetland mapping to aid end users in assessing wetland gain and loss across the state of Minnesota https://agu2019fallmeeting-agu.ipostersessions.com/Default.aspx?s=02-1E-D1-3C-F1-F0-4D-AE-39-1A-14-5D-1A-41-19-95&pdfprint=true&guestview=true 2/13 1/16/2020 AGU - iPosterSessions.com METHODS We used an object based random forest approach to classify three landcover types: wetland, upland, and open water The results of the random forest
    [Show full text]
  • Google Cloud Dataflow – an Insight
    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2015): 78.96 | Impact Factor (2015): 6.391 Google Cloud Dataflow – An Insight Eashani Deorukhkar1 Department of Information Technology, RamraoAdik Institute of Technology, Mumbai, India Abstract: The massive explosion of data and the need for its processing has become an area of interest with many companies vying to occupy a significant share of this market. The processing of streaming data i.e. data generated by multiple sources simultaneously and continuously is an ability that some widely used data processing technologies lack. This paper discusses the “Cloud Data Flow” technology offered by Google as a possible solution to this problem and also a few of its overall pros and cons. Keywords: Data processing, streaming data, Google, Cloud Dataflow 1. Introduction To examine a real-time stream of events for significant [2] patterns and activities The processing of large datasets to generate new insights and To implement advanced, multi-step processing pipelines to relationships from it is quite a common activity in numerous extract deep insight from datasets of any size[2] companies today. However, this process is quite difficult and resource intensive even for experts[1]. Traditionally, this 3. Technical aspects processing has been performed on static datasets wherein the information is stored for a while before being processed. 3.1 Overview This method was not scalable when it came to processing of data streams. The huge amounts of data being generated by This model is specifically intended to make data processing streams like real-time data from sensors or from social on a large scale easier.
    [Show full text]
  • Google Cloud Dataflow a Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic
    Google Cloud Dataflow A Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic STREAM 2015 Agenda 1 Data Shapes 2 Data Processing Tradeoffs 3 Google’s Data Processing Story 4 Google Cloud Dataflow Score points Form teams Geographically distributed Online and Offline mode User and Team statistics in real time Accounting / Reporting Abuse detection https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg 1 Data Shapes Data... ...can be big... ...really, really big... Thursday Wednesday Tuesday ...maybe even infinitely big... 8:00 1:00 9:00 2:00 10:00 3:00 11:00 4:00 12:00 5:00 13:00 6:00 14:00 7:00 … with unknown delays. 8:00 8:00 8:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 2 Data Processing Tradeoffs Data Processing Tradeoffs 1 + 1 = 2 $$$ Completeness Latency Cost Requirements: Billing Pipeline Important Not Important Completeness Low Latency Low Cost Requirements: Live Cost Estimate Pipeline Important Not Important Completeness Low Latency Low Cost Requirements: Abuse Detection Pipeline Important Not Important Completeness Low Latency Low Cost Requirements: Abuse Detection Backfill Pipeline Important Not Important Completeness Low Latency Low Cost Google’s 3 Data Processing Story Data Processing @ Google Dataflow MapReduce FlumeJava Dremel Spanner GFS Big Table Pregel Colossus MillWheel 2002 2004 2006 2008 2010 2012 2014 2016 Data Processing @ Google Dataflow MapReduce FlumeJava Dremel Spanner GFS Big Table Pregel Colossus MillWheel 2002 2004 2006 2008 2010
    [Show full text]
  • Create a Created Date Schema in Bigquery
    Create A Created Date Schema In Bigquery Denominative Elmore canoodles funereally. Auxiliary Rudy acquiesce: he unbar his aril adequately and contagiously. Overreaching Randolf excused some scrabble and shacks his syneresis so midnight! Wait polls with exponential backoff. The schema in tests are many cases. Perform various levels of dates in schema definition file somewhere on create date column when you created in an identical data will lead at. You'd may that avoiding two scans of SALES would improve performance However remember most circumstances the draw BY version takes about twice as suite as the lot BY version. You run some SQL queries against that data. The script then creates a partition function, if configured. The order in which clustering is done matters. Developer relations lead at SKB Kontur. We create schema in the dates and their very wide variety of bigquery operation creates a working directory as a few seconds of model you. By using the client libraries. Database services to migrate, please leave it empty. What are partitions in SQL? In our case the join condition is matching dates from the pedestrian table and the bike table. The mother source records need is have a machine image updates to merge correctly. Python community version selection works in schema of bigquery operation on create date column can be created in a new custom schema tab. Once per store the data needs to simplify etl to any data are also a bigquery? Google Analytics 360 users that have that up the automatic BigQuery export will rejoice by this. Whether to automatically infer options and schema for CSV and JSON sources.
    [Show full text]
  • Progressive Web Applications
    Bachelor’s thesis Degree programme in Information and Communications Technology 2019 Archana Karki PROGRESSIVE WEB APPLICATIONS - Powerful websites, functional as native mobile apps BACHELOR’S THESIS | ABSTRACT TURKU UNIVERSITY OF APPLIED SCIENCES Degree programme in Information and Communication Technology 2019| 60 pages Archana Karki PROGRESSIVE WEB APPS • Powerful websites, usable as native mobile apps Current web platform and browsers are adequately powerful to support mobile and desktop applications. Application Program Interfaces (APIs) that supports the integration of mobile device and web browsers can execute features such as notifications, push messages, home screen icon, device camera and so on. The concept of progressive web apps is to make regular websites functional as mobile and desktop app, without compromising the user experience of traditionally used native apps. In order to understand the concept of Progressive Web Apps (PWAs), a functional app was created using HTML, CSS, JavaScript, along with the manifest file, service workers, and web APIs. The primary objective of the thesis was to understand and implement the technologies of Progressive Web Applications (PWAs). Hence, a detailed study of the topic and development of a PWA was also carried out. The study showed that one PWA could serve the function of a website, mobile app and a desktop app efficiently. On one hand, the technology used for making a PWA is not too complicated for web developers using HTML, CSS, and JavaScript, and on the other hand, simple files can turn an existing HTTPS website into a fully functional app, saving the cost of developing a new native app. KEYWORDS: Web apps, web, native apps, hybrid app, service workers, app shell, cache API, device integration and notification API.
    [Show full text]
  • Exploring the Prevalence and Evolution of Android Concerns: a Community Viewpoint
    Journal of Software Exploring the Prevalence and Evolution of Android Concerns: A Community Viewpoint Sherlock A. Licorish* Department of Information Science, University of Otago, PO Box 56, Dunedin 9054, New Zealand. Manuscript submitted May 3, 2016; accepted August 12, 2016. * Corresponding author. Tel.: 64 3 479 8319; email: [email protected] doi: 10.17706/jsw.11.9.848-869 Abstract: In line with growing awareness of the need for systems to adapt quickly to change, there has been increasing interest in the evolution of software systems. Research has particularly considered developer-led activities change over time. Comparatively less consideration has been given to the study of software evolution as driven by the wider community of stakeholders. Although, a project’s wider community is central to the feedback system and project success. We have contributed to such efforts and studied the evolution of architecture issues and non-functional requirements in the Android project, as identified by the wider Android community1. We mined the Android issues tracker, employing n-gram analysis in our examination of 21,547 issues. We observe that most architecture-related issues were located in Android application layer, and these issues increased with time. Additionally, usability-related concerns were reported most when they were held to be given greatest attention. Outcomes suggests that Android’s open model and shared ownership have positively impacted Google’s success, which could provide a useful lesson for other similar projects. Key words: Android, android architecture, android non-functional requirements, software evolution, data mining and N-gram analysis. 1. Introduction A large body of research has been directed to understanding various aspects of the software development process (as performed by humans), and particularly, how systems evolve.
    [Show full text]