Rethinking the Architecture of the Web

Total Page:16

File Type:pdf, Size:1020Kb

Rethinking the Architecture of the Web Rethinking the Architecture of the Web A Dissertation Presented by Liang Zhang to The College of Computer and Information Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Northeastern University Boston, Massachusetts July 2016 To my family i Contents List of Figures v List of Tables viii Acknowledgments ix Abstract of the Dissertation x 1 Introduction 1 1.1 Contributions . .5 1.2 Outline . .7 2 Background 8 2.1 Web browsers . .8 2.1.1 Dynamic web pages . .9 2.1.2 Browser plug-ins . 10 2.1.3 HTML5 . 11 2.1.4 Mobile web browsers . 12 2.2 JavaScript . 13 2.2.1 The web and beyond . 13 2.3 Web servers . 14 2.3.1 Web application servers . 15 2.3.2 Privacy . 15 2.4 Cloud services . 16 2.4.1 Content distribution networks (CDNs) . 17 3 Maygh: Rethinking web content distribution 19 3.1 Motivation . 19 3.2 Maygh potential . 21 3.3 Maygh design . 22 3.3.1 Web browser building blocks . 22 3.3.2 Model, interaction, and protocol . 23 3.3.3 Maygh client . 25 3.3.4 Maygh coordinator . 28 ii 3.3.5 Multiple coordinators . 28 3.4 Security, privacy, and impact on users . 30 3.4.1 Security . 30 3.4.2 Privacy . 31 3.4.3 Impact on users . 32 3.4.4 Mobile users . 32 3.5 Implementation . 33 3.6 Evaluation . 33 3.6.1 Client-side microbenchmarks . 34 3.6.2 Coordinator scalability . 36 3.6.3 Trace-based simulation . 38 3.6.4 Small-scale deployment . 43 3.7 Summary . 44 4 Priv.io: Rethinking web applications 45 4.1 Motivation . 45 4.2 Overview . 47 4.2.1 Cost study . 48 4.3 Design . 51 4.3.1 Assumptions . 51 4.3.2 Attribute-based encryption . 52 4.3.3 Priv.io building blocks . 53 4.3.4 Priv.io operations . 54 4.4 Third-party applications . 57 4.4.1 Application API . 57 4.4.2 Managing applications . 59 4.4.3 Security and privacy . 59 4.4.4 Limitations . 60 4.4.5 Demonstration applications . 60 4.5 Discussion . 62 4.6 Evaluation . 64 4.6.1 Microbenchmarks . 66 4.6.2 User-perceived performance . 67 4.6.3 Small-scale deployment . 69 4.7 Summary . 69 5 Picocenter: Rethinking computation 71 5.1 Motivation . 72 5.2 Long-lived mostly-idle applications . 74 5.3 Picocenter Architecture . 75 5.3.1 Interface to cloud tenants . 75 5.3.2 Challenges . 76 5.3.3 Architecture overview . 76 5.3.4 Hub . 77 5.3.5 Worker . 79 iii 5.4 Quickly Swapping in Processes . 80 5.4.1 Swapping out applications . 81 5.4.2 Swapping in applications . 81 5.5 Implementation and Discussion . 84 5.5.1 Implementation . 84 5.5.2 Swapping implementation . 85 5.5.3 Deployment issues . 86 5.6 Evaluation . 88 5.6.1 Evaluation setup . 89 5.6.2 Microbenchmarks . 89 5.6.3 Swapping performance . 90 5.6.4 ActiveSet . 93 5.6.5 Cost . 96 5.7 Summary . 97 6 Related Work 98 6.1 Content distribution . 98 6.1.1 Optimizing CDNs . 98 6.1.2 Non-CDN approaches . 99 6.2 Privacy . 100 6.2.1 Enhancing web browsers . 100 6.2.2 Web services . 100 6.2.3 Decentralized network systems . 102 6.3 Computation . 102 6.3.1 Hardware virtualization . 102 6.3.2 Software environments . 103 6.3.3 Pre-paging and migration . 104 6.3.4 Checkpoint and restore . 104 6.3.5 Code offloading . 104 7 Conclusion 106 7.1 Summary . 106 7.2 Extensions . 108 A Supporting ActiveSet in CRIU 110 Bibliography 112 iv List of Figures 3.1 Overview of how content is delivered in Maygh, implemented in JavaScript (JS). A client requesting content is connected to another client storing the content with the coordinator’s help. The content is transferred directly between clients. 24 3.2 Maygh messages sent when fetching an object in Maygh between two clients (peer- ids pid1 and pid2). pid1 requests a peer storing content-hash obj-hash1, and is given pid2. The two clients then connect directly (with the coordinator’s assistance, using STUN if needed) to transfer the object. 26 3.3 Overview of how multiple coordinators work together on Maygh. The mapping from objects to lists of peers is distributed using consistent hashing, and the coordinator each peer is attached to is also stored in this list. The maximum number of lookups a coordinator must do to satisfy a request is two: one to determine a peer storing the requested item, and another to contact the coordinator that peer is attached to. 29 3.4 Average response time versus transaction rate for a single coordinator. The coordina- tor can support 454 transactions per second with under 15ms latency. ..
Recommended publications
  • BOROUGH of MANHATTAN COMMUNITY COLLEGE City University of New York Department of Computer Information Systems Office S150/Phone: 212-220-1476
    BOROUGH OF MANHATTAN COMMUNITY COLLEGE City University of New York Department of Computer Information Systems Office S150/Phone: 212-220-1476 Web Programming II Class hours: 2 CIS 485 Lab hours: 2 Spring 2012 Credits: 3 Course Description: This course will introduce students to server-side web programming. Emphasis is placed on database connectivity in order to solve intermediate level application problems. Students will be tasked with web projects that facilitate understanding of tier design and programming concepts. The overall goal of this course is to create a shopping cart application with a login and database component. Prerequisites/Co-requisite: Basic Skills: ENG 088, ESL 062, ACR 094, MAT 012/051; CIS 385 Web Programming I Learning Outcomes and Assessment After completing this course, students will be able to: Outcome: Demonstrate the use of a database with server-side scripting Assessment: Lab exercises and exam questions Outcome: Demonstrate the use a Cookie and Session manager with server-side scripting Assessment: Final project, lab exercises and exam questions Outcome: Develop a database-driven website Assessment: Lab exercises Outcome: Design and develop a shopping-cart application with a login and database component Assessment: Final Project General Education Outcomes and Assessment Quantitative Skills – Students will use quantitative skills and concepts and methods of mathematics to solve problems Assessment: Use formulas and concepts of mathematics to solve problems in programming assignments Information and Technology
    [Show full text]
  • DLCGI Advanced Uses
    DLCGI Advanced Uses Using DLCGI to achieve single sign-on with The Diver Solution In situations where authentication to DiveLine needs to be integrated with an existing authentication scheme, we provide "DLCGI", the DiveLine-Common Gateway Interface interfacing module. The "Common Gateway Interface" is a standard for interfacing external scripts and programs with a web server. How DLCGI works When dlcgi.exe is executed by the webserver, in the context of a user that the web server has already authenticated, it obtains a limited-lifetime one-time password from DiveLine. This password can be passed, via web page redirects, custom web page scripting, etc., to DivePort, NetDiver, or even ProDiver to allow the user to login. The typical strategy for using DLCGI is: 1. Configure DiveLine to accept DLCGI requests from your webserver. 2. Install dlcgi.exe in a scripts directory (e.g. /cgi-bin/) on your CGI-compliant webserver (e.g. IIS, Apache). You configure the name of your DiveLine server and other parameters using dlcgi.cfg in the same directory as the executable. 3. Restrict access to this script so that the webserver will only execute it when the user has already authenticated (e.g. Domain account). Typical uses • DivePort: Users go to the DivePort site, and are redirected to another URL for authentication. That URL, which runs dlcgi.exe, redirects the user back to the DivePort URL with a one-use authentication token. • ProDiver: When ProDiver connects to DiveLine, if configured with a DLCGI URL, it will access the URL in "raw" mode (see below) to obtain a parse-able result file containing a one-use DiveLine authentication token.
    [Show full text]
  • Distributing an SQL Query Over a Cluster of Containers
    2019 IEEE 12th International Conference on Cloud Computing (CLOUD) Distributing an SQL Query Over a Cluster of Containers David Holland∗ and Weining Zhang† Department of Computer Science, University of Texas at San Antonio Email: ∗[email protected], †[email protected] Abstract—Emergent software container technology is now across a cluster of containers in a cloud. A feasibility study available on any cloud and opens up new opportunities to execute of this with performance analysis is reported in this paper. and scale data intensive applications wherever data is located. A containerized query (henceforth CQ) uses a deployment However, many traditional relational databases hosted on clouds have not scaled well. In this paper, a framework and deployment methodology that is unique to each query with respect to the methodology to containerize relational SQL queries is presented, number of containers and their networked topology effecting so that, a single SQL query can be scaled and executed by data flows. Furthermore a CQ can be scaled at run-time a network of cooperating containers, achieving intra-operator by adding more intra-operators. In contrast, the traditional parallelism and other significant performance gains. Results of distributed database query deployment configurations do not container prototype experiments are reported and compared to a real-world RDBMS baseline. Preliminary result on a research change at run-time, i.e., they are static and applied to all cloud shows up to 3-orders of magnitude performance gain for queries. Additionally, traditional distributed databases often some queries when compared to running the same query on a need to rewrite an SQL query to optimize performance.
    [Show full text]
  • Leukemia Medical Application with Security Features
    Journal of Software Leukemia Medical Application with Security Features Radhi Rafiee Afandi1, Waidah Ismail1*, Azlan Husin2, Rosline Hassan3 1 Faculty Science and Technology, Universiti Sains Islam Malaysia, Negeri Sembilan, Malaysia. 2 Department of Internal Medicine, School of Medicine, Universiti Sains Malaysia, Kota Bahru, Malaysia. 3 Department of Hematology, School of Medicine, Universiti Sains Malaysia, Kota Bahru, Malaysia. * Corresponding author. Tel.: +6 06 7988056; email: [email protected]. Manuscript submitted January 27, 2015; accepted April 28, 2015 doi: 10.17706/jsw.10.5.577-598 Abstract: Information on the Leukemia patients is very crucial by keep track medical history and to know the current status of the patient’s. This paper explains on development of Hematology Information System (HIS) in Hospital Universiti Sains Malaysia (HUSM). HIS is the web application, which is the enhancement of the standalone application system that used previously. The previous system lack of the implementation of security framework and triple ‘A’ elements which are authentication, authorization and accounting. Therefore, the objective of this project is to ensure the security features are implemented and the information safely kept in the server. We are using agile methodology to develop the HIS which the involvement from the user at the beginning until end of the project. The user involvement at the beginning user requirement until implemented. As stated above, HIS is web application that used JSP technology. It can only be access within the HUSM only by using the local Internet Protocol (IP). HIS ease medical doctor and nurse to manage the Leukemia patients. For the security purpose HIS provided password to login, three different user access levels and activity log that recorded from each user that entered the system Key words: Hematology information system, security feature, agile methodology.
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]
  • Erlang on Physical Machine
    on $ whoami Name: Zvi Avraham E-mail: [email protected] /ˈkɒm. pɑː(ɹ)t. mɛntl̩. aɪˌzeɪ. ʃən/ Physicalization • The opposite of Virtualization • dedicated machines • no virtualization overhead • no noisy neighbors – nobody stealing your CPU cycles, IOPS or bandwidth – your EC2 instance may have a Netflix “roommate” ;) • Mostly used by ARM-based public clouds • also called Bare Metal or HPC clouds Sandbox – a virtual container in which untrusted code can be safely run Sandbox examples: ZeroVM & AWS Lambda based on Google Native Client: A Sandbox for Portable, Untrusted x86 Native Code Compartmentalization in terms of Virtualization Physicalization No Virtualization Virtualization HW-level Virtualization Containerization OS-level Virtualization Sandboxing Userspace-level Virtualization* Cloud runs on virtual HW HARDWARE Does the OS on your Cloud instance still supports floppy drive? $ ls /dev on Ubuntu 14.04 AWS EC2 instance • 64 teletype devices? • Sound? • 32 serial ports? • VGA? “It’s DUPLICATED on so many LAYERS” Application + Configuration process* OS Middleware (Spring/OTP) Container Managed Runtime (JVM/BEAM) VM Guest Container OS Container Guest OS Hypervisor Hardware We run Single App per VM APPS We run in Single User mode USERS Minimalistic Linux OSes • Embedded Linux versions • DamnSmall Linux • Linux with BusyBox Min. Linux OSes for Containers JeOS – “Just Enough OS” • CoreOS • RancherOS • RedHat Project Atomic • VMware Photon • Intel Clear Linux • Hyper # of Processes and Threads per OS OSv + CLI RancherOS processes CoreOS threads
    [Show full text]
  • Getting Started Guide with Wiz550web Getting Started Guide with Wiz550web
    2015/02/09 17:48 1/21 Getting Started Guide with WIZ550web Getting Started Guide with WIZ550web WIZ550web is an embedded Web server module based on WIZnet’s W5500 hardwired TCP/IP chip, Users can control & monitor the 16-configurable digital I/O and 4-ADC inputs on module via web pages. WIZ550web provides the firmware & web page examples for user’s customization. This page describes the following topics: ● Product Preview ● Hello world ❍ Product contents ❍ SD card initialization ❍ Data flash initialization ❍ Serial debug message ● WIZ550web Basic operations and CGI ● Basic Demo Webpage ● Examples for WIZ550web customization Users can download the following source codes from the 'WIZ550web GitHub Repository' ● Firmware source code (The projects for Eclipse IDE) ❍ Application / Boot ● Demo webpage WIZ550web GitHub Repository https://github.com/Wiznet/WIZ550web WIZ550web GitHub Page http://wiznet.github.io/WIZ550web Develop Environment - http://wizwiki.net/wiki/ Last update: 2015/02/09 products:wiz550web:wiz550webgsg_en http://wizwiki.net/wiki/doku.php?id=products:wiz550web:wiz550webgsg_en 13:05 ● Eclipse IDE for C/C++ Developers, Kepler Service Release 2 ● ARM GCC 4.8.3 (2014q1) Product Preview Hello World Product Contents Ordering Part No: WIZ550web ● WIZ550web module x 1 Ordering Part No: WIZ550web-EVB ● WIZ550web module x 1 ● WIZ550web baseboard x 1 http://wizwiki.net/wiki/ Printed on 2015/02/09 17:48 2015/02/09 17:48 3/21 Getting Started Guide with WIZ550web ● LAN cable x 1 ● Serial cable x 1 ● 12V Power adapter x 1 SD card is option for both WIZ550web and WIZ550web-EVB Refer to recommended lists of SD card. Vendor Capacity(Bytes) Type Class 2G SD n/a Sandisk 4G SDHC 4 8G SDHC 4 Samsung 4G SDHC 6 Transcend 4G SDHC 4,10 SD card Initialization WIZ550web uses Micro SD card as a storage for web content and SD card is not included as default.
    [Show full text]
  • CGI Scripts: Gateways to World-Wide Web Power
    Behavior Research Methods. Instruments. & Computers 1996,28 (2), 165-169 CGI scripts: Gateways to World-Wide Web power JAMES M. KIELEY Miyazaki International CoUege, Miyazaki, Japan The power of the hypertext-based information presentation system known as the World-Wide Web can be enhanced by scripts based on the common gateway interface (CG!) protocol. CG! scripts re­ siding on a Webserver permit the execution of computer programs that can perform a wide variety of functions that maybe useful to psychologists. Example applications are presented here, along with ref­ erence information for potential script developers. The majority ofinformation that people access via the permit users to input data by clicking on checkboxes, hypertext-based information presentation system known radio buttons, menus, reset buttons, and submit buttons, as the World-Wide Web (WWW) is actually stored in the and also by typing into text fields (Lemay, 1995). form of static files-that is, text and graphics files that COl was developed by the original programmers ofthe appear a certain way when viewed from a Web browser, UNIX-based CERN and NCSA HTTP Web servers to such as Netscape or Mosaic, because ofa command lan­ supersede a prior scripting environment called HTBIN. guage known as HTML. HTML, by its original design, is Other Web servers that support scripting, including those a simple command set used to present multimedia infor­ based on other operating systems, mayor may not use mation that can be accessed asynchronously. The capa­ the COl protocol. Early applications of COl scripts in­ bilities ofHTML, and, therefore, the WWW, can be im­ cluded using them to serve information to a browser that proved with scripts conforming to the common gateway is in a format that is otherwise unreadable, such as an SQL interface (COl) protocol.
    [Show full text]
  • The Common Gateway Interface and Server-Side Programming
    WebWeb MasterMaster 11 IFIIFI Andrea G. B. Tettamanzi Université de Nice Sophia Antipolis Département Informatique [email protected] Andrea G. B. Tettamanzi, 2019 1 Unit 3 The Common Gateway Interface and Server-side Programming Andrea G. B. Tettamanzi, 2019 2 Agenda • The Common Gateway Interface • Server-Side Programming Andrea G. B. Tettamanzi, 2019 3 Introduction • An HTTP server is often used as a gateway to a different information system (legacy or not), for example – an existing body of documents – an existing database application • The Common Gateway Interface (CGI) is an agreement between HTTP server implementors about how to integrate such gateway scripts and programs • It was typically (but not exclusively) used in conjunction with HTML forms to build database applications • Nowadays largely superseded by dynamic Web content technologies such as PHP, ASP.NET, Java Servlets, and Node.js Andrea G. B. Tettamanzi, 2019 4 The Common Gateway Interface • The Common Gateway Interface (CGI) is a de facto standard protocol for Web servers to execute an external program that generates a Web page dynamically • The external program executes like a console application running on the same machine as the Web server (the host) • Such program is known as a CGI script or simply as a CGI Andrea G. B. Tettamanzi, 2019 5 How Does That Work? • Each time a client requests the URL corresponding to a CGI program, the server will execute it in real-time – E.g.: GET http://www.example.org/cgi-bin/add?x=2&y=2 • The output of the program will go more or less directly to the client • Strictly speaking, the “input” to the program is the HTTP request • Environment variables are used to pass data about the request from the server to the program – They are accessed by the script in a system-defined manner – Missing environment variable = NULL value – Character encoding is system-defined Andrea G.
    [Show full text]
  • Common Gateway Interface Reference Guide
    COMMON GATEWAY INTERFACE REFERENCE GUIDE Copyright © 1998 The President and Fellows of Harvard College All rights reserved Common Gateway Interface (CGI) Reference Guide The Harvard Computer Society Table of Contents Introduction...............................................................................................................................................................1 How the Web Really Works ...................................................................................................................................1 GET and POST ......................................................................................................................................................1 Perl and CGI ..............................................................................................................................................................2 Here Document Quoting ........................................................................................................................................2 The CGI.pm Module...............................................................................................................................................2 Returning a Web Page...........................................................................................................................................3 Sending Mail ..........................................................................................................................................................4 Maintaining State......................................................................................................................................................5
    [Show full text]
  • Dynamic Web Content Technologies
    Dynamic web content technologies CSCI 470: Web Science • Keith Vertanen Overview • Dynamic content – What it is – Sources of input • CGI (Common Gateway Interface) – FastCGI • Server-side scripng – PHP, ASP, JSP • Web server modules • Custom web server 2 Stac vs. dynamic • Stac content – Images and pages don't change • Always the same, liKe a file server – Fast to deliver, easy to cache • Dynamic content – Same URL results in different delivered HTML • e.g. different preference on # of products to display – May change as user interac?on progresses • e.g. adding items to a shopping cart – Need something besides just HTTP and HTML • HTTP is stateless • HTML is not programmable (e.g. condi?onal, loops) 3 Input to dynamic pages • Form fields – <INPUT> tags inside the <FORM> tag – URL encoding (percent-encoded) • if GET: in the URL, if POST: in the HTTP payload • Unreserved characters: – ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijKlmnopqrstuvwxyz0123456789-_.~ • Reserved characters: – !*'();:@&=$,/?#[] – Converted to %XX, where XX is ASCII in hexadecimal – %20 = space (also +), %21 = !, %23 = #, %25 = %, … • Most languages have URL encode/decode func?ons 4 Input to dynamic pages • Cookies – Differen?ate clients hing same page • Other input sources: – User agent (browser) – HTTP referer • Misspelled since original RFC 1945 • The page you came from to get to this one – Client's IP address – Time of day – … 5 CGI • CGI (Common Gateway Interface) – In use since 1993 – Requests a URL in a special locaon/file extension • e.g. h~p://www.blah.com/cgi-bin/looKup
    [Show full text]
  • Lecture Forms and Common Gateway Interface Mechanism
    Lecture Forms and Common Gateway Interface Mechanism 6WXaZ(OP\,c,.TT(0,1$'W_(+b,)111%78786:# 4 Forms • Used to create a set of pages that contain fields in which the viewer can select and supply information – Introduced very early in HTML 2.0 – Allows WWW users to perform data entry – Permit direct interaction with customers for inquiries, registration, sales of products, and services – To create a capability requires two steps: • Use HTML form elements to create the pages that contain the form • Write a server-side script to process form data; this program must be placed so the WWW server can execute it 6WXaZ(OP\,c,.TT(0,1$'W_(+b,)111%78786:# 7 The Original Set of User Interface Elements %-0154' 4/01 A9B/ (DCHE/ .2/348CI 2789C:5;11CB ,G8A91 68'0.7 2/E/1 5(6(7 17EEHCD: )))) %4(64'2('' *0/ %,(.(.4' $(), $:007 $5=0 6WXaZ(OP\,c,.TT(0,1$'W_(+b,)111%78786:# 9 FORM Element and Some Attributes • Syntax <FORM> ... </FORM> • Attribute Specifications – ACTION= URI (form handler) – METHOD=[ get | post ] (HTTP method for submitting form) • GET is the default; form contents are appended to the URL • POST form contents to be sent as payload – ENCTYPE= ContentType (content type to submit form as) • Defaults to application/x-www-urlencoded which returns name/value pairs, separated by &, spaces replaced by + and reserved characters (like #) replaced by %HH, H a hex digit – ACCEPT-CHARSET= Charsets (supported character encodings) – TARGET= FrameTarget (frame to render form result in, in HTML4) (a browsing context name or keyword, in HTML5, such as _self,
    [Show full text]