Forest for the Trees for Adversarial Code Stylometry

Total Page:16

File Type:pdf, Size:1020Kb

Forest for the Trees for Adversarial Code Stylometry StyleCounsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry by Christopher McKnight A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2018 c Christopher McKnight 2018 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Authorship attribution has piqued the interest of scholars for centuries, but had historically remained a matter of subjective opinion, based upon examination of handwriting and the physical document. Midway through the 20th Century, a technique known as stylometry was developed, in which the content of a document is analyzed to extract the author’s grammar use, preferred vocabulary, and other elements of compositional style. In parallel to this, programmers, and particularly those involved in education, were writing and testing systems designed to automate the analysis of good coding style and best practice, in order to assist with grading assignments. In the aftermath of the Morris Worm incident in 1988, researchers began to consider whether this automated analysis of program style could be combined with stylometry techniques and applied to source code, to identify the author of a program. The results of recent experiments have suggested this code stylometry can successfully iden- tify the author of short programs from among hundreds of candidates with up to 98% precision. This potential ability to discern the programmer of a sample of code from a large group of possible authors could have concerning consequences for the open-source community at large, particularly those contributors that may wish to remain anonymous. Recent international events have suggested the developers of certain anti-censorship and anti-surveillance tools are being targeted by their governments and forced to delete their repositories or face prosecution. In light of this threat to the freedom and privacy of individual programmers around the world, and due to a dearth of published research into practical code stylometry at scale and its feasibility, we carried out a number of investigations looking into the difficulties of applying this technique in the real world, and how one might effect a robust defence against it. To this end, we devised a system to aid programmers in obfuscating their inherent style and imitating another, overt, author’s style in order to protect their anonymity from this forensic technique. Our system utilizes the implicit rules encoded in the decision points of a random forest ensemble in order to derive a set of recommendations to present to the user detailing how to achieve this obfuscation and mimicry attack. In order to best test this system, and simultaneously assess the difficulties of performing practical stylometry at scale, we also gathered a large corpus of real open-source software and devised our own feature set including both novel attributes and those inspired or borrowed from other sources. Our results indicate that attempting a mass analysis of publicly available source code is fraught with difficulties in ensuring the integrity of the data. Furthermore, we found ours and most other published feature sets do not sufficiently capture an author’s style independently of the content to be very effective at scale, although its accuracy is significantly greater than a random guess. Evaluations of our tool indicate it can successfully extract a set of changes that would iii result in a misclassification as another user if implemented. More importantly, this extraction was independent of the specifics of the feature set, and therefore would still work even with a more accurate model of style. We ran a limited user study to assess the usability of the tool, and found overall it was beneficial to our participants, and could be even more beneficial if the valuable feedback we received were implemented in future work. iv Acknowledgements This work benefitted from the use of the CrySP RIPPLE Facility at the University of Water- loo. I would like to thank my supervisor Ian Goldberg for his guidance, encouragement, and res- olute attention to detail. Our weekly meetings were a great help in nudging me toward the finish line, while the many opportunities for personal development offered throughout the duration of my programme were priceless. Truly my eyes have been opened to a world beyond that with which I was familiar, and I shall never look back. To the members of my committee, Mike Godfrey and Yaoliang Yu, I am very grateful for your time, expertise and valuable feedback. Being able to use the private study rooms at Waterloo Public Library while writing this thesis was priceless. Finally, I would like to thank Radio X and all its DJs for keeping my sanity during long hours of coding and writing, particularly Johnny Vaughan and his “4til7 Thang” (even Little Si). v Dedication For my loving and supportive wife Katya, whose tireless drive and work ethic was a constant inspiration, and our son James, whose companionship on many contemplative early morning walks helped get me though even the darkest hours of this arduous journey. vi Table of Contents List of Tablesx List of Figures xi 1 Introduction1 2 Motivation4 3 Background8 3.1 The Federalist Papers................................8 3.2 The Morris/Internet Worm............................. 10 4 Literature Review 12 4.1 Authorship Attribution of Natural Language.................... 12 4.1.1 Early Work................................. 12 4.1.2 Computer-Assisted Studies......................... 14 4.1.3 Internet-Scale Authorship Attribution................... 20 4.2 Plagiarism Detection................................ 27 4.2.1 Intrinsic Plagiarism Detection....................... 28 4.2.2 Authorship Verification........................... 29 4.3 Authorship Attribution of Software......................... 29 4.3.1 Source Code Attribution.......................... 30 vii 4.3.2 Executable Code Attribution........................ 39 4.4 Defences Against Authorship Attribution..................... 42 5 Implementation and Methodology 51 5.1 Contributions.................................... 51 5.2 Background..................................... 53 5.2.1 Choice of Programming Language..................... 53 5.2.2 Eclipse IDE................................. 54 5.2.3 Weka Machine Learning.......................... 55 5.2.4 Random Forest............................... 55 5.2.5 GitHub................................... 58 5.3 Obtaining Data................................... 58 5.3.1 The GitHub Data API........................... 59 5.3.2 Rate Limiting................................ 60 5.3.3 Ensuring Sole and True Authorship.................... 63 5.3.4 Removing Duplicate and Common Files.................. 64 5.3.5 Summary of Data Collection........................ 65 5.4 Feature Extraction.................................. 66 5.4.1 Node Frequencies.............................. 68 5.4.2 Node Attributes............................... 71 5.4.3 Identifiers.................................. 71 5.4.4 Comments................................. 73 5.4.5 Other.................................... 74 5.5 Training and Making Predictions.......................... 75 5.6 Making Recommendations............................. 75 5.6.1 Requirements................................ 76 5.6.2 Parsing the Random Forest......................... 79 5.6.3 Analyzing the Split Points......................... 81 viii 5.6.4 Presenting to the User........................... 92 5.7 Using the Plugin.................................. 93 5.8 Pilot User Study................................... 98 5.8.1 Study Details................................ 98 6 Results 100 6.1 Conducting Source Code Authorship Attribution at the Internet Scale...... 100 6.2 Extracting a Class of Feature Vectors That Can Systematically Effect a Classifi- cation as Any Given Target............................. 107 6.3 Pilot User Study................................... 113 6.3.1 Results................................... 113 6.3.2 Experiences with Manual Task....................... 114 6.3.3 Experiences with Assisted Task...................... 115 6.3.4 Summary of User Study.......................... 117 7 Conclusions 119 7.1 Future Work..................................... 122 7.2 Final Remarks.................................... 124 References 127 APPENDICES 141 A User Study Questionnaire 142 ix List of Tables 5.1 Repositories per author............................... 66 5.2 Node class hierarchy counts............................ 70 5.3 Node attributes................................... 71 6.1 Investigation into repositories that completely failed................ 106 x List of Figures 5.1 Node class hierarchy................................ 69 5.2 Plugin menu..................................... 94 5.3 Resource selection dialog for training....................... 95 5.4 Resource selection dialog for evaluation...................... 96 5.5 Individual file output................................ 97 5.6 Aggregate output.................................. 97 5.7 Overall recommendations view........................... 97 5.8 Sample recommendations—statements....................... 98 5.9 Sample recommendations—unary expressions................... 98 6.1 Identifying the author of each file.........................
Recommended publications
  • Uila Supported Apps
    Uila Supported Applications and Protocols updated Oct 2020 Application/Protocol Name Full Description 01net.com 01net website, a French high-tech news site. 050 plus is a Japanese embedded smartphone application dedicated to 050 plus audio-conferencing. 0zz0.com 0zz0 is an online solution to store, send and share files 10050.net China Railcom group web portal. This protocol plug-in classifies the http traffic to the host 10086.cn. It also 10086.cn classifies the ssl traffic to the Common Name 10086.cn. 104.com Web site dedicated to job research. 1111.com.tw Website dedicated to job research in Taiwan. 114la.com Chinese web portal operated by YLMF Computer Technology Co. Chinese cloud storing system of the 115 website. It is operated by YLMF 115.com Computer Technology Co. 118114.cn Chinese booking and reservation portal. 11st.co.kr Korean shopping website 11st. It is operated by SK Planet Co. 1337x.org Bittorrent tracker search engine 139mail 139mail is a chinese webmail powered by China Mobile. 15min.lt Lithuanian news portal Chinese web portal 163. It is operated by NetEase, a company which 163.com pioneered the development of Internet in China. 17173.com Website distributing Chinese games. 17u.com Chinese online travel booking website. 20 minutes is a free, daily newspaper available in France, Spain and 20minutes Switzerland. This plugin classifies websites. 24h.com.vn Vietnamese news portal 24ora.com Aruban news portal 24sata.hr Croatian news portal 24SevenOffice 24SevenOffice is a web-based Enterprise resource planning (ERP) systems. 24ur.com Slovenian news portal 2ch.net Japanese adult videos web site 2Shared 2shared is an online space for sharing and storage.
    [Show full text]
  • Watson Daniel.Pdf (5.294Mb)
    Source Code Stylometry and Authorship Attribution for Open Source by Daniel Watson A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2019 c Daniel Watson 2019 Author's Declaration I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Public software repositories such as GitHub make transparent the development history of an open source software system. Source code commits, discussions about new features and bugs, and code reviews are stored and carefully attributed to the appropriate developers. However, sometimes governments may seek to analyze these repositories, to identify citi- zens who contribute to projects they disapprove of, such as those involving cryptography or social media. While developers who seek anonymity may contribute under assumed identi- ties, their body of public work may be characteristic enough to betray who they really are. The ability to contribute anonymously to public bodies of knowledge is extremely impor- tant to the future of technological and intellectual freedoms. Just as in security hacking, the only way to protect vulnerable individuals is by demonstrating the means and strength of available attacks so that those concerned may know of the need and develop the means to protect themselves. In this work, we present a method to de-anonymize source code contributors based on the authors' intrinsic programming style.
    [Show full text]
  • Shedding Light on Mobile App Store Censorship
    Shedding Light on Mobile App Store Censorship Vasilis Ververis Marios Isaakidis Humboldt University, Berlin, Germany University College London, London, UK [email protected] [email protected] Valentin Weber Benjamin Fabian Centre for Technology and Global Affairs University of Telecommunications Leipzig (HfTL) University of Oxford, Oxford, UK Humboldt University, Berlin, Germany [email protected] [email protected] ABSTRACT KEYWORDS This paper studies the availability of apps and app stores across app stores, censorship, country availability, mobile applications, countries. Our research finds that users in specific countries do China, Russia not have access to popular app stores due to local laws, financial reasons, or because countries are on a sanctions list that prohibit ACM Reference Format: Vasilis Ververis, Marios Isaakidis, Valentin Weber, and Benjamin Fabian. foreign businesses to operate within its jurisdiction. Furthermore, 2019. Shedding Light on Mobile App Store Censorship. In 27th Conference this paper presents a novel methodology for querying the public on User Modeling, Adaptation and Personalization Adjunct (UMAP’19 Ad- search engines and APIs of major app stores (Google Play Store, junct), June 9–12, 2019, Larnaca, Cyprus. ACM, New York, NY, USA, 6 pages. Apple App Store, Tencent MyApp Store) that is cross-verified by https://doi.org/10.1145/3314183.3324965 network measurements. This allows us to investigate which apps are available in which country. We primarily focused on the avail- ability of VPN apps in Russia and China. Our results show that 1 INTRODUCTION despite both countries having restrictive VPN laws, there are still The widespread adoption of smartphones over the past decade saw many VPN apps available in Russia and only a handful in China.
    [Show full text]
  • Internet Infrastructure Review Vol.27
    Internet Infrastructure Vol.27 Review May 2015 Infrastructure Security Increasingly Malicious PUAs Messaging Technology Anti-Spam Measure Technology and DMARC Trends Web Traffic Report Report on Access Log Analysis Results for Streaming Delivery of the 2014 Summer Koshien Inte r ne t In f r ast r uc t ure Review Vol.27 May 2015 Executive Summary ———————————————————3 1. Infrastructure Security ———————————————4 Table of Contents Table 1.1 Introduction —————————————————————— 4 1.2 Incident Summary ——————————————————— 4 1.3 Incident Survey ——————————————————— 11 1.3.1 DDoS Attacks —————————————————————— 11 1.3.2 Malware Activities ———————————————————— 13 1.3.3 SQL Injection Attacks —————————————————— 16 1.3.4 Website Alterations ——————————————————— 17 1.4 Focused Research —————————————————— 18 1.4.1 Increasingly Malicious PUAs —————————————— 18 1.4.2 ID Management Technology: From a Convenience and Security Perspective ————— 22 1.4.3 Evaluating the IOCs of Malware That Reprograms HDD Firmware —————————————————————— 25 1.5 Conclusion —————————————————————— 27 2. Messaging Technology —————————————— 28 2.1 Introduction ————————————————————— 28 2.2 Spam Trends ————————————————————— 28 2.2.1 Spam Ratios Decline Further in FY2014 ————————— 28 2.2.2 Higher Risks Despite Lower Volumes —————————— 29 2.3 Trends in Email Technologies ——————————— 29 2.3.1 The DMARC RFC ————————————————————— 29 2.3.2 Problems with DMARC and Reporting —————————— 30 2.3.3 Use of DMARC by Email Recipients ——————————— 30 2.3.4 Domain Reputation ——————————————————— 31 2.3.5
    [Show full text]
  • Threat Modeling and Circumvention of Internet Censorship by David Fifield
    Threat modeling and circumvention of Internet censorship By David Fifield A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor J.D. Tygar, Chair Professor Deirdre Mulligan Professor Vern Paxson Fall 2017 1 Abstract Threat modeling and circumvention of Internet censorship by David Fifield Doctor of Philosophy in Computer Science University of California, Berkeley Professor J.D. Tygar, Chair Research on Internet censorship is hampered by poor models of censor behavior. Censor models guide the development of circumvention systems, so it is important to get them right. A censor model should be understood not just as a set of capabilities|such as the ability to monitor network traffic—but as a set of priorities constrained by resource limitations. My research addresses the twin themes of modeling and circumvention. With a grounding in empirical research, I build up an abstract model of the circumvention problem and examine how to adapt it to concrete censorship challenges. I describe the results of experiments on censors that probe their strengths and weaknesses; specifically, on the subject of active probing to discover proxy servers, and on delays in their reaction to changes in circumvention. I present two circumvention designs: domain fronting, which derives its resistance to blocking from the censor's reluctance to block other useful services; and Snowflake, based on quickly changing peer-to-peer proxy servers. I hope to change the perception that the circumvention problem is a cat-and-mouse game that affords only incremental and temporary advancements.
    [Show full text]
  • Style Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry∗
    Style Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry∗ Christopher McKnight Ian Goldberg Magnet Forensics University of Waterloo [email protected] [email protected] ABSTRACT worm based on an examination of the reverse-engineered code [17], The results of recent experiments have suggested that code stylom- casting style analysis as a forensic technique. etry can successfully identify the author of short programs from This technique, however, may be used to chill speech for soft- among hundreds of candidates with up to 98% precision. This poten- ware developers. There are several cases of developers being treated tial ability to discern the programmer of a code sample from a large as individuals of suspicion, intimidated by authorities and/or co- group of possible authors could have concerning consequences for erced into removing their software from the Internet. In the US, the open-source community at large, particularly those contrib- Nadim Kobeissi, the Canadian creator of Cryptocat (an online se- utors that may wish to remain anonymous. Recent international cure messaging application) was stopped, searched, and questioned events have suggested the developers of certain anti-censorship by Department of Homeland Security officials on four separate oc- and anti-surveillance tools are being targeted by their governments casions in 2012 about Cryptocat and the algorithms it employs [16]. and forced to delete their repositories or face prosecution. In November 2014, Chinese developer Xu Dong was arrested, pri- In light of this threat to the freedom and privacy of individual marily for political tweets, but also because he allegedly “committed programmers around the world, we devised a tool, Style Counsel, to crimes of developing software to help Chinese Internet users scale aid programmers in obfuscating their inherent style and imitating the Great Fire Wall of China” [4] in relation to proxy software he another, overt, author’s style in order to protect their anonymity wrote.
    [Show full text]
  • Digital Authoritarianism and the Global Threat to Free Speech Hearing
    DIGITAL AUTHORITARIANISM AND THE GLOBAL THREAT TO FREE SPEECH HEARING BEFORE THE CONGRESSIONAL-EXECUTIVE COMMISSION ON CHINA ONE HUNDRED FIFTEENTH CONGRESS SECOND SESSION APRIL 26, 2018 Printed for the use of the Congressional-Executive Commission on China ( Available at www.cecc.gov or www.govinfo.gov U.S. GOVERNMENT PUBLISHING OFFICE 30–233 PDF WASHINGTON : 2018 VerDate Nov 24 2008 12:25 Dec 16, 2018 Jkt 081003 PO 00000 Frm 00001 Fmt 5011 Sfmt 5011 C:\USERS\DSHERMAN1\DESKTOP\VONITA TEST.TXT DAVID CONGRESSIONAL-EXECUTIVE COMMISSION ON CHINA LEGISLATIVE BRANCH COMMISSIONERS Senate House MARCO RUBIO, Florida, Chairman CHRIS SMITH, New Jersey, Cochairman TOM COTTON, Arkansas ROBERT PITTENGER, North Carolina STEVE DAINES, Montana RANDY HULTGREN, Illinois JAMES LANKFORD, Oklahoma MARCY KAPTUR, Ohio TODD YOUNG, Indiana TIM WALZ, Minnesota DIANNE FEINSTEIN, California TED LIEU, California JEFF MERKLEY, Oregon GARY PETERS, Michigan ANGUS KING, Maine EXECUTIVE BRANCH COMMISSIONERS Not yet appointed ELYSE B. ANDERSON, Staff Director PAUL B. PROTIC, Deputy Staff Director (ii) VerDate Nov 24 2008 12:25 Dec 16, 2018 Jkt 081003 PO 00000 Frm 00002 Fmt 0486 Sfmt 0486 C:\USERS\DSHERMAN1\DESKTOP\VONITA TEST.TXT DAVID C O N T E N T S STATEMENTS Page Opening Statement of Hon. Marco Rubio, a U.S. Senator from Florida; Chair- man, Congressional-Executive Commission on China ...................................... 1 Statement of Hon. Christopher Smith, a U.S. Representative from New Jer- sey; Cochairman, Congressional-Executive Commission on China .................. 4 Cook, Sarah, Senior Research Analyst for East Asia and Editor, China Media Bulletin, Freedom House ..................................................................................... 6 Hamilton, Clive, Professor of Public Ethics, Charles Sturt University (Aus- tralia) and author, ‘‘Silent Invasion: China’s Influence in Australia’’ ............
    [Show full text]
  • July 20, 2020 the Honorable David N. Cicilline Chairman
    July 20, 2020 The Honorable David N. Cicilline Chairman, Subcommittee on Antitrust, Commercial and Administrative Law U.S. House of Representatives Subject: House Judiciary Committee’s July 27 hearing of Apple’s CEO Tim Cook Chairman Cicilline, GreatFire is a China-based, anti-censorship organization that has been working since 2011 to bring transparency to online censorship in China and to help Chinese citizens to freely access information. We would like to draw to your attention Apple’s current policy of censorship of its App Store, which constitutes a serious abuse of its dominant position in the digital marketplace as well as a violation of human rights. On July 27, the Subcommittee on Antitrust, Commercial and Administrative Law of the U.S House of Representatives Judiciary Committee will question Apple Inc. CEO Tim Cook, along with the CEOs of Amazon, Google and Facebook, as part of the Committee’s ongoing investigation into competition in the digital marketplace. The “Online Platforms and Market Power, Part 6: Examining the Dominance of Amazon, Facebook, Google and Apple” hearing will conclude an investigation which began last year and has already covered Apple’s anti-competitive practices and their impact, most notably on a “Free and Diverse Press”. We believe that one crucial consequence of Apple’s dominant position in the digital market has not been covered by the investigation: Apple’s opaque and arbitrary management of its China App Store. In China, currently Apple’s biggest market worldwide, Apple directly collaborates with the Chinese authorities to censor apps that the government does not want its population to use.
    [Show full text]
  • Cloudtransport: Using Cloud Storage for Censorship-Resistant Networking
    CloudTransport: Using Cloud Storage for Censorship-Resistant Networking Chad Brubaker1,2, Amir Houmansadr2, and Vitaly Shmatikov2 1 Google 2 The University of Texas at Austin Abstract. Censorship circumvention systems such as Tor are highly vulnerable to network-level filtering. Because the traffic generated by these systems is disjoint from normal network traffic, it is easy to recog- nize and block, and once the censors identify network servers (e.g., Tor bridges) assisting in circumvention, they can locate all of their users. CloudTransport is a new censorship-resistant communication system that hides users’ network traffic by tunneling it through a cloud storage ser- vice such as Amazon S3. The goal of CloudTransport is to increase the censors’ economic and social costs by forcing them to use more expen- sive forms of network filtering, such as large-scale traffic analysis, or else risk disrupting normal cloud-based services and thus causing collateral damage even to the users who are not engaging in circumvention. Cloud- Transport’s novel passive-rendezvous protocol ensures that there are no direct connections between a CloudTransport client and a CloudTrans- port bridge. Therefore, even if the censors identify a CloudTransport connection or the IP address of a CloudTransport bridge, this does not help them block the bridge or identify other connections. CloudTransport can be used as a standalone service, a gateway to an anonymity network like Tor, or a pluggable transport for Tor. It does not require any modifications to the existing cloud storage, is compatible with multiple cloud providers, and hides the user’s Internet destinations even if the provider is compromised.
    [Show full text]
  • A Privacy Threat for Internet Users in Internet-Censoring Countries
    A Privacy Threat for Internet Users in Internet-censoring Countries Feno Heriniaina R. College of Computer Science, Chongqing University, Chongqing, China Keywords: Censorship, Human Computer Interaction, Privacy, Virtual Private Networks. Abstract: Online surveillance has been increasingly used by different governments to control the spread of information on the Internet. The magnitude of this activity differs widely and is based primarily on the areas that are deemed, by the state, to be critical. Aside from the use of keywords and the complete domain name filtering technologies, Internet censorship can sometimes even use the total blocking of IP addresses to censor content. Despite the advances, in terms of technology used for Internet censorship, there are also different types of circumvention tools that are available to the general public. In this paper, we report the results of our investigation on how migrants who previously had access to the open Internet behave toward Internet censorship when subjected to it. Four hundred and thirty-two (432) international students took part in the study that lasted two years. We identified the most common circumvention tools that are utilized by the foreign students in China. We investigated the usability of these tools and monitored the way in which they are used. We identified a behaviour-based privacy threat that puts the users of circumvention tools at risk while they live in an Internet-censoring country. We also recommend the use of a user-oriented filtering method, which should be considered as part of the censoring system, as it enhances the performance of the screening process and recognizes the real needs of its users.
    [Show full text]
  • Code Hunt: Experience with Coding Contests at Scale
    Code Hunt: Experience with Coding Contests at Scale Judith Bishop R. Nigel Horspool Tao Xie Nikolai Tillmann, Microsoft Research University of Victoria University of Illinois at Jonathan de Halleux Redmond, WA, USA Victoria, BC, Canada Urbana-Champaign Microsoft Research [email protected] [email protected] IL, USA Redmond, WA, USA [email protected] nikolait, [email protected] Abstract—Mastering a complex skill like programming takes Code Hunt adds another dimension – that of puzzles. It is with many hours. In order to encourage students to put in these hours, these ideas in mind that we conceived Code Hunt, a game for we built Code Hunt, a game that enables players to program coding against the computer by solving a sequence of puzzles against the computer with clues provided as unit tests. The game of increasing complexity. Code Hunt is unique among coding has become very popular and we are now running worldwide systems and among games in that it combines the elements of contests where students have a fixed amount of time to solve a set of puzzles. This paper describes Code Hunt and the contest both to produce just what we need to get students to put in those experience it offers. We then show some early results that hours of practice to hone their programming skills. Along the demonstrate how Code Hunt can accurately discriminate between way, they also learn to understand testing, since the game is good and bad coders. The challenges of creating and selecting based on unit tests. Code Hunt has been used by over 50,000 puzzles for contests are covered.
    [Show full text]
  • Curriculum Vitae
    Curriculum Vitae Jakub Pachocki [email protected] Work experience 2016 { present Harvard School of Engineering and Applied Sciences Postdoctoral Fellow Education 2013 { 2016 Carnegie Mellon University PhD in Computer Science Thesis: Graphs and Beyond: Faster Algorithms for High Dimen- sional Convex Optimization Advisor: Gary Miller 2010 { 2013 University of Warsaw Bachelor's Degree in Computer Science Publications author names in alphabetical order • M. Cohen, Y. T. Lee, G. Miller, J. Pachocki and A. Sidford. Geometric Median in Nearly Linear Time. 48th Annual Symposium on the Theory of Computing (STOC 2016). • A. Ene, G. Miller, J. Pachocki and A. Sidford. Routing under Balance. 48th Annual Symposium on the Theory of Computing (STOC 2016). • M. Cygan, F. Fomin, A. Golovnev, A. Kulikov, I. Mihajlin, J. Pachocki and A. Soca la. Tight bounds for graph homomorphism and subgraph isomorphism. 27th Annual Symposium on Discrete Algorithms (SODA 2016). • M. Cohen, C. Musco and J. Pachocki. Online Row Sampling. 19th In- ternational Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2016). • M. Mitzenmacher, J. Pachocki, R. Peng, C. E. Tsourakakis and S. C. Xu. Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling. 21th International Conference on Knowledge Discovery and Data Mining (KDD 2015). • M. Cohen, G. Miller, R. Kyng, J. Pachocki,p R. Peng, A. Rao and S. C. Xu. Solving SDD Systems in Nearly O(m log n) Time. 46th Annual Symposium on the Theory of Computing (STOC 2014). • M. Cygan, J. Pachocki, A. Socala. The Hardness of Subgraph Isomor- phism. arXiv preprint, 2015. • M. Cohen, G. Miller, J.
    [Show full text]