Topic Modeling of Public Repositories at Scale Using Names in Source Code

Total Page:16

File Type:pdf, Size:1020Kb

Topic Modeling of Public Repositories at Scale Using Names in Source Code Topic modeling of public repositories at scale using names in source code Vadim Markovtsev Eiso Kant [email protected] [email protected] sourcefdg, Madrid, Spain February 2017 Abstract—Programming languages themselves have a limited technological ecosystems. The next attempt to classify open number of reserved keywords and character based tokens that source projects was based on manually submitted lists of define the language specification. However, programmers have a keywords. While this approach works [1], it requires careful rich use of natural language within their code through comments, text literals and naming entities. The programmer defined names keywords engineering to appear comprehensive, and thus not that can be found in source code are a rich source of information widely adopted by the end users in practice. GitHub introduced to build a high level understanding of the project. The goal of repository tags in January 2017 which is a variant of manual this paper is to apply topic modeling to names used in over keywords submission. 13.6 million repositories and perceive the inferred topics. One The present paper describes how to conduct fully automated of the problems in such a study is the occurrence of duplicate repositories not officially marked as forks (obscure forks). We topic extraction from millions of public repositories. It scales show how to address it using the same identifiers which are linearly with the overall source code size and has substantial extracted for topic modeling. performance reserve to support the future growth. We pro- We open with a discussion on naming in source code, we then pose building a bag-of-words model on names occurring in elaborate on our approach to remove exact duplicate and fuzzy source code and applying proven Natural Language Processing duplicate repositories using Locality Sensitive Hashing on the bag-of-words model and then discuss our work on topic modeling; algorithms to it. Particularly, we describe how ”Weighted and finally present the results from our data analysis together MinHash” algorithm [2] helps to filter fuzzy duplicates and with open-access to the source code, tools and datasets. how an Additive Regularized Topic Model (ARTM) [3] can Index Terms—programming, open source, source code, soft- be efficiently trained. The result of the topic modeling is a ware repositories, git, GitHub, topic modeling, ARTM, locality nearly-complete open source projects classification. It reflects sensitive hashing, MinHash, open dataset, data.world. the drastic variety in open source projects and reflect multiple I. INTRODUCTION features. The dataset we work with consists of approx. 18 million public repositories retrieved from GitHub in October There are more than 18 million non-empty public reposi- 2016. tories on GitHub which are not marked as forks. This makes GitHub the largest version control repository hosting service. It The rest of the paper is organised as follows: Section has become difficult to explore such a large number of projects II reviews prior work on the subject. Section III elaborates and nearly impossible to classify them. One of the main on how we turn software repositories into bags-of-words. sources of information that exists about public repositories is Section IV describes the approach to efficient filtering of fuzzy their code. repository clones. Section V covers the building of the ARTM To gain a deeper understanding of software development model with 256 manually labeled topics. Section VI presents it is important to understand the trends among open-source the achieved topic modeling results. Section VII lists the opened datasets we were able to prepare. Finally, section VIII arXiv:1704.00135v2 [cs.PL] 20 May 2017 projects. Bleeding edge technologies are often used first in open source projects and later employed in proprietary so- presents a conclusion and suggests improvements to future lutions when they become stable enough1. An exploratory work. analysis of open-source projects can help to detect such trends and provide valuable insight for industry and academia. II. RELATED WORK Since GitHub appeared the open-source movement has A. Academia gained significant momentum. Historically developers would manually register their open-source projects in software di- There was an open source community study which pre- gests. As the number of projects dramatically grew, those sented statistics about manually picked topics in 2005 by J. lists became very hard to update; as a result they became Xu et.al. [4]. more fragmented and started exclusively specializing in narrow Blincoe et.al. [5] studied GitHub ecosystems using reference coupling over the GHTorrent dataset [6] which contained 2.4 1Notable examples include the Linux OS kernel, the PostgreSQL database engine, the Apache Spark cluster-computing framework and the Docker million projects. This research employs an alternative topic containers. modeling method on source code of 13.6 million projects. Instead of using the GHTorrent dataset we’ve prepared open For the purpose of our analysis we choose to use the latest datasets from almost all public repositories on GitHub to be version of the master branch of each repository. And treat each able to have a more comprehensive overview. repository as a single document. An improvement for further M. Lungi [7] conducted an in-depth study of software research would be to use the entire history of each repository ecosystems in 2009, the year when GitHub appeared. The including unique code found in each branch. examples in this work used samples of approx. 10 repositories. And the proposed discovery methods did not include Natural A. Preliminary Processing Language Processing. Our first goal is to process each repository to identify which The problem of the correct analysis of forks on GitHub has files contain source code, and which files are redundant for been discussed by Kalliamvakou et.al. [8] along with other our purpose. GitHub has an open-source machine learning valuable concerns. based library named linguist [35] that identifies the pro- Topic modeling of source code has been applied to a gramming language used within a file based on its extension variety of problems reviewed in [9]: improvement of software and contents. We modified it to also identify vendor code maintenance [10], [11], defects explanation [12], concept and automatically generated files. The first step in our pre- analysis [13], [14], software evolution analysis [15], [16], processing is to run linguist over each repository’s master finding similarities and clones [17], clustering source code and branch. From 11.94 million repositories we end up with 402.6 discovering the internal structure [18], [19], [20], summarizing million source files in which we have high confidence it is [21], [22], [23]. In the aforementioned works, the scope of the source code written by a developer in that project. Identifying research was focused on individual projects. the programming language used within each file is important The usage of topic modeling [24] focused on improv- for the next step, the names extraction, as it determines the ing software maintenance and was evaluated on 4 software programming language parser. projects. Concepts were extracted using a corpus of 24 projects in [25]. Giriprasad Sridhara et.al. [26], Yang and Tan [27], B. Extracting Names Howard et.al. [28] considered comments and/or names to Source code highlighting is a typical task for professional find semantically similar terms; Haiduc and Marcus [29] text editors and IDE’s. There have been several open source researched common domain terms appearing in source code. libraries created to tackle this task. Each works by having The presented approach in this papers reveals similar and a grammar file written per programming language which domain terms, but leverages a much significantly larger dataset contains the rules. Pygments [36] is a high quality community- of 13.6 million repositories. driven package for Python which supports more than 400 Bajracharya and Lopes [30] trained a topic model on the programming languages and markups. According to Pygments, year long usage log of Koders, one of the major commercial all source code tokens are classified across the following cate- code search engines. The topic categories suggested by Ba- gories: comments, escapes, indentations and generic symbols, jracharya and Lopes share little similarity with the categories reserved keywords, literals, operators, punctuation and names. described in this paper since the input domain is much different. Linguist and Pygments have different sets of supported languages. Linguist stores it’s list at B. Industry master/lib/linguist/languages.yml and the similar Pygments To our knowledge, there are few companies which maintain list is stored as pygments.lexers.LEXERS. Each has a complete mirror of GitHub repositories. sourcefdg [31] nearly 400 items and the intersection is approximately 200 is focused on doing machine learning on top of the col- programming languages (”programming” Linguist’s item lected source code. Software Heritage [32] strives to become type). The languages common to Linguist and Pygments web.archive.org for open source software. SourceGraph [33] which were chosen are listed in appendixA. In this research processes source code references, internal and external, and we apply Pygments to the 402.6 million source files to extract created a complete reference graph for projects written in all tokens which belong to the type Token.Name. Golang. C. Processing names libraries.io [34] is not GitHub centric but rather processes the dependencies and metadata of open source packages The next step is to process the names according to naming fetched from a number of repositories. It analyses the de- conventions. As an example class FooBarBaz adds three pendency graph (at the level of projects, while SourceGraph words to the bag: foo, bar and baz, or int wdSize analyses at the level of functions). should add two: wdsize and size. Fig.1 is the full listing of the function written in Python 3.4+ which splits identifiers.
Recommended publications
  • Marcelo Camargo (Haskell Camargo) – Résumé Projects
    Marcelo Camargo (Haskell Camargo) – Résumé https://github.com/haskellcamargo [email protected] http://coderbits.com/haskellcamargo Based in Joinville / SC – Brazil Knowledge • Programming languages domain: ◦ Ada, AdvPL, BASIC, C, C++, C#, Clipper, Clojure, CoffeeScript, Common LISP, Elixir, Erlang, F#, FORTRAN, Go, Harbour, Haskell, Haxe, Hy, Java, JavaScript, Ink, LiveScript, Lua, MATLAB, Nimrod, OCaml, Pascal, PHP, PogoScript, Processing, PureScript, Python, Ruby, Rust, Self, Shell, Swift, TypeScript, VisualBasic [.NET], Whip, ZPL. • Markup, style and serialization languages: ◦ Markdown, reStructuredText, [X] HTML, XML, CSS, LESS, SASS, Stylus, Yaml, JSON, DSON. • Database management systems: ◦ Oracle, MySQL, SQL Server, IBM DB2, PostgreSQL. • Development for operating systems: ◦ Unix based, Windows (CE mobile included), Android, Firefox OS. • Parsers and compilers: ◦ Macros: ▪ Sweet.js, preprocessor directives. ◦ Parser and scanner generators: ▪ ANTLR, PEG.js, Jison, Flex, re2c, Lime. • Languages: ◦ Portuguese (native) ◦ English (native) ◦ Spanish (fluent) ◦ French (fluent) ◦ Turkish (intermediate) ◦ Chinese (mandarin) (intermediate) ◦ Esperanto (intermediate) Projects • Prelude AdvPL – a library that implements functional programming in AdvPL and extends its syntax to a more expressive one; it's a port of Haskell's Prelude; • Frida – a LISP dialect that runs on top of Node.js; • PHPP – A complete PHP preprocessor with a grammar able to identify and replace tokens and extend/modify the language syntax, meant to be implemented
    [Show full text]
  • JETIR Research Journal
    © 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162) QUALITATIVE COMPARISON OF KEY-VALUE BIG DATA DATABASES 1Ahmad Zia Atal, 2Anita Ganpati 1M.Tech Student, 2Professor, 1Department of computer Science, 1Himachal Pradesh University, Shimla, India Abstract: Companies are progressively looking to big data to convey valuable business insights that cannot be taken care by the traditional Relational Database Management System (RDBMS). As a result, a variety of big data databases options have developed. From past 30 years traditional Relational Database Management System (RDBMS) were being used in companies but now they are replaced by the big data. All big bata technologies are intended to conquer the limitations of RDBMS by enabling organizations to extract value from their data. In this paper, three key-value databases are discussed and compared on the basis of some general databases features and system performance features. Keywords: Big data, NoSQL, RDBMS, Riak, Redis, Hibari. I. INTRODUCTION Systems that are designed to store big data are often called NoSQL databases since they do not necessarily depend on the SQL query language used by RDBMS. NoSQL today is the term used to address the class of databases that do not follow Relational Database Management System (RDBMS) principles and are specifically designed to handle the speed and scale of the likes of Google, Facebook, Yahoo, Twitter and many more [1]. Many types of NoSQL database are designed for different use cases. The major categories of NoSQL databases consist of Key-Values store, Column family stores, Document databaseand graph database. Each of these technologies has their own benefits individually but generally Big data use cases are benefited by these technologies.
    [Show full text]
  • CADET: Computer Assisted Discovery Extraction and Translation
    CADET: Computer Assisted Discovery Extraction and Translation Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield Philipp Koehn, Craig Harman, Dawn Lawrie, Chandler May, Max Thomas Annabelle Carrell, Julianne Chaloux, Tongfei Chen, Alex Comerford Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Pushpendre Rastogi Rashmi Sankepally, Travis Wolfe, Ying-Ying Tran, Ted Zhang Human Language Technology Center of Excellence, Johns Hopkins University Abstract Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, la- bel, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customiz- Figure 1: CADET concept ing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.1 1 Introduction CADET is an integrated workbench for helping knowledge workers discover, extract, and translate Figure 2: Discovery user interface useful information. The user interface (Figure1) is based on a domain expert starting with a large information in structured form. To do so, she ex- collection of data, wishing to discover the subset ports the search results to our Extraction interface, that is most salient to their goals, and exporting where she can provide annotations to help train an the results to tools for either extraction of specific information extraction system. The Extraction in- information of interest or interactive translation. terface allows the user to label any text span using For example, imagine a humanitarian aid any schema, and also incorporates active learning worker with a large collection of social media to complement the discovery process in selecting messages obtained in the aftermath of a natural data to annotate.
    [Show full text]
  • Javascript Tomasz Pawlak, Phd Marcin Szubert, Phd Institute of Computing Science, Poznan University of Technology Presentation Outline
    INTERNET SYSTEMS JAVASCRIPT TOMASZ PAWLAK, PHD MARCIN SZUBERT, PHD INSTITUTE OF COMPUTING SCIENCE, POZNAN UNIVERSITY OF TECHNOLOGY PRESENTATION OUTLINE • What is JavaScript? • Historical Perspective • Basic JavaScript • JavaScript: The Good Parts • JavaScript: The Bad Parts • Languages that compile to JavaScript • ECMAScript 6 MODERN WEB APPLICATION DATABASE SERVER HTML defines structure and content, CSS sets the formatting and appearance, JavaScript adds interactivity to a webpage and allows to create rich web applications. WHY JAVASCRIPT? • JavaScript is the language of the web browser — it is the most widely deployed programming language in history • At the same time, it is one of the most despised and misunderstood programming languages in the world • The amazing thing about JavaScript is that it is possible to get work done with it without knowing much about the language, or even knowing much about programming. It is a language with enormous expressive power. It is even better when you know what you’re doing — JAVASCRIPT: THE GOOD PARTS , DOUGLAS CROCKFORD WHY JAVASCRIPT? • Q: If you had to start over, what are the technologies, languages, paradigms and platforms I need to be up- to-date and mastering in my new world of 2014? • A: Learn one language you can build large systems with AND also learn JavaScript. • JavaScript is the language of the web. The web will persist and the web will win. That's why I suggest you learn JavaScript — S C O T T HANSELMAN , 2 0 1 4 WHAT IS JAVASCRIPT? • JavaScript is a cross-platform, object-oriented, functional, lightweight, small scripting language. • JavaScript contains a standard library of built-in objects, such as Array and Math, and a core set of language elements such as operators, control structures, and statements.
    [Show full text]
  • Implementation of the BOLARE Programming Language
    Implementation of the BOLARE Programming Language Viktor Pavlu ([email protected]) Institute of Computer Languages, Vienna University of Technology, Argentinierstrasse 8/E185, 1040 "ien, Austria ABSTRACT REBOL is a very flexible, dynamic, reflective programming language that clearly differs from the dynamic languages currently in popular use! The idea behind REBOL is that different problems should be attacked with different languages that have varying levels of granularity, each specifi- cally tailored to its problem domain. These domain-specific languages should give the program- mer the power to #rite programs that are closer to the problem and more expressive, thus shorter and easier to read, maintain and extend than would be possible in today’s dynamic programming languages. Despite its promising features, REBOL has not gained wide acceptance. Above all, this is due to the following: )*+ The language is merely defined by means o its only implementation, ),+ this implementation is closed-source and has many flaws, and )-+ the flaws are here to stay! .e therefore started project BOLARE. Aim o this project is to build an interpreter for a lan- guage that very closely resembles REBOL but leaves out all unintentional behavior that only stems from artifacts of the original implementation. After an introduction to REBOL/BOLARE in general, the three areas currently being #orked on are presented: Parser Generator: REBOL has a large set o built-in datatypes that have their own literal form making it easy to embed dates, times, email-addresses, tag structures, coordinates, binary data, etc. in scripts. The BOLARE parser is dynamically generated from a declarative de- scription of literal forms.
    [Show full text]
  • Security Analysis of Firefox Webextensions
    6.857: Computer and Network Security Due: May 16, 2018 Security Analysis of Firefox WebExtensions Srilaya Bhavaraju, Tara Smith, Benny Zhang srilayab, tsmith12, felicity Abstract With the deprecation of Legacy addons, Mozilla recently introduced the WebExtensions API for the development of Firefox browser extensions. WebExtensions was designed for cross-browser compatibility and in response to several issues in the legacy addon model. We performed a security analysis of the new WebExtensions model. The goal of this paper is to analyze how well WebExtensions responds to threats in the previous legacy model as well as identify any potential vulnerabilities in the new model. 1 Introduction Firefox release 57, otherwise known as Firefox Quantum, brings a large overhaul to the open-source web browser. Major changes with this release include the deprecation of its initial XUL/XPCOM/XBL extensions API to shift to its own WebExtensions API. This WebExtensions API is currently in use by both Google Chrome and Opera, but Firefox distinguishes itself with further restrictions and additional functionalities. Mozilla’s goals with the new extension API is to support cross-browser extension development, as well as offer greater security than the XPCOM API. Our goal in this paper is to analyze how well the WebExtensions model responds to the vulnerabilities present in legacy addons and discuss any potential vulnerabilities in the new model. We present the old security model of Firefox extensions and examine the new model by looking at the structure, permissions model, and extension review process. We then identify various threats and attacks that may occur or have occurred before moving onto recommendations.
    [Show full text]
  • Cross Site Scripting Attacks Xss Exploits and Defense.Pdf
    436_XSS_FM.qxd 4/20/07 1:18 PM Page ii 436_XSS_FM.qxd 4/20/07 1:18 PM Page i Visit us at www.syngress.com Syngress is committed to publishing high-quality books for IT Professionals and deliv- ering those books in media and formats that fit the demands of our customers. We are also committed to extending the utility of the book you purchase via additional mate- rials available from our Web site. SOLUTIONS WEB SITE To register your book, visit www.syngress.com/solutions. Once registered, you can access our [email protected] Web pages. There you may find an assortment of value- added features such as free e-books related to the topic of this book, URLs of related Web sites, FAQs from the book, corrections, and any updates from the author(s). ULTIMATE CDs Our Ultimate CD product line offers our readers budget-conscious compilations of some of our best-selling backlist titles in Adobe PDF form. These CDs are the perfect way to extend your reference library on key topics pertaining to your area of expertise, including Cisco Engineering, Microsoft Windows System Administration, CyberCrime Investigation, Open Source Security, and Firewall Configuration, to name a few. DOWNLOADABLE E-BOOKS For readers who can’t wait for hard copy, we offer most of our titles in downloadable Adobe PDF form. These e-books are often available weeks before hard copies, and are priced affordably. SYNGRESS OUTLET Our outlet store at syngress.com features overstocked, out-of-print, or slightly hurt books at significant savings. SITE LICENSING Syngress has a well-established program for site licensing our e-books onto servers in corporations, educational institutions, and large organizations.
    [Show full text]
  • A Javascript Mode for Yi
    Abstract Yi is a text editor written in the lazy functional programming language Haskell, which makes it possible to define precise editing modes using an abstract syntax tree provided online using the lazy and incremental parser library in Yi. We have developed a JavaScript mode for this editor using this parser library to accurately point out possible errors in the source code. The mode accurately highlights syntactical errors as the user types and pro- vides a verifier to check the semantics of the source code. It supports most of the syntax from JavaScript 1.8 and can readily be extended with more functionality. The mode can also be used as a starting point for future developers of C-like modes for Yi. Writing a responsive parser for Yi proved non-trivial, because of the trade-off between parser performance and accuracy. This report describes how to write a mode for Yi and the different problems encountered during the working process of the JavaScript mode. It also describes in what ways the problems were solved. Sammanfattning Yi är en textredigerare skriven i det lata funktionella programspråket Haskell, som gör det möjligt att definiera noggranna redigeringslägen med hjälp av ett abstrakt syntaxträd som tillhandahålls av det lata och inkre- mentella parsningsbiblioteket i Yi. Vi har utvecklat ett JavaScript-läge till denna redigerare med hjälp av detta parsningsbibliotek för att exakt utpeka möjliga fel i källkoden. Läget markerar syntaktiska fel medan användaren skriver och tillhandahåller en verifierare för att kontrollera semantiken i källkoden. Det stödjer större delen av syntaxen i JavaScript 1.8 och kan enkelt utökas med mer funk- tionalitet.
    [Show full text]
  • Declare Named Function Coffeescript
    Declare Named Function Coffeescript Dannie remains feudalistic after Waring pikes unendurably or understudied any issue. Mickie waters outrageously. Hansel ratchet her saffrons banteringly, drivable and grouchier. This nintendo switch from running code must carefully update clause runs to adjust a named function with leading number of place You down need the add furniture to execute coffee script code in an HTML file In other cases I've seen people scour the attributes of typecoffeescript and typecoffee so they might offer for you fill well. CoffeeScript and Named Functions Software Engineering. Thanks for contributing an hook to socket Overflow! It was expected. Tech Book time Off CoffeeScript Vs Simplifying Lucid Mesh. Function var age myName name names say i len myName. CoffeeScript Interview Questions for beginners and professionals with decent list at top frequently. In its own derivatives of array which others have to motivate us with other languages they different from christian faith: was so why i actually declare named function coffeescript, rather than enforcing classical object. Therefore, where site navigate the funeral was announced for the procedure time. Do exploration spacecraft enter your function invocation can even though, coffeescript file is named functions like java developers have you want a string, dynamic import prelude. This regard where coffeescript can be a fee problem. You declare variables declared in other objects from the bottom, things for declaration location data type errors is where it stopped requiring by opening a lesson here! Already there an account? Behind them in function declaration, or named shorthand method needs to declare it actually quite surprising.
    [Show full text]
  • Jquery Cloudflare Your Company Via Grunt-Contrib-Uglify Used to Build That Jquery
    JavaScript & Security get married Yan Zhu NCC Group SF Open Forum 9/17/15 F YEAH RUSTIC PENNSYLVANIA WEDDING THEME!! About me: ● Security Engineer at Yahoo! by day ● EFF Technology Fellow (Let’s Encrypt, HTTPS Everywhere) ● That’s a real photo of me -> Our story 09.??.1995 08.19.15 JavaScript released! Started investigating JS optimizer security as a side project. ??.??.1991 01.05.11 08.23.15 I was born! Wrote my first line of Got bored and mostly JavaScript. stopped working on this project. This talk is about JavaScript. (sorry not sorry) JAVASCRIPT What runs JS? ● Browsers ● Servers (node/io.js) ● Soon: everything Inspiration GET YOUR COPY TODAY PoC||GTFO 0x08 https://www.alchemistowl.org/pocorgtfo/ “No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect.” Ken Thompson, Reflections on Trusting Trust (1984) seen in the wild! JS isn’t “compiled,” but ... ● Transpilers to JS exist for every major language ● JS sugar (CoffeeScript, Coco, LiveScript, Sibilant) ● Optimizers (Closure, Uglify) ● Static typing (Closure, Flow, TypeScript, asm.js) ● Language extensions (React’s JSX) ● ES6 -> ES5 converter (Babel) more at https://github.com/jashkenas/coffeescript/wiki/list-of-languages-that- compile-to-js Let’s get hackin’ Step 1: Pick a JS library Who uses UglifyJS2? INSERT OVERCROPPED LOGO gruntjs jquery cloudflare your company via grunt-contrib-uglify used to build that jquery.
    [Show full text]
  • Visual Validation of SSL Certificates in the Mozilla Browser Using Hash Images
    CS Senior Honors Thesis: Visual Validation of SSL Certificates in the Mozilla Browser using Hash Images Hongxian Evelyn Tay [email protected] School of Computer Science Carnegie Mellon University Advisor: Professor Adrian Perrig Electrical & Computer Engineering Engineering & Public Policy School of Computer Science Carnegie Mellon University Monday, May 03, 2004 Abstract Many internet transactions nowadays require some form of authentication from the server for security purposes. Most browsers are presented with a certificate coming from the other end of the connection, which is then validated against root certificates installed in the browser, thus establishing the server identity in a secure connection. However, an adversary can install his own root certificate in the browser and fool the client into thinking that he is connected to the correct server. Unless the client checks the certificate public key or fingerprint, he would never know if he is connected to a malicious server. These alphanumeric strings are hard to read and verify against, so most people do not take extra precautions to check. My thesis is to implement an additional process in server authentication on a browser, using human recognizable images. The process, Hash Visualization, produces unique images that are easily distinguishable and validated. Using a hash algorithm, a unique image is generated using the fingerprint of the certificate. Images are easily recognizable and the user can identify the unique image normally seen during a secure AND accurate connection. By making a visual comparison, the origin of the root certificate is known. 1. Introduction: The Problem 1.1 SSL Security The SSL (Secure Sockets Layer) Protocol has improved the state of web security in many Internet transactions, but its complexity and neglect of human factors has exposed several loopholes in security systems that use it.
    [Show full text]
  • Obstacles to Compilation of Rebol Programs
    Obstacles to Compilation of Rebol Programs Viktor Pavlu TU Wien Institute of Computer Aided Automation, Computer Vision Lab A-1040 Vienna, Favoritenstr. 9/183-2, Austria [email protected] Abstract. Rebol’s syntax has no explicit delimiters around function arguments; all values in Rebol are first-class; Rebol uses fexprs as means of dynamic syntactic abstraction; Rebol thus combines the advantages of syntactic abstraction and a common language concept for both meta-program and object-program. All of the above are convenient attributes from a programmer’s point of view, yet at the same time pose severe challenges when striving to compile Rebol into reasonable code. An approach to compiling Rebol code that is still in its infancy is sketched, expected outcomes are presented. Keywords: first-class macros, dynamic syntactic abstraction, $vau calculus, fexpr, Ker- nel, Rebol 1 Introduction A meta-program is a program that can analyze (read), transform (read/write), or generate (write) another program, called the object-program. Static techniques for syntactic abstraction (macro systems, preprocessors) resolve all abstraction during compilation, so the expansion of abstractions incurs no cost at run-time. Static techniques are, however, conceptionally burdensome as they lead to staged systems with phases that are isolated from each other. In systems where different syntax is used for the phases (e. g., C++), it results in a multitude of programming languages following different sets of rules. In systems where the same syntax is shared between phases (e. g., Lisp), the separa- tion is even more unnatural: two syntactically largely identical-looking pieces of code cannot interact with each other as they are assigned to different execution stages.
    [Show full text]