Advancing Compilers
Total Page:16
File Type:pdf, Size:1020Kb
© 2014 IJIRT | Volume 1 Issue 6 | ISSN : 2349-6002 COMPILERS-STUDY FROM ZERO Vishal Sharma, Priyanka Yadav, Priti Yadav Computer Science Department, Dronacharya College of Engineering/ Maharishi Dayanand University, India Abstract- This paper gives the basic understanding of Compilers also aid in detecting programming errors compilers. The beginning of paper explain the term (which are all too common). compiler followed by its working, need and Compiler techniques also help to improve computer knowledge required in building compilers. Further security. we have explained the different types of compiler and For example, the Java Byte code Verifier helps to the application of compilers. The paper shows the study of different types of compilers.Thus, going guarantee that Java security rules are satisfied. through this paper one will end up with a good II. HOW DOES A COMPILER WORK? understanding of compilers and their future aspect. A compiler can be viewed in two parts: Index Terms- Analyser, Compilers, FORTRAN, LISP, UNIVAC. 1. Source Code Analyser: which takes as input I. INTRODUCTION source code as a sequence of characters, and Compilers are fundamental to modern computing. interprets it as a structure of symbols (vars, values, They act as translators, transforming human- operators, etc.) oriented programming languages into computer- oriented machine languages. 2. Object Code Generator: which takes the A compiler allows programmers to ignore the structural analysis from (1) and produces runnable machine-dependent details of programming. code as output? Compilers allow programs and programming skills to be machine-independent. Code Source analyser Code Code generator Object Code The Code Analyser typically has three stages: • Semantic Analyser: checks that variables are • Lexical Analysis: accepts the source code as a instantiated before use, etc. sequence of chars, outputs the code as a sequence of tokens. What is the Challenge? • Syntax Analyser: interprets the program tokens as Many variations: a structured program. IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 226 many programming languages (eg, write n*m compilers, one for each language- FORTRAN, C++, Java) architecture combination. many programming paradigms (eg, object- A typical real-world compiler usually has multiple oriented, functional, logic) phases. This increases the compiler's portability many computer architectures (eg, MIPS, and simplifies retargeting. The front end consists of SPARC, Intel, alpha) the following phases: many operating systems (eg, Linux, 1. scanning: a scanner groups input Solaris, Windows) characters into tokens; Qualities of a compiler (in order of importance): 2. parsing: a parser recognizes sequences of 1. the compiler itself must be bug-free tokens according to some grammar and 2. it must generate correct machine code generates Abstract Syntax Trees (ASTs); 3. the generated machine code must run fast 3. Semantic analysis: performs type 4. the compiler itself must run fast checking (ie, checking whether the (compilation time must be proportional to variables, functions etc in the source program size) program are used consistently with their 5. the compiler must be portable (ie, definitions and with the language modular, supporting separate compilation) semantics) and translates ASTs into IRs; 6. it must print good diagnostics and error 4. optimization: messages 5. Optimizes IRs. 7. the generated code must work well with The back end consists of the following phases: existing debuggers 1. instruction selection: maps IRs into 8. must have consistent and predictable assembly code; optimization. 2. code optimization: optimizes the assembly Building a compiler requires knowledge of code using control-flow and data-flow programming languages (parameter analyses, register allocation, etc; passing, variable scoping, memory 3. code emission: generates machine code allocation, etc) from assembly code. theory (automata, context-free languages, The generated machine code is written in an object etc) file. This file is not executable since it may refer to external symbols (such as system calls). The algorithms and data structures (hash operating system provides the following utilities to tables, graph algorithms, dynamic execute the code: programming, etc) linking: computer architecture (assembly A linker takes several object files and libraries programming) as input and produces one executable object software engineering. file. It retrieves from the input files (and puts A compiler can be viewed as a program that them together in the executable object file) the accepts a source code (such as a Java program) and code of all the referenced functions/procedures generates machine code for some computer and it resolves all external references to real architecture. Suppose that you want to build addresses. The libraries include the operating compilers for n programming languages (eg, sytem libraries, the language-specific libraries, FORTRAN, C, C++, Java, BASIC, etc) and you and, maybe, user-created libraries. want these compilers to run on m different loading: architectures (eg, MIPS, SPARC, Intel, alpha, etc). A loader loads an executable object file into If you do that naively, you need to memory, initializes the registers, heap, data, etc and starts the execution of the program. IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 227 © 2014 IJIRT | Volume 1 Issue 6 | ISSN : 2349-6002 In computer science, bootstrapping is the process programming language developed by the Naval of writing a compiler (or assembler) in the Electronics Laboratory in 1958. target programming language which it is intended NELIAC was the brainchild of Harry Huskey — to compile. Applying this technique leads to a self- then Chairman of the ACM and a well known hosting compiler. computer scientist, and supported by Maury Many compilers for many programming languages Halstead, the head of the computational centre at are bootstrapped, including compilers NEL. The earliest version was implemented on the for BASIC, ALGOL, C, Pascal, PL/I, Factor, Hask prototype USQ-17 computer (called the Countess) ell, Modula-2, Oberon, OCaml, Common at the laboratory. It was the world's first self- Lisp, Scheme, Java, Python, Scala, Nimrod and compiling compiler. This means that the compiler more. was first coded in simplified form in assembly language (the bootstrap), and then re-written in its NELIAC own language, compiled by this bootstrap compiler, and re-compiled by itself, making the bootstrap The Navy Electronics Laboratory International obsolete. ALGOL Compiler or NELIAC was a dialect and compiler implementation of the ALGOL 58 Lisp IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 228 XCOM compiles from XPL source code, but since The first self-hosting compiler (excluding XCOM itself is written in XPL it can compile itself assemblers) was written for Lisp by Tim Hart and – it is a self-compiling compiler, not reliant on Mike Levin at MIT in 1962.[4] They wrote a Lisp anyone else's compilers. Several famous languages compiler in Lisp, testing it inside an existing Lisp have self-compiling compilers, including interpreter. Once they had improved the compiler Burroughs B5000 Algol, PL/I, C, LISP, and Java. to the point where it could compile its own source Creating such compilers is a chicken-and-egg code, it was self-hosting. conundrum. The language is first implemented by a The compiler as it exists on the standard compiler temporary compiler written in some other tape is a machine language program that was language, or even by an interpreter (often an obtained by having the S-expression definition of interpreter for an intermediate code, as BCPL can the compiler work on itself through the interpreter. do with intcode or O-code). XCOM began as an (AI Memo 39) Algol program running on Burroughs machines, This technique is only possible when an interpreter translating XPL source code into System/360 already exists for the very same language that is to machine code. Someone manually turned its Algol be compiled. It borrows directly from the notion of source code into XPL source code. That XPL running a program on itself as input, which is also version of XCOM was then compiled on used in various proofs in theoretical computer Burroughs, creating a self-compiling XCOM for science, such as the proof that the halting problem System/360 machines. The Algol version was then is undecidable. thrown away, and all further improvements happened in the XPL version only. This is called XPL bootstrapping the compiler. Retargeting the compiler for a new machine architecture is a XPL is a dialect of the PL/I programming similar exercise, except only the code generation language, developed in 1967, used for the modules need to be changed. development of compilers for computer languages. XCOM is a one-pass compiler (but with an emitted It was designed and implemented by a team with code fix-up process for forward branches, loops William McKeeman, James J. Horning, and David and other defined situations). It emits machine code B. Wortman at Stanford University and the for each statement as each grammar rule within a University of California, Santa Cruz. It was first statement is recognized, rather than waiting until it announced at the 1968 Fall Joint Computer has parsed the entire procedure or entire program. Conference in San Francisco. There are no parse trees or other required It is the name of both the programming language intermediate