<<

© 2014 IJIRT | Volume 1 Issue 6 | ISSN : 2349-6002

COMPILERS-STUDY FROM ZERO

Vishal Sharma, Priyanka Yadav, Priti Yadav Computer Science Department, Dronacharya College of Engineering/ Maharishi Dayanand University, India

Abstract- This paper gives the understanding of also aid in detecting programming errors compilers. The beginning of paper explain the term (which are all too common). followed by its working, need and Compiler techniques also help to improve computer knowledge required in building compilers. Further security. we have explained the different types of compiler and For example, the Byte code Verifier helps to the application of compilers. The paper shows the study of different types of compilers.Thus, going guarantee that Java security rules are satisfied. through this paper one will end up with a good II. HOW DOES A COMPILER WORK? understanding of compilers and their future aspect. . A compiler can be viewed in two parts: Index Terms- Analyser, Compilers, , LISP,

UNIVAC. 1. Analyser: which takes as input I. INTRODUCTION source code as a sequence of characters, and Compilers are fundamental to modern computing. interprets it as a structure of symbols (vars, values, They act as translators, transforming human- operators, etc.) oriented programming languages into computer- oriented machine languages. 2. Generator: which takes the A compiler allows to ignore the structural analysis from (1) and produces runnable machine-dependent details of programming. code as output? Compilers allow programs and programming skills to be machine-independent.

Code Source analyser Code

Code generator Object Code

The Code Analyser typically has three stages: • Semantic Analyser: checks that variables are • Lexical Analysis: accepts the source code as a instantiated before use, etc. sequence of chars, outputs the code as a sequence of tokens. What is the Challenge? • Syntax Analyser: interprets the program tokens as Many variations: a structured program.

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 226

 many programming languages (eg, write n*m compilers, one for each language- FORTRAN, ++, Java) architecture combination.  many programming paradigms (eg, object- A typical real-world compiler usually has multiple oriented, functional, logic) phases. This increases the compiler's portability  many computer architectures (eg, MIPS, and simplifies retargeting. The front end consists of SPARC, , alpha) the following phases:  many operating systems (eg, , 1. scanning: a scanner groups input Solaris, Windows) characters into tokens; Qualities of a compiler (in order of importance): 2. parsing: a parser recognizes sequences of 1. the compiler itself must be bug-free tokens according to some grammar and 2. it must generate correct generates Abstract Syntax Trees (ASTs); 3. the generated machine code must run fast 3. Semantic analysis: performs type 4. the compiler itself must run fast checking (ie, checking whether the (compilation time must be proportional to variables, functions etc in the source program size) program are used consistently with their 5. the compiler must be portable (ie, definitions and with the language modular, supporting separate compilation) semantics) and translates ASTs into IRs; 6. it must print good diagnostics and error 4. optimization: messages 5. Optimizes IRs. 7. the generated code must work well with The back end consists of the following phases: existing debuggers 1. instruction selection: maps IRs into 8. must have consistent and predictable assembly code; optimization. 2. code optimization: optimizes the assembly Building a compiler requires knowledge of code using control-flow and data-flow  programming languages (parameter analyses, register allocation, etc; passing, variable scoping, memory 3. code emission: generates machine code allocation, etc) from assembly code.  theory (automata, context-free languages, The generated machine code is written in an object etc) file. This file is not executable since it may refer to external symbols (such as system calls). The  algorithms and data structures (hash provides the following utilities to tables, graph algorithms, dynamic execute the code: programming, etc) linking:  (assembly A takes several object files and libraries programming) as input and produces one executable object  software engineering. file. It retrieves from the input files (and puts A compiler can be viewed as a program that them together in the executable object file) the accepts a source code (such as a Java program) and code of all the referenced functions/procedures generates machine code for some computer and it resolves all external references to real architecture. Suppose that you want to build addresses. The libraries include the operating compilers for n programming languages (eg, sytem libraries, the language-specific libraries, FORTRAN, C, C++, Java, BASIC, etc) and you and, maybe, user-created libraries. want these compilers to run on m different loading: architectures (eg, MIPS, SPARC, Intel, alpha, etc). A loader loads an executable object file into If you do that naively, you need to memory, initializes the registers, heap, data, etc and starts the execution of the program.

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 227

© 2014 IJIRT | Volume 1 Issue 6 | ISSN : 2349-6002

In computer science, bootstrapping is the process developed by the Naval of writing a compiler (or assembler) in the Electronics Laboratory in 1958. target programming language which it is intended NELIAC was the brainchild of Harry Huskey — to compile. Applying this technique leads to a self- then Chairman of the ACM and a well known hosting compiler. computer scientist, and supported by Maury Many compilers for many programming languages Halstead, the head of the computational centre at are bootstrapped, including compilers NEL. The earliest version was implemented on the for BASIC, ALGOL, C, Pascal, PL/I, Factor, Hask prototype USQ-17 computer (called the Countess) ell, Modula-2, Oberon, OCaml, Common at the laboratory. It was the world's first self- Lisp, Scheme, Java, Python, Scala, Nimrod and compiling compiler. This means that the compiler more. was first coded in simplified form in (the bootstrap), and then re-written in its NELIAC own language, compiled by this bootstrap compiler, and re-compiled by itself, making the bootstrap The Navy Electronics Laboratory International obsolete. ALGOL Compiler or NELIAC was a dialect and compiler implementation of the ALGOL 58 Lisp

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 228

XCOM compiles from XPL source code, but since The first self-hosting compiler (excluding XCOM itself is written in XPL it can compile itself assemblers) was written for Lisp by Tim Hart and – it is a self-compiling compiler, not reliant on Mike Levin at MIT in 1962.[4] They wrote a Lisp anyone else's compilers. Several famous languages compiler in Lisp, testing it inside an existing Lisp have self-compiling compilers, including . Once they had improved the compiler Burroughs B5000 Algol, PL/I, C, LISP, and Java. to the point where it could compile its own source Creating such compilers is a chicken-and-egg code, it was self-hosting. conundrum. The language is first implemented by a The compiler as it exists on the standard compiler temporary compiler written in some other tape is a machine language program that was language, or even by an interpreter (often an obtained by having the S-expression definition of interpreter for an intermediate code, as BCPL can the compiler work on itself through the interpreter. do with intcode or O-code). XCOM began as an (AI Memo 39) Algol program running on Burroughs machines, This technique is only possible when an interpreter translating XPL source code into System/360 already exists for the very same language that is to machine code. Someone manually turned its Algol be compiled. It borrows directly from the notion of source code into XPL source code. That XPL running a program on itself as input, which is also version of XCOM was then compiled on used in various proofs in theoretical computer Burroughs, creating a self-compiling XCOM for science, such as the proof that the halting problem System/360 machines. The Algol version was then is undecidable. thrown away, and all further improvements happened in the XPL version only. This is called XPL bootstrapping the compiler. Retargeting the compiler for a new machine architecture is a XPL is a dialect of the PL/I programming similar exercise, except only the code generation language, developed in 1967, used for the modules need to be changed. development of compilers for computer languages. XCOM is a one-pass compiler (but with an emitted It was designed and implemented by a team with code fix-up process for forward branches, loops William McKeeman, James J. Horning, and David and other defined situations). It emits machine code B. Wortman at Stanford University and the for each statement as each grammar rule within a University of California, Santa Cruz. It was first statement is recognized, rather than waiting until it announced at the 1968 Fall Joint Computer has parsed the entire procedure or entire program. Conference in San Francisco. There are no parse trees or other required It is the name of both the programming language intermediate program forms, and no loop-wide or and the LALR parser generator system (or TWS: procedure-wide optimizations. XCOM does, translator writing system) based on the language. however, perform peephole optimization. The code XPL featured a relatively simple bottom-up generation response to each grammar rule is compiler system dubbed MSP (mixed strategy attached to that rule. This immediate approach can precedence) by its authors. It was bootstrapped result in inefficient code and inefficient use of through Burroughs Algol onto the IBM System/360 machine registers. Such are offset by the efficiency computer. Subsequent implementations of XPL of implementation, namely, the use of dynamic featured an SLR(1) parser. strings mentioned earlier: in processing text during compilation, substring operations are frequently XCOM performed. These are as fast as an assignment to an integer; the actual substring is not moved. In short, The XPL compiler, called XCOM, is a one-pass it is quick, easy to teach in a short course, fits into compiler using a table-driven parser and simple modest-sized memories, and is easy to change for code generation techniques. Versions of XCOM different languages or different target machines. exist for different machine architectures, using different hand-written code generation modules for Who developed the first language compiler and those targets. The original target was IBM when? System/360. Rear Admiral Dr. Grace Murray Hopper, in 1949. The compiler, written in assembly language,

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 229

converted symbolic mathematical code into compiler is quite generic, as opposed to a compiler machine code. that was designed specifically for a particular By 1949 programs contained mnemonics that were processor. transformed into binary code instructions executable by the computer. Admiral Hopper and Parallelizing compiler her team extended this improvement on binary This kind of compiler is used in systems composed code with the development of her first compiler, of what is known as a parallel architecture. The the A-O. The A-O series of compilers translated serial input program is converted in such a way that symbolic mathematical code into machine code, it can be processed efficiently by the system. and allowed the specification of call numbers assigned to the collected programming routines There are several types of compilers, each having stored on magnetic tape. One could then simply their particular characteristics to suit varying needs specify the call numbers of the desired routines and and systems. An overview can be found in the computer would “find them on the tape, bring the Cambridge encyclopaedia volumes and other them over and do the additions. This was the first sites, which describe their structure, compiler,” she declared. techniques, possible errors and uses. III. DIFFERENT TYPES OF COMPILERS IV. APPLICATIONS OF COMPILER TECHNOLOGY One-pass compiler A one-pass compiler (also known as "narrow 1: Implementation of High-Level Programming compiler") is a compiler that passes Languages through the source code of each compilation unit  Abstraction: All modern languages only once. Due to this reason it is very fast. But the support abstraction. Data-flow analysis down-side of this is that mostly, it can only be used permits optimizations that significantly in simple programs. reduce the execution time cost of abstractions. Threaded code compiler  Inheritance: The increasing use of smaller, This type of compiler replaces given strings in the but more numerous, methods has made source with given binary code. Many FORTH interprocedural analysis important. Also implementations use threaded code and some use optimizations have improved virtual the term threading for almost any technique used to method dispatch. implement Forth's virtual machine.  Array bounds checking in Java and Ada: Optimizations have been produced that Incremental compiler eliminate many checks. This type of compiler was used in Lisp systems. It  Garbage collection in Java: Improved continued to be developed for imperative algorithms. and interactive programming.  Dynamic compilation in Java: Optimizations to predict/determine parts Stage compiler of the program that will be heavily This is used in Prolog machines for instance. The executed and thus should be the first/only process involves compiling to assembly language. parts dynamically compiled into native code. Just-in-time compiler 2: Optimization for Computer Architectures Starting off in bytecode, and then compiling it into Parallelism machine code just in time before the execution Major research efforts had lead to improvements in starts off.  Automatic parallelization: Examine serial programs to determine and expose Retargetable compiler potential parallelism. This type of compiler is used to produce code from  Compilation of explicitly parallel various CPUs. However it is worth noting that the languages. object code in this case is sometimes of a lower quality. This is due to the fact that this type of

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 230

Memory Hierarchies research group in the 1980s designed a VLSI chip All machines have a limited number of registers, for rapid interprocessor coordination. The design which can be accessed much faster than central software we used essentially let you paint. You memory. All but the simplest compilers devote painted blue lines where you wanted metal, green effort to using this scarce resource effectively. for polysilicon, etc. Where certain colors crossed, a Modern processors have several levels of caches transistor appeared. and advanced compilers produce code designed to Current microprocessors are much too complicated utilize the caches well. to permit such a low-level approach. Instead, 3: Design of New Computer Architectures designers write in a high level description language RISC (Reduced Instruction Set Computer) which is compiled down the specific layout. RISC computers have comparatively simple Database Query Interpreters instructions, complicated instructions require The optimization of database queries and several RISC instructions. A CISC, Complex transactions is quite a serious subject. Instruction Set Computer, contains both complex Compiled Simulation and simple instructions. A sequence of CISC 5: Software Productivity Tools instructions would be a larger sequence of RISC Dataflow techniques developed for optimizing code instructions. Advanced optimizations are able to are also useful for finding errors. Here correctness find commonality in this larger sequence and lower is not an absolute requirement, a good thing since the total number of instructions. The CISC Intel finding all errors in undecidable. processor line 8086/80286/80386/... had a Type Checking major change with the 686 (a.k.a. pentium pro). In Techniques developed to check for type correctness this processor, the CISC instructions were (we will see some of these) can be extended to find decomposed into RISC instructions by the other errors such as using an uninitialized variable. processor itself. Currently, code for x86 processors Bounds Checking normally achieves highest performance when the As mentioned above optimizations have been (optimizing) compiler emits primarily simple developed to eliminate unnecessary bounds instructions. checking for languages like Ada and Java that Specialized Architectures perform the checks automatically. Similar A great variety has emerged. Compilers are techniques can help find potential buffer produced before the processors are fabricated. overflow errors that can be a serious security threat. Indeed, compilation plus simulated execution of the Memory-Management Tools generated machine code is used to evaluate Languages (e.g., Java) with garbage collection proposed designs. cannot have memory leaks (failure to free no longer 4: Program Translations accessible memory). Compilation techniques can Binary Translation help to find these leaks in languages like C that do This means translating from one machine language not have garbage collection. to another. Companies changing processors V. Study of Different Compilers and Their sometimes use binary translation to execute legacy Properties: code on new machines. Apple did this when Common Lisp Compiler converting from Motorola CISC processors to the Common Lisp (CL) is a dialect of the Lisp PowerPC. An alternative is to have the new programming language, published in ANSI processor execute programs in both the new and standard document "ANSI INCITS 226-1994 old instruction set. Intel had the Itanium processor (R2004), (formerly X3.226-1994 (R1999))".[1] also execute x86 code. Apple, however, did not From the ANSI Common Lisp standard the produce their own processors. Common Lisp HyperSpec has been derived[2] for With the recent dominance of x86 processors, use with web browsers. Common Lisp was binary translators from x86 have been developed so developed to standardize the divergent variants of that other microprocessors can be used to execute Lisp (though mainly the MacLisp variants) which x86 software. predated it, thus it is not an implementation but Hardware Synthesis rather a language specification. Several In the old days integrated circuits were designed by implementations of the Common Lisp standard are hand. For example, the NYU Ultracomputer

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 231

available, including free and open source software programming paradigms. As a dynamic and proprietary products. programming language, it facilitates evolutionary and incremental software development, with Common Lisp is a general-purpose, multi-paradigm iterative compilation into efficient run-time programming language. It supports a combination programs. of procedural, functional, and object-oriented

Unix- Other License Compiler Author Target Windows IDE? like OSs type Allegro Common Franz, Inc. Native code Yes Yes Yes Proprietary Yes Lisp Armed Bear JVM Yes Yes Yes GPL Yes Common Lisp CLISP Bytecode Yes Yes Yes GPL No

Clozure Clozure CL Native code Yes Yes No LGPL Yes Associates CMU Common Native code, Public No Yes No Yes Lisp Bytecode Domain Corman Common Native code Yes No No Proprietary Yes Lisp Embeddable Bytecode, C Yes Yes Yes LGPL Yes Common Lisp GNU Common C Yes Yes No GPL No Lisp LispWorks LispWorks Native code Yes Yes No Proprietary Yes Ltd Open Genera Symbolics Ivory No Yes No Proprietary Yes Scieneer Common Scieneer Pty Native code No Yes No Proprietary No Lisp Ltd Steel Bank Public Yes Native code Yes Yes Yes Common Lisp Domain

D- COMPILERS

The D programming language is an object-oriented, imperative, multi-paradigm system programming language created by Walter Bright of Digital Mars. Though it originated as a re-engineering of C++, D is a distinct language, having redesigned some core C++ features while also taking inspiration from other languages, notably Java, Python, Ruby, C#, and Eiffel. D's design goals attempt to combine the performance of compiled languages with the safety and expressive power of modern dynamic languages. Idiomatic D code is commonly as fast as equivalent C++ code, while being shorter and memory-safe. Other Compiler Author Windows -like License type IDE? OSs Digital Mars D Digital Mars and Yes 32-bit Linux, Mac No GPL and Artistic No

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 232

Other Compiler Author Windows Unix-like License type IDE? OSs (DMD) others OS X, FreeBSD D Compiler for ? Yes Yes ? ? ? .Net GDC GCC Yes Yes No GPL No No LDC LLVM Yes Yes No Open Source

DiBOL – COMPILERS

DiBOL or Digital's Business Oriented Language is a general-purpose, procedural, language, which is well-suited for Management Information Systems (MIS) software development. It has a syntax similar to FORTRAN and BASIC, along with BCD arithmetic. It shares the COBOL program structure of data and procedure divisions.

Compiler Author Windows Unix-like Other OSs License type IDE? Yes Synergy DBL[2][3][4] Synergex Yes Yes Yes Proprietary

EIFFEL

Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer (an object- orientation proponent and author ofObject-Oriented Software Construction) and Eiffel Software. The design of the language is closely connected with the Eiffel programming method. Both are based on a set of principles, including , command-query separation, the uniform-access principle, the single-choice principle, the open-closed principle, and option-operand separation. Many concepts initially introduced by Eiffel later found their way into Java, C#, and other languages. New language design ideas, particularly through the Ecma/ISO standardization process, continue to be incorporated into the Eiffel language. Unix- Other Compiler Author Windows License type IDE? like OSs Eiffel Software / Community Dual GPL / Yes EiffelStudio Yes Yes Yes developed (sourceforge) Proprietary

Java compilers Unix- Other License Compiler Author Windows IDE? like OSs type GNU Java GNU Project No Yes No GPL No Sun Microsystems (Owned Javac Yes Yes Yes GPL No by Oracle) S.N Java Compiler SN Ink. (Owned by S.N) Yes No No Free No ECJ ( Yes Eclipse project Yes Yes Yes EPL Compiler for Java)

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 233

Pascal compilers Windo Unix Licens Compiler Author Other OSs IDE? ws -like e type Andrew Amsterdam Tanenbaum and Ceriel No Yes Yes BSD No Compiler Kit Jacobs Embarcadero Embarcadero (CodeG Proprietar Yes No ? Yes ear) y Proprietar Delphi Prism RemObjects Yes Yes Yes Yes y FrameworkPas Framework Proprietar Yes No Yes (MS-DOS) Yes cal Computers, Inc. y Yes FPIDE Florian Paul Klämpfl Yes Yes (OS/2, FreeBSD, Solaris, H GPL & Lazaru aiku, etc.) s Proprietar Irie Pascal Irie Tools Limited Yes Yes No Yes y GNU Pascal GNU Project Yes Yes Yes GPL No Yes Proprietar Kylix (CodeGear) No No Yes (Linux) y Turbo Pascal Proprietar Borland (CodeGear) Yes (3.x) No No Yes for Windows y Proprietar Microsoft No No Yes (MS-DOS) Yes Pascal y Neuron Pascal Salah IBN AMAR Yes Yes Yes GPL Yes Compiler Proprietar Unknow HP Pascal Hewlett-Packard No No Yes (OpenVMS) y n Turbo Pascal CodeGear (Borland) No No Yes Freeware Yes OpenSour Vector Pascal Glasgow University Yes Yes No No ce Vitaly Miryanov and Yes Virtual Pascal Yes Yes (OS/2) Freeware Yes Allan Mertner (Linux) Wolfgang Draxler and Yes WDSibyl Yes No Yes (OS/2) GPL Speed-Soft

first COMPILER for UNIVAC and FORTRAN to VI. CONCLUSION various latest compilers for each and every Thus, we get an introduction about the compilers computer language, there has been significant and their working. The modern world is changing growth in them. The compiler field must develop very fast and hence this change is also taking place the technologies that enable more of the progress in the development of compilers. From the very the field has experienced over the past 50 years.

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 234

Computer science educators must attract some of the brightest students to the compiler field by showing them its deep intellectual foundations, highlighting the broad applicability of compiler technology to many areas of computer science. Today, compilers are fast, accurate and more precise to their working. With the upcoming technology the compiler growth will enhance and we can see the future of compilers as bright era. REFERENCES [1] Compiler Research: The Next 50 Years BY MARY HALL, DAVID PADUA, AND KESHAV PINGALI [2] Compilers Spring term Mick O’Donnell: [email protected] Alfonso Ortega: [email protected] [3] http://www.cs.yale.edu/homes/tap/Files/hopper -story.html [4] http://en.wikipedia.org/wiki/D_(programming_ language) [5] http://en.wikipedia.org/wiki/Common_Lisp [6] http://en.wikipedia.org/wiki/History_of_compi ler_construction [7] http://en.wikipedia.org/wiki/List_of_compilers #Source-to-source_compilers

IJIRT 100560 INTERNATONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 235