Bootstrapping is a term used in computer science Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. It is frequently described as the systematic study of algorithmic processes that create, describe and transform information. According to Peter J. Denning, the to describe the techniques involved in writing a compiler A compiler is a computer program that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program (or assembler Assembly languages are a family of low-level languages for programming computers, microprocessors, microcontrollers, and other integrated circuits. They implement a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture. This representation is usually defined by the hardware) in the target programming language A programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human communication which it is intended to compile. This technique is also called self-hosting The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-.
A large proportion of programming languages are bootstrapped, including BASIC In computer programming, BASIC is a family of high-level programming languages. The original BASIC was designed in 1964 by John George Kemeny and Thomas Eugene Kurtz at Dartmouth in New Hampshire, USA to provide computer access to non-science students. At the time, nearly all use of computers required writing custom software, which was something, C C is a general-purpose computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system, Pascal Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring, Factor Factor is a dynamically typed concatenative programming language whose design and implementation is led by Slava Pestov. Factor's main influences are Joy, Forth, Lisp and Self. Current versions of Factor exist as Continuous Builds for supported platforms while version 1.0 is under development, Haskell Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry, Modula-2 Modula-2 is a computer programming language invented by Niklaus Wirth at ETH, around 1978, as a successor to his intermediate language Modula. Modula-2 was implemented in 1980 for the Lilith computer, which was commercialized in 1982 by startup company DISER as MC1 and MC2. DISER sold 120 units worldwide. The Modula-2 language was understood by, Oberon Oberon is a programming language created in 1986 by Professor Niklaus Wirth and his associates at ETH Zurich in Switzerland. It was developed as part of the implementation of the Oberon operating system. The original intention was to use Modula-2 as the implementation language but it lacked the required safe type-extension facilities. Also, it was, OCaml Objective Caml, or OCaml is the main implementation of the Caml programming language, created by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy and others in 1996. OCaml is an open source project managed and principally maintained by INRIA, Common Lisp Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , (formerly X3.226-1994 (R1999)). Developed to standardize the divergent variants of Lisp (though mainly the MacLisp variants) which predated it, it is not an implementation but rather a language, Scheme Scheme is a multi-paradigm programming language. It is one of the two main dialects of Lisp and supports a number of programming paradigms but is best known for its support of functional programming. It was developed by Guy L. Steele and Gerald Jay Sussman in the 1970s. Scheme was introduced to the academic world via a series of papers now, Clojure Clojure is a dialect of the Lisp programming language. It is a general-purpose language supporting interactive development that encourages a functional programming style which enables simplified multithreaded programming. Clojure runs on the Java Virtual Machine. Clojure honors the code-as-data philosophy and has a sophisticated Lisp macro system, and more.
Contents |
Advantages
Bootstrapping a compiler has the following advantages:[1]
- it is a non-trivial test of the language being compiled.
- compiler developers only need to know the language being compiled.
- improvements to the compiler's back-end improve not only general purpose programs but also the compiler itself.
- it is a comprehensive consistency check as it should be able to reproduce its own object code.
The chicken and egg problem
If one needs a compiler for language X to obtain a compiler for language X (which is written in language X), how did the first compiler get written? Possible methods to solving this chicken and egg Chickens hatch from eggs, but eggs are laid by chickens, making it difficult to say which originally gave rise to the other. To ancient philosophers, the question about the first chicken or egg also evoked the questions of how life and the universe in general began problem include:
- Implementing an interpreter In computer science, an interpreter is a computer program which reads source code written in a high-level programming language, transforms the code to machine code, and executes the machine code. Using an interpreter, a single source file can produce equal results even in vastly different systems . Using a compiler, a single source file can or compiler A compiler is a computer program that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program for language X in language Y. Niklaus Wirth Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages reported that he wrote the first Pascal Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring compiler in Fortran Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing. Originally developed by IBM in the 1950s for scientific and engineering applications, Fortran came to dominate this area of programming early on and has been in continual use for over half a century.
- Another interpreter or compiler for X has already been written in another language Y; this is how Scheme Scheme is a multi-paradigm programming language. It is one of the two main dialects of Lisp and supports a number of programming paradigms but is best known for its support of functional programming. It was developed by Guy L. Steele and Gerald Jay Sussman in the 1970s. Scheme was introduced to the academic world via a series of papers now is often bootstrapped.
- Earlier versions of the compiler were written in a subset of X for which there existed some other compiler; this is how some supersets of Java Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to bytecode that can run, Haskell Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry, and the Free Pascal Free Pascal is a free, portable and open source compiler for Pascal and Object Pascal languages. It supports a number of dialects, including the two most popular Borland dialects—Turbo Pascal and Delphi—and some Mac Pascal constructs compiler are bootstrapped.
- The compiler for X is cross compiled A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. Cross compiler tools are used to generate executables for embedded system or multiple platforms. It is used to compile for a platform upon which it is not feasible to do the compiling, like microcontrollers that don't from another architecture where there exists a compiler for X; this is how compilers for C C is a general-purpose computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system are usually ported to other platforms.
- Writing the compiler in X; then hand-compiling it from source (most likely in a non-optimized way) and running that on the code to get an optimized compiler. Donald Knuth Donald Ervin Knuth (born January 10, 1938) is a renowned computer scientist and Professor Emeritus of the Art of Computer Programming at Stanford University used this for his WEB literate programming Literate programming is an approach to programming introduced by Donald Knuth as an alternative to the structured programming paradigm of the 1970s system.
Methods for distributing compilers in source code include providing a portable bytecode Bytecode is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code. Since instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware version of the compiler, so as to bootstrap the process of compiling the compiler with itself.
Such methods are also one way of detecting or avoiding (or both) the potential problem pointed out in Reflections on Trusting Trust.
History
Main article: History of compiler writing In computing, a compiler is a computer program that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable programThe first language to provide such a bootstrap was NELIAC. The first commercial language to do so was PL/I PL/I is an imperative computer programming language designed for scientific, engineering, and business applications. It has been used by various academic, commercial and industrial users since it was introduced in the early 1960s, and is still actively used today.
The first self-hosting The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self- compiler (excluding assemblers) was written for Lisp Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older. Like Fortran, Lisp has changed a great deal since its early days, and a number of dialects have by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.[2]
- The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression The term S-expression or sexp refers to a convention for representing semi-structured data in human-readable textual form. Symbolic expressions are mostly made of symbols and lists. S-expressions are probably best known for their use in the Lisp family of programming languages. Other uses of S-expressions are in Lisp-derived languages such as definition of the compiler work on itself through the interpreter. (AI Memo 39)[2]
This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science Theoretical computer science is the collection of topics of computer science that focuses on the more abstract, logical and mathematical aspects of computing, such as the theory of computation, analysis of algorithms, and semantics of programming languages. Although not itself a single topic, its practitioners form a distinct subgroup within, such as the proof that the halting problem In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input is undecidable.
See also
- Self-hosting The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-
- Self-interpreter A self-interpreter, or metainterpreter, is a programming language interpreter written in the language it interprets. An example would be a BASIC interpreter written in BASIC. Conceptually, self-interpreters are closely related to self-hosting compilers
- T-diagram
References
- ^ Compilers and Compiler Generators: An Introduction With C++. Patrick D. Terry 1997. International Thomson Computer Press. ISBN 1850322988
- ^ a b Tim Hart and Mike Levin. "AI Memo 39-The new compiler". ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-039.pdf. Retrieved 2008-05-23.
Categories: Compilers | Compiler theory This category is for computer science articles related to compiler theory. At least for the moment, this means that it is the appropriate category for any article related to compilers in general, such as Relocation table, even if they are not particularly "theoretical" in nature. The category Compilers is the appropriate category for
480px x 640px | 63.50kB
[source page]
was designed using mucs pcb tools and nicely demonstrates that it s not only compilers which have bootstrapping problems A photo of a populated mainboard can be found here Last but not least the TREE board photos front close up and
unknown
Mon, 23 Jan 2006 12:19:26 GM
What if the two . compilers. were once (way back in their history) initially compiled with a common . compiler. ? This is quite common with . bootstrapping. - create an initial . compiler. using an off-the-shelf one. Posted by: davidg at January 23, ...

