Aarne Ranta, Implementing Programming Languages. An Introduction to Compilers and Interpreters, with an appendix coauthored by Markus Forsberg, College Publications, London, 2012. ISBN 978-1-84890-064-6
Publisher's web page
adlibris.se SEK 163
cremona.se SEK 135
College Publications, Volume 16 in Computing series
Aarne Ranta, http://www.cse.chalmers.se/~aarne/
Markus Forsberg, http://spraakbanken.gu.se/personal/markus/
University of Gothenburg, Sweden
This book is meant as a text book for Computer Science students, years 2 and later.
It is also usable as a self-study book, and to some extent, as a manual to the BNFC tool (BNF Converter).
The book is based on material created for the course Programming Language Technology at Chalmers University of Technology and University of Gothenburg, Sweden. The course has been given by the author yearly since 2007. The book also covers some material for the continuation course Compiler Construction, which the author taught in 2002 to 2006.
Implementing a programming language means bridging the gap from the programmer's high-level thinking to the machine's zeros and ones. If this is done in an efficient and reliable way, programmers can concentrate on the actual problems they have to solve, rather than on the details of machines. But understanding the whole chain from languages to machines is still an essential part of the training of any serious programmer. It will result in a more competent programmer, who will moreover be able to develop new languages. A new language is often the best way to solve a problem, and less difficult than it may sound.
This book follows a theory-based practical approach, where theoretical models serve as blueprint for actual coding. The reader is guided to build compilers and interpreters in a well-understood and scalable way. The solutions are moreover portable to different implementation languages. Much of the actual code is automatically generated from a grammar of the language, by using the BNF Converter tool. The rest can be written in Haskell or Java, for which the book gives detailed guidance, but with some adaptation also in C, C++, C#, or OCaml, which are supported by the BNF Converter.
The main focus of the book is on standard imperative and functional languages: a subset of C++ and a subset of Haskell are the source languages, and Java Virtual Machine is the main target. Simple Intel x86 native code compilation is shown to completes the chain from language to machine. The last chapter leaves the standard paths and explores the space of language design ranging from minimal Turing-complete languages to human-computer interaction in natural language.
Table of Contents
Here are preliminary versions of some assignments; full versions coming soon. In particular, the test suites and the code templates will be extended.
1. Parser for C++
2. Type Checker for CPP
3. Interpreter for CPP
4. Code Generator for CPP
5. Interpreter for Fun
6. A Special-Purpose Language
Chapter 1, Compilation Phases
Chapter 2, Grammars
Chapter 3, Lexing and parsing
Chapter 4, Type checking
Chapter 5, Interpreters
Chapter 6, Code generation
Chapter 7, Functional programming languages
Chapter 8, The language design space
Electronic self-study exercises
A set of exercises from a related course, and solutions to some of them.
CPP grammar (Chapter 2)
Query language (Chapter 8); also as a web demo
Precompiled Java tools (Cup and JLex)
Paul Carter, A Tutorial on PC assembly.
Paul Graham, essays
Mark Jones, Typing Haskell in Haskell
Xavier Leroy, Compiling Functional Languages
Simon Peyton Jones and David Lester, Implementing Functional Languages: a Tutorial,
Eric Raymond, The Art of Unix Programming,
R. Stallman, Using and Porting the GNU Compiler Collection
S. Thompson, Type Theory and Functional Programming
p. 10 (and also later): it is stated that Python is an untyped language. By this we mean that Python has no compile-time type checking. But it does have a run-time notion of types, known as dynamic typing.
p. 27: too many classes in the Java example have the name
EAdd. Should be
p. 73: in
if (typ2 = typ) should be
if (typ2 == typ)
p. 92: in the last rule for
ifeq L, the code pointer should become P+1 when v != 0
p. 170: the
lin rules for
TAny generate a bogus condition. The proper rules are:
TAll kind = parenth ("\\p -> and [p x | x <-" ++ kind ++ "]") ; TAny kind = parenth ("\\p -> or [p x | x <-" ++ kind ++ "]") ;
Please send your errata to Aarne Ranta!