Introduction to metacompilation in FORTH

published: 24 March 2021 / updated 25 March 2021

Lire cette page en français

 

Definition of a metacompiler: compiler that builds compilers for languages or systems other than his own .

The metacompiler

Of all the advanced languages, only FORTH is powerful enough and handy to have the concept of metacompilation. Any FORTH system, except perhaps the versions of the very earliest origins of the language, is itself designed and written in FORTH. The FORTH tool for writing a FORTH system is its metacompiler and it has become customary to provide the user with the sources of the metacompiler as well as the sources of the language itself metacompiled. This is an absolutely unique feature for a programming language.

On many software repositories it is common to find versions of the FORTH language written in C language. Should we consider these creations as metacompilation?

Unlike FORTH, the C language is heavy and verbose. Where FORTH can metacompile code in small memory spaces, the C language requires a programming environment with resources important.

For the user, having the metacompiler and the language sources means that he has with his programming tool of an extremely powerful SYSGEN allowing him to redefine according to his own needs the very tool he uses. The prospects offered are breathtaking: the concept is a delightful illustration of foundations of self-referenced creative intelligence as developed for example Douglas Hofstadter in his book "Godel, Escher, Bach: An Eternal Golden Braid".

The FORTH compiler

An executable program in FORTH language goes through the compilation of a code source, here called application:

FORTH does not create independent executable code like a compiler would in C language.

FORTH is simply expanding. The final application is the FORTH kernel and the compiled application code.

The FORTH metacompiler

The FORTH metacompiler is an application, which has the particularity to process a source code written in FORTH and allowing to generate a new kernel FORTH:

Conceptually, generating a new FORTH kernel is pretty trivial:

  1. we first compile the metagenerator, here META
  2. the metacompiler processes the contents of the source file kernel
  3. kernel code compiled by META is injected into a zone buffer, here TARGET
  4. at the end of compilation, the content of TARGET can be saved locally or transferred to the target system.

If the target system has a different processor than the one that metacompiles, you must provide the compilation of an assembler adapted to this target processor. For example, to generate an executable code for ARDUINO, it will first be necessary to compile an assembler to the Atmega processor.

The kernel

STOP, would you say! Why do I need an assembler if I am compiling FORTH code?

In fact, the FORTH language cannot be written entirely in the FORTH language. There are a number of words, so-called primitive words, which cannot be written than in assembler. Example definition of the word +:

CODE +  (S n1 n2 -- sum )
  BX POP
  AX POP
  BX AX ADD
  1PUSH
END-CODE

Extract of the code from kernel.txt for the 8086 processor.

FORTH can be compiled with less than 20% of the kernel generated via assembler. Once the minimal set of FORTH primitives have been assembled, we can then write the rest of the FORTH kernel in FORTH:

: (UD.)  (S ud -- a l )
  <# #S #>
;
 
: UD.  (S ud -- )
  (UD.) TYPE SPACE
;

If we take a look at the assembly code for Atmega from ud. we have this:

UDDOT_L:
    .db     NFA|3,"ud."
    rcall   LESSNUM
    rcall   NUMS
    rcall   NUMGREATER
    call    TYPE
    jmp     SPACE_

The main advantage of going through a metacompiler is to reduce the amount of assembled code when generating a new kernel FORTH.

If we change the target processor, we will only have another assembler to integrate and the primitive words to rewrite. We can use labels to simplify writing code in assembler:

LABEL DOCREATE  (S -- )
  W INC
  W INC
  W PUSH
  NEXT

and further, the use of DOCREATE:

: VARIABLE  (S -- ) 
  CREATE 
    0 , 
;USES DOCREATE , 

WARNING: the word , used here is a specific version of the metacompiler. We will see these mechanisms when discussing the detailed operation of the metacompiler.

Above, the definition of DOCREATE ends with NEXT here is the definition in the kernel source code for FORTH 83 under MS-DOS:

H: NEXT  ( --- ) 
  META ASSEMBLER >NEXT #) JMP 
; 

This definition extends the meta-compiler instead of generating FORTH code in TARGET.

In this definition, the word ASSEMBLER appears. This word selects a vocabulary. We will come back to the notion of vocabularies in detail in a dedicated article.

Vocabularies

All the difficulty in understanding a metacompiler lies in the ambiguity words. It is not possible to escape this ambiguity insofar as the created system uses the language of the creative system. We are talking here about host-system and target-system. The ambiguity is further increased by the fact that a good metacompiler is "homologous": it compiles "like" a usual compiler, i.e. the target is written in natural Forth as if the code had to be a simple extension of the host. Now the target, complete system and autonomous, cannot of course be a subset of the host: it resides necessarily outside of it.

To remove the ambiguity of words, FORTH uses its concept of VOCABULARIES. The same word will have a different meaning depending on whether it belongs to this or that vocabulary. It is the CONTEXT, that is to say the list of active vocabularies at a given instant, which allows the host system to know which of all the homonyms the meaning of the word to be interpreted. The management of vocabularies in metacompilation is a key step which conditions the whole process.

Here is how the word , can be defined in the meta-compiler:

\ compile 16 bits value in target  
: ,-T  (S n -- )               
  HERE-T !-T 2 ALLOT-T 
; 
H: , 
   ,-T 
H; 

The content of this article will evolve according to the progress of the meta-compiler to generate the code for ARDUINO cards, as well as for other cards if the contributors actively collaborate on this project.