Fennel - The Book

Table of Contents

1 Introduction

This book is intended to be a complete reimplementation of Fennel, along with associated tooling that reuses the compiler infrastructure. It was developed by adapting the original Fennel compiler. I could not have written any of this without basing it on the work others have put into creating and maintaining the Fennel language.

1.1 How to read this book

This book is also a program! It is written using the literate programming style, which means that the source code you read in this book is also the source code of the program the book is about.

When a code block is not part of the compiler's source code, it will have a comment at the beginning saying so. It will look like this:

;; Example - not a part of the source code!

The compiler itself is a single Fennel file.

Note also that the source code blocks in this book are arranged for the purpose of reading, and do not necessarily appear in the same order in the source code itself. For instance, much of the preamble of the compiler is left until the end, since it's largely miscellaneous boilerplate that doesn't contribute to an understanding of the code.

1.2 What is Fennel?

Fennel is a language that compiles to Lua. It's part of the Lisp language family, which includes languages like Common Lisp, Clojure, Racket, and Scheme.

The Fennel compiler is a Lua program - the original compiler was written by hand in Lua, and the compiler in this book compiles to Lua. When you run a Fennel program, it is first compiled into Lua code and then run by the Lua interpreter. Running Fennel from the command line once you have it installed looks like this:

# Example - not a part of the source code!
fennel --eval "(+ 1 1)"
2

You can also compile code to Lua ahead of time and then run it with a Lua interpreter, e.g.:

# Example - not a part of the source code!
echo '(print (.. "hello " "world!"))' |
  fennel --compile -   |         # compile fennel from stdin
  tee compiled.lua     |         # copy the output to compiled.fnl
  xargs -0 printf 'compiled: %s' # print the output
printf "output:   $(lua compiled.lua)"
compiled: return print(("hello " .. "world!"))
output:   hello world!

2 Iterators and Streams

A construct we'll use frequently throughout the book is the stream. Streams are a functions which output data sequentially. If you've ever used an iterator like pairs or ipairs in Lua, you've already used one - the function that is returned by those iterators is a stream.

Just as there are two kinds of iterators in Lua, there are likewise two kinds of streams - stateful and stateless.

2.1 Stateful Streams

Stateful streams capture the state of the stream within a closure (or a coroutine). They ignore any arguments, and return the next element of the stream each time they are called. Here's a function which creates a stateful stream that returns each byte in a string successively:

(fn string-bytes [str]
  (var index 1)
  #(let [r (str:byte index)]
     (set index (+ index 1))
     r))

(each [byte (string-bytes "abc")]
  (print (.. byte " " (string.char byte))))

Stateful streams are less flexible - because the state is stored within the closure, there's no way to set the state from outside the stream. One of the most important implications of stateful streams is that you can only traverse the stream once.

Some streams are inherently stateful. For instance, the data returned by (io.read) is only returned once, and calling it again will result in new data being returned. Here's how we could turn that into a stateful stream:

(local chunk-size (^ 2 13)) ;; 8kb
(fn file-chunks [filename]
  (local f (if filename (io.open filename :r)
               io.stdin))
  #(let [chunk (f:read chunk-size)]
     (when (not chunk) (f:close))
     chunk))

2.2 Stateless Streams

Stateless streams have their state passed in using two arguments: the invariant state and the variant state. Each time the stream is called, it must be passed the two state values as arguments. The stream should then return two values: the new variant state and the streamed value.

To illustrate this more concretely, let's take a look at the iterator ipairs, which returns a stateless stream. In ipairs' case, the invariant state is the table containing the elements, and the variant state is the current index of the stream.

;; Example - not a part of the source code!
(local fennelview #((require :fennelview) $ {:one-line true}))

(let [letters [:a :b :c]
      ;; the invariant state is the array, and the variant state is
      ;; the last index (starting at 0)
      (stream arr initial-i) (ipairs letters)]

  ;; Manually iterate the stream
  (var i initial-i)
  (while i
    (let [(new-i value) (stream arr i)]
      (set i new-i)
      (print (fennelview {: arr : i : value}))))

  ;; Manually iterate the stream once with a different table and index
  (let [arr [:d :e :f]
        initial-i 1
        (i value) (stream arr initial-i)]
    (print (fennelview {: arr :i i : value}))))
{:arr ["a" "b" "c"] :i 1 :value "a"}
{:arr ["a" "b" "c"] :i 2 :value "b"}
{:arr ["a" "b" "c"] :i 3 :value "c"}
{:arr ["a" "b" "c"]}
{:arr ["d" "e" "f"] :i 2 :value "e"}

2.3 Intro to Streams: stateful-string-stream

As both an example and a helper function for later use, let's define a function that creates a stateful stream from a string:

(fn stateful-string-stream [str]
  (var index 1)
  #(let [r (str:byte index)]
     (set index (+ index 1))
     r))

We return an anonymous function which closes over str and index, maintaing the state in the function itself. Each time it is called, it will return the next byte in the string.

One handy feature of this design is that these streams are also iterators! For instance, using stateful-string-stream we can print the bytes of a string with the following code:

;; EXAMPLE - not a part of the source code!
(local {:streams {: stateful-string-stream}} (require :fennel-the-book))
(each [byte (stateful-string-stream "abc")]
  (io.stdout:write (.. byte " ")))
97 98 99 nil

2.4 Buffering stateful streams with create-cursor

The tokenizer we will implement in the next chapter will be using the stateful-string-stream we just defined to stream the bytes of the code it's digesting. However, the stream we've defined can be awkward to use - without keeping track of things manually we can't check bytes ahead of the stream's current position without advancing the stream. This gets quite messy to deal with, since you can't just pass the stream to a function if you might need to send buffered data or both instead.

To remedy this, we'll define a simple abstraction over a stateful stream called a cursor. The cursor is a table with two main functions that provide access to the values in the stream - take and peek.

cursor.take is itself a stateful stream - if you create a cursor that wraps a stateful stream and iterate over cursor.take, you will get exactly the same values as if you iterated over the original stream.

cursor.peek, on the other hand, lets you look at the upcoming values of cursor.take. It does so by calling the original stream and then storing the value it returns in a buffer. When cursor.take is called, it returns any values in the buffer before returning values from the original stream.

(fn create-cursor [stream]
  ;; We track the current position and the end of the buffer. The
  ;; indices of the buffer items will always be between these two
  ;; numbers. Since we don't move the buffer elements back to the
  ;; beginning at any point, these indices will increase
  ;; monotonically.
  (var position 0)
  (var buffer-end 0)
  (let [;; This is the buffer to store values that were retrieved
        ;; ahead of the cursor position
        buffer []
        buffer-length #(- buffer-end position)
        buffer-get #(. buffer (+ position $))
        buffer-set #(tset buffer (+ position $1) $2)
        buffer-push
        #(let [new-buffer-end (+ buffer-end 1)]
           (tset buffer new-buffer-end $)
           (set buffer-end new-buffer-end))]

    (fn buffer-load-and-get [i]
      (if (= i (+ 1 (buffer-length)))
          (let [new-item (stream)]
            (buffer-push new-item)
            new-item)
          (> i (buffer-length))
          (let [new-item (stream)]
            (buffer-push new-item)
            (buffer-load-and-get i))
          (let [old-item (buffer-get i)]
            old-item)))

    ;; Tail recursive peek lets us peek ahead multiple values without
    ;; allocating a table each time
    (fn peek [a b]
      (let [(i n) (match (values a b)
                    (an-i an-n) (values an-i an-n)
                    (an-n nil) (values 1 an-n)
                    (nil nil) (values 1 1))]
        (when (< n 1) (error "cannot peek at less than one value"))
        (if
         ;; Base case - return the remaining item
         (or (not n) (= n 1))
         (buffer-load-and-get i)
         ;; Otherwise, return the item at i and recursively iterate
         ;; until we've returned all the requested values
         (values (buffer-load-and-get i)
                 (peek (+ i 1) (- n 1))))))

    (fn take [n]
      (let [n (if (= n nil) 1 n)]
        (when (> n 0)
          (values
           (if (> (buffer-length) 0)
               (let [item (buffer-get 1)]
                 (buffer-set 1 nil)
                 (set position (+ position 1))
                 item)
               (do (set position (+ position 1))
                   (set buffer-end (+ buffer-end 1))
                   (stream)))
           (take (- n 1))))))

    {: peek : take}))

To demonstrate how this can be useful, let's try out our cursor with some sample code:

;; Example - not a part of the source code!
(global unpack (or unpack table.unpack))
(let [{: print-table} (require :org-table-helpers)
      {:streams {: stateful-string-stream
                 : create-cursor}} (require :fennel-the-book)
      {: insert : concat} table
      stream (stateful-string-stream "abcdef")
      cursor (create-cursor stream)
      rows []]

  ;; Advance the stream of bytes by iterating over cursor.take
  (each [byte cursor.take]
    ;; Check the next byte after the cursor, then the next two bytes
    (let [peek-1-byte (cursor.peek)
          peek-2-bytes [(cursor.peek 2)]]
      (insert rows [[byte] [peek-1-byte] peek-2-bytes])))

  ;; Add an additional column of decoded characters for each column of bytes
  (each [i row (ipairs rows)]
    (local new-row [])
    (each [j bytes (ipairs row)]
      (each [_ byte (ipairs bytes)] (insert new-row byte))
      (when (and (= j 3) (< (length bytes) 2)) (insert new-row ""))
      (when (> (length bytes) 0)
        (insert new-row (string.char (unpack bytes)))))
    (tset rows i new-row))

  (print-table
   rows {:column-headers ["Current" "" "Peek 1" "" "Peek 2"]
         :column-groups [:/ :> :< :> :< "" :>]
         }))
Current   Peek 1   Peek 2    
97 a 98 b 98 99 bc
98 b 99 c 99 100 cd
99 c 100 d 100 101 de
100 d 101 e 101 102 ef
101 e 102 f 102   f
102 f          

As you can see, the (cursor.peek) expression does not affect the subsequent (cursor.peek 2) expression - the values only advance when cursor.take is called in the iterator.

For convenience in testing later functions, we'll also define create-string-cursor which creates a cursor that buffers the bytes of a string:

(fn create-string-cursor [s] (-> s stateful-string-stream create-cursor))

3 Tokenizing: Bytes and Pieces

The first step towards compiling code is tokenizing. Tokenizing is the process of taking the source format of the language - in our case, a UTF-8 string - and turning it into tokens. Tokens are the individual instances of the basic elements of a languages grammar. Tokens are not nested - for instance, we don't have a list token type, but rather opener and closer token types to indicate when a list begins and ends.

Each token is tagged with a token type. There is a finite number of token types, as follows:

The total list of token types is as follows:

  • String literals - e.g., "example"
  • Number literals - e.g., 3.456e-7 or 0xabc123
  • Symbols - e.g., example
  • Keyword strings - e.g., :example
  • Openers - (, [, or {
  • Closers - ), ], or }
  • Prefix characters - ', `, ,, and #
  • Whitespace and comments

Whitespace tokens are mostly ignored by the parser, and comment tokens are completely ignored, but we tokenize them anyway so that the tokenizer can be re-used by other tooling, like a formatter for Fennel code.

Since the number of token types is fixed and small, it's convenient to use integers instead of strings to represent the token types. To do so, we use a table that stores a mapping of string names to their corresponding number values and predicate functions. The predicate functions let us check the type readably without first converting the number to a string.

First, we'll create a macro that will let us use bare symbols without them resolving to variables:

(global unpack (or unpack table.unpack))
(macro enum [...]
  (let [cases [...]
        stringed-cases []]

    (each [i case (ipairs cases)]
      (let [stringed-case (tostring case)]
        ;; (tset kv-pairs adjusted-i [i stringed-case])
        ;; (tset kv-pairs (+ adjusted-i 1) [stringed-case i])
        ;; (tset kv-pairs (+ adjusted-i 2) [(.. stringed-case :?) `#(= $ ,i)])
        (tset stringed-cases i (tostring case))))

    `(let [this-enum# [,(unpack stringed-cases)]]
       (each [k# v# (ipairs this-enum#)]
         ;; this-enum.CASE will return the int
         (tset this-enum# v# k#)
         ;; this-enum.case? will check equality with the int
         (tset this-enum# (.. v# :?) #(= $ k#)))
       this-enum#)))
(local token-types
       (enum string number symbol keyword-string
             opener closer prefix
             whitespace comment))

3.1 Readers

Our tokenizer will take stateful stream of bytes and create a cursor over it.

Our readers are not actually single functions, but rather a table containing a few functions which collect tokens in different ways. All the functions take the same argument. Named peek, it should be a function that allows the reader to check the values of bytes relative to the cursor's current position. We'll be using cursor.peek defined above.

  • readn - takes a peek function, and uses it to check how many bytes it should read. Returns 0 if the reader cannot read a token at the current cursor position, or n where n is the number of bytes to read.
  • read-bytes - takes a cursor, which it advances over the next token, returning the bytes of that token.
  • read-string - takes a cursor which it advances over the next token, returning the token as a string.

We can use a function which takes a readn function as an argument to generate a reader with all the functions just described :

(fn create-reader [readn]
  (fn read-bytes [cursor]
    (let [n (readn cursor.peek)] (cursor.take n)))
  (fn read-string [cursor]
    (-> cursor read-bytes string.char))
  {: readn : read-bytes : read-string})

This allows us to construct readers as follows:

;; Example - not a part of the source code!

(create-reader
 (fn [peek]
   ;; - peek is a cursor's peek function

   ;; This is a normal fennel function body. Its return value should be
   ;; 0 if the reader cannot currently take a token, or a number of
   ;; bytes to take for the next token.
   )

3.1.1 Whitespace reader

The whitespace reader takes or skips all the whitespace bytes at the beginning of its cursor argument's stream. Whitespace is defined as any of the following bytes:

  • 9 (^I, tab)
  • 10 (^J, line feed)
  • 11 (^K, vertical tab)
  • 12 (^L, form feed)
  • 13 (^J, carriage return)
  • 32 (space)
(fn whitespace? [b]
  (and b (or (= b 32)
             (and (>= b 9) (<= b 13)))))

(fn read-whitespace [peek n]
  (let [n (or n 1)]
    (if (whitespace? (peek n 1))
        (read-whitespace peek (+ n 1))
        (- n 1))))

(local whitespace-reader (create-reader (fn [peek] (read-whitespace peek))))

3.1.2 Comment reader

The comment reader is also quite simple. Since Fennel has only line-based comments, we simply check that the initial byte is a semicolon (value 59) and then get all the bytes until the next newline (value 10).

(fn read-comment [peek n]
  (let [n (or n 1)
        peeked (peek n 1)]
    (if (and (= n 1) (not= peeked 59)) 0
        (or (not peeked)
            (and (not= n 1) (= peeked 10))) (- n 1)
        (read-comment peek (+ n 1)))))

(local comment-reader (create-reader (fn [peek] (read-comment peek))))

3.1.3 Symbol reader

The symbol reader is relatively simple. A symbol character is defined as any character except the following:

  • Special characters with charcodes 32 and under (includes whitespace)
  • Delimiters
  • Single and double quotes
  • Commas
  • Semicolons
  • DEL control character

Additionally, symbols cannot begin with any of the following characters, since they are ambiguous with numeric literals:

  • Digits (0-9)
  • Period

To track delimiters, we will use a delims table. Opening delimiters have the corresponding closer as their value. Closing delimiters simply have true.

(local delims {40 41    ;; (
               41 true  ;; )
               91 93    ;; [
               93 true  ;; ]
               123 125  ;; {
               125 true ;; }
               })

(fn delim? [b] (not (not (. delims b))))

Now we can define a function that detects symbol characters based on the above definition:

(fn symbol-char? [b]
  (and b
       (> b 32)
       (not (. delims b))
       (not= b 34)  ;; "
       (not= b 39)  ;; '
       (not= b 44)  ;; ,
       (not= b 59)  ;; ;
       (not= b 127) ;; DEL
       ))

And another function to check for digits, which cannot begin a symbol:

(fn digit-char? [b] (and (> b 47) (< b 58)))
(fn disallowed-symbol-starter? [b]
  (or (not (symbol-char? b))
      (digit-char? b)
      ))

Now that we have that function, we can create a symbol reader easily:

(fn read-symbol [peek n]
  (let [n (or n 1)
        char (peek n 1)]
    (if (and (= n 1) (disallowed-symbol-starter? char)) 0
        ;; a colon followed by symbol chars is a keyword string
        (and (= n 1) (= char 58) (symbol-char? (peek (+ n 1) 1))) 0
        (symbol-char? char) (read-symbol peek (+ n 1))
        (- n 1))))

(local symbol-reader (create-reader (fn [peek] (read-symbol peek))))

3.1.4 Keyword string reader

Keyword strings are strings created by prefixing a symbol with the : character. Because of this, we can re-use the symbol-reader we've just defined to collect the string after skipping the initial : character.

(fn read-keyword-string [peek n]
  (let [n (or n 1)
        char (peek n 1)]
    (if (and (= n 1) (not= char 58)) 0
        (and (= n 2) (not (symbol-char? char))) 0
        (and (> n 2) (not (symbol-char? char))) (- n 1)
        (read-keyword-string peek (+ n 1)))))

(local keyword-string-reader (create-reader (fn [peek] (read-keyword-string peek))))

3.1.5 String reader

Strings in Fennel are delimited with double quotes, which can be escaped within the string using backslashes. Due to this escaping, the string reader is the first to require an explicit state machine within the reader itself. The possible states of this machine are as follows:

  • start: takes the opening quote (erroring if it's not a quote), then transitions to base.
  • base: take string bytes normally, looking for the next double-quote character (byte 34), and adds them to the string. Transitions to backslash if it sees a backslash character (byte 92).
  • backslash: takes and adds the next byte to the string, regardless of what byte it is, then transition back to base.
  • done: close the collection loop and, if collecting, return the collected bytes.
(local string-reader-states (enum start base backslash done))
(fn read-string [peek n state]
  (let [n (or n 1)
        s string-reader-states
        state (or state s.start)]
    (if (= n 0) 0 ; n has been explicitly set to 0
        (= state s.done) (- n 1)
        (let [char (peek n 1)
              (new-state override-n)
              (match (values state char)
                (_ nil) (error "unterminated string")
                (s.start 34) s.base
                (s.start _) (values s.start 0)
                ((s.start ?b) ? (not ?b)) (values s.start 0)
                (s.base 92) s.backslash
                (s.base 34) s.done
                (s.base _) s.base
                (s.backslash _) s.base)
              new-n (or override-n (+ n 1))]
          (read-string peek new-n new-state)))))

(local string-reader (create-reader (fn [peek] (read-string peek))))

3.1.6 Number Reader

The number reader is the most complicated reader, and includes a rather involved state machine to keep track of the state of the reader. In each step, the machine chooses a new step The possible states of this machine are as follows (all transitions other than those explicitly listed will result in an error):

  • start: the reader begins in this state, and chooses which state to transition to based on the first character. Transitions to negate, dec-point, leading-0, or digit.
  • negate: the reader has found a leading hyphen. Transitions to dec-point, leading-0, or digit.
  • dec-point: the reader has found a decimal point. Transitions to exp or dec-digit.
  • hex-dec-point: the reader has found a decimal point in a hex number. Transitions to hex-dec-digit.
  • leading-0: the reader has found a leading zero. Transitions to dec-point, digit, exp, or base-hex.
  • base-hex: the reader has found a hex indicator character. Transitions to hex-dec-point or hex-digit. May not end the number and will cause an error if it is the last character.
  • digit: the reader has found a digit before the decimal point. Transitions to dec-point, digit, or exp.
  • dec-digit: the reader has found a digit after the decimal point. Acts identically to digit except that another decimal point will produce an error.
  • hex-digit: the reader has found a digit in a hex number. Transitions to hex-dec-point or hex-digit.
  • hex-dec-digit: the reader has found a digit after the decimal point in a hex number. Acts identically to hex-digit except that another decimal point will produce an error.
  • exp: the e or E character has been found in a non-hex number, indicating that the number should be summed with 10 to the given power. Transitions to exp-negate or exp-digit. May not end the number and will cause an error if it is the last character.
  • exp-negate: a hyphen has been found immediately following an exponent indicator. Transitions to exp-digit. May not end the number and will cause an error if it is the last character.
  • exp-digit: a digit in the tens-exponent portion of the number has been found. Transitions to exp-digit.
(fn hex-letter-digit-char? [b] (or (and (> b 64) (< b 71))
                                   (and (> b 96) (< b 103))))
(fn hex-digit-char? [b] (or (digit-char? b) (hex-letter-digit-char? b)))
(fn exponent-char? [b] (or (= b 69) (= b 101)))
(fn hex-indicator-char? [b] (or (= b 88) (= b 120)))
(fn number-char? [b]
  (or (digit-char? b)
      (= b 46) ; 0
      (= b 95) ; _
      ))

(fn err-unexpected-char [b message]
  (error (.. "malformed number: unexpected char \"" (string.char b) "\" " message)))

(local number-reader-states
  (enum start negate dec-point hex-dec-point
        leading-0 base-hex digit dec-digit
        hex-digit hex-dec-digit
        exp exp-negate exp-digit))

(fn err-unhandled-state-transition [state b]
  (error (.. "unhandled state transition in number parser!\tstate: " (. number-reader-states state)
           "\tbyte: " (or b "<nil>") "\tchar: " (or (string.char b) "<nil>"))))

(fn err-invalid-number-character [state b]
  (error (.. "invalid char in number: " (string.char b) "\tchar value: " b)))

;; takes a state and byte (which can potentially be nil) and returns a
;; new state. returning :end will end the collection loop, ignoring
;; the final byte that the state machine was called with
(fn number-reader-state-machine [state byte]
  (let [s number-reader-states]
    (match (values state byte)

      ;; --- start ---
      (s.start 45) s.negate
      (s.start 46) s.dec-point
      (s.start 48) s.leading-0
      ((s.start b) ? (digit-char? b)) s.digit

      ((s.start b) ? (exponent-char? b))
      (error "malformed number: unexpected leading exponent char")

      ((s.start b) ? (hex-indicator-char? b))
      (error "malformed number: unexpected leading hex indicator char")

      ;; --- negate ---
      (s.negate 46) s.dec-point
      (s.negate 48) s.leading-0
      ((s.negate b) ? (digit-char? b)) s.digit
      (s.negate b) (err-unexpected-char b "following negation char")

      ;; --- dec-point ---
      ((s.dec-point b) ? (exponent-char? b)) s.exp
      ((s.dec-point b) ? (digit-char? b)) s.dec-digit
      (s.dec-point b) (err-unexpected-char b "following decimal point")

      ;; --- hex-dec-point
      ((s.hex-dec-point b) ? (hex-digit-char? b)) s.hex-dec-digit
      (s.hex-dec-point b) (err-unexpected-char b "following decimal point")

      ;; --- leading-0 ---
      (s.leading-0 45) (error "unexpected hyphen following leading zero")
      (s.leading-0 46) s.dec-point
      ((s.leading-0 b) ? (digit-char? b)) s.digit
      ((s.leading-0 b) ? (exponent-char? b)) s.exp
      ((s.leading-0 b) ? (hex-indicator-char? b)) s.base-hex

      ;; --- base-hex ---
      (s.base-hex 46) s.hex-dec-point
      ((s.base-hex b) ? (hex-digit-char? b)) s.hex-digit
      (s.base-hex b) (err-unexpected-char b "following hex indicator char")

      ((s.base-hex ?b) ? (not ?b))
      (error "unexpected end of number following hex indicator char")

      ;; --- digit ---
      (s.digit 45) (error "unexpected hyphen following digit")
      (s.digit 46) s.dec-point
      ((s.digit b) ? (digit-char? b)) s.digit
      ((s.digit b) ? (exponent-char? b)) s.exp

      ((s.digit b) ? (hex-letter-digit-char? b))
      (error "unexpected hex digit in non-hex number")

      ((s.digit b) ? (hex-indicator-char? b))
      (error "unexpected hex indicator char following digit")

      ;; --- dec-digit ---
      (s.dec-digit 46) (error "unexpected second decimal point")
      ((s.dec-digit b) ? (digit-char? b)) s.dec-digit

      ;; reuse s.digit state for all other cases
      (s.dec-digit ?b) (number-reader-state-machine s.digit ?b)

      ;; --- hex-digit ---
      (s.hex-digit 45) (error "unexpected hyphen following digit")
      (s.hex-digit 46) s.hex-dec-point
      ((s.hex-digit b) ? (hex-digit-char? b)) s.hex-digit

      ((s.hex-digit b) ? (hex-indicator-char? b))
      (error "unexpected hex indicator char following digit")

      ;; --- hex-dec-digit ---
      (s.hex-dec-digit 46) (error "unexpected second decimal point")
      ((s.hex-dec-digit b) ? (digit-char? b)) s.hex-dec-digit

      ;; reuse s.hex-digit state for all other cases
      (s.hex-dec-digit ?b) (number-reader-state-machine s.hex-digit ?b)

      ;; --- exp ---
      (s.exp 45) s.exp-negate
      ((s.exp b) ? (digit-char? b)) s.exp-digit
      (s.exp b) (err-unexpected-char b "following exponent char")

      ((s.exp ?b) ? (not ?b))
      (error "unexpected end of number following exponent char")

      ;; --- exp-negate ---
      ((s.exp-negate b) ? (digit-char? b)) s.exp-digit
      (s.exp-negate b) (err-unexpected-char b "following exponent hyphen char")

      ((s.exp-negate ?b) ? (not ?b))
      (error "unexpected end of number following exponent hyphen char")


      ;; --- exp-digit ---
      ((s.exp-digit b) ? (digit-char? b)) s.exp-digit

      (s.exp-digit b)
      (error "unexpected char \"" (string.char b) "\" following exponent digit char")

      ;; ignore underscores
      (ss 95) ss

      ((_ ?b) ? (or (not ?b) (whitespace? ?b) (delim? ?b))) s.end

      ;; catch all other states
      _ (err-invalid-number-character state byte))))

(fn check-for-number [peek]
  (let [b (peek)]
    (or (digit-char? b) ;; leading digits always indicate a number
        (let [b2 (peek 2 1)]
          (or (and (or (= b 45) (= b 46)) (digit-char? b2)) ;; e.g. -1 or .1
              (let [b3 (peek 3 1)]
                (and (= b 45) (= b2 46) (digit-char? b3)))))))) ;; e.g. -.1

(fn read-number [peek n state]
  (let [n (or n 1)]
    (if (and (= n 1) (not (check-for-number peek))) 0
        (let [s number-reader-states
              state (or state s.start)
              char (peek n 1)
              new-state (number-reader-state-machine state char)]
          (if (= new-state s.end) (- n 1)
              (read-number peek (+ n 1) new-state))))))

(local number-reader (create-reader (fn [peek] (read-number peek))))

3.1.7 Other Readers

(local prefixes {96 :quote 44 :unqote 39 :quote 35 :hashfn})
(local prefix-reader
  (create-reader (fn [peek]
                   (if (and (. prefixes (peek))
                            (let [next-b (peek 2 1)]
                              (not (or (whitespace? next-b)
                                       (= (type (. delims next-b)) :boolean))))) 1
                       0))))

(local opener-reader
  (create-reader (fn [peek]
                   (if (= (type (. delims (peek))) :number) 1
                       0))))

(local closer-reader
  (create-reader (fn [peek]
                   (if (= (. delims (peek)) true) 1
                       0))))

3.2 Building the Tokenizer

Our tokenizer will take a stream of bytes and, using the readers already defined, output a stream of tokens.

To compose different readers together, we will tag each of them with a token type. Then, when reading, we will run the readn of each reader sequentially until we find one that returns a value greater than 0. We will then return the tag that reader was g iven along with the return value of readn. For convenience, we'll also provide similarly tagged versions of read-bytes and read-string.

First, we need a function that will split a series of arguments into two tables, one for the odd arguments and one for the even arguments. This will allow us to iterate through both tables and match the arguments up into pairs. We can implement this recursively:

(local split-values-alternating
  (do
    ;; We use a do block to keep split-values-alternating-recursively
    ;; from being visible outside of split-values-alternating.
    (fn split-values-alternating-recursively [odds evens i odd even ...]
      (tset odds i odd)
      (tset evens i even)
      (if (> (select :# ...) 0)
          ;; If there are arguments left, recurse.
          (split-values-alternating-recursively odds evens (+ i 1) ...)

          ;; If there are no arguments left, return the two tables of
          ;; odd and even arguments.
          (values odds evens)))

    ;; This function allows us to set default arguments for
    ;; split-values-alternating-recursively to use for the first
    ;; iteration.
    (fn [...] (split-values-alternating-recursively [] [] 1 ...))))

Now that we have that function, we can

(fn compose-tagged-readers [...]
  (let [(tags readers) (split-values-alternating ...)
        readn-tagged-inner
        (fn readn-tagged-inner [peek i]
          (let [tag (. tags i) reader (. readers i)]
            (if (= reader nil) (values nil 0)
                (let [n (reader.readn peek)]
                  (if (> n 0) (values tag n)
                      (readn-tagged-inner peek (+ i 1)))))))
        readn-tagged (fn [peek] (readn-tagged-inner peek 1))
        read-bytes-tagged
        (fn [cursor]
          (let [(tag n) (readn-tagged cursor.peek)]
            (values tag (cursor.take n))))
        read-string-tagged-inner
        (fn [tag ...]
          (values tag (string.char ...)))
        read-string-tagged
        (fn [cursor]
          (read-string-tagged-inner (read-bytes-tagged cursor)))
        readn (fn [peek] (select 2 (readn-tagged peek)))
        {: read-bytes : read-string} (create-reader readn)]
    {: readn
     : read-bytes
     : read-string
     : readn-tagged
     : read-bytes-tagged
     : read-string-tagged}))
(local fennel-tagged-reader
  (let [tts token-types]
    (compose-tagged-readers tts.string string-reader
                            tts.number number-reader
                            tts.opener opener-reader
                            tts.closer closer-reader
                            tts.whitespace whitespace-reader
                            tts.comment comment-reader
                            tts.prefix prefix-reader
                            tts.keyword-string keyword-string-reader
                            tts.symbol symbol-reader)))

(fn take-token [cursor]
  (when (cursor.peek)
    (let [n (fennel-tagged-reader.readn cursor.peek)]
      (if (> n 0) (fennel-tagged-reader.read-bytes-tagged cursor)
          (let [(b1 b2 b3) (cursor.peek 3)]
            (error (.. "unrecognized byte sequence [" b1 " " b2 " " b3 "] "
                       "\"" (string.char b1 b2 b3) "\"")))))))

(fn chunk-stream->byte-stream [chunk-stream]
  (var chunk "")
  (var index 1)
  (var done false)
  (fn [...]
    (if done nil

        (<= index (length chunk))
        (let [byte (chunk:byte index)]
          (set index (+ index 1))
          byte)

        (do (set chunk (chunk-stream ...))
            (if (or (not chunk) (= chunk ""))
                (set done true)

                (do (set index 2)
                    (chunk:byte 1)))))))

(fn byte-stream->token-stream [bytes-stream]
  (let [cursor (create-cursor bytes-stream)]
    #(take-token cursor)))

4 Parsing

(fn map-stream [f stream] (fn [...] (f (stream ...))))
(local box-tokens
  (partial map-stream
           (fn [token-type first ...]
             (values token-type (when first [first ...])))))

(global _ENV _ENV)
(global _G _G)
(global setfenv setfenv)
(global loadstring loadstring)
(fn load-code [code environment filename]
  (var environment environment)
  (set environment (or environment _ENV _G))
  (let [filename (or filename :anonymous)]
    (if (and setfenv loadstring)
        (let [f (assert (loadstring code filename))]
          (setfenv f environment)
          f)
        (assert (load code filename :t environment)))))

(fn canonicalize [str]
  (let [formatted (str:gsub "[\1-\31]" #(.. "\\" ($1:byte)))
        load-fn (load-code (: "return %s" :format formatted) nil)]
    (load-fn)))

;; a "form" has the following shape:
;; [form-type ... values]
(local form-types (enum symbol string number sequence table list))
(fn escape-string-for-output [str]
  (str:gsub "[\1-\31]" #(.. "\\" ($:byte))))

(fn first-values [first] first)
(fn rest-values [first ...] ...)
(fn map-values [fun item ...]
  (when (not= item nil)
    (values (fun item) (map-values fun ...))))

(fn concat-strings-with-spaces [first second ...]
  (if (and (not first) (not second)) (values)
      (and first (not second)) first
      (concat-strings-with-spaces (.. first " " second ) ...)))

(local form->string
  (do
    (var form->string nil)
    (fn complex-form->string [form opener closer]
      (.. opener
          (or (concat-strings-with-spaces
               (map-values form->string (unpack form))) "")
          closer))
    (set form->string
         (fn form->string [form]
           (match form.type
             form-types.symbol (. form 1)
             form-types.number (tostring (. form 1))

             form-types.string
             (.. "\"" (escape-string-for-output (. form 1)) "\"")

             form-types.list (complex-form->string form "(" ")")
             form-types.table (complex-form->string form "{" "}")
             form-types.sequence
             (complex-form->string form "[" "]"))))
    form->string))

(local form-methods
  {:push (fn [form child-form]
           (set form.length (+ form.length 1))
           (tset form form.length child-form)
           child-form)})

(local FORM-MT {:__index form-methods
                :__tostring form->string
                :__fennelview form->string})

(fn create-form [form-type ...]
  (let [form [...]]
    (tset form :type form-type)
    (tset form :length (select :# ...))
    (setmetatable form FORM-MT)
    form))

(fn string-form [str]
  (create-form form-types.string (canonicalize str)))
(fn string-form-from-bytes [...] (string-form (string.char ...)))
(fn string-form-from-keyword-string-bytes [colon ...]
  (create-form form-types.string (string.char ...)))
(fn number-form-from-bytes [...]
  (create-form form-types.number (tonumber (string.gsub (string.char ...) "_" ""))))
(fn symbol-form [str] (create-form form-types.symbol str))
(fn symbol-form-from-bytes [...] (symbol-form (string.char ...)))
(fn sequence-form [...] (create-form form-types.sequence ...))
(fn table-form [...] (create-form form-types.table ...))
(fn list-form [...] (create-form form-types.list ...))

(local stack-methods
  {:push (fn [stack empty-form]
           (set stack.length (+ stack.length 1))
           (tset stack stack.length empty-form)
           empty-form)
   :pop (fn [stack]
          (when (= stack.length 0)
            (error "cannot pop stack with length 0"))
          (let [form (. stack stack.length)]
            (tset stack stack.length nil)
            (set stack.length (- stack.length 1))
            form))
   :peek (fn [stack]
           (when (not= stack.length 0)
             (. stack stack.length)))})

(local STACK-MT {:__index stack-methods})

(fn create-stack []
  (local stack [])
  (tset stack :length 0)
  (setmetatable stack STACK-MT)
  stack)

(fn open-form-with-stack [stack bytes]
  (stack:push
   (match bytes
     [40] (list-form)
     [91] (sequence-form)
     [123] (table-form))))

(fn open-prefix-form-with-stack [stack bytes]
  (stack:push
   (match bytes
     [35] (list-form (symbol-form :hashfn))
     [44] (list-form (symbol-form :unquote))
     [96] (list-form (symbol-form :quote)))))

(fn close-form-with-stack [stack bytes]
  (when (= stack.length 0)
    (error (.. "unexpected closing delimiter "
               (string.char (unpack bytes)))))
  (let [form (stack:pop)
        expected-closer (match form.type
                          form-types.list ")"
                          form-types.sequence "]"
                          form-types.table "}")
        closer (string.char (unpack bytes))]
    (when (not= expected-closer closer)
        (error (.. "unexpected closing delimiter " closer
                   ", expected " expected-closer)))
    form))

(local parser-states (enum expecting-form
                           expecting-whitespace
                           expecting-prefixed-form))

(fn token-stream->form-stream [token-stream]
  (let [boxed-token-stream (box-tokens token-stream)
        tts token-types
        fts form-types
        stack (create-stack)
        state-stack (create-stack)
        prefixes-at []
        open-form (partial open-form-with-stack stack)
        open-prefix-form (partial open-prefix-form-with-stack stack)
        close-form (partial close-form-with-stack stack)]
    (var needs-whitespace nil)
    (fn take-form []
      (var should-return nil)
      (var return-value nil)
      (fn dispatch [form]
        (if (= stack.length 0)
            (do (set should-return true)
                (set return-value form))
            (let [parent-form (stack:peek)
                  pushed-form (parent-form:push form)]
              (if (. prefixes-at stack.length)
                  (do (tset prefixes-at stack.length nil)
                      (let [further-form (stack:pop)]
                        (dispatch further-form)))
                  pushed-form))))
      (let [(token-type bytes) (boxed-token-stream)]
        (when (and needs-whitespace
                 (not= token-type tts.whitespace)
                 (not= token-type tts.comment)
                 (not= token-type tts.closer))
            (error (.. "expected whitespace, got "
                       (. token-types token-type))))
        (match token-type
          tts.symbol
          (dispatch (symbol-form-from-bytes (unpack bytes)))
          tts.string
          (dispatch (string-form-from-bytes (unpack bytes)))
          tts.keyword-string
          (dispatch (string-form-from-keyword-string-bytes
                     (unpack bytes)))
          tts.number
          (dispatch (number-form-from-bytes (unpack bytes)))
          tts.whitespace nil
          tts.comment nil
          tts.opener (open-form bytes)
          tts.prefix (do (tset prefixes-at (+ stack.length 1) true)
                         (open-prefix-form bytes))
          tts.closer (let [form (close-form bytes)] (dispatch form))
          nil (do (set should-return true) (set return-value nil)))
        (set needs-whitespace (and (not= token-type tts.opener)
                                   (not= token-type tts.prefix)
                                   (not= token-type tts.whitespace)
                                   (not= token-type tts.comment)))
        (if should-return return-value (take-form))))
    take-form))

(fn string->form-stream [str]
  (-> str
      stateful-string-stream
      byte-stream->token-stream
      token-stream->form-stream))

5 CLI

(local parse-args
  (do
    (fn starts-with [str start] (= start (str:sub 1 (length start))))
    (fn end-of-flags? [a] (= a :--))
    (fn flag? [f] (and f (not (end-of-flags? f))
                       (> (length f) 1) (starts-with f :-)))
    (fn long-flag? [a] (and (flag? a) (starts-with a :--)))
    (fn long-flag-name [a] (string.sub a 3))
    (fn short-flag? [a] (and (flag? a) (not (long-flag? a))))
    (fn short-flag-flags [a]
      (let [ret []]
        (var ret-i 1)
        (each [c (-> a (string.sub 2) (string.gmatch "."))]
          (tset ret ret-i c)
          (set ret-i (+ ret-i 1)))
        ret))
    (fn flag-and-value [f]
      (when (flag? f) (string.match f "^-%-?([^=]+)=(.*)$")))
    (fn parse-args-1 [opts opts-i flags? a b ...]
      (if (not a) opts
          ;; long flags
          (and flags? a (long-flag? a))
          (let [(f v) (flag-and-value a)]
            (if f ;; long flag with interior value
                (do (tset opts f (if (= v nil) "" v))
                    (parse-args-1 opts opts-i flags? b ...))
                ;; long flag with value
                (and b (not (flag? b)))
                (do (tset opts (long-flag-name a) b)
                    (parse-args-1 opts opts-i flags? ...))
                ;; long flag without value
                (do (tset opts (long-flag-name a) "")
                    (parse-args-1 opts opts-i flags? b ...))))
          ;; short flags
          (and flags? a (short-flag? a))
          (let [(f v) (flag-and-value a)]
            (if f ;; short flag with interior value
                (do (each [ff (string.gmatch f ".")]
                      (tset opts ff ""))
                    (tset opts f (if (= v nil) "" v))
                    (parse-args-1 opts opts-i flags? b ...))
                (let [flags (short-flag-flags a)]
                  (each [_ f (ipairs flags)] (tset opts f ""))
                  (if (and b (not (flag? b)))
                      (do (tset opts (. flags (length flags)) b)
                          (parse-args-1 opts opts-i flags? ...))
                      (parse-args-1 opts opts-i flags? b ...)))))
          ;; no more flags
          (and flags? a (end-of-flags? a))
          (parse-args-1 opts opts-i false b ...)
          ;; positional arg
          (do (tset opts opts-i a)
              (parse-args-1 opts (+ opts-i 1) flags? b ...))))
    (fn parse-args [...]
      (parse-args-1 {} 1 true ...))))

(fn print-token [t]
  (-> [(. token-types (. t 1)) (string.char (select 2 (unpack t)))]
      ((require :fennelview))
      print))

(fn file-stream [path chunk-size]
  (let [cs (or chunk-size 8000)
        f (if (= path :-) io.stdin (assert (io.open path)))]
    (values #(f:read cs) #(f:close))))

(fn tokenize-file [path chunk-size]
  (let [(fs close-file) (file-stream path chunk-size)
        token-stream (-> fs
                         chunk-stream->byte-stream
                         byte-stream->token-stream)]
    (each [t #(let [t [(token-stream)]] (when (. t 1) t))]
      (print-token t))
    (close-file)))

(fn formize-file [path chunk-size]
  (let [(fs close-file) (file-stream path chunk-size)
        form-stream (-> fs
                        chunk-stream->byte-stream
                        byte-stream->token-stream
                        token-stream->form-stream)]
    (each [f form-stream]
      (print (form->string f)))
    (close-file)))

(fn help [] (print "
Fennel - The Book

    --help, -h                  display this message
    --tokenize PATH, -t PATH    tokenize a file
    --formize PATH, -f PATH     parse and re-print a file
"))

(when (not= 0 (select :# ...))
  (let [options (parse-args ...)]
    (match options
      {: help} (help)
      {: h} (help)

      {:tokenize path :chunk-size ?chunk-size}
      (tokenize-file path (tonumber ?chunk-size))
      {:t path :chunk-size ?chunk-size}
      (tokenize-file path (tonumber ?chunk-size))
      {:tokenize "" 1 path :chunk-size ?chunk-size}
      (tokenize-file path (tonumber ?chunk-size))
      {:t "" 1 path :chunk-size ?chunk-size}
      (tokenize-file path (tonumber ?chunk-size))

      {:formize path :chunk-size ?chunk-size}
      (formize-file path (tonumber ?chunk-size))
      {:f path :chunk-size ?chunk-size}
      (formize-file path (tonumber ?chunk-size))
      {:formize "" 1 path :chunk-size ?chunk-size}
      (formize-file path (tonumber ?chunk-size))
      {:f "" 1 path :chunk-size ?chunk-size}
      (formize-file path (tonumber ?chunk-size)))))

6 Misc.

6.1 Hashbang

To allow the file to be run as an executable on Linux, we add a hashbang to the first line. As noted above, the tokenizer treats this line as a comment if it is the very first thing in the file.

#!/usr/bin/env fennel

6.2 Utils


7 Book tooling

This section contains Fennel tooling used to create this book.

7.1 JS for HTML output

(js.global.console:log :hello-world)
(js.global:alert "hello")
(js.global.console:log js.global)
nil

7.2 Org table helper

;; Exported to org-table-helpers.fnl

(local fennelview (require :fennelview))

(fn fast-push [t v]
  (set t.__count (+ (or t.__count 0) 1))
  (tset t t.__count v))

(fn fast-length [t] (or t.__count (length t)))

(fn make-table [rows options]
  (let [{: column-headers : column-groups} (or options {})
        column-headers-row
        (and column-headers (= :table (type column-headers))
             column-headers)
        column-widths []
        processed-rows []
        hlines-after {}
        chunks []]

    (var table-cell-width 0)

    (when column-headers-row (table.insert rows 1 column-headers-row))
    (when column-groups (table.insert rows 2 column-groups))

    ;; collect table widths and convert cells to strings
    (each [row-i row (ipairs rows)]
      (local processed-cells [])
      (each [cell-i cell (ipairs row)]
        (let [val (if (= :string (type cell)) cell
                      (fennelview cell {:one-line true}))
              val-width (length val)]

          ;; update column width if it's smaller than the current cell
          (when (> val-width (or (. column-widths cell-i) 0))
            (tset column-widths cell-i val-width))

          ;; update table cell width
          (when (> cell-i table-cell-width) (set table-cell-width cell-i))
          (fast-push processed-cells val)))
      (fast-push processed-rows processed-cells))

    (local table-cell-height (fast-length processed-rows))

    ;; print the cells to the chunks table
    (each [row-i row (ipairs processed-rows)]
      (fast-push chunks "|") ;; left border
      (for [cell-i 1 table-cell-width]
        (let [cell (or (. row cell-i) "")
              cell-width (length cell)]
          (fast-push chunks " ")
          (fast-push chunks cell)
          (local right-cell-padding
                 (math.max 0 (- (. column-widths cell-i) cell-width)))
          (local right-padding (+ 1 right-cell-padding))
          (fast-push chunks (string.rep " " right-padding))
          (fast-push chunks "|") ;; right border
          ))
      (when (not= row-i table-cell-height)
        (fast-push chunks "\n"))
      (when (and column-headers (= row-i 1))
        (fast-push chunks "|")
        (each [column-i width (ipairs column-widths)]
          (fast-push chunks (string.rep "-" (+ width 2)))
          (fast-push chunks (if (= column-i table-cell-width) "|" "+")))
        (fast-push chunks "\n")))

    (table.concat chunks)))

{: make-table :print-table (fn [...] (print (make-table ...)))}

7.3 fennel-the-book.el

;; Bootstrap quelpa
(if (require 'quelpa nil t)
    (quelpa-self-upgrade)
  (with-temp-buffer
    (url-insert-file-contents
     "https://framagit.org/steckerhalter/quelpa/raw/master/bootstrap.el")
    (eval-buffer)))

(setq quelpa-stable-p t)

;; Install quelpa-use-package, which will install use-package as well
(quelpa
 '(quelpa-use-package
   :fetcher git
   :url "https://framagit.org/steckerhalter/quelpa-use-package.git"
   :stable nil))
(require 'quelpa-use-package)
(package-initialize)

(use-package htmlize :quelpa)
(use-package lua-mode :quelpa)
(use-package fennel-mode :quelpa (:stable nil) :after lua-mode)
(use-package org
  :quelpa
  :after (fennel-mode htmlize)
  :config
  (defun ftb/tangle ()
    (org-babel-tangle-file "fennel-the-book.org"))
  (defun ftb/export-html ()
    (setq org-confirm-babel-evaluate nil
          org-html-htmlize-output-type 'css
          org-export-allow-bind-keywords t)
    ;; one of these should work, depending on the org version
    (with-demoted-errors (org-babel-do-load-languages 'org-babel-load-languages '((sh . t))))
    (with-demoted-errors (org-babel-do-load-languages 'org-babel-load-languages '((shell . t))))
    (with-current-buffer (find-file "fennel-the-book.org")
      (font-lock-flush)
      (font-lock-fontify-buffer)
      (org-html-export-to-html nil))))

7.4 Makefile

The makefile is committed to the repo so that make tangle can be run.

.PHONY: tangle
tangle: fennel-the-book.org
        emacs -Q --batch -L "$(shell pwd)" -l 'fennel-the-book' --eval '(ftb/tangle)'

.PHONY: site
site: site/index.html site/fennel-the-book.css site/theme.css

site/index.html: fennel-the-book.html
        mkdir -p site && cp fennel-the-book.html site/index.html

site/fennel-the-book.css: fennel-the-book.css
        mkdir -p site && cp fennel-the-book.css site/fennel-the-book.css

site/theme.css: theme.css
        mkdir -p site && cp theme.css site/theme.css

fennel-the-book.html: fennel-the-book.org
        emacs -Q --batch -L "$(shell pwd)" -l 'fennel-the-book' --eval '(ftb/export-html)'

fennel-the-book.css: fennel-the-book.org
        make tangle

theme.css: fennel-the-book.org
        make tangle

fennel-the-book.fnl: fennel-the-book.org
        make tangle

test.fnl: fennel-the-book.org
        make tangle

.PHONY: test
test: fennel-the-book.fnl test.fnl
        fennel test.fnl -v

7.5 Gitignore

For the .gitignore, we start by ignoring everything then whitelist the specific files we wish to commit:

*
!fennel-the-book.org
!makefile
!fennel-the-book.el
!netlify.toml
!lib
!lib/*
!.gitignore

8 Output

(global unpack (or unpack table.unpack))
;; (macro → [...] `(-> ,...))

<<enum>>

<<stateful-string-stream>>

<<create-cursor>>

<<create-string-cursor>>

<<token-types>>

<<readers>>

<<tokenizer>>

<<parser>>

<<cli>>

{:streams {: stateful-string-stream : create-cursor : create-string-cursor : map-stream}
 :readers {: whitespace-reader : comment-reader : symbol-reader : keyword-string-reader : number-reader : string-reader}
 : chunk-stream->byte-stream
 : compose-tagged-readers
 : token-types
 : byte-stream->token-stream
 : form-types
 : token-stream->form-stream
 : string->form-stream
 : form->string
 }

9 Test output

<<tests-header>>

<<tests-stateful-string-stream>>
<<tests-create-string-cursor>>
<<tests-compose-tagged-readers>>
<<tests-comment-reader>>
<<tests-symbol-reader>>
<<tests-keyword-string-reader>>
<<tests-string-reader>>
<<tests-number-reader>>

<<tests-footer>>

Author: Benaiah Mischenko

Created: 2021-06-17 Thu 19:46

Validate