Writing Lisp

2019-12-02

This post is an introduction to programming in Clojure, a Lisp dialect, including how to code and how to think about coding.


Recap

Earlier, we explored what Lisp is, some of its core elements, its origins, and its impacts on the computer science community. You also got a little sneak peek of what Lisp code looks like and how to write some. Hopefully you took advantage of the interactive code samples and played around with them to see what would happen. Now we will delve further into writing Lisp code and how that may differ from languages you've experienced in the past (not only in the syntax of course, but also in the paradigm and common idioms).

So let's begin!

Note: This tutorial focuses on Clojure, which has some particular nuances separate from other Lisps, but many of the principles remain the same.

Hello World

Any language tutorial worth its salt begins with a Hello World example, a tradition dating back to The C Programming Language. So we're going to begin with just that.

(println "Hello, world!")

You saw something very similar in the last post, but here we'll dissect it more thoroughly, piece by piece.

We begin with (, the opening parenthesis. This denotes the start of a list literal. The end is denoted by a matching ) All of the terms between the parenthesis – including any nested lists, which we don't have here but will later – are elements in the list we've denoted.

Inspecting the list, we see that it begins with the name (or symbol in Clojure) println. These names are similar to variable names in other languages, but we shy away from making anything actually variable in Lisp; in other words, most things are immutable and don't change their value. The next, and last, element in the list is just the string "Hello, world!". That's our entire list.

It's pretty boring on its own. The magic happens when we evaluate it.

Lisp works by evaluating forms, which are essentially just values. Some refer to forms as "valid code". Forms include strings, integers, lists, and so on. When evaluating lists, a recursive strategy is used – evaluate each of its elements, then evaluate it as a whole.

The first element to evaluate is the symbol println. Symbols are evaluated by resolving them, or finding the value to which they are associated (just as variables are associated with values). When the Clojure runtime performs a lookup on println, it discovers a function in the Clojure standard library. Excellent!

The second element is very easy to evaluate. Many things simply evaluate to themselves, completely unchanged. Strings are one of them, so we simply get back "Hello, world!" post-evaluation.

Great! We have a function and we have a string. Now what?

We evaluate the list as a whole of course! This is done by taking the first element (well, the result of evaluating it anyway) and applying it as a function to the remaining elements (again, their evaluated forms). We apply the function we retrieved earlier from println to "Hello, world!". A whole bunch of magic (that I could write pages upon pages upon pages about) happens inside the println function that displays the input text and then returns nil.

Whod've thought that something as elementary as Hello World could be so complicated? There are a lot of moving parts to it, so reread the section if you have to of course. The evaluation strategy described above proves indispensible to the majority of Lisp's functionality.

Definitions

Running computations is great, but what good are they if you can't store them? Well we'll get into that now!

To associate a name (a symbol) to some value, you would write code such as the following

(def foo 1)
(def bar (+ 1 2 3))
(println foo bar)

def simply defines a new symbol-to-value association. The value can then be referenced later by using the bound name.

If you want to define a function, you use defn ("define function"). For example

(defn times-2
  [n]
  (* n 2))
(times-2 5)

defn takes a symbol for naming the new function, a vector of symbols for the function's parameters, and finally the function body. If the body is composed of multiple forms, the result of evaluating the last one is the function's return value. Vectors are similar to lists, but have some core differences. For one, when they are evaluated, they evaluate to themselves. They are not used to represent function invocations. Additionally, at a lower level, vectors are more like arrays and provide performant random element access while lists are linked lists and do not provide this performance. Here, the times-2 definition uses a vector containing just the symbol a which, during invocations of times-2, will be bound to whatever argument value was passed in (such as 5 above).

(defn constant
  []
  (println "hi!")
  "bye!")
(constant)

Here's another example. In this case, the function takes no parameters (empty vector) and has two forms in the body. After invoking println, the function always yields the string "bye!".

The third type of definition is more temporary, using let expressions. With these, we bind names to values, but only temporarily. Outside the scope of the expression, that binding is gone.

(let [string "hello there!"]
  (println (count string)))
;;string

Above, we temporarily bind string to a string and print out its length. But once we're finished with the let, the binding is gone and an error occurrs when trying to access it. (In the context of the above interactive snippet, there is a warning and string evaluates to nil because of how the Clojure code is compiled and run in your browser, but the point is that the binding is truly gone). Also, just like with functions, the last form in a let is used for the final evaluated result.

One final thing. Notice that all of the code above used lists! Lists are fundamental to Lisp code – everything uses them!

Control Flow

Now that we've covered some basics, let's get into control flow, which occurs with simple if-else branching as well as looping.

If Expressions

The first type of control flow we'll discuss is the if expression, which takes a test form, then form, and optional else form. The test form is evaluated. If it is determined to be truthy (more on this later), then the then form is evaluated. Otherwise, if an else form was provided, that gets evaluated. If there was no else form, the if expression evaluates to nil. Below are some examples.

;; Test case is true, so then form is evaluated
(if true
  (println "First is true"))
;; Test case is true, so then form is evaluated
(if true
  (println "Second is true")
  (println "Second is false"))
;; Test case is false, so else form is evaluated
(if false
  (println "Third is true")
  (println "Third is false"))
;; Test case is false and no else form is provided, so the result is nil
(if false
  (println "Fourth is true!"))

Notice that I've been talking about evaluation and saying expression rather than statement. That's because these if expressions actually evaluate to values -- whatever value was yielded by the selected branch. The term statement relates to imperative programming, which involves lots of state changing and mutability. We're trying our best to stick to functional programming, which means our code avoids state changes and side-effects, instead being used almost solely for the value it evaluates to. This goes along with sticking to pure functions, as I mentioned in the last post. if expressions can be used for side-effects (as done in the above examples with println) but they are expressions first and foremost. You can compare them to the ternary operator in some C-like languages (i.e. test ? then : else).

A form is truthy if it is not false and it is not nil. Everything else is truthy, no matter what.

Looping

The other main form of control flow is looping. Whether you're writing a loop in Lisp or some other language, it's the same basic concept, but you may be surprised by how you write the loops. Whereas you may be used to code like

List<Integer> ints = List.of(0, 1, 2, 3, 4);
List<Integer> reversed = new ArrayList<>();
int i = ints.size() - 1;

while (i >= 0) {
    reversed.add(ints.get(i));
    i--;
}

This is a very imperative approach and uses that dreaded mutability that we're working very hard to avoid! Just look at the hideous add and the reassignment too! How can we avoid such unsightly code?

The answer is recursion!

(let [ints [0 1 2 3 4]]
  (loop [i (dec (count ints))
         reversed []]
    (if (< i 0)
      reversed
      (recur (dec i) (conj reversed (get ints i))))))

This is a bit of a complicated example but we'll go through it step by step of course. We start with a let that simply sets up ints, a vector of the numbers zero through four.

Inside of this is a loop, which sets up a recursive loop. loop first takes a vector containing symbols and initial values to bind them to. We bind i to the last index of ints and reversed to an empty vector.

Next is the body of the loop. We check if i is less than zero. If so, we're finished and can return reversed. Otherwise, we need to continue looping until we're complete. We use recur for this. recur returns to the nearest recursion point (such as loop or a function definition), rebinding the names to new values. The first new value is pretty simple - just one subtracted from i. The second value, which will later be rebound to reversed, is a new vector created by appending the value at index i in ints to the current value of reversed. get retrieves the value, conj creates a new vector by appending.

Once these values are computed, we recurse and the loop reruns the same code with the new bindings. When i is zero, recur is invoked one last time with -1 and the complete reversed vector. In the next iteration of the loop, the recursive base case is satisfied and the reversed vector is returned.

Notice that nowhere did we actually mutate anything – no variables were reassigned and no vectors were changed. We derived new values and bound them to the symbol names in each recursive invocation.

We can also write the above code as a function, and change the style to be even more functional.

(defn reversed
  [xs rev]
  (if xs
    (recur (next xs) (cons (first xs) rev))
    rev))
(reversed [0 1 2 3 4] [])

We create a function reversed that takes xs (a list, vector, etc. to reverse) and rev (the reversed version). If xs is truthy (i.e. not nil and not false), then we return the value of a recursive call. In the recursive invocation, xs is bound to a new sequence containing every value except the first, or nil if none are left (via next). For rev, we use cons to create a new sequence with ``xs``'s first value prepended to rev. Notice that the final result is a list, not a vector, even though we passed in a vector. cons always yields lists (many Clojure functions do, in fact, even when passed in a vector).

This code also follows a more functional approach by avoiding indices and just working on the heads (the fronts) of our sequences, adding and dropping elements with each recursive call. This makes it easier to reason about our code, especially because this approach focuses on what we want rather than more nitty gritty details like indices.

Recursion is often a more elegant way to express a number of functions or solutions to problems, so try to use it whenever possible.

Higher-Order Functions

The final topic of discussion is higher-order functions and all of Clojure's related facilities for dealing with them.

First, we must highlight that Lisps treat functions as values, just like numbers or lists. The easiest way to highlight this is with anonymous (unnamed) functions, which are essentially function literals. They take the form (fn [args...] body). See the following example.

((fn [xs] (next (butlast xs))) [0 1 2 3 4])

The outermost list contains two elements, an anonymous function and a vector. This anonymous function returns a list containing every element of its argument except the first and last. When the above list is evaluated, the anonymous function is applied to the vector, producing (1 2 3). We could easily rewrite the above code as follows:

(defn ends-dropped
 [xs]
 (next (butlast xs)))
(ends-dropped [0 1 2 3 4])

They are roughly equivalent, except that the first never defines a named function, whereas the second does. Another way to show Lisp's support for functions as values is to evaluate a function.

(fn [])

When we evaluate the anonymous function above (which receives nothing and yields nil), the environment tells us that it's value of the function type. Neat! Now that you understand how functions are values, we can delve into higher-order functions, which receive functions as arguments or return them. We covered apply in the last post; now let's discuss map.

map takes a function and a sequence and returns a new sequence where each element has been replaced with the result of applying the function to the element. To convert a list of values to their string representations (with the str function), we use the following

(map str [0 "hello" [1 2 3]])

For each element in the given vector, str was applied to it and a new list was constructed out of these string representations.

Some other useful higher-order functions include comp and partial.

comp composes all functions passed to it, feeding the output of one to the input of the next. Take for example

(def increment-str (comp str inc))
(increment-str 5)

By using comp, we create a function that increments a number and returns the string representation of the new number. When using comp, every function except the last supplied (which is the first to be invoked) must support taking a single argument, since they will just be passed the output of the previous function.

partial will partially apply a function to some arguments, creating a new function in the process. When this new function is applied to some arguments, the net result is applying the original function to the partially applied arguments and the new arguments. If you wanted to add a prefix string to some other strings, you could do the following

(map (partial str "prefix_") ["hello" "world" "yes" "no"])

str is partially applied to the string "prefix_" to create a new function that map uses. When we map over the elements of the vector, str is "completely" applied to "prefix_" and the given element.

The overall benefit of higher order functions is that they provide another level of abstraction, which means you will write less code and the code that you do write will be clearer and more elegant. That is the essence of Lisp, after all!

Allow me to demonstrate this abstraction with map. In some other language, if you wanted to create a new list of values based on some initial list, you might write something like the following

List<Integer> ints = List.of(0, 1, 2, 3, 4);
List<Integer> incrementedInts = new ArrayList<>();

for (int n : ints) {
    incrementedInts.add(n + 1);
}

Now look at the same code in Clojure

(map inc [0 1 2 3 4])

Much shorter! And much clearer too! You don't need the boilerplate of creating two variables and a for loop and invoking add and all that extra junk. Now you might argue that having two variables isn't bad, but there are many cases where the mapped sequence is just an intermediary value that you use to derive another value, and declaring an extra variable might not really be necessary or even helpful. map, like many other higher-order functions, abstracts over some process. Here, it's the process of building up a new sequence; you needn't concern yourself with iterating over the initial sequence, just with how to derive new values from old ones (the mapping function).

Conclusion

In this post, you have learned some of the basics of Clojure and what facilities it provides you with. Furthermore, you have begun to think in a functional style, which is probably unfamiliar to you if you're coming from a language like Java or C. Did you like it? Me too! Functional programming can really simplify a lot of code and make it clearer. Higher-order functions make this most explicit, as they reduce a lot of boilerplate, but recursive style is great for this too! In the next post, we will discuss Lisp's metaprogramming facilities – in other words, you will write code that writes code.

Here is an excellent talk drawing parallels between the simplicity of Clojure and of hand tools for woodworking. I highly recommend watching it!