25

Getting Work Done in Clojure: The Building Blocks

 5 years ago
source link: https://www.tuicool.com/articles/hit/biqQJf2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Getting Work Done in Clojure: The Building Blocks

A Joyful Introduction to Clojure, Part II

<< Part I

ZfMnye3.jpg!web
Photo credit: kryshen.net/photos/

Now that we have our development environment set up, we can actually get to work and build something in Clojure.

In this article, I’ll be assuming that you’ve already gone throughpart I of this guide, so you should already have the joyful_clojure example repository on your local machine. If you haven’t gone through part I yet, now would be a good time.

If you forked and cloned the repository, make sure you get the most recent version by doing the following:

cd /path/to/joyful_clojure
git pull upstream master

If you downloaded the repository as a zip file, then you will need to download it again to get the most recent updates.

Let’s open the second sample project in Atom:

cd /path/to/joyful_clojure/02_clojure_building_blocks
atom .

Next, fire up a Clojure REPL in your terminal:

lein repl

Finally, use the “Proto Repl: Remote Nrepl Connection” command in Atom to connect to your running REPL on port 8081:

77rMneQ.png!web

Now that we’re up and running, let’s dive in and learn how to get work done in Clojure.

Clojure’s Built-in data structures

To be able to get anything done in a new programming language, we have to start with the building blocks. Clojure leans heavily toward a functional programming paradigm, so the primary building blocks of Clojure programs are functions and data .

We’ve already written some functions, so let’s focus on the other building block: data.

Primitive Data Types

Clojure has all of the primitive data types that you would expect: numbers, strings, characters, booleans, regular expressions, and so on.

nil ;; Clojure's equivalent of null
123 ;; An integer
1.23 ;; A double (floating point number)
true ;; A boolean
\H ;; A character
"Hello" ;; A string
#"^Hello.*" ;; A regular expression

If you are familiar with high-level languages like Javascript, Python, or Ruby, you might be surprised that there is a distinction between a “character” and a string of length 1. Clojure runs on the Java Virtual Machine, so it inherits the semantics of strings from Java, where a string is implemented as an array of characters behind the scenes. Roughly speaking, the difference between \H and "H" is the same as the difference between 1 and [1] : A string is a sequence of characters , and the string "H" just happens to be a sequence of length 1.

Immutable Data Structures: Clojure’s Secret Sauce

Clojure has a rich library of standard data structures, and one of Clojure’s signature strengths is that all of them are immutable.

First, let’s look at the vector . This is a sequential data structure that is analogous to a “list” or “array” data structure in most other languages. It is represented by square brackets.

;; A vector of integers
[1 2 3]

You can look up elements by index in a vector using the get function:

femmUnU.gif

You can also append an element to a vector with the conj function, short for “conjoin”:

IjQnmmQ.gif

Vectors are immutable, like all Clojure data structures. Functions like conj that maniputate vectors always return a new vector, rather than mutating the old one:

3Y3qYbR.gif

Notice that my-vector is unchanged, because conj returned a new vector.

A couple of other useful functions: first returns the first element of a vector, while rest returns everything except the first element:

rieMj2R.gif

Technically, rest doesn’t return another vector, it returns a “lazy sequence”, which is represented in the console by parentheses instead of square brackets. We’ll come back to lazy sequences later.

Next the map . This is a key-value structure similar to a Python dictionary , Javascript object , or Ruby hash. It is represented by curly braces.

;; A map representing a person
{"first-name" "Daniel"
 "last-name" "King"}

Something to notice: All of the data structures simply use spaces to separate the elements. Commas are treated as whitespace by the Clojure compiler, and no : is required between the keys and values in a map.

In the map above, I am using strings for the keys, but it’s possible to use any kind of data as a key:

;; A nested map representing a collection of people by id
{1 {"id" 1
    "first-name" "Daniel"
    "last-name" "King"}
 2 {"id" 2
    "first-name" "Jane"
    "last-name" "Smith"}}

Notice that the outer map uses integers as keys, and the inner maps use strings.

Clojure actually has another primitive data type called a keyword , which is similar to a string but used for a different purpose. Keywords are used to represent names that are important within the context of the program, whereas strings are usually used to represent plain text that does not refer to anything else in the program. For performance reasons, maps usually use keywords as keys instead of strings:

;; A map that uses keywords as keys
{:first-name "Daniel"
 :last-name "King"

Like vectors, values can be looked up by key in a map using the get function:

ZrMnI3y.gif

New key-value pairs can be added to a map using the assoc function, short for “associate”:

Y3Y3y23.gif

Key-value pairs can be removed from a map with the dissoc function:

3Azemqb.gif

Sometimes it can be awkward to express more complex computations on the values of a map in terms of get and assoc . For example, suppose I wanted to increment the age on a person:

IjaeAnE.gif

If you’re like me, you probably find that expression more difficult to read than it should be.

A much cleaner way to accomplish this is to use the update function, which allows you to pass in a function that takes in the old value and returns the new one:

ARzEbqf.gif

Notice the fn operator, which creates an anonymous function.

We can make this even cleaner using Clojure’s built-in inc function, which simply increments a number by 1:

BJ7RVn3.gif

Nested maps are extremely common in Clojure, so the map functions get , assoc , and update also have versions that operate on nested maps: get-in , assoc-in , and update-in .

For example, suppose I had a map of people by id , and I wanted to increment the age of one specific person:

URJbeuN.gif

Notice that update-in take a vector of keys instead of a single key as its second argument, representing the path to the specific nested value you want to update. get-in and assoc-in work similarly. Try them out in your REPL!

Clojure also has a set data type, which represents a collection of unique elements. They are represented by curly braces with a preceding # glyph:

;; A set of integers
#{1 2 3}

Sets are unordered , so if you iterate through a set, you are not guarenteed to get the elements in the same order as you put them in. The advantage of sets is that you can check the presence of an element in O(1) time using the contains? function:

;; Returns true in O(1) time
(contains? #{1 2 3} 3)

If you wanted to do the same thing for a vector, you would need to iterate through the elements of the vector to find the element you’re looking for, which takes O(n) time.

Warning: You will be disappointed with the results if you try to use the contains? function to check the presence of an element in a vector . contains? works on both maps and vectors, but it checks for the presence of a certain key, not a certain value . In a vector, the keys are indices , so the contains? function checks for the presence of a certain index:

e6bMby6.gif

The last core data structure in Clojure is the list , which represents sequential data stored in a linked list. A list is represented by elements inside parentheses:

;; A list of integers
'(1 2 3)

Notice the single quote ' before the first parenthesis. This quote means “don’t try to evaluate this as code”. If we didn’t put the quote there, the Clojure compiler would try to interpret this list as a function call to a function named “1”. Obviously there is no such function, so an error would occur.

Usually, vectors are preferred over lists for representing sequential data. The main use case of lists is to represent unevaluated code when you are metaprogramming: Writing code that generates or manipulates other code. We’ll touch on the concept of metaprogramming again later, as it is a particular strength of languages in the Lisp family, including Clojure.

In summary: Clojure offers four main collection types: vectors for sequential data, maps for key-value pairs, sets for collections of unique elements, and lists for unevaluated code. Probably 70% or more of Clojure programming involves manipulating these four core data structures (especially vectors and maps), and the Clojure standard library has a very rich set of generic functions prebuilt for this purpose. Clojurists refer to this as “data-oriented programming”.

Control Flow in Clojure: Avoiding Classic Mistakes

To get any non-trivial work done in a programming language, you have to know how to use if statements, loops, and so on. Clojure handles these concepts in a different way from most other languages, so I want to help you avoid falling in to some of the common traps.

At this point, let’s switch into the main namespace where the sample code is located:

bEzMjyj.gif

Local Variables (…but they don’t vary)

Previously, we discussed that the def operator is used to bind names to values in Clojure:

(def 3-squared (* 3 3))

In most languages, the same syntax is used to create global variables, and function-local variables:

// Javascript
// Global variable
const theNumber = 3;
function add7(num) {
  // Local variable
  const answer = num + 7;
  return answer;
}

However, if you try to do the same thing in Clojure, you will be disappointed with the results. The def operator binds a name to a value at the namespace scope, so if you use it inside a function, the value of that name that you set inside the function will be visible to all of the other functions in that namespace. This is probably not what you want, and it is exceedingly rare to do so in practice.

To create function-local bindings, use the let operator:

// Javascript
function add7(num) {
  const answer = num + 7;
  return answer;
}
;; Clojure
(defn add-7
  [num]
  (let [answer (+ num 7)]
    answer))

Notice that the return value of the function is placed inside the parentheses of the let block, because the local bindings are only visible inside that block. If you try to use a local binding outside of the let block where it was defined, you will get an error like Unable to resolve symbol: answer in this context :

(defn add-7
  [num]
  (let [answer (+ num 7)]) ;; WRONG, the let block ends here
  answer)

You can declare as many local bindings as you want, and each one can depend on previously-declared bindings:

(defn hypotenuse
  "Calculates the hypotenuse of a right triangle"
  [side1 side2]
  (let [a-squared (* side1 side1)
        b-squared (* side2 side2)
        c-squared (+ a-squared b-squared)]
    (Math/sqrt c-squared)))

Conditional logic

I mean, you can’t have a programming language without if , right?

Clojure has an if expression, as you would expect. It works roughly the same as the ternary operator  ? in other languages:

;; Determines if a number is even
(defn even?
  [num]
  (if (= (mod num 2) 0)
    true
    false))

Notice the general format: (if condition branch-1 branch-2) . The entire expression evaluates to branch-1 if the condition is true, and branch-2 if not.

Hold on, though, because there are a few gotchas that can trip you up if you’re coming from other languages.

Gotcha #1 : Truthiness

Many languages have a notion of “truthiness”, where they treat certain values as being “true” or “false” when used in a logical expression. Annoyingly, languages disagree as to which values are “truthy”. For example, Python and Javascript agree that 0 is falsey and all other numbers are truthy, but Python treats an empty list [] as falsey, whereas Javascript treats it as truthy.

Clojure takes a (subjectively) more principled view on this matter: false and nil are falsey, and anything else is truthy. This can cause some unexpected behavior if you are used to relying on the quirks of other languages:

(if number-of-users
  (println "We have users")
  (println "We have no users"))

The example above would always print “We have users”, even if number-of-users is 0. The correct way would be to explicitly check if number-of-users is non-zero:

(if (> number-of-users 0)
  (println "We have users")
  (println "We have no users"))

Gotcha #2:Doing multiple things on a branch

One limitation of Clojure’s if expression is that you can only put one expression on the if-branch and the else-branch. This usually isn’t a problem, but on rare occasions, you want to be able to do multiple things on one of the branches.

Suppose you wanted to be able to print out 2 different messages on the else-branch. Naively, I might try something like this:

;; This is wrong
(if (> number-of-users 0)
  (println "We have users")
  (println "We have no users")
  (println "We should probably go find some."))

Unfortunately, this results in an error, because the if expression is only expecting 3 arguments: The condition, the if-branch, and the else-branch.

To remedy this, we can use the do operator, which combines multiple expressions into one:

;; This is correct
(if (> number-of-users 0)
  (println "We have users")
  (do
    (println "We have no users")
    (println "We should probably go find some.")))

This is pretty rare in practice, but still useful to be aware of.

Gotcha #3:More than 2 branches

In many languages, you can do something like this if you need more than 2 branches in your logic:

// Javascript
function isTeenager(age) {
  if (age < 13) {
    return false;
  } else if (age > 19) {
    return false;
  } else {
    return true;
  }
}

In Clojure, trying to do the same thing leads to many levels of nesting:

(defn is-teenager
  [age]
  (if (< age 13)
    false
    (if (> age 19)
      false
      true)))

This works, but it’s not very easy to read. Clojure provides a convenient operator called cond which allows us to express any number of logical branches without going overboard on nesting:

(defn is-teenager
  [age]
  (cond
    (< age 13) false
    (> age 19) false
    :else true))

Using cond is completely equivalent to nested if expressions, but it’s convenient nonetheless.

Loops and iteration

Now for the real meat. This is where Clojure diverges the most from other languages.

Suppose that we have a list of users, and we want to get the ids of all the users who are 21 or older. We would most likely represent the list as a vector of maps :

(def users
  [{:id 1 :age 31
    :first-name "Daniel" :last-name "King"}
   {:id 2 :age 16
    :first-name "Angel" :last-name "Herrera"}
   {:id 3 :age 31
    :first-name "Jane" :last-name "Smith"}
   {:id 4 :age 20
    :first-name "Ruth" :last-name "Langley"}])

We define a function get-eligible-user-ids , which would ideally return the vector [1 3] if we pass in these users.

In most languages, we would probably do something like this:

for

Something like this:

// Javascript
function getEligibleUserIds(users) {
  const results = [];
  for (const user of users) {
    if (user.age >= 21) {
      results.push(user.id);
    }
  }
return results;
}

Clojure has an operator called doseq (short for do-sequence) which does roughly the same thing as a for-in or for-of loop in most languages (side note: Clojure also has a for operator, but it does something else). To get started, we could try just printing out the id of each user:

IJfQB3B.gif

So far so good. As a next step, we can try to emulate the Javascript solution using a combination of let , doseq , if , and conj (spoiler alert: this approach looks reasonable, but it is wrong ):

;; WARNING: This is wrong
(defn get-eligible-user-ids
  [users]
  (let [results []]
    (doseq [user users]
      (if (>= (get user :age) 21)
        (conj results (get user :id))))
    results))

But then, when I try the function, I get an empty vector [] instead of the result I wanted, [1 3] :

JJbiMfq.gif

The problem is that the results vector is immutable! When we try to add an id into results using conj , we are actually creating a new vector, leaving results unmodified. Therefore, when we return results at the end, it is still empty.

Most languages rely on mutability to accomplish basic tasks. Clojure’s focus on immutability is one of its best qualities, but it requires that we rethink our approach. Instead of using imperative loops to accomplish this task, we are going to use recursion .

Clojure provides a structure called loop/recur which allows us to perform a recursive computation. To print out the ids of every user in our list, we would do something like this:

(loop [remaining users]
  (if (empty? remaining)
    nil
    (let [user (first remaining)]
      (println (get user :id))
      (recur (rest remaining)))))

There’s a lot going on here, so let’s break it down.

  1. The loop operator defines a local name remaining and initializes this name to the value users  , exactly like the let form would do.
  2. We check to see if there are any users left in remaining  . If not, we are done, and we can return nil  .
  3. Otherwise, we take the first user from remaining and bind it to the name user
  4. We print out the id of the user.
  5. We call the recur operator to jump back to the beginning of the loop. The next time through the loop, the local names established at the beginning are bound to the values that we pass into recur  . Here, the current value of (rest remaining) is bound to the name remaining on the next iteration.

Using this structure, we can write a correct version of the get-eligible-user-ids function:

(defn get-eligible-user-ids-1
  [users]
  (loop [remaining users
         results []]
    (if (empty? remaining)
      results
      (let [user (first remaining)
            next-remaining (rest remaining)
            next-results (if (>= (get user :age) 21)
                           (conj results (get user :id))
                           results)]
        (recur next-remaining next-results)))))

Thankfully, this gives the correct results (try it in your REPL to be sure).

At this point, you might be thinking that this is a lot of work to get a simple result.

You are correct! loop/recur is there when you need it, but it’s rare to need it in practice. The Right Way to do this particular task in Clojure is to use map and filter .

In Javascript, you can accomplish this task by doing the following:

function getEligibleUserIds2(users) {
  return users
    .filter((user) => user.age >= 21)
    .map((user) => user.id);
}

It looks much the same in Clojure:

(defn get-eligible-user-ids-2
  [users]
  (map (fn [user] (get user :id))
    (filter (fn [user] (>= (get user :age) 21)) users)))

Notice that the map and filter functions take the sequence to be operated on as the last argument, like (filter predicate sequence) . As a result, the Clojure version needs to be read inside-out, like a math expression, as opposed to the Javascript version, which can be read top-to-bottom.

If this bothers you (as it probably does), then hang on: By the time we’re done, it’s going to be much more readable.

One interesting thing about Clojure’s implementation of map and filter is that they return lazy sequences, rather than concrete vectors. This means that even if you chain multiple calls to map and filter , none of the computation is actually done until you try to look at the elements in the resulting sequence. Therefore, chaining many calls to map and filter is much more computationally efficient in Clojure than it is in Javascript, where each intermediate map or filter immediately transforms the entire list, resulting in a lot of extra arrays being created and then thrown away.

On the other hand, there are times when you don’t want a lazy sequence, you want a concrete vector. In these cases, you have a couple of options.

Option 1: You can use reduce to manually construct an output vector:

(defn get-eligible-user-ids-3
  [users]
  (reduce conj []
          (map (fn [user] (get user :id))
               (filter (fn [user] (>= (get user :age) 21)) users))))

Option 2: You can use the (into collection1 collection2) function, which iterates through collection2 and appends each element to collection1 :

(defn get-eligible-user-ids-4
  [users]
  (into []
        (map (fn [user] (get user :id))
             (filter (fn [user] (>= (get user :age) 21)) users))))

This will iterate through the lazy sequence produced by the chained map and filter calls, and append each element to the (initially) empty vector [] .

Option 2 is more idiomatic, both because it is simpler, and because into has some internal optimization which makes it perform better than reduce in most cases.

This works, but it’s getting hard to read again, due to all the nesting. Luckily, it is possible to write this in a top-to-bottom fashion using the magic of threading macros .

Prettifying the Code With Threading Macros

Earlier, we very briefly touched on the idea of metaprogramming , or writing code that generates or manipulates other code. Clojure primarily implements this idea with macros , which are functions that transform unevaluated code.

We’re not going to be writing any macros yet, but I want to explore a couple of the macros that are provided in the Clojure standard library, since they are so commonly used. In particular, I want to look at the threading macros , which allow us to write code in a more readable, top-to-bottom style.

First, let’s look at the thread-last macro, represented by the ->> operator (yes, it’s an arrow with two arrowheads).

Try evaluating the following in your REPL:

aANJr2i.gif

What the heck is going on here? Well, remember that the word “REPL” stands for “read-eval-print loop”, because when you enter some text into the REPL, the Clojure compiler first calls read-string to transform the text into unevaluated code , then calls eval on the unevaluated code to run it on the JVM and produce a result, then calls print to print the result out in the terminal.

aMzQR36.png!web

There is actually a fourth step in this process, called macro expansion , where any macros present in the code are evaluated. Since macros are functions that transform unevaluated code , the macro expansion step occurs between the read and eval steps, while the code is still unevaluated:

eqEBfaY.png!web

I guess the acronym RMEPL never really caught on.

We can isolate the macro expansion step by using the macroexpand operator, which will help us understand what’s really going on with the ->> macro (notice that I put a ' before the ->> expression inside of macroexpand ):

z222aeq.gif

Notice that the ->> macro takes the first expression and inserts it as the last argument to the next expression .

You can also call the ->> macro with more than 2 arguments:

Uv6vAjn.gif

Notice that the first expression is inserted as the last argument to the next expression, then that is inserted as the last argument to the following expression.

Here’s the key takeaway: We can use the ->> macro to rewrite nested function calls in a sequential, top-to-bottom way .

This is exactly what we need to make the get-eligible-users function more readable.

Here’s what the get-eligible-users function would look like if we re-write it with the thread-last macro:

(defn get-eligible-user-ids-5
  [users]
  (->> users
       (filter (fn [user] (>= (get user :age) 21)))
       (map (fn [user] (get user :id)))
       (into [])))

Really pause and make sure you understand what’s going on here. I’ll put in some commas to make it a bit clearer (remember, the Clojure compiler ignores commas):

(defn get-eligible-user-ids-5
  [users]
  (->> users
       (filter (fn [user] (>= (get user :age) 21)) ,,,,)
       (map (fn [user] (get user :id)) ,,,,)
       (into [] ,,,,)))

The ->> macro takes the first expression, and inserts it as the last argument in the second expression (where the commas are), which is then inserted as the last argument of the third expression, and so on.

The result is that the body of the function is completely equivalent to get-eligible-user-ids-4 , where we nested the map and filter calls:

VnmAbiq.gif

The thread-last ->> macro is extremely common in Clojure code, so it’s really really important to get comfortable with it.

There is also a thread-first macro, represented by the -> operator, which inserts each expression as the first argument of the next expression:

6jq6juA.gif

This macro is useful when you want to make multiple updates to a data structure in one operation, as in this add-new-user function:

(def users-table
  {:users-list ["123" "456"]
   :users-by-id {"123" {:id "123"
                        :first-name "Daniel"}
                 "456" {:id "456"
                        :first-name "Jane"}}})
(defn add-new-user
  [users-table user]
  (let [id (get user :id)]
    (-> users-table
        (assoc-in [:users-by-id id] user)
        (update :users-list conj id))))

Putting it all together

Okay…that’s enough for now.

Before we move on, it’s important to practice what you’ve learned.

There are a few exercises in the main namespace in the 02_clojure_building_blocks project. Try applying what you’ve learned to create those functions.

The solutions are in the solutions namespace. You know the drill…don’t look at the solutions before you’ve given the exercises a try!

However, do look at the solutions once you’re finished. I included several solutions to some of the question, to demonstrate some common Clojure idioms that we didn’t have time to discuss in this article.

Some hints to get you started:

The Clojure cheatsheet is a super-useful resource. I recommend having it open at all times (but don’t worry if you don’t understand everything in there yet)

Clojure has a built-in library called clojure.string , which contains a number of useful functions for working with strings . This library is imported at the top of the main namespace with the following invocation:

(ns main
  (:require [clojure.string :as string]))

This allows you to call functions in this library by prefixing them with string/ :

(string/upper-case "hello")
=> "HELLO"

In the next article, we’ll move beyond the basic features of the Clojure language and learn how to actually build a useful application with the Clojure ecosystem .

<< Part I


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK