0

My understanding of reducers after EuroClojure

 2 years ago
source link: https://thegeez.net/2012/06/12/euroclojure_reducers.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

My understanding of reducers after EuroClojure

12 Jun 2012

At EuroClojure Rich Hickey gave a second, unscheduled talk about the new reducers library. The talk clarified a few points for me, that I didn't initially get from the explaining blogposts on clojure.com/blog. For the Amsterdam Clojure Meetup I wrote down these notes:

Reducers vs seq/lazy-seq api

This is mostly explained in the first blogpost: Reducers - A Library and Model for Collection Processing.
The reducers are partly about de-complecting representation, order and mechanism. For example: the map reducer is only a recipe to apply a function to each item in the collection. Compare the following:

[2 3 4] === 
(vec (map inc [1 2 3])) === 
(into [] (r/map inc [1 2 3])) === 
(reduce conj [] (r/map inc [1 2 3]))

(here map is the clojure.core/map, r/map is the new clojure.core.reducers/map)

A reducer is only the recipe step of transforming a collection. Building a collection or a result is an explicit, separate reduce step. The building of the collection and the what to do with the input collection are separate here. When you replace r/map with a composition of various reducers the actual construction of the results is only the final step. Compare this to the intermediate lazy-seq results that are produced when composing the old seq functions.

Reducers and fold for parallelism

This is mostly explained in the second blogpost: Anatomy of a Reducer.
The split between building results and the reducer recipes also opens up some nice possibilities for parallelism, through the fork/join framework.

The example Rich gave was (not verbatim this, but close enough):

(def v (vec (range 100000000000)))
(reduce + (map inc v)) vs.
(reduce + (r/map inc v)) vs.
(r/fold + (r/map inc v))

I think the timings were: reduce+r/map 2x fast as reduce+map, and r/fold+r/map 4x fast as reduce+map.

Fold is simply the name for the new parallel reduce function, it is not meant to refer to a function with the same name in other languages.

In the example the fold function is not passed a combiner. The plus function is used for the combiner here as well. The seed for the reduce at the leafs is the combiner function called with 0 arguments.

Fold will try to do the computation in parallel using fork/join, when the input collection that is asked to apply the reducer to itself supports this and when the reducer supports this. The check for support is done through protocols: for the datastructures: PersistentVector extends CollFold, for the reducers: r/map is defined as a folder, while r/take-while is defined as a reducer (and does not support fold, because partitioning does not make sense for this computation). When parallel fold is not supported, then fold will just do a reduce. See the implementation for details: reducers.clj

A common approach for parallel processing is to use map+reduce. The work is divided into partitions and map is applied to each partition and the intermediate results are combined with reduce. In fold the approach is reduce+combine. The work on the smallest partition is done with a reduce rather than with map. Compare the two approaches by trying to express filter in map+reduce versus reduce+combine. It appeals to the functional programming sensibilities that reduce is a better fit for most operations than map.

Fold works nicely with Clojure datastructures. Many datastructures in Clojure are trees, which can be easily partitioned. As Rich explained: explicit parallel datastructures (as found in other languages) make no sense because being parallel is a property of an algorithm, not the data.

The ultimate benefit of the reducers library is this: In the past you could make your programs faster if you simply waited a while for faster hardware (Moore's law). Fold and reduce+combine bring back that promise for data processing with more cores, rather than faster cpu's.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK