## Briefly Haxl

I’ll be digesting the Haxl paper here with the intent of understanding the mechanisms better and sharing an abbreviated form of the information with you. Check the first section below for the fast lane. Read beyond that for a brief look at some of the mechanisms used to implement this idea.

## TL;DR Version

The shortest summary of the ideas that I can give are:

• A Fetch monad is defined that automatically extracts concurrency by using Applicative and Monad
• Concurrency is implicitly enabled by defining a clever Applicative
• Dependent computations are made to block implicitly by leveraging Monad
• Requests are interpreted using a Free monad construction
• Users of the API are required to implement their own Request type
• The concurrency strategy is left up to users for maximum flexibility
• Requests are memoized for correctness - two identical Requests in flight should always return the same result
• It behaves well in the face of exceptions
• It is written to be used in a strictly read-only context
• It is expected to also be able to work in a strictly write-only context

The final interface for Fetch looks like this:

data Request a = undefined -- application-specific type

-- what the Fetch API implements
data FetchStatus a
= NotFetched
| FetchSuccess a

data BlockedRequest =
forall a . BlockedRequest (Request a)
(IORef (FetchStatus a))

newtype DataCache =
DataCache (forall a . HashMap (Request a) a)

newtype Fetch a = Fetch {
unFetch :: IORef DataCache -> IO (Result a) }

dataFetch :: Request a -> Fetch a
runFetch :: Fetch a -> IO a

data Result a
= Done a
| Blocked (Seq BlockedRequest) (Fetch a)
| Throw SomeException

return a = Fetch $\ref -> return (Done a) Fetch m >>= k = Fetch$ \ref -> do
r <- m ref
case r of
Done a       -> unFetch (k a) ref
Blocked br c -> return (Blocked br (c >>= k))
Throw e      -> return (Throw e)

instance Applicative Fetch where
pure = return

Fetch f <*> Fetch x = Fetch $\ref -> do f' <- f ref x' <- x ref case (f', x') of (Done g, Done y) -> return (Done (g y)) (Done g, Blocked br c) -> return (Blocked br (g <$> c))
(Done g,        Throw e)       -> return (Throw e)
(Blocked br c,  Done y)        -> return (Blocked br (c <*> return y))
(Blocked br1 c, Blocked br2 d) -> return (Blocked (br1 <> br2) (c <*> d))
(Blocked br c,  Throw e)       -> return (Blocked br (c <*> throw e))
(Throw e,       _)             -> return (Throw e)

Read below for more details on how this all works.

## The Fundamentals: Monad, Functor, Applicative, and a Caveat

I won’t go into detail here as to what these abstractions mean, or how they are used in general. I reproduce their interfaces below to make it more apparent how elegant the Fetch monad instance is.

class Functor f where
fmap :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
pure  :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b

return :: a -> f a
(>>=)  :: f a -> f (a -> f b) -> f b

ap :: (Monad m) => m (a -> b) -> m a -> m b
ap mf mx = do f <- mf; x <- mx; return (f x)

Because of a historical quirk, Applicative is not a superclass of Monad in Haskell. This is something that’s going to be resolved soon, likely in time for GHC 7.10. A consequence of this is that programs leveraging Haxl that use do-notation heavily may not exploit as much concurrency as if they used applicative-style.

An important note, derived from the above, is that the correct implementation of Applicative is:

instance Applicative m where
pure = return
<*> = ap

As we saw above for Fetch, this was not how Applicative was defined. The reasoning behind this is that more concurrency can be extracted by leveraging read-only semantics while still preserving correctness. A user won’t care whether their requests are performed sequentially or concurrently as long as they get back the data they expected. The only time a problem comes up is when multiple requests are in flight concurrently for the same resource. We’ll see below how caching was used to resolve this caveat.

## More than an Optimization: Caching for Correctness

The point of introducing caching in the Fetch core was to avoid returning inconsistent state for duplicate requests. The trick is to memoise the only entry point into this API, namely, dataFetch.

Three cases are considered:

1. The request is not in the cache.
2. The request is in the cache and it was successful.
3. The request is in the cache and it has not been fetched.

Let’s take a look at the internal caching API:

-- HashMap used rather than Map because of type constraints:
-- Ord not reasonable for all types - Eq and Hashable are
newtype DataCache =
DataCache (forall a . HashMap (Request a) a)

lookup :: Request a -> DataCache -> Maybe a
insert :: Request a -> a -> DataCache -> DataCache

In order to satisfy the three cases above, a request is cached as soon as it is issued, rather than upon completing. This way, duplicate requests can be served from the cache, rather than being issued across the network/etc..

This API never discards information - there is no delete. This works in practice, because once a series of requests are completed, the cache memory is collected by the GC.

There are additional benefits to this caching framework:

• Replayability: When a Fetch operation has completed, the cache contains an executable log of all requests that were made. This can be leveraged idempotently for profiling or fault diagnosis.
• Metadata: data can be stored in the cache (timestamps, md5s, …) that are not remote fetches.

Fetch is most similar to async in languages like C#, Clojure, Scala, F#, and OCaml. The key difference (and the reason for the title of this paper, “There is no Fork”) is that this framework performs concurrent operations implicitly by virtue of Applicative and Monad. The others must spawn a concurrent operation, often passing a callback to be performed to avoid blocking.

## Closing Notes

It was fun reading this paper. It’s enlightening just how far the combination of a sound abstraction and the particulars of a problem can be taken to craft an elegant solution. It almost reads like a functional pearl!

If you want to learn even more, I recommend checking out the Github Repo. The examples show how you can leverage Haxl in your systems.