Briefly Haxl

I’ll be digesting the Haxl paper here with the intent of understanding the mechanisms better and sharing an abbreviated form of the information with you. Check the first section below for the fast lane. Read beyond that for a brief look at some of the mechanisms used to implement this idea.

TL;DR Version

The shortest summary of the ideas that I can give are:

The final interface for Fetch looks like this:

data Request a = undefined -- application-specific type

-- what the Fetch API implements
data FetchStatus a
  = NotFetched
  | FetchSuccess a

data BlockedRequest =
  forall a . BlockedRequest (Request a)
                            (IORef (FetchStatus a))

newtype DataCache =
  DataCache (forall a . HashMap (Request a) a)

newtype Fetch a = Fetch {
  unFetch :: IORef DataCache -> IO (Result a) }

dataFetch :: Request a -> Fetch a
runFetch :: Fetch a -> IO a

data Result a
  = Done a
  | Blocked (Seq BlockedRequest) (Fetch a)
  | Throw SomeException

instance Monad Fetch
  return a = Fetch $ \ref -> return (Done a)

  Fetch m >>= k = Fetch $ \ref -> do
    r <- m ref
    case r of
      Done a       -> unFetch (k a) ref
      Blocked br c -> return (Blocked br (c >>= k))
      Throw e      -> return (Throw e)

instance Applicative Fetch where
  pure = return

  Fetch f <*> Fetch x = Fetch $ \ref -> do
    f' <- f ref
    x' <- x ref
    case (f', x') of
      (Done g,        Done y)        -> return (Done (g y))
      (Done g,        Blocked br c)  -> return (Blocked br (g <$> c))
      (Done g,        Throw e)       -> return (Throw e)
      (Blocked br c,  Done y)        -> return (Blocked br (c <*> return y))
      (Blocked br1 c, Blocked br2 d) -> return (Blocked (br1 <> br2) (c <*> d))
      (Blocked br c,  Throw e)       -> return (Blocked br (c <*> throw e))
      (Throw e,       _)             -> return (Throw e)

Read below for more details on how this all works.

The Fundamentals: Monad, Functor, Applicative, and a Caveat

I won’t go into detail here as to what these abstractions mean, or how they are used in general. I reproduce their interfaces below to make it more apparent how elegant the Fetch monad instance is.

class Functor f where
    fmap :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
    pure  :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b

class Monad f where
    return :: a -> f a
    (>>=)  :: f a -> f (a -> f b) -> f b

ap :: (Monad m) => m (a -> b) -> m a -> m b
ap mf mx = do f <- mf; x <- mx; return (f x)

More on: Monad, Applicatives

Because of a historical quirk, Applicative is not a superclass of Monad in Haskell. This is something that’s going to be resolved soon, likely in time for GHC 7.10. A consequence of this is that programs leveraging Haxl that use do-notation heavily may not exploit as much concurrency as if they used applicative-style.

An important note, derived from the above, is that the correct implementation of Applicative is:

instance Applicative m where
  pure = return
  <*> = ap

As we saw above for Fetch, this was not how Applicative was defined. The reasoning behind this is that more concurrency can be extracted by leveraging read-only semantics while still preserving correctness. A user won’t care whether their requests are performed sequentially or concurrently as long as they get back the data they expected. The only time a problem comes up is when multiple requests are in flight concurrently for the same resource. We’ll see below how caching was used to resolve this caveat.

More than an Optimization: Caching for Correctness

The point of introducing caching in the Fetch core was to avoid returning inconsistent state for duplicate requests. The trick is to memoise the only entry point into this API, namely, dataFetch.

Three cases are considered:

  1. The request is not in the cache.
  2. The request is in the cache and it was successful.
  3. The request is in the cache and it has not been fetched.

Let’s take a look at the internal caching API:

-- HashMap used rather than Map because of type constraints:
-- Ord not reasonable for all types - Eq and Hashable are
newtype DataCache =
  DataCache (forall a . HashMap (Request a) a)

lookup :: Request a -> DataCache -> Maybe a
insert :: Request a -> a -> DataCache -> DataCache

In order to satisfy the three cases above, a request is cached as soon as it is issued, rather than upon completing. This way, duplicate requests can be served from the cache, rather than being issued across the network/etc..

This API never discards information - there is no delete. This works in practice, because once a series of requests are completed, the cache memory is collected by the GC.

There are additional benefits to this caching framework:

Fetch is most similar to async in languages like C#, Clojure, Scala, F#, and OCaml. The key difference (and the reason for the title of this paper, “There is no Fork”) is that this framework performs concurrent operations implicitly by virtue of Applicative and Monad. The others must spawn a concurrent operation, often passing a callback to be performed to avoid blocking.

Closing Notes

It was fun reading this paper. It’s enlightening just how far the combination of a sound abstraction and the particulars of a problem can be taken to craft an elegant solution. It almost reads like a functional pearl!

If you want to learn even more, I recommend checking out the Github Repo. The examples show how you can leverage Haxl in your systems.

Thanks for reading!