❤ Queer Types ❤ - Briefly Haxl

Briefly Haxl

I’ll be digesting the Haxl paper here with the intent of understanding the mechanisms better and sharing an abbreviated form of the information with you. Check the first section below for the fast lane. Read beyond that for a brief look at some of the mechanisms used to implement this idea.

TL;DR Version

The shortest summary of the ideas that I can give are:

A Fetch monad is defined that automatically extracts concurrency by using Applicative and Monad
Concurrency is implicitly enabled by defining a clever Applicative
Dependent computations are made to block implicitly by leveraging Monad
Requests are interpreted using a Free monad construction
Users of the API are required to implement their own Request type
- The concurrency strategy is left up to users for maximum flexibility
Requests are memoized for correctness - two identical Requests in flight should always return the same result
It behaves well in the face of exceptions
It is written to be used in a strictly read-only context
- It is expected to also be able to work in a strictly write-only context

The final interface for Fetch looks like this:

data Request a = undefined -- application-specific type

-- what the Fetch API implements
data FetchStatus a
  = NotFetched
  | FetchSuccess a

data BlockedRequest =
  forall a . BlockedRequest (Request a)
                            (IORef (FetchStatus a))

newtype DataCache =
  DataCache (forall a . HashMap (Request a) a)

newtype Fetch a = Fetch {
  unFetch :: IORef DataCache -> IO (Result a) }

dataFetch :: Request a -> Fetch a
runFetch :: Fetch a -> IO a

data Result a
  = Done a
  | Blocked (Seq BlockedRequest) (Fetch a)
  | Throw SomeException

instance Monad Fetch
  return a = Fetch $ \ref -> return (Done a)

  Fetch m >>= k = Fetch $ \ref -> do
    r <- m ref
    case r of
      Done a       -> unFetch (k a) ref
      Blocked br c -> return (Blocked br (c >>= k))
      Throw e      -> return (Throw e)

instance Applicative Fetch where
  pure = return

  Fetch f <*> Fetch x = Fetch $ \ref -> do
    f' <- f ref
    x' <- x ref
    case (f', x') of
      (Done g,        Done y)        -> return (Done (g y))
      (Done g,        Blocked br c)  -> return (Blocked br (g <$> c))
      (Done g,        Throw e)       -> return (Throw e)
      (Blocked br c,  Done y)        -> return (Blocked br (c <*> return y))
      (Blocked br1 c, Blocked br2 d) -> return (Blocked (br1 <> br2) (c <*> d))
      (Blocked br c,  Throw e)       -> return (Blocked br (c <*> throw e))
      (Throw e,       _)             -> return (Throw e)

Read below for more details on how this all works.

The Fundamentals: Monad, Functor, Applicative, and a Caveat

I won’t go into detail here as to what these abstractions mean, or how they are used in general. I reproduce their interfaces below to make it more apparent how elegant the Fetch monad instance is.

class Functor f where
    fmap :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
    pure  :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b

class Monad f where
    return :: a -> f a
    (>>=)  :: f a -> f (a -> f b) -> f b

ap :: (Monad m) => m (a -> b) -> m a -> m b
ap mf mx = do f <- mf; x <- mx; return (f x)

More than an Optimization: Caching for Correctness

The point of introducing caching in the Fetch core was to avoid returning inconsistent state for duplicate requests. The trick is to memoise the only entry point into this API, namely, dataFetch.

Three cases are considered:

The request is not in the cache.
The request is in the cache and it was successful.
The request is in the cache and it has not been fetched.

Let’s take a look at the internal caching API:

-- HashMap used rather than Map because of type constraints:
-- Ord not reasonable for all types - Eq and Hashable are
newtype DataCache =
  DataCache (forall a . HashMap (Request a) a)

lookup :: Request a -> DataCache -> Maybe a
insert :: Request a -> a -> DataCache -> DataCache

In order to satisfy the three cases above, a request is cached as soon as it is issued, rather than upon completing. This way, duplicate requests can be served from the cache, rather than being issued across the network/etc..

This API never discards information - there is no delete. This works in practice, because once a series of requests are completed, the cache memory is collected by the GC.

There are additional benefits to this caching framework:

Replayability: When a Fetch operation has completed, the cache contains an executable log of all requests that were made. This can be leveraged idempotently for profiling or fault diagnosis.
Metadata: data can be stored in the cache (timestamps, md5s, …) that are not remote fetches.

Fetch is most similar to async in languages like C#, Clojure, Scala, F#, and OCaml. The key difference (and the reason for the title of this paper, “There is no Fork”) is that this framework performs concurrent operations implicitly by virtue of Applicative and Monad. The others must spawn a concurrent operation, often passing a callback to be performed to avoid blocking.

Closing Notes

It was fun reading this paper. It’s enlightening just how far the combination of a sound abstraction and the particulars of a problem can be taken to craft an elegant solution. It almost reads like a functional pearl!

If you want to learn even more, I recommend checking out the Github Repo. The examples show how you can leverage Haxl in your systems.

Thanks for reading!

Briefly Haxl

TL;DR Version

The Fundamentals: Monad, Functor, Applicative, and a Caveat

More than an Optimization: Caching for Correctness

A Note on Related Work

Closing Notes