I’ll be digesting the Haxl paper here with the intent of understanding the mechanisms better and sharing an abbreviated form of the information with you. Check the first section below for the fast lane. Read beyond that for a brief look at some of the mechanisms used to implement this idea.
The shortest summary of the ideas that I can give are:
Fetch
monad is defined that automatically extracts concurrency by using Applicative
and Monad
Applicative
Monad
Free
monad constructionRequest
type
The final interface for Fetch
looks like this:
data Request a = undefined -- application-specific type
-- what the Fetch API implements
data FetchStatus a
= NotFetched
| FetchSuccess a
data BlockedRequest =
forall a . BlockedRequest (Request a)
(IORef (FetchStatus a))
newtype DataCache =
DataCache (forall a . HashMap (Request a) a)
newtype Fetch a = Fetch {
unFetch :: IORef DataCache -> IO (Result a) }
dataFetch :: Request a -> Fetch a
runFetch :: Fetch a -> IO a
data Result a
= Done a
| Blocked (Seq BlockedRequest) (Fetch a)
| Throw SomeException
instance Monad Fetch
return a = Fetch $ \ref -> return (Done a)
Fetch m >>= k = Fetch $ \ref -> do
r <- m ref
case r of
Done a -> unFetch (k a) ref
Blocked br c -> return (Blocked br (c >>= k))
Throw e -> return (Throw e)
instance Applicative Fetch where
pure = return
Fetch f <*> Fetch x = Fetch $ \ref -> do
f' <- f ref
x' <- x ref
case (f', x') of
(Done g, Done y) -> return (Done (g y))
(Done g, Blocked br c) -> return (Blocked br (g <$> c))
(Done g, Throw e) -> return (Throw e)
(Blocked br c, Done y) -> return (Blocked br (c <*> return y))
(Blocked br1 c, Blocked br2 d) -> return (Blocked (br1 <> br2) (c <*> d))
(Blocked br c, Throw e) -> return (Blocked br (c <*> throw e))
(Throw e, _) -> return (Throw e)
Read below for more details on how this all works.
I won’t go into detail here as to what these abstractions mean, or how they are used in general. I reproduce their interfaces below to make it more apparent how elegant the Fetch
monad instance is.
class Functor f where
fmap :: (a -> b) -> f a -> f b
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
class Monad f where
return :: a -> f a
(>>=) :: f a -> f (a -> f b) -> f b
ap :: (Monad m) => m (a -> b) -> m a -> m b
ap mf mx = do f <- mf; x <- mx; return (f x)
More on: Monad, Applicatives
Because of a historical quirk, Applicative
is not a superclass of Monad
in Haskell. This is something that’s going to be resolved soon, likely in time for GHC 7.10. A consequence of this is that programs leveraging Haxl that use do-notation heavily may not exploit as much concurrency as if they used applicative-style.
An important note, derived from the above, is that the correct implementation of Applicative is:
As we saw above for Fetch
, this was not how Applicative
was defined. The reasoning behind this is that more concurrency can be extracted by leveraging read-only semantics while still preserving correctness. A user won’t care whether their requests are performed sequentially or concurrently as long as they get back the data they expected. The only time a problem comes up is when multiple requests are in flight concurrently for the same resource. We’ll see below how caching was used to resolve this caveat.
The point of introducing caching in the Fetch
core was to avoid returning inconsistent state for duplicate requests. The trick is to memoise the only entry point into this API, namely, dataFetch
.
Three cases are considered:
Let’s take a look at the internal caching API:
-- HashMap used rather than Map because of type constraints:
-- Ord not reasonable for all types - Eq and Hashable are
newtype DataCache =
DataCache (forall a . HashMap (Request a) a)
lookup :: Request a -> DataCache -> Maybe a
insert :: Request a -> a -> DataCache -> DataCache
In order to satisfy the three cases above, a request is cached as soon as it is issued, rather than upon completing. This way, duplicate requests can be served from the cache, rather than being issued across the network/etc..
This API never discards information - there is no delete
. This works in practice, because once a series of requests are completed, the cache memory is collected by the GC.
There are additional benefits to this caching framework:
Fetch
operation has completed, the cache contains an executable log of all requests that were made. This can be leveraged idempotently for profiling or fault diagnosis.Fetch
is most similar to async
in languages like C#, Clojure, Scala, F#, and OCaml. The key difference (and the reason for the title of this paper, “There is no Fork”) is that this framework performs concurrent operations implicitly by virtue of Applicative
and Monad
. The others must spawn a concurrent operation, often passing a callback to be performed to avoid blocking.
It was fun reading this paper. It’s enlightening just how far the combination of a sound abstraction and the particulars of a problem can be taken to craft an elegant solution. It almost reads like a functional pearl!
If you want to learn even more, I recommend checking out the Github Repo. The examples show how you can leverage Haxl in your systems.
Thanks for reading!