This will be a series of posts covering the development of a Domain Specific Language for search. I’ll dive into the design, testing, and implementation of parsing and evaluation. This will include both pure evaluation and compilation to express queries for an external data store (*SQL, Elastic Search, etc.).
In part 1, I’ll cover why someone might want to use a DSL, as well as the design and implementation of the DSL core.
Why would you want to implement a DSL?
Focusing on search, you might want to expose search functionality to users. This might be a public-facing, HTTP API that allows users to look up other user details and posts, like Twitter’s Search API. You might be part of a team at a large Cloud Services provider, putting together a search engine for users to review logs regarding infrastrcuture. This might an internal feature, an enhancement to existing developer tools to facilitate handling customer support requests.
Search is a wide space!
The main reasons to pursue a DSL as a solution are:
Essentially, DSL approaches to solving problems bring the capabilities of language design theory to your problem space.
It’s notable that command line tools like grep, sed, awk, and find are themselves DSLs. A surprising number of problems can be approached with language design techniques in mind.
While there are many benefits to such an approach, it is not without costs. The primary costs are knowledge and time - knowing enough about language design, parsing, and the domain to bring together the pieces, and having enough time to make it all work.
This series of posts aims to address the knowledge part of the cost, by demonstrating what this process looks like from start to finish.
Let’s get into it!
We’re going to start with a simple search language and build on it in later posts. Let’s first specify a few parts of the language, enough to perform some interesting searches:
This allows for expressions like:
confidential == false count > 10 and ip == "10.0.8.0" date >= "2010-10-12" and date <= "2010-11-12"
Future extensions to the language could happen along any of the above axes: more operations, more types, more type constraints, more logical combinators.
For a convenient implementation, the limiting factor is how much the host programming language’s type system can accommodate. I’ll be using Haskell in this series, but any language that allows for algebraic data types should be enough to reproduce this work. It’s entirely possible to develop a DSL solution in other languages, but there’ll be more testing to do.
Let’s implement the language core.
We’d like to support all of
<=. Let’s add those in:
Next, we’ll add support for types and values:
Notice that we’re maintaining both type-level and value-level information, and maintain a distinction between the two levels.
Next up, support for variables:
This is a fairly straightforward approach. We don’t have to worry about bindings or scope. We leave the declaration of allowed variables to the search engine provider, to be exposed via documentation.
Next up: type constraints!
data Supports = Equality | Ordering deriving (Eq, Show) supports :: Type -> [Supports] supports IntType = [Equality, Ordering] supports StringType = [Equality, Ordering] supports BoolType = [Equality] needs :: Op -> [Supports] needs Lt = [Ordering] needs Gt = [Ordering] needs Eq = [Equality] needs Gte = [Ordering, Equality] needs Lte = [Ordering, Equality] -- > IntType `can` Lt -- True can :: Type -> Op -> Bool can t op = null (needs op \\ supports t)
Next, the logical combinator
and. I’ll also introduce how binary operations are represented at this stage:
This little recursive data type captures the essence of our language. Let’s break it down:
And Expr Expr:
andcombines two expressions
BinOpL Op Var Literal: a binary
Varon the Left and a
BinOpR Op Literal Var: a binary
Varon the Right and a
With this construction, we’ve ruled out some cases:
1 < 2
date == date,
ip > date
Here’s the language as a whole, for convenient reading:
data Op = Lt -- < | Gt -- > | Eq -- == | Gte -- >= | Lte -- <= deriving Eq data Type = IntType | StringType | BoolType deriving Eq data Literal = IntLit Int | StringLit Text | BoolLit Bool deriving Eq data Supports = Equality | Ordering deriving (Eq, Show) supports :: Type -> [Supports] supports IntType = [Equality, Ordering] supports StringType = [Equality, Ordering] supports BoolType = [Equality] needs :: Op -> [Supports] needs Lt = [Ordering] needs Gt = [Ordering] needs Eq = [Equality] needs Gte = [Ordering, Equality] needs Lte = [Ordering, Equality] -- > IntType `can` Lt -- True can :: Type -> Op -> Bool can t op = null (needs op \\ supports t) data Expr = And Expr Expr | BinOpL Op Var Literal | BinOpR Op Literal Var deriving (Show, Eq)
That’s all I’m going to cover for now - the core of the language. In the next post, I’ll build a parser for this language and write some tests for that parser.
If you’d like to get a head start, the source for the search DSL is availaible in this GitLab repo.
Thanks for reading!
If you enjoy my works: