Jim Newkirk and I have been doing xUnit.net for 7 years now (and for Jim, NUnit for many years before that). You could say that open source is part of our blood, and when we left Microsoft, we made sure that open source would continue to be part of our daily efforts at Tier 3.
Fast forward 15 months: Tier 3 has been acquired (and is now the CenturyLink Cloud Development Center), and our first major open source effort Iron Foundry has been accepted into the Cloud Foundry Incubator project. Lots of great developers are working to ensure that you can write .NET code against a Platform-as-a-Service stack that doesn't lock you into a specific vendor.
Today we are proud to announce our second major open source effort: ElasticLINQ.
What is ElasticLINQ?
One of the major challenges when writing distributed software is how to distribute the data. When I started here 15 months ago, we had 4 data centers, and plans to expand into several more over the coming year. The data was being stored primarily in Microsoft SQL Server. As our data center footprint grew, it was becoming clear that centralized data storage was not going to scale with us. Having islands of data means that your application (and your users) can end up spending a lot of time waiting for data requests to go halfway around the world; and if there are any network glitches along the way, you might even fail to get the data entirely.
Almost right away we started evaluating alternatives that would let us keep all the data locally. We decided to use Couchbase as our primary data store, based on its extremely strong Cross Data-Center Replication (XDCR) capabilities. Many object data storage systems end up paired with an index engine for comprehensive searching capabilities. Couchbase provides an indexing integration solution with Elasticsearch, a horizontally scalable wrapper around Lucene.
The Lucene query syntax is based on JSON; ElasticSearch documents are also stored as JSON. Our developers, steeped in the worlds of .NET and SQL Server, were much more comfortable using the Language Integrated Query (LINQ) functions introduced in .NET 3.5.
ElasticLINQ bridges these two worlds by letting us query Elasticsearch using LINQ, and have the results projected into CLR types. We enlisted the expertise of Damien Guard (of Attack Pattern), who worked on both the LINQ to SQL and Entity Framework teams, to do the initial version of ElasticLINQ for us.
How do I use ElasticLINQ?
As with most open source libraries for .NET, we have provided ElasticLINQ binaries via NuGet. Just install the package ElasticLINQ into your project. Version 1.0 supports Desktop .NET 4.5.
To get started, you will need an Elasticsearch server (or server cluster). It can be installed on both Windows and Linux, with installation instructions found here. The Elasticsearch API is done via HTTP and JSON, so you will need to be able to have HTTP access to the server (on port 9200, with a default installation).
Elasticsearch segregates data into what's called an index (think of this as a database in a traditional data world). When searching Elasticsearch, you can narrow your search to one or more indices, or you can search all the indices in the server cluster.
Connection and Context
To start, you will need to create the connection object which instructs ElasticLINQ how to connect to the server:
var connection = new ElasticConnection(new Uri("http://myserver:9200"));
There are four additional optional constructor arguments: username
and password
for servers that require authentication, a timeout
value to tell Elasticsearch how long you're willing to wait for an answer, and an index
parameter to tell Elasticsearch which index to search (to search multiple indices, pass a comma-separated string like "index1,index2"
).
Once you have the connection object, you'll need to create the context object; this object is where your queries will start.
var context = new ElasticContext(connection);
This constructor, too, has optional arguments: a mapping
object which I'll talk more about in a future blog post, a retryPolicy
object which allows you to specify retry logic for failed searchs, and a log
object that lets you integrate ElasticLINQ's logging into your existing logging solution.
Querying with the Context
The context object's Query<T>
method starts a query against a particular document type in Elasticsearch, and returns an IQueryable<T>
. From here, you can use many of the traditional LINQ operations which will be translated into an Elasticsearch request.
As of version 1.0, the following LINQ operations are supported: Where
, Skip
, Take
, Select
, OrderBy
and OrderByDescending
, ThenBy
and ThenByDescending
, First
and FirstOrDefault
, and Single
and SingleOrDefault
.
As with any LINQ implementation, there are limitations on the types of queries that are supported, since your C# code must be translate into an appropriate query at runtime. The Where
operation supports most common expressions, including: ==
, !=
, <
, <=
, >
, >=
, ||
, and &&
; as well as Equals
and Contains
.
Full-text searching
Elasticsearch offers a full text search feature called query_string
. It's intended to take user input (which means searches are case-insensitive), and lets the user specify things like wildcards and join operators like AND
and OR
. To issue a query_string
search, use the QueryString
extension method on your IQueryable<T>
:
query.QueryString("this AND that OR the other thing")
For more information on full-text searching with query_string
, see the Elasticsearch documentation.
Custom queries with ElasticMethods
In addition to the built-in LINQ operations, ElasticLINQ also supports four custom operations implemented as static methods on the ElasticMethods
class: Regexp
, Prefix
, ContainsAny
, and ContainsAll
. The first two are fairly self explanatory; the latter two require a bit of background.
Since LINQ (and relational databases) really don't support the notion of columns which are themselves collections, there is no existing syntax that adequately expresses the idea of matching collections against collections. The Contains
method will do the one-to-many mapping, and is supported in both directions with ElasticLINQ as you'll see below. When you have a collection field, and you want to see if it matches any (or all) of the values in a list, you can use ElasticMethods.ContainsAny
or ElasticMethods.ContainsAll
to express that style of query.
Here are the four types of collection-related queries:
var names = new[] { "Brad", "Jim", "Damien" }; // See if a field matches one of many values query.Where(x => names.Contains(x.Name)) // See if a collection field contains a single value query.Where(x => x.Aliases.Contains("Brad")) // See if a collection field contains any value from a list query.Where(x => ElasticMethods.ContainsAny(x.Aliases, names)) // See if a collection field contains all values from a list query.Where(x => ElasticMethods.ContainsAll(x.Aliases, names))
Custom queries and projections with ElasticFields
When querying (with Where
and Query
) and projecting (with Select
), you have access to two pieces of metadata from Elasticsearch: the document ID, and the search score. These fields are available as static properties named ElasticFields.Id
and ElasticFields.Score
, respectively.
A note about scoring: in Elasticsearch (and Lucene), a search is performed in two stages: a query
, which scores the results of the search, and a filter
, which further filters the result of the query
. The default query
and filter
return all documents, so you can query
without a filter
, filter
without a query
, or provide neither (and simply get all the documents in Elasticsearch).
When you use Where
in ElasticLINQ, that translates into a filter
. We chose this by default because it is significantly faster than a query
, and the results of filter
are cacheable by Elasticsearch. If you don't intend to use scoring, then filter
is the way to go. If you want to issue a query
so you can get scoring (for example, because you're doing full-text style searching), you can use the Query
extension method that we've added to IQueryable
; it has the same syntax and usage as Where
, except that it results in a query
for your search. You can mix both Query
and Where
in the same request, but be aware that the two-stage system in Elasticsearch means that all Query
criteria will be searched before any Where
criteria are.
What's next?
This is v1.0 software, so we have a lot left that we can do. We've just recently started using this in our production code, and we are constantly finding new things we want to support. We expect you will come up with things we never dreamed of, too.
We are excited for the community to start using and contributing to ElasticLINQ. The Github site is a work in progress. Soon we will get documentation posted to the Wiki pages on the site, and get a real home page set up. We are anxiously awaiting the first community contributed bugs, Wiki edits, and pull requests.
We hope you love using ElasticLINQ as much as we do!
Third-party libraries
The best part about open source is sharing. :) We would like to thank the following open source projects which were critical to the development of ElasticLINQ.
- Json.NET is used to serialize the query as well as deserialize the results. All Json.NET serialization-related attributes should be natively supported as a result.
- xUnit.net and NSubstitute were used for unit testing.