RDF Result sets - Epimorphics

In some recent semantic web applications, where we’ve been creating user interfaces over REST style interfaces over RDF data sets, we found a common pattern emerging – ResultSets. The approach we took has been documented but it’s buried in other details so I’d like to pull out the essential pattern in this post.

Situation

The situation is that your UI (or other client) wants to find all resources that match some criteria and get a description of them. Typically the client wants to see those resources ordered (e.g. in terms of relevance to some original query, or by name or whatever) .

This is not just a SPARQL SELECT. SELECT allows you to find the matching resources and to sort them but it can only extract a fixed set of values from the resources. A key value of RDF is it’s ability to handle schema-less information and not require resource descriptions to be of uniform shape. If we only pull back descriptions via SELECT we loose that.

This not a simple subgraph of the RDF dataset (e.g. as you would get from a DESCRIBE) since then you loose the information on which are the top level matching resources and how they are ordered.

Specifying the query

Abstractly we specify the query using the template:

query(select, var, description)

Where select is a SPARQL select query which extracts the resources we want, possibly ordering them; var is the name of a variable in the select which corresponds to the retrieved resources and description is either the single keyword “DESCRIBE” (meaning that each resource should be returned via a SPARQL DESCRIBE operation) or it is a SPARQL ConstructTemplate which refers to other variables in the select.

In fact there’s a lot of separate machinery for how to build up the query as a series of query refinement operations, but that’s not relevant here.

Returning the results

To return results we provide two abstractions – a ResultSet and a ResultWindow.

A ResultSet:

is identified by a URI
has RDF metadata to describe that URI (the dataset operated on, the query run, when it ran etc)
can be used to open a ResultWindow
openWindow(ResultSet, {start, {end}})

A ResultSetWindow:

is also identified by a URI and identifies:
o an ordered list of resources
o an RDF graph containing at least the descriptions of the resources within the window
o a flag to indicate if the window reaches to the end of the ResultSet

Having a first class representation of the whole result set allows us to pass it around, annotate it, share it, without having to copy the actual results. It is up to the server to decide how eager/lazy to be on evaluation and what caching (if any) to do.

Having a window allows us to probe and page through inconveniently large result sets. If a client opens a window over the whole set or pages through in nice order then the server can still stream results but the server has to be prepared to reissue the query with a LIMIT/OFFSET or rewind the results if the client opens windows out of order.

Packaging up as a RESTful API

So far we’ve been talking abstractly but as well as a Java API for this query interface we want to use it in a RESTful web service setting.

The query endpoint is simple, supporting GET (or POST for large queries):

https://example.com/dataset?query={qstring}&var={name}&description={descstring}

The returned document representation is RDF (in RDF/XML, Turtle or a JSON encoding) describing the result set:

<https://example.com/dataset/resultsetNNN> a rs:ResultSet;
     dc:date "2009-12-15T09:32:42Z"^^xsd:dateTime;
     rs:query qstring;
     rs:var name;
     rs:description descstring;
     rs:firstWindow <https://example.com/dataset/resultsetNNN/0-20>;
     ... optional statistics or other metadata ... .

In the case of a client requesting HTML then they get a rendering of this, which includes clickable links for browsing to the first window of results.

The client can then open a window onto the ResultSet by appending a window description to the returned results set URI:

https://example.com/dataset/resultsetNNN/{start{-end}}

or can follow its nose down the rs:firstWindow reference.

A GET on this window URI, requesting JSON encoding is easy, you get a wrapper something like:

{
    "id" : "https://example.com/dataset/resultsetNNN/0-20",
    "resultset" : "https://example.com/dataset/resultsetNNN",
    "windowStart" : 0,
    "windowEnd" : 15,
    "complete"  : true,
    "results" : [
         {
             "id" : "https://example.com/dataset/someresourceURI",
             "https://example.com/somepropertyURI" : "some property value",
             ...
         }
         ...
    ]
}

So the array of results provides the required ordering and top level resource list, the RDF descriptions are rendered inline as JSON structures.

In the case of requesting some RDF encoding then you get a graph back which contains the metadata trail allowing the client to unpick the result list:

<https://example.com/dataset/resultsetNNN> a rs:ResultSet;
    rs:window <https://example.com/dataset/resultsetNNN/0-20> .

<https://example.com/dataset/resultsetNNN/0-20> a rs:ResultSetWindow;
    rs:windowStart 0;
    rs:windowEnd 15;
    rs:complete true;
    rs:results (
        <https://example.com/dataset/someresourceURI> ... ) .

<https://example.com/dataset/someresourceURI> ... .

So there you go. Some quite minor wrappers round existing technology but it’s a pattern that worked for us.

#TechTalk