Our Thinking

Industry news and comment, from practical tips for publishing open data to inspiring accounts of how open data has changed businesses

Get in touch

An RDF to JSON converter on Google App Engine

As part of our work on the SDX project we felt one requirement was to enable typical web developers to exploit the RDF linked data in a simpler, more developer friendly form. In particular it would be nice to be able to access an RDF model of a resource as JSON to simply using it…

As part of our work on the SDX project we felt one requirement was to enable typical web developers to exploit the RDF linked data in a simpler, more developer friendly form. In particular it would be nice to be able to access an RDF model of a resource as JSON to simply using it from ajax clients.

There are, in fact, quite a lot of proposed RDF in JSON formats out there. Some, such as the Talis format are quite widely used but that one is aimed at faithful transport of the whole RDF model rather than a developer-friendly rendering.

As it happens, at the same time we started looking at this Jeni Tennison was thinking along similar lines, motived by the same use case of developer access to eGov linked data sets. Following a discussion on the linked data list. Jeni, Mark Birbeck (author of the best matching existing JSON format, RDFj) and myself have started up a project agree a definition for such a translation.

The design choices and discussions can be found on the associated google code wiki: https://code.google.com/p/linked-data-api/wiki/Overview

Implementation

While the design choices are not finalized we felt it would be useful to have an implementation of a RDF/JSON encode/decode library and a working web endpoint. That way we, and others, can experiment with what a JSON rendering of sample data looks like and use that to drive the design choices. The translator is implemented in Java, built on top of Jena. The translator supports:

  • encoding an ordered list of RDF resources from a single graph, with or without any resources they reference
  • encoding an entire RDF graph
  • encoding a DataSet (a default graph plus a set of named graphs)
  • customizing the coding by supplying an ontology
  • decoding back to an RDF graph (plus ordered list of root resources) or to a DataSet

Encoding can be optionally done relative to a base URI, in which case resource references that are extensions of that base will be encoded as relative URIs. The initial set of design choices the code implements is described at: https://code.google.com/p/linked-data-api/wiki/DI_expimental_translator#Design_choices

The implementation is available from: https://code.google.com/p/linked-data-api/downloads/list.

Web endpoint

To make the implementation available as a web endpoint we wanted to experiment with using Google App Engine for hosting. Since I’m new to GAE I’ll record what was needed in case others are interested.

Step 0 is to get a Google App Engine account. Be warned that this requires validation via text to a mobile phone and any given phone number can only be used for a single GAE account. If, like me, you want to have both a personal and a work account you may get burned by this.

As an Eclipse user, step 1 was to download the GAE plugin for Eclipse. Use the update site: https://dl.google.com/eclipse/plugin/3.5

This lets you create a new type of project Web Application Project in which we then develop the servlet and associated html files in the normal way. With the GAE plugin you can run the application locally to test if the code meets the GAE restrictions.

Jena 2.6.2, and the included ARQ, had some minor infelicities that caused them to barf under GAE. Fixes for these have been checked in but it does mean if you want to run under GAE you should run rebuild Jena and ARQ from the public sources until the next release is done.

The second problem is that you can’t write to files in GAE which affects logging. Jena uses log4j and the GAE plugin automatically creates a logj.properties file defining a console appender. Since GAE automatically logs the stdout/stderr output this works fine except that you need to tell Jena to use that appender. A simple brute force way to do that is to set the root logger so that the head of log4j.properties will look like:

# Configure the console as our one appender
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p [%c] - %m%n

# Manually added line to force use of the Console appender
log4j.rootLogger=ERROR, A1

where the last line is the only one I needed to add. Choice of logging level depends on your requirements, obviously.

With these small fixes the webapp worked locally just fine and uploading to GAE is very easy. The Eclipse plugin gives you a button for doing this. Just edit the WEB-INF/appengine-web.xml file to given the application the name of the application you created at step 0. Then click the upload button Icon for Eclipse plugin button and supply username and password. The webapp comes live very quickly.

This results in the simple web application at: [Bad link]

The servlet supporting this also implements a GET interface to enable it to act as a proxy onto public linked data. So, for example, the following will fetch the DBPedia description of Cambridge, convert it to JSON and return a wrapped array of resources with the first entry corresponding to https://dbpedia.org/resource/Cambridge and the remainder encoding all referenced resources that are included in the same RDF graph:

[Bad link]

Quite a painless and cheap way of putting up public web apps.

The primary limitation (other than paying if you get very heavy usage!) is that GAE is limited to 30s response time. So in proxy mode if the site you are proxying takes too long to respond, or tries to deliver too much data, the endpoint will time out.


DAVE REYNOLDS

DAVE REYNOLDS

CTO

Contact us about your linked data projects