Ontologies for government data
For data to be useful it has to be structured in a way that is meaningful and fit for purpose. The key to this is vocabularies or ontologies which describe what data is about and the properties that can be used to characterize and link it.
This gives the data a strong, but flexible foundation, that can adapt as the data grows and the needs of organisation evolve.
Having standard ontologies that can be reused helps both data publishers and data users. They make it possible to compare, connect and integrate data from across multiple sources safely.
Early in the stages of publishing government linked data it became apparent that certain types of data were coming up again and again:
- A lot of government data (indeed all data!) takes the form of tables or cubes - collections of measurements or statistics organised by a set of dimensions (such as when they were taken and what area they apply to).
- We very often need to refer to organisations - not just to describe how the organisations are structured but to be able to link data to the organisations that collect, license and publish it.
Yet in both cases there were no good standard ontologies available and this was holding back the process of opening up government data to full 5-star quality.
Sponsored by The National Archive we set about solving these issues.
Client: Initially sponsored by The National Archive
Our Role: Develop ontologies, carry through to international standards
To develop ontologies for representing data cubes (statistics, measurements, expenditures) and for representing organisations, then build them into international standards.
Data Cube Vocabulary
People often think of linked data as being a way to connect descriptions of things into a graph of relationships, for example to link a school to the organization that oversees it. To deal with data which is naturally presented as tables and charts the Data Cube Vocabulary was developed. This is useful for a huge range of data from official statistics, through payments and expenditures, to sensor measurements and environmental environment quality assessments.
At the heart of the vocabulary is the concept of dataset which is a collection of observations. The observations are organized along a set of dimensions (e.g. time, geographic region) and each observation has one or more associates measurements (e.g. population or air quality classification). To reliably interpret the measurements you may need other information such as the units of measurement or the measurement process used, these annotations are termed attributes.
All the information defining the structure of the dataset is given in a data structure definition which means that RDF Data Cubes are self describing. From an observation you can find the enclosing data set and the the corresponding definition of its structure.
The vocabulary was first created in 2010 under the initiative and sponsorship of John Sheridan (The National Archive) and developed by Dave Reynolds (Epimorphics), Richard Cyganiac (DERI) and Jeni Tennison (now ODI) with advice and support from Arofan Gregory. This builds upon related standards particularly the SDMX information model used for statistical data.
Dave led the vocabulary through the W3C Recommendation process and it became a formal Recommendation in 2014.
It has proved hugely successful and there have been a wide variety of implementations of Data Cube including publishing statistical, environment and observational measures together with tools for data conversion, data validation and visualization. At Epimorphics we used it routinely across many projects.
One common place where governments and local authorities publish tabular data is the area of finance.
Working with the Local eGovernment Standards Body (LeGSB now iStandUK) we developed a Payments Ontology for local government expenditure data. This builds upon the Data Cube vocabulary and customizes it for representing payments.
One of the challenges in modelling payments information is that every publisher has a different way of organizing their expenditure analysis – how they group and classify the different expenditures and relate them to both budget and services.
A great feature of linked data and the open world modelling technique it is based on, is that such questions can be left [somewhat] open. The ontology allows for an open ended set of analysis schemes, each self-describing and organized using the SKOS approach to knowledge organization.
This allows the flexibility for local authorities to publish their analysis as it is meaningful to them. But also to map these to shared analysis schemes, allowing for benchmarking and cross organisation comparisons.
Organization Ontology: ORG
Much information, both public sector and within enterprises, relates to organisations and organisational units. Yet there didn’t seem to be a satisfactory common vocabulary for presenting such information in linked data. Sponsored by The National Archives we developed a solution to this in the form of the Organization Ontology (ORG for short).
In any such undertaking we first have to be clear on the requirements. Our look at this identified that what was needed was a very lightweight, highly reusable ontology which did not try to model particular organisational structures. We also wanted to make sure the ontology connected to and reused other commonly used linked data vocabularies such as foaf. Since a survey failed to turn up any fully satisfactory candidates we developed a new ontology, based on the lessons from the survey and community feedback.
The end result gives a simple core which is able to represent organisations, organisational units and people’s roles within organisations while being very extensible. It is openly usable and has been exploited by a number of groups including data.gov.uk who have been able to extend the ontology to cope with the rich mysteries of the UK government structures and used it to publish organogram information for all government departments.
As with the Data Cube Vocabulary, this proved to be a sufficiently common need that W3C sought to standardize it. We successfully took ORG through the W3C process and it became a formal W3C Recommendation in 2014.
- Reusable ontologies help data publishers, data users and service providers – providing common patterns that enable data to be compared, connected and interpreted.
- We have developed a number of such ontologies and have taken two key ones through to becoming internationally used standards (W3C Recommendations):
- The Data Cube Vocabulary which is useful for any sort of tabular data from government statistics, through financial data to sensor measurements.
- The Organization Ontology to enable information on organisations and their structure to be published and linked to.
- The processes that we used for these highly successful ontologies have stood us in good stead when developing ontologies for other customers.
More Case Studies
Epimorphics supported Land Registry in building capability and supporting the publication of the UK House Price Index as open linked data. The Land Registry records the sale of residential and commercial properties and this data is made available to support many types of users including the public, companies, institutions,...
The Environment Agency provides a range of live information related to flooding including alerts and warnings of current or impending floods, together with real time measurements of river levels and flows. We make this information available as open data in way that makes it easy for developers to access...
The Natural Resources Wales Bathing Water Quality Data service was developed with the specific needs of Wales and dual language capabilities built in. The NRW service and the data API all are able to respond to language requests supporting both NRW’s and other Wales user needs