Data Cube ontology (W3C)

People often think of linked data as being a way to connect descriptions of things into a graph of relationships, for example to link a school to the organization that oversees it. To deal with data which is naturally presented as tables and charts the Data Cube Vocabulary was developed. This is useful for a huge range of data from official statistics, through payments and expenditures, to sensor measurements and environmental environment quality assessments.

At the heart of the vocabulary is the concept of dataset which is a collection of observations. The observations are organized along a set of dimensions (e.g. time, geographic region) and each observation has one or more associates measurements (e.g. population or air quality classification). To reliably interpret the measurements you may need other information such as the units of measurement or the measurement process used, these annotations are termed attributes.

All the information defining the structure of the dataset is given in a data structure definition which means that RDF Data Cubes are self describing. From an observation you can find the enclosing data set and the the corresponding definition of its structure.

The vocabulary was first created in 2010 under the initiative and sponsorship of John Sheridan (The National Archive) and developed by Dave Reynolds (Epimorphics), Richard Cyganiac (DERI) and Jeni Tennison (now ODI) with advice and support from Arofan Gregory. This builds upon related standards particularly the SDMX information model used for statistical data.

Dave led the vocabulary through the W3C Recommendation process and it became a formal Recommendation in 2014.

It has proved hugely successful and there have been a wide variety of implementations of Data Cube including publishing statistical, environment and observational measures together with tools for data conversion, data validation and visualization. At Epimorphics we used it routinely across many projects.