All articles

Introducing our reference data management platform service

Decorative graphic - dark pink background with node and link graphic with Epimorphics swish icon in aqua as one node. White Text: introducing reference data management platform service

We offer a number of services to support the UK public sector through GCloud13.  This post highlights our cloud software Reference Data Management (Registry) Platform and associated Reference Data Publishing support service.

Our Reference data management platform service enables organisations to manage a version controlled hierarchy of reference terms, delegating authority to different groups in an open way that helps you build trust. Used for publishing registers of connective reference data with persistent URIs that supports key data infrastructure, internal and external services, data integration and standards activities. 

Service Description

A key to good data governance is the ability to develop, manage and use shared, system independent reference data that enables data to be reused and integrated across applications and between organisations. 

The Epimorphics Reference Data Management Platform provides a complete solution to this challenge:

  • A platform to manage controlled sets of reference data. Each item of reference data is given a persistent, resolvable identifier that meets the UK Government mandated standard for such identifiers (HTTP URIs) 
  • Provides an open data model which supports anything from flat collections of simple reference terms, to taxonomies and thesauri, including the ability to publish cross-links between terms in different collections. The data model is based on W3C Semantic Web standards but the platform provides for both update and access through simple CSV or JSON(LD) formats – enabling both publishing and use of the reference data from systems with no support for Semantic Web standards 
  • Provides customizable lifecycle management of the reference data including version management and history 
  • Role-based management of the reference data including ability to delegate management hierarchically and even delegate sections of the reference data namespace to other systems 
  • Integrated search and browsing tools to make it easy to locate sets of terms, or individual terms, for reuse. 
  • Full API to enable automated access to, and update of, the reference data from (suitably authorised) external systems 

Our Reference Data Manager platform is a solution that builds upon the open source registry software that we developed. The platform and software is trusted and used internationally by organisations needing to manage their reference data. It provides a range of reference management tools and services such as controlled, authoritative lists of identifiers as URIs. This supports good data governance, data standards, data collaboration and data use.

Flow Diagram - From left to right. Box one - codelist / register change arrow to box 2 - submission and access control to Box 3 - data storage and manage account to box 4 - API and data access to Box 5 Registry API.  Underneath Box 2 - box 5 is the Monitoring box. Dotted line from Box 4 and 5 to data users and applications (end user, data scientist and data manager)

The Epimorphics Reference Data Management Platform (Registry) has, at its heart, an open source implementation of the Linked Data Registry specification (developed under the auspices of the UK Government Linked Data Working Group). 

This provides for the creation and management of reference data through a well-defined API. 

We wrap this core component with a customisable user interface that can be easily adapted to an alternative look and feel or modified to present simplified interfaces for particular applications. Organisations use the Reference Data Management Platform to improve their enterprise data management, providing an open, robust, web accessible reference data.  This has commonly formed part of improving information architecture and data governance by providing interoperable data for code lists, vocabularies and other data standards through persistent web accessible URIs.  Often this includes the need for version control and access by other systems using RESTful APIs for integration. 

Overview of architecture diagram. Dark blue boxes. Bottom to top: RDF store and Text index, Registry Logic (with customisable lifecycle note), API (authentication and Authorisation) and User roles database, User interface (with Customisable templates - look and feel and operations note), and Reverse proxy front end (with namespace delegation note)

For the backend storage of the reference data we use the open source Apache Jena TDB triple store and Apache Lucene text index. In many deployments the platform includes a reverse proxy front end that allows parts of the reference data namespace to delegate to other systems. 

This front end is driven by the registry logic itself so that delegating part of the namespace is as easy as creating a registration record specifying the target of the delegation. 

Example UK public sector GCloud users

This software platform powers the UK Environment registry, the Food Standards Agency Codelist repository and has been used internationally for controlled management of vocabularies and reference data. 

Screenshot displayed on an image of an open macbook pro screen. Screenshot of Food Standards Agency data.food.gov.uk/codes landing page.

Through GCloud13 we offer the platform as a fully managed cloud-hosted service with a separate instance created for each customer. 

Other users

External users of the registry software include MetOffice, World Meteorological Organisation, BRGM (the French Geological Survey), US National Weather Service and CSIRO (Australia’s national science agency).  For a number of these we have supported their use through training or funded code development (of the open source code).

More information

For more information contact us, see our GCloud Service Offerings or our Reference Data Manager page