Bathing Water Quality: Structure of the Published Linked Data

This document describes the structure of the UK Bathing Water Quality linked data that we produced for the Environment Agency.

As part of our work with data.gov.uk and the UK Location Programme we have been working to pilot the publication of both current and historic bathing water quality information as linked data.  This data is available through SPARQL endpoints at data.gov.uk and TSO's OpenUPLabs and through data dumps (samples, sites, compliance_history).

The data covers the period up to the end of the 2010 season.

The Domain

The UK has a number of areas, typically beaches, that are designated as bathing waters where people routinely enter the water.  The Environment Agency monitors and reports on the quality of the water at these bathing waters.

For each bathing water there is a sampling point near which the water is sampled roughly once a week during the bathing season.  These samples are analysed and the water given a compliance classification of excellent, good or poor.

Data Structure

The data can be thought of as structured in 3 groups:

  • There is basic reference data describing the bathing waters and sampling points
  • There is a data set giving the rating for each bathing water for each year it has been monitored
  • There is a data set giving the detailed weekly sampling results for each bathing water

This data is represented in RDF data using the following namespaces and prefixes:

General Reference Data

Bathing Waters

Each bathing water is identified by two URIs fo the form:

  1. http://environment.data.gov.uk/id/bathing-water/{EU bathing water id}
  2. http://environment.data.gov.uk/id/bathing-water/{name}

The first of these has a last segment based on an EU bathing water identifier.  The second has a last segment based on the name of the bathing water.

Each bathing water has the following properties:

rdf:type
  • the value of this property is bw:BathingWater
skos:prefLabel
  • the preferred name of the bathing water
rdfs:label
  • a name for the bathing water
 skos:notation
  • the EU bathing water ID for the bathing water
  • the value of this property has a datatype of bw:eubwid
bw:eubwidNotation
  • the EU bathing water ID for the bathing water
  • this is a subproperty of skos:notation
  • the value of this property has a datatype of bw:eubwid
loc-sp:samplingPoint
  • the sampling point for this bathing water
ref:uriSet
  • the URI set this bathing water is a member of.
owl:sameAs
  • relates one URI for the bathing water to the other

 Sampling Points

Each sampling point is identified by a URI of the form:

and has the following properties:

skos:prefLabel
  • the preferred human readable text label for representing the sampling point
rdfs:label
  • a human readable text label that may be used to represent the sampling point
skos:notation
  • a code identifying the sampling point
  • the value of this property has a datatype of loc-sp:samplePointCode
loc-sp:samplePointNotation
  • a code identifying the sampling point
  • this property is a subproperty of skos:notation
  • the value of this property has a datatype of loc-sp:samplePointCode
bw:bathingWater
  • the bathing water associated with this sampling point
geo:lat, geo:long
  • the latitude and longditude of the sampling point
  • the values of these properties have an xsd:decimal datatype
ossr:easting, ossr:northing
  • ordinance survey easting and northing coordinates of the sampling point
  • the values of these properties have an xsd:decimal datatype
ref:uriSet
  • the URI set that this sampling point is a member of.

Compliance Classification

Compliance classifications are modelled as RDF resources and described using the SKOS vocabulary.  They have URIs fo the form:

They have the following properties:

rdf:type
  • skos:concept and bwq:Compliance
rdfs:label
  • text that may be used to represent the concept
rdfs:isDefinedBy
  • identifies a resource defining the concept
skos:prefLabel
  • preferred text label for the concept
  • Welsh and English language variants are present
skos:inScheme
  • the SKOS concept containing the concept with value bwq:compliance
skos:definition
  • a definition of the concept
skos:topConceptOf
  • indicates that this concept is a top level concept
  • value is bwq:compliance
skos:notation
bwq:complianceCodeNotation
dcterms:source
  • a resource from which which the compliances codes were derived

The following compliance codes are defined:

bwq:G
  • Excellent
bwq:I
  • good
bwq:F
  • poor
bwq:C
  • the bathing water was closed during the bathing season
bwq:N
  • not classified

 

 Data Sets

There are two datasets, the annual compliance dataset and the samples dataset.  Each of these is modelled as an n-dimensional matrix using the data cube vocabulary

Annual Compliance Assessment Dataset

The annual compliance assessment dataset has two dimensions, the year and the sampling point.  The data set is identifyied by the URI:

The dataset resource has the following properties:

rdf:type
  • bwq:ComplianceDataset
  • void:Dataset
rdfs:label
  • text that may be used to identify the dataset
dcterms:description
  • a description of the dataset
dcterms:modified
  • that data and time the dataset was last modified
  • the value of this property has datatype xsd:dateTime
dcterms:license
  • the license under which the data has been made available
dcterms:source
  • a reference to the source from the which the RDF data was obtained
void:vocabulary
  • reference to an RDFS or OWL vocabulary used in the dataset
  • there are multiple statements with the property
void:uriRegexPattern
  • a regular expression which describes the form of URIs identifying resources in the dataset
void:dataDump
  • a reference to a representation of all the data in the dataset
qb:structure
  • a reference to a resource that describes the structure of the dataset
  • see the data cube vocabulary documentation for information on the properties of a dataset description.
qb:slice
  • a reference to a subset of, or slice through, the dataset

 

Slices

The annual compliance assessment dataset has two kinds of slices or subsets of the dataset:

all the observations for a specific sampling point http://environment.data.gov.uk/data/bathing-water-quality/compliance/slice/point/{id}
all the observations for a specific year http://environment.data.gov.uk/data/bathing-water-quality/compliance/slice/year/{year}

Each slice is a resource with the following properties:

rdf:type
  • identifies the type of slice, one of
    • bwq:ComplianceBySamplingPointSlice
    • bwq:ComplianceByYearSlice
rdfs:label
  • a text label identifying the slice
qb:sliceStructure
qb:observation
  • references to observations in the slice
bwq:samplingPoint or bwq:sampleYear
  • specifies the dimension value that is constant for all observations in this slice

Annual Compliance Assessment Observations

The value of each cell in the dataset matrix is the compliance code for given sampling point and year.  Following the data cube vocabulary model, each cell in the matrix is a resource identified by a URL of the form:

  • http://environment.data.gov.uk/data/bathing-water-quality/compliance/point/{sampling point id}/year/{year}

 Each such observation has the following properties:

rdf:type
  • qb:Observation and bwq:ComplianceAssessment
rdfs:label
  • text that may be used to identify the assessment
bwq:samplingPoint
  • the sampling point where the observation data was obtained
dcterms:source
  • A reference to a row in a spreadsheet from which the RDF data was obtained
bwq:bathingWater
  • the bathing water assessed
bwq:sampleYear
  • the year of the assessment
bwq:complianceClassification
  • the compliance classification, an instance of bwq:Compliance
qb:dataSet
bwq:inYearDetail
  • a reference to the data on which the compliance assessment is based
  • the value is a reference to a slice of samples dataset described below.

In-Season Sample Assessment Dataset

The in-season sample assessment dataset has three dimensions:

  • the year in which the sample was taken
  • the week in which the sample was taken
  • the sampling point at which the sample was taken.

Each observation has 6 measures:

  • total coliform count
  • faecal coliform count
  • faecal strptococci count
  • entrovirus count
  • salmonella present
  • sample classification

Each observation can also have the following attributes:

  • the time the sample was taken
  • whether there was an abnormal weather exception
  • a total coliform count qualifier
  • a faecal coliform count qualifier
  • a faecal strotococci qualifier
  • an entrovirus qualifier

The data set has the URI:

The dataset resource has the following properties:

rdf:type
  • bwq:SampleDataset
  • void:Dataset
rdfs:label
  • text that may be used to identify the dataset
dcterms:description
  • a description of the dataset
dcterms:modified
  • that data and time the dataset was last modified
  • the value of this property has datatype xsd:dateTime
dcterms:license
  • the license under which the data has been made available
dcterms:source
  • a reference to the source from the which the RDF data was obtained
void:vocabulary
  • reference to an RDFS or OWL vocabulary used in the dataset
  • there are multiple statements with the property
void:uriRegexPattern
  • a regular expression which describes the form of URIs identifying resources in the dataset
void:dataDump
  • a reference to a representation of all the data in the dataset
qb:structure
  • a reference to a resource that describes the structure of the dataset
  • see the data cube vocabulary documentation for information on the properties of a dataset description.
qb:slice
  • a reference to a subset of, or slice through, the dataset

Slices

The in-season sample assessment dataset has the following kinds of slices through the data:

samples for a given sampling point
samples for a given week
samples for a given year
samples for a given year and sampling point
latest samples for each sampling point

Each slice is a resource with the following properties:

rdf:type
  • qb:Slice
  • identifies the type of slice, one of:
    • bwq:BySamplingPointYearSlice
    • bwq:BySamplingPointSlice
    • bwq:ByWeekSlice
    • bwq:LatestSampleSlice
    • bwq:ByYearSlice
rdfs:label
  • a text label identifying the slice
qb:sliceStructure
qb:observation
  • references to observations in the slice
bwq:samplingPoint or bwq:sampleYear or bwq:
  • specifies the dimension value that is constant for all observations in this slice

In-Season Sample Assessment Observation

In the in-season sample assessment data set, each cell in the dimensional matrix has the 6 measure named above and associated attributes.  Following the data cube vocabulary model, each cell in the matrix is a resource identified by a URL of the form:

 The record date is the date at which the information was published.  If a sample is reanalysed and different results published, a new observation with a different record date will be created.

Each observation has the following properties:

rdf:type 
  • qb:Observation and bwq:SampleAssessment
 rdfs:label
  • text that may be used to identify the assessment
 bwq:samplingPoint
  • the sampling point where the observation data was obtained
 dcterms:source
  • A reference to a row in a spreadsheet from which the RDF data was obtained
 bwq:bathingWater
  • the bathing water assessed
 bwq:sampleDateTime
  • the time the sample was taken
  • the value is a reference to a resource, not a typed literal.
 bwq:sampleYear
  • the year of the assessment
  • the value is a reference to a resource, not a literal.
 bwq:sampleWeek
  • the week in which the sample was taken
  • the value is a reference to a resource
 bwq:faecalColiformCount
  • The number of colonies of faecal coliform per 100ml water sample.
  • the value is has datatype xsd:integer
 bwq:faecalColiformQualifier
  • >, < or actual qualifier for Faecal Coliform Count
 bwq:totalColiformCount
  • The total number of colonies of coliform per 100ml water sample.
  • the value has datatype xsd:integer
 bwq:totalColiformQualifier
  • >, < or actual qualifier for Total Coliform Count
  • the value of this property is an instance of bwq:CountQualifier
 bwq:faecalStreptococciCount
  • The number of colonies of faecal streptococci per 100ml water sample.
  • the value has a datatype of xsd:integer
 bwq:faecalStreptococciQualifier
  • >, < or actual qualifier for Faecal Streptococci Count
  • the value of this property is an instance of bwq:CountQualifier
 bwq:entrovirusCount
  • The number of colonies ofentro virus per 100ml water sample.
  • the value has a datatype of xsd:integer
 bwq:entrovirusQualifier
  • >, < or actual qualifier for Entro Virus Count
  • the value of this property is an instance of bwq:CountQualifier
 bwq:salmonellaPresent
  • An indicator of the presence of salmonella a water sample
  • the value of this property is an instance of bwq:Presence
 bwq:complianceClassification
  • the compliance classification of the sample
  • the value of this property is an instance of Compliance
 bwq:abnormalWeatherException
  • were the results affected by abnormal weather conditions, floods or other disasters
  • the value has a datatype of xsd:boolean
 qb:dataSet
 rdfs:comment
  • A comment on the observation.
 dcterms:created
  • date this resource was created
  • the value has a data type xsd:dateTime

 Count Qualifiers

The observation properties bwq:faecalColiformQualifier, bwq:totalColiformQualifier, bwq:faecalStreptococciQualifier and bwq:entrovirusQualifier specify how their corresponding count properties should be interpretted.  The values of these properties are instances of bwq:CountQualifier and are defined as SKOS concepts.  The following bwq:CountQualifiers are defined:

bwq:moreThan
  •  Indicates that the actual count value is more than the value given; that it exceeds a detection/measurement upper bound.
bwq:lessThan
  • Indicates that the actual count value is less than the value given; that it is less than a detection/measurement lower bound.
bwq:actual
  • indicates that the actual count value is as given.

Presence

The presence or absence of a substance may be indicated by an instance of bwq:Presence which has a richer set of values than just true or false.  Instances of bwq:Presence are SKOS concepts.  The following instance of bwq:Presence are defined:

bwq:present
  •  an assessement has detected the presence of some characteristic
bwq:not-present
  •  an assessement has not detected the presence of some characteristic
bwq:not-accessed
  •  an assessment of the presense of some characteristic has not been made