I was at the Bristol Vocamp recently (a fun event) and there was a lot of discussion around issues of validating RDF and checking conformance with lightweight RDF vocabularies. There was some talk about constraint expression languages but I suggested that OWL, and especially OWL 2, already do quite a lot of that. Couple that…
I was at the Bristol Vocamp recently (a fun event) and there was a lot of discussion around issues of validating RDF and checking conformance with lightweight RDF vocabularies. There was some talk about constraint expression languages but I suggested that OWL, and especially OWL 2, already do quite a lot of that. Couple that to closed-world validation tools in the style of Eyeball and we already have quite a lot of what is needed. In the end I did a brief summary of what’s new in OWL 2 and how it might affect people who focus mostly on RDF/RDFS. This went down well enough I promised to write up the talk. Here’s a first stab at that.
Warning: I’m not an expert on OWL 2, though I have done some work with implementing the OWL 2 RL profile. If you want the real story then go look at the specs 🙂
OWL issues for RDF vocabularies
OWL is a compromise. It’s a compromise between the people who want to write down complex ontologies (and so want maximum expressivity) and those who want tractable, even efficient, reasoners who need some constraints on the language. It’s also a compromise between the Description Logic community who have great reasoning technology, but need “neat” constraints on the language to be able to employ it, and the RDF community used to “scruffy” freedom. The compromise means that there are multiple flavours of OWL. There is the Description Logic flavour (DL) that constrains how you use the OWL vocabulary, but allows for complete reasoning support, and the unconstrained RDF semantics (Full).
Early in the lifetime of OWL some owners of RDF vocabularies did try to use OWL to express additional constraints on their vocabularies. For example Dan and Libby added some OWL constructs to foaf. However, people found they tended to end up in OWL Full instead of OWL DL and/or couldn’t express what they wanted. So what were the limitations on using OWL for RDF vocabularies? The main ones were:
- Strict separation of ObjectProperties (properties that point to other resources) and DatatypeProperties (properties that point to literal values). For example, Dublin Core allows dc:creator to denote the name of an author as a string or point to a resource such as a FOAF description. That’s not allowed within OWL DL.
- Strict separate of meta-levels. In some RDF vocabularies you want to annotate your classes and properties with other information such as hints to a UI or data generator. In OWL DL you can have annotations but they have no semantics, which means that you aren’t allowed to add things like range axioms to your annotation properties. Whereas within Full you could say “this annotation should be an integer”.
- Some key limits on expressivity. In particular you can’t define a key. In foaf you can say state things like your IM Chat ID or the SHA1 hash of your mailbox, which ought to be enough to uniquely identify you. In OWL there is the notion of an InverseFunctionalProperty which looks just like it should allow you to say that, for example, foaf:aimChatID is a unique key. Except that within OWL DL it is only applicable to ObjectProperties, you can’t use it on literal-valued property.
- Complex to understand. One cost of the compromise is that people found the whole hierarchy of OWL DL, OWL Lite, OWL Full and its relationship to RDFS confusing. The number of non-specialists prepared to read the model theories might have been a bit limited too. Which in turns means there’s a pretty high barrier to implementation.
OWL 2 is a major set of extensions and, mostly, improvements to OWL which solve at least some of these problems and introduce additional features that are useful to people from the RDF side of the house.
What’s new in OWL 2
First let’s be clear – the complexity problem hasn’t been solved! This is a language with four different syntaxes (not counting the different RDF syntax versions) specified across 13 different documents. So don’t expect anything I say here to be complete or definitive. Though if some part is actually wrong then please tell me about it.
However, buried in this complexity is some stuff I think is useful, for the types of applications I work on, which I’ll pick out.
OWL had an abstract syntax to help with writing the specs but all OWL ontologies were expressed via RDF. In OWL2 life is more complicated. There is still the moral equivalent of the abstract syntax, now called the functional syntax. There is still an RDF syntax, and the good news is that it is fully backward compatible. However, there is also another textual syntax called the Manchester syntax and an XML-but-not-RDF/XML syntax called OWL/XML. Since OWL can be used for writing down RDF facts as well as all this ontology stuff, that means the world now has at least two new RDF syntax forms. Though in practice I doubt if they’ll get much tool support or uptake for pure RDF usage.
There are lots of new axioms you can express in OWL 2, all motivated by some applications, but several of them are especially useful in an RDF setting …
Qualified Cardinality Restrictions. In OWL you could say that a Person has four limbs, that the value of limb is either of type Arm or type Leg, and even that a Person has some limbs that are Arms and some limbs that are Legs. However, you couldn’t say that they have precisely two limbs which are type Leg and two limbs which are type Arm. In OWL 2 you can. You can combine cardinality restrictions with local range types. This is big deal for things like medical ontologies. The QCR axioms were at one point in the OWL (1) drafts but got left out on complexity grounds, this was seen in some quarters as a bit of a mistake and getting them back in again was quite a high priority for the OWL 2 group.
Keys. These solve the problem of using literal-valued properties to identify resources. With OWL2’s owl:hasKey you can give a list of properties, both object- and literal-valued properties, that together identify resources of a given type.
Chain properties. This is an interesting extensions which allows OWL 2 to derive uncle from the combination of parent and brother. You can say that a chain of properties, when composed together, imply another property.
Property axioms. OWL 1 had several property types and property axioms – Functional, Symmetric, inverseOf etc. In OWL 2 there are more – Reflexive (so that x R x is always true), Irreflexive (so that x R x is never true), Asymmetric (so that you can’t have both x R y and y R x).
Negative assertions. This is a strange one from an RDF point of view. In OWL 2 you can assert that a certain fact does not hold – that triples like (foo prop val) or (foo rdf:type Bar) are not true. From an RDF point of view this is a big step. It implies that RDF toolkits ought to implement a full three valued logic – yes, no, don’t know – so you can distinguish between triple not present and triple asserted to not be true. From OWL point of view this is just syntactic sugar for something that was already expressible in OWL 1 if you knew the tricks.
Syntactic sugar for disjointness. There also some syntactic sugar to make it easier to say a set of classes are mutually disjoint or that they are both disjoint and together cover some specific larger set.
Punning and annotations
So there’s a few interesting new things that you can state with OWL2. What about all the issues of having to cleanly separate object- and literal- valued properties, to separate meta-levels, to be careful with annotations?
Here there is good news and bad news.
There is a notion in OWL 2 of punning. Roughly this means that the same identifier can be used as if it denoted, for example, both an individual and a class. However, the way this works is that it is as if there were really two different entities; entities that are not really connected but just happen to have the same looking name. So things you say about the individual-nature of some X don’t affect the class-nature of that X, and vice versa. In terms of the OWL semantics this punning works. In terms of web architecture and using URIs to denote things it makes some people uncomfortable.
This punning means that you can do things like treat :Eagle as separately denoting both a class of Birds and an individual of the class :Species. This can be quite useful for some sorts of modelling. It also means that you can say more about annotations. In particular there are new constructs for declaring the domain/range of annotation properties.
The bad news is that this punning doesn’t extend as far as allowing you to mix Object- and Datatype- properties. So one of the biggest limitations on OWL (1) DL for working with RDF vocabularies is still there. I think in part this is due to problems with having backward compatible syntax, certainly the early proposals for allowing some property type punning did involve a non-monotonic RDF syntax which would have been an absolute nightmare. So if this is the cost of having a backward compatible and stable syntax then it seems to me like a price worth paying. Shame though.
This bit is potentially pretty significant for RDF validators.
With OWL 1, and indeed RDFS, when you wanted to validate some literal values you could always use XSD constraints “out of band”. That is, use XSD to define some new datatype like integers between 1 and 42 and then declare that new datatype as the range of your property. The tie up between the URI you use for that datatype in your RDFS declaration and the file where you specified the datatype restrictions was a matter of convention rather than part of the specs. Nevertheless at least Jena supported it, so long as you explicitly loaded in the schema definitions you wanted.
OWL 2 has add a whole lot of machinery which, at least to me, seems to move the XSD data restriction machinery wholesale into OWL. Specifically each of the OWL datatypes now has a set of facets, which correspond to the XSD notion of facets, and allow you to constrain the allowed values to a subset of the datatype’s value space. So that you can now define the range of a property as being only integers between 1 and 3 without having to step outside OWL at all. You can also create unions, complements and intersections of such dataranges – so you can have a value which is either between 1 and 3 or greater than 13 but not 42.
[You might think these datatype combinators weren’t needed. OWL already lets you define unions of classes after all. However, the OWL 2 punning also doesn’t extend as far as allowing punning between classes and datatypes.]
The final innovation in OWL 2 that I want to mention is the notion of Profiles.
With all the new axiom types in OWL 2 implementations are even more complex than for OWL 1 (and of a higher computational complexity class) whereas for some usages people would forgo some expressivity to get easier to implement or faster to run inference engines. OWL 2 profiles do this for you. There are three defined profiles of OWL 2 which each conform to the same semantics but limit what parts of the syntax you can use in return for better performance. These are EL, QL and RL.
EL is particularly good for cases where you have very big ontologies (lots of classes and properties) but they are not too complex.
QL is particularly suited to cases where you have lots of instance data (large “ABox” in the jargon). It allows the instance data to be stored in an RDBMS and accessed via a query rewrite that doesn’t require recursive queries.
RL is suited to implementation by rule based reasoners, including databases with deduction rule support such as datalog engines.
From an RDF point of view it is RL that is particularly interesting because it supports the RDF-based semantics.
OK I’ve skipped over something up to now. It is still the case that there is a two way split in OWL 2. There is the direct semantics, Description Logic friendly, an extension of OWL (1) DL. There also an RDF-based semantics, much like in OWL 1. The correspondence between the two is a little less direct than in OWL 1 but it’s there. The OWL 2 RDF-based semantics is then an upward-compatible extension of RDF and RDFS and means you can take an RDFS vocabulary and add a few useful bits from OWL 2 and know the exact semantics of what you’ve got. The question is, will that require a full blown theorem prover to reason with? What OWL RL does is provide a profile of OWL which can implemented via simple entailment rules at the RDF level and so gives you an RDFS-compatible fragment of OWL 2 which is likely to be pretty widely supported.
At least that’s the marketing story and it’s largely true but the story on OWL RL implementation and conformance is a bit more complicated than it seems. Subject for a future post!
I should admit that, as far as I can tell, the RL profile doesn’t include DatarangeRestriction support for making use of all those datatype facets and only supports intersection of Dataranges. But I’m sure that can be worked round eventually.
So going back to where we started. The addition of keys plus datatype facets makes a noticeable difference to what OWL 2 do for RDF vocabularies. The big limitation that remains is the need for strict separation of Datatype- and Object- properties for OWL 2 DL. However, for OWL 2 RL, with the RDF semantics, then you can lift that restriction (just as existing OWL (1) reasoners like Jena’s have always been OWL Full reasoners and can mix freely with unconstrained RDFS).
When it comes to validation then there is still the issue that open world and lack of Unique Name Assumption makes is hard to make use of cardinality constraints. This is the thing that most often trips up people new to OWL. That expect that if that say p has cardinality 1 on some class Foo and their data has a Foo with a missing p then the OWL validators will complain. They won’t. The declaration just means that semantically Foo does in fact have a p value, you just don’t know what it is yet. However, there is no problem at all with creating tools which make a closed world and unique name assumption for the purposes of data validation. They aren’t violating the OWL semantics, so long as they don’t purport to be doing OWL consistency checking, they are doing a different job but a useful one. We’ve had Jena’s Eyeball for a long time now and since it is openly extensible there’s nothing to stop someone adding some additional checkers to, for example, implement the OWL DatarangeRestrictions.
[This entry is a cross-post of an earlier personal blog post.]