Tuesday September 8, 2009

Safety Nets for OWLs

I've been programming for several decades, but I'm relatively new to ontology development in general and OWL in particular. So, I'm certainly not an expert on the range of work in this area. However, I think I see some areas where programming best practices and tools could provide useful "safety nets" for OWL-based ontology developers.

Substantial development projects (whether in programming or ontology development) quickly pass beyond the point where a developer can remember every detail, let alone comprehend the implications of a given change. Like metaprogramming in dynamic programming languages (eg, Perl, Python, Ruby), the use of reasoners and inference in OWL provides great expressive power. However, a small-looking change can have far-reaching results.

If multiple developers are involved, things get even more treacherous. Details may be lost or mis-communicated. Changes may interact in unforeseen ways. Finally, if an external client is involved, another level of coordination is needed.

Best Practices

So, programmers have developed assorted best practices (and supporting tools) that provide at least partial solutions for these concerns. Some of these practices are nearly universal; others are used mostly in the agile software development and dynamic programming language communities:

Design Patterns

Most modern programmers and many ontologists are already familiar with design patterns, but others may not be, so here is a summary:

Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.

-- Christopher Alexander, "A Pattern Language"

If a problem matches a particular design pattern, using the pattern can reduce both effort and risk. Design patterns also aid system documentation, because the developer can simply mention the pattern(s) being used. Of course, a pattern may be used inappropriately or incorrectly, but documenting the name of the pattern may even help here, by allowing readers to check the implementation against the intent.

Semantic Web for the Working Ontologist (Allemang and Hendler; Morgan Kaufmann) is the first book I found that met my needs as an aspiring ontologist. It uses (admittedly simple) design patterns as a tool for teaching modeling, ontology creation, etc. Semantic Web Programming (Hebeler, et al; Wiley), another fine book, also has a chapter on "Semantic Web Patterns and Best Practices".

Ontology Design Patterns is a Semantic Web portal dedicated to ontology design patterns (ODPs). The portal, which was started under the NeOn project, collects, evaluates, organizes, and publishes ODPs. In a related effort, research is being done to evaluate the effectiveness of ODP use. For example, I recently heard Eva Blomqvist speak at K-CAP 2009 about Experiments on pattern-based ontology design. As one might suspect, the results are encouraging: "... ontology quality is improved, coverage of the task increases, usability is improved, and common modeling mistakes can be avoided".

Revision Control

There are dozens of revision control systems, with varying architectures and feature sets. What they have in common is the ability to "snapshot" the state of a project, so that it can be inspected and/or recovered at some future time. Typically, each snapshot includes a "commit" message, giving the developer's thoughts on significant changes that the snapshot contains.

Git is a representative example of the state of the art in this area. As a distributed revision control system, it supports the use of multiple "repositories". Developers can create new repositories at will, modify them, then (as appropriate) merge them back together. GitHub, a popular support site, currently supports more than 100,000 repositories.

Clearly, some of these capabilities could benefit ontologists, if the implementation details could be worked out. My own partly-baked idea (PBI) would be to give each OWL-based ontology editor (eg, Protégé) the capability to save (and import) text-based snapshots of the asserted ontology. Ideally, these should be:

  • canonical - serialized in a consistent manner, to allow comparisons

  • high-level - ignoring implementation details (eg, RDF) where possible

  • human-readable - commented and pretty-printed for ease of comprehension

  • standardized - to allow exchange between different ontology editors
Some of these criteria (eg, standardization) can be deferred for the moment, in the interest of getting something in place for experimentation, etc. In the longer-term, however, all of these attributes (and probably more) would be nice to have.

Testing and Continuous Integration

There are various, overlapping types of software testing. However, all of them use the computer to check the software's behavior and report unexpected results. Here is a partial catalog:

  • Unit Testing tests small portions (ie, units) of the source code in isolation.

  • Integration Testing tests whether the units work together properly.

  • Acceptance Testing tests whether the source code, as a whole, performs adequately.

  • Regression Testing tests whether the system has regressed (ie, changed) to previous, undesirable behavior.
By creating and using sets of tests, developers can ensure that no tested behavior will fail without notice. This is far from a complete guarantee of perfection, but it is also far from useless. Suites of regression tests provide a valuable "safety net" for development (eg, debugging, feature addition, refactoring). Unit tests, as discussed below, can be usefully integrated into the design and implementation process.

Many development efforts have found testing so useful that they choose to have it performed on a continuous basis. This practice (and more) is embodied in continuous integration (CI). Each time a developer commits a change, the CI suite runs a battery of tests. If an error is detected, developers are notified immediately.

In an ontology-development environment, a CI system might run a series of queries against the ontology whenever a reasoner reported a clean result. This would alert the ontologist to new or changed axioms which modify the expected behavior of the ontology.

Test- and Behavior-Driven Development

In Test-Driven Development (TDD), tests are always written first. The newly-added tests, which check for desired behavior, are then run. These tests should fail, confirming that they check for an unimplemented behavior. The developer then modifies the software until all the tests pass. Finally, the developer checks for "code smells" (things that work, but are not cleanly implemented).

Behavior-Driven Development (BDD) extends upon TDD, bringing the "client" into the process. In BDD, the developer and client develop and agree on a desired set of behaviors, encoded in a constrained natural language. The developer then creates the needed code, testing it against the agreed-upon specification.

When the tests pass, the developer shows the results to the client. This may result in acceptance or a further refinement of the specification. In any case, the client is part of the process and the agreed-upon criteria are mechanically guaranteed to be met.

It strikes me that the relationship between a client and a programmer is quite analogous to that between a domain expert and an ontologist. If something like BDD could be used in ontology development, any number of mis-communications might be detected and eliminated.

Getting There...

The programming community has already pioneered the practices described above. In fact, much of the infrastructure they have developed might be applicable to OWL-based ontology development. So, I don't believe that there are any particularly difficult technical issues.

The real difficulties will lie in changing people's attitudes and behavior. Getting ontologists (let alone domain experts) to accept these practices may be an uphill battle. However, the evidence from the software development arena is pretty compelling, so I think the attempt is worthy of consideration.

Safety Nets for OWLs in Computers , Ruby , Semantic Web , Technology - posted at Tue, 08 Sep, 16:41 Pacific | «e» | TrackBack


I completely agree that this is the way to go, and at the moment too little communication is going on between software developers and ontology engineers. In some cases, things are even re-invented for ontology engineering, while things may actually more or less be readily available to be reused from software engineering practices/tools/methods.

To complement what you write above, I can give some additional hints to related work in the ontology engineering community:

With respect to unit tests for ontologies, there has been some work done on this, see for example the paper by Vrandecic and Gangemi from 2006 about ontology unit tests: There is also a Protégé plug-in supporting the management of unit tests: Unfortunately I never tried it.

Another line of work that is ongoing, which I only very briefly mentioned in my talk at K-CAP, is the development of methodologies similar to the TDD you mention. The method suite we are developing is called XD (eXtreme Design... not to be confused with extreme programming...) and is based on modeling using a divide&conquer paradigm, where patterns are used to solve small partial problems, and then the solution is composed from integrating and extending these parts. Each small problem is associated to a unit test, and all accumulated tests are run after each change is made. Hopefully there will be a paper about XD in the Workshop on Ontology Patterns (WOP) at ISWC in Washington later this year.

Let's keep up the discussion, and try not to re-invent so many wheels in ontology engineering!

You preface this referring to OWL-based ontologies, but a lot of this applies to models in RDFS or SKOS as well. Version control in particular is quite an issue for controlled vocabularies as e.g. are done in SKOS.

At least some ontology editing tools, in particular those based on Eclispe like TopBraid Composer, already have integration to SVN and CVS through the Eclipse connection. Even for non-eclipse based environments, this should be possible using a plug-in mechanism. TopQuadrant has provided some elementary 'diff' functions aimed directly at OWL models; Protege has some things like this as well.

In my contact with people doing vocabulary management (which is not quite the same thing as ontology development, but there are a lot of procedural things that are similar), there is a related issue that I call version governance that they are quite interested in. Its relationship to version control is subtle; suffice it to say that there are some different teamwork-based requirements for vocabulary management in contrast to code management.

As far as testing goes, my colleague Ralph Hodgson has done a lot of this sort of work with NASA. I have encouraged him to blog about it - it really has to do with measures that can be used as part of a QA process for ontology development in very large projects (dozens of geographically separated modelers working on hundreds of ontology modules).

Post a comment

Note: All comments are subject to approval. Spam will be deleted before anyone ever sees it. Excessive use of URLs may cause comments to be auto-junked. You have been warned.

Any posted comments will be viewable by all visitors. Please try to stay relevant ;-) If you simply want to say something to me, please send me email.