February 27, 2015 Leave a comment
Here at Thetus, we use the OWL Web Ontology Language to create our semantic knowledge models. Engineers working with OWL typically edit their models with tools like Protégé or TopBraid Composer, but sometimes you want or need to create a model by hand. Semantic Modelers at Thetus often edit ontologies by hand because our knowledge modeling engine, Publisher, uses our own in-house OWL serialization syntax, known as Thetus Markup Language (TML). TML is an XML syntax that was created as a friendlier alternative to RDF/XML.
Maintaining a proprietary serialization syntax and the toolchain to support it is a lot of work, so we’ve begun to consider alternatives. As the Semantic Web has matured, many serialization syntaxes for RDF and OWL have been proposed, and choosing one can be a little overwhelming. For handcrafted ontologies, many Semantic Web veterans recommend using Turtle, but they don’t often explain why it’s the best choice. Working with our own syntax has given modelers at Thetus some strong opinions about what we were looking for in a new one, and since we weren’t very familiar with many of the available syntaxes, we decided to do some research and draw our own conclusions.
The modeling team at Thetus reviewed about a dozen syntaxes for this effort: the OWL Functional Syntaxes, OWL/XML, RDF/XML, Manchester Syntax, Turtle, N-Triples, JSON-LD, Notation 3 (N3), TriG, TriX, and N-Quads. With the help of various syntax conversion tools, we converted some ontologies we’ve worked with to each of these syntaxes so we could assess their readability. We also spent a lot of time reviewing the documentation for each syntax and scouring the web for expert opinions on them. We developed a set of criteria for evaluating syntaxes, and then weighted the criteria and ranked them.
Our criteria for editing models by hand fell into two main categories. First, the syntax should be easy for humans to read and write. Specifically, during the review, four human-friendly qualities stood out in particular:
- The syntax provides an alternative to using absolute IRIs, such as namespace prefixes or relative IRIs.
- The syntax doesn’t force the author to write flat triples, offering features such as inline or nested blank nodes, collections, or lists.
- The syntax provides a shorthand for literals and common predicates.
- The syntax is not XML, as many people do not enjoy typing numerous angle brackets.
Second, the syntax should have good support from existing Semantic Web tools, such as Apache Jena, the OWL API and various RDF and OWL conversion tools, because you don’t want to invest a lot of effort into building a model only to find that existing tools can’t read it. Since we were focused on hand-editing models, we didn’t take into consideration whether Protégé or TopBraid Composer supported the syntaxes.
Given the human-friendly qualities above, it became easy to rule out several syntaxes immediately. We learned that four of the syntaxes are closely related: Turtle, N3, N-Triples and TriG, which we began to refer to collectively as the Turtle family. Of the Turtle family, N-Triples lacks the first three qualities, so it was easy to rule out.
Of the other syntaxes, N-Quads also lacks the first 3 qualities. OWL/XML and TriX don’t do so well on the second quality because they lack nesting features. All three of the normative syntaxes (OWL Functional, OWL/XML and RDF/XML) lack shorthand features. OWL/XML, RDF/XML, and TriX were further ruled out because they are all XML formats. And if avoiding XML is important, JSON-LD also starts to look less attractive, because it requires typing lots of curly braces and is somewhat verbose.
This leaves us with three of the four Turtle family syntaxes—Turtle, N3, TriG—and the Manchester Syntax. In our weighted ranking, they were in a four-way tie for first place across the human-friendly ease of use qualities, with good support for all four of our desired qualities and only minor differences in their coverage of the third quality.
We used our second main category of criteria—tool support—to break this tie, which is how Turtle ended up leading the pack. It has the most support of any serialization syntax other than RDF/XML , which is the only syntax that OWL 2 tools are required to support. Additionally, if you are a SPARQL user, familiarity with Turtle is useful because it has a great deal of overlap with the syntax of SPARQL’s WHERE clause.
So now you know why Thetus thinks Turtle is the best syntax for writing ontologies by hand! What do you think?
~ Marijane White, Principal Engineer