Topic Maps and the Semantic Web

People are continually asking me to summarize the relationship between Topic Maps and the Semantic Web, and since there is nothing really succinct to point to, I thought I would write up some of my thoughts here. Don’t expect an exhaustive or scientific account; this is just my view from 30,000 feet, but some people may find it interesting. I plan to follow up later with more details and further observations.

The relationship between TM and RDF (the core Semantic Web technology) has interested me for several years. I was instrumental in getting Topic Maps approved as an ISO standard back in 1999, the same year the RDF specification became a W3C recommendation. I published my first thoughts on the relationship between the two in June 2000 (Topic Maps and RDF: A first cut) and I have contributed to the debate off and on since then.

My first cut was refined two years later as Ten Theses on Topic Maps and RDF, a piece that was partly informed by the work of Lars Marius Garshol in his ground-breaking paper Topic Maps, RDF, DAML, OIL – A comparison. Later I chaired the W3C RDF/Topic Maps Interoperability Task Force (RDFTM), which was charged with “providing guidelines for users who want to combine usage of the W3C’s RDF/OWL family of specifications and the ISO’s family of Topic Maps standards.” That committee produced two useful publications – a Survey of RDF/Topic Maps Interoperability Proposals and draft Guidelines for RDF/Topic Maps Interoperability – before being disbanded in late 2006 due to a reorganization of the W3C Working Group to which it belonged.

My take has always been that despite superficial similarities, RDF/OWL and Topic Maps are optimized for radically different purposes; they complement each other and we need both. Our efforts should focus on identifying synergies and enabling interoperability; and, to the extent that we discuss which of them is “best,” the discussion should be framed in terms of suitability for some (particular) purpose.

Background

First, some background. RDF and Topic Maps evolved in parallel during the 1990s within two communities that were scarcely aware of each other’s existence:

RDF (Resource Description Framework)

Evolved out of MCF and PICS with input from the Dublin Core community. Became a W3C recommendation in 1999. Its original purpose (as the name implies) was for assigning descriptive metadata to web pages and web sites (“resources”). Over time the concept of “resource” was generalized to mean “anything with identity” and RDF became a model for making assertions about arbitrary subjects.

Topic Maps

Originated as an application of HyTime for representing back-of-book indexes for the purpose of merging and other forms of processing. Evolved into a general, subject-centric model for capturing the “aboutness” of information resources in the form of networks of interconnected “topics,” “associations” and “occurrences”. Approved as an ISO standard in 1999.

The two specifications hit the streets within a few weeks of each other and quickly became hot topics within the XML community. At Extreme Markup 2000 in Montréal there was an entertaining free for all in which Eric Miller (representing RDF) and Eric Freese (representing Topic Maps) appeared on the podium dressed as boxers to slug it out on behalf of their respective specs. (In fact, they pretty much agreed on everything, and produced a useful initial list of correspondences between Topic Maps and RDF.)

In his closing keynote, Michael Sperberg-McQueen wondered if we really needed both RDF and Topic Maps and playfully suggested that representatives of the two communities be locked in a room and not let out until they had agreed on a single specification. (It’s a good job no-one took him seriously, or we would probably still be there!)

Similarities

So what are the similarities? I won’t go into detail, just list them as I did in a lightning talk at the W3C Technical Plenary in 2005:

  • Both “extend” XML into the realm of semantics

  • Both allow assertions to be made about subjects in the outside world

  • Both are very concerned with identity

  • Both define abstract, associative (graph-based) models

  • Both have XML-based interchange syntaxes (and simpler, text-based syntaxes)

  • Both allow some measure of inferencing or reasoning

  • Both have constraint languages and query languages

A lot more could be said on all of this, but right now I’m more interested in bringing out the differences.

Differences

In the lightning talk I confined myself to four fairly general areas in which I think RDF and Topic Maps are differ significantly:

  • They have different roots (TM in traditional finding aids, such as indexes, thesauri and the like; RDF in document metadata and predicate logic).

  • They have different levels of semantics (RDF is more low level; TM has higher level semantics built directly into the model).

  • When you dig into the details you discover significantly different models (viz., identity, scope, association roles, non-binary relationships, variant names, etc.).

  • Perhaps most importantly, they have significantly different goals (RDF and OWL are positioned as enablers for large-scale data integration and/or an “artificially intelligent” web for software agents; Topic Maps is all about findability and knowledge federation for humans).

I admit that these characterizations are somewhat nebulous, but that doesn’t mean they are not real: it’s just hard to pinpoint the exact nature of the differences. Since the lightning talk I’ve tried using metaphor and aphorisms to convey a feel for the differences, as in:

  • RDF/OWL is for machines; Topic Maps is for humans.

  • RDF/OWL is optimized for inferencing; Topic Maps is optimized for findability.

  • The great strength of RDF/OWL is that it is based on formal logic; the great strength of Topic Maps is that it is not based on formal logic.

  • RDF/OWL is to mathematics as Topic Maps is to natural language.

  • RDF/OWL is to Aristotle as Topic Maps is to Wittgenstein.

(That last one is for the philosophically inclined; I’ll come back to it later.)

These statements are not meant to be taken literally, but my experience is that they can be helpful. If you want a more detailed or scientific account of the differences – in particular, the differences between the two models and how to map between them – I recommend starting with the RDFTM documents mentioned above. These also contain substantial bibliographies.

When to use what?

Some folks are not particularly concerned with the differences; they just want to know when to use what. Here’s my take – once again, rather high level, but hopefully still useful – based on the following general premisses:

  1. RDF is more low-level; oriented towards machines

  2. Topic Maps is more high-level; oriented towards humans

  3. OWL is oriented towards artificial intelligence

On this basis I offer the following rules of thumb (my 2 cents, as they say; feel free to differ):

  • Do you only want to encode document metadata?

    — RDF is ideal and you won’t need OWL.

  • Do you want to achieve subject-based classification of content?

    — Topic Maps provides the better combination of flexibility and human-friendliness.

  • Do you want to combine metadata and subject-based classification?

    — Go straight for Topic Maps, because it also supports metadata.

  • Do you want to encode knowledge for humans to use?

    — Topic Maps is the better solution for handling context and “fuzziness”.

  • Do you want to develop agent-based applications?

    — You are probably better served by RDF (with OWL), but if you already have Topic Maps, you’re half way there.

Whatever you choose, you can sleep soundly in the knowledge that data can be moved from RDF to Topic Maps (and vice versa), thanks to the RDFTM interoperability work.