Topic Maps and the Semantic Web

People are continually asking me to summarize the relationship between Topic Maps and the Semantic Web, and since there is nothing really succinct to point to, I thought I would write up some of my thoughts here. Don’t expect an exhaustive or scientific account; this is just my view from 30,000 feet, but some people may find it interesting. I plan to follow up later with more details and further observations.

The relationship between TM and RDF (the core Semantic Web technology) has interested me for several years. I was instrumental in getting Topic Maps approved as an ISO standard back in 1999, the same year the RDF specification became a W3C recommendation. I published my first thoughts on the relationship between the two in June 2000 (Topic Maps and RDF: A first cut) and I have contributed to the debate off and on since then.

My first cut was refined two years later as Ten Theses on Topic Maps and RDF, a piece that was partly informed by the work of Lars Marius Garshol in his ground-breaking paper Topic Maps, RDF, DAML, OIL – A comparison. Later I chaired the W3C RDF/Topic Maps Interoperability Task Force (RDFTM), which was charged with “providing guidelines for users who want to combine usage of the W3C’s RDF/OWL family of specifications and the ISO’s family of Topic Maps standards.” That committee produced two useful publications – a Survey of RDF/Topic Maps Interoperability Proposals and draft Guidelines for RDF/Topic Maps Interoperability – before being disbanded in late 2006 due to a reorganization of the W3C Working Group to which it belonged.

My take has always been that despite superficial similarities, RDF/OWL and Topic Maps are optimized for radically different purposes; they complement each other and we need both. Our efforts should focus on identifying synergies and enabling interoperability; and, to the extent that we discuss which of them is “best,” the discussion should be framed in terms of suitability for some (particular) purpose.

Background

First, some background. RDF and Topic Maps evolved in parallel during the 1990s within two communities that were scarcely aware of each other’s existence:

RDF (Resource Description Framework)

Evolved out of MCF and PICS with input from the Dublin Core community. Became a W3C recommendation in 1999. Its original purpose (as the name implies) was for assigning descriptive metadata to web pages and web sites (“resources”). Over time the concept of “resource” was generalized to mean “anything with identity” and RDF became a model for making assertions about arbitrary subjects.

Topic Maps

Originated as an application of HyTime for representing back-of-book indexes for the purpose of merging and other forms of processing. Evolved into a general, subject-centric model for capturing the “aboutness” of information resources in the form of networks of interconnected “topics,” “associations” and “occurrences”. Approved as an ISO standard in 1999.

The two specifications hit the streets within a few weeks of each other and quickly became hot topics within the XML community. At Extreme Markup 2000 in Montréal there was an entertaining free for all in which Eric Miller (representing RDF) and Eric Freese (representing Topic Maps) appeared on the podium dressed as boxers to slug it out on behalf of their respective specs. (In fact, they pretty much agreed on everything, and produced a useful initial list of correspondences between Topic Maps and RDF.)

In his closing keynote, Michael Sperberg-McQueen wondered if we really needed both RDF and Topic Maps and playfully suggested that representatives of the two communities be locked in a room and not let out until they had agreed on a single specification. (It’s a good job no-one took him seriously, or we would probably still be there!)

Similarities

So what are the similarities? I won’t go into detail, just list them as I did in a lightning talk at the W3C Technical Plenary in 2005:

  • Both “extend” XML into the realm of semantics

  • Both allow assertions to be made about subjects in the outside world

  • Both are very concerned with identity

  • Both define abstract, associative (graph-based) models

  • Both have XML-based interchange syntaxes (and simpler, text-based syntaxes)

  • Both allow some measure of inferencing or reasoning

  • Both have constraint languages and query languages

A lot more could be said on all of this, but right now I’m more interested in bringing out the differences.

Differences

In the lightning talk I confined myself to four fairly general areas in which I think RDF and Topic Maps are differ significantly:

  • They have different roots (TM in traditional finding aids, such as indexes, thesauri and the like; RDF in document metadata and predicate logic).

  • They have different levels of semantics (RDF is more low level; TM has higher level semantics built directly into the model).

  • When you dig into the details you discover significantly different models (viz., identity, scope, association roles, non-binary relationships, variant names, etc.).

  • Perhaps most importantly, they have significantly different goals (RDF and OWL are positioned as enablers for large-scale data integration and/or an “artificially intelligent” web for software agents; Topic Maps is all about findability and knowledge federation for humans).

I admit that these characterizations are somewhat nebulous, but that doesn’t mean they are not real: it’s just hard to pinpoint the exact nature of the differences. Since the lightning talk I’ve tried using metaphor and aphorisms to convey a feel for the differences, as in:

  • RDF/OWL is for machines; Topic Maps is for humans.

  • RDF/OWL is optimized for inferencing; Topic Maps is optimized for findability.

  • The great strength of RDF/OWL is that it is based on formal logic; the great strength of Topic Maps is that it is not based on formal logic.

  • RDF/OWL is to mathematics as Topic Maps is to natural language.

  • RDF/OWL is to Aristotle as Topic Maps is to Wittgenstein.

(That last one is for the philosophically inclined; I’ll come back to it later.)

These statements are not meant to be taken literally, but my experience is that they can be helpful. If you want a more detailed or scientific account of the differences – in particular, the differences between the two models and how to map between them – I recommend starting with the RDFTM documents mentioned above. These also contain substantial bibliographies.

When to use what?

Some folks are not particularly concerned with the differences; they just want to know when to use what. Here’s my take – once again, rather high level, but hopefully still useful – based on the following general premisses:

  1. RDF is more low-level; oriented towards machines

  2. Topic Maps is more high-level; oriented towards humans

  3. OWL is oriented towards artificial intelligence

On this basis I offer the following rules of thumb (my 2 cents, as they say; feel free to differ):

  • Do you only want to encode document metadata?

    — RDF is ideal and you won’t need OWL.

  • Do you want to achieve subject-based classification of content?

    — Topic Maps provides the better combination of flexibility and human-friendliness.

  • Do you want to combine metadata and subject-based classification?

    — Go straight for Topic Maps, because it also supports metadata.

  • Do you want to encode knowledge for humans to use?

    — Topic Maps is the better solution for handling context and “fuzziness”.

  • Do you want to develop agent-based applications?

    — You are probably better served by RDF (with OWL), but if you already have Topic Maps, you’re half way there.

Whatever you choose, you can sleep soundly in the knowledge that data can be moved from RDF to Topic Maps (and vice versa), thanks to the RDFTM interoperability work.

8 thoughts on “Topic Maps and the Semantic Web

  1. Pingback: danbri’s foaf stories » Beautiful plumage: Topic Maps Not Dead Yet

  2. Thank you, Steve; nice summary.

    Next challenge: and how do (or: how should) RDF and Topic Maps relate to colloquial XML? If one has (for example) an encyclopedia of a given subject area, or a library of journal articles, or both, then what information should be in colloquial XML, what information in Topic Maps, what information in RDF? (Sounds like a trick question, I know, but it’s not. I have an idea of the answer, but am not sure of it.)

    • Michael, I’m sorry I haven’t risen to your challenge yet. I fully intend to (and in the not-too-distant future).

      In the meantime I’m a bit intrigued by your use of the word “colloquial”. Can you explain what it means as applied to XML? Google gives me 352 hits, so the phrase clearly has some currency, but “colloquial” as used here surely doesn’t have quite its normal dictionary sense (as applied to a word or phrase) of “used in ordinary or familiar conversation; not formal or literary”.

  3. Pingback: Messages in a bottle » Blog Archive » RDF and Wittgenstein

    • (To Michael, trackback 2, below) Yes, exactly, it was the later Wittgenstein I had in mind — in particular the discussion (in Untersuchungen) of the category “game” and the notion of family resemblances.

      The contrast between the Aristotelian view of categories (defined in terms of necessary and sufficient conditions) and the view that category membership is based on resemblance to prototypical members seems somehow to be reflected in the contrast between RDF/OWL and Topic Maps (as also between generative and cognitive linguistics).

      It’s just a thought, though. I haven’t really thought it through and I’m not yet sure it will lead to any useful insights, but I hope to return to the topic one day.

  4. Very clearly put, Steve. My experiences with both RDF and Topic Maps fully support your analysis. Not going to details, but when hand crafting an RDF schema for the Finnish Museum of Photography (at that time there were no robust RDF modeling tools) I learnt that creating an ontology for human users was very laborious. RDF is like a box of small Lego blogs to build a house (say, a place where you can store your toy solders :)) compared to Topic Maps which is like a well thought set of wall pieces, doors, windows and the roof components.
    (http://www.fortex.fi/references.html – check the RDF schema story, a link to Schema ref doc is there as well…)

    I have tested TMs in ontology modeling few times. A larger project in which we successfully implemented TMs for a telecom information portal system convinced me that it is a better choice if your task is to codify knowledge structures that need to be communicated to human users. It was so much simpler to design and to explain to biz people. For larger information ecosystems the subject identity management concept is somewhat imperative. And it is all built-in in Topic Maps.
    (http://www.topicmaps.com/tm2008/hanninen.pdf )

    The number of TMs engines and the relatively narrow base of TMs developers is the only downside, but I believe the situation will improve when the OKS goes Open Source.

  5. Hi there

    As you’re writing in 2008 (although reporting a 2005 discussion, fair enough), I’m suprised you don’t mention W3C SKOS. It’s an RDF-based vocabulary for (roughly) thesaurus-like content. If you’re doing subject-based classification, SKOS deserves some serious attention. We have a lot of thesauri exposed in it now, as well as the Library of Congress subject headings (http://id.loc.gov/authorities/), and experiments from the Dewey and UDC classifications. I suspect mapping between SKOS/RDF and TopicMaps might go more smoothly than from general unconstrained RDF; I’m not sure about vice-versa, ie. that many TopicMaps might be expressable in terms of SKOS structures…

Leave a comment