The first post in this series gave a background to the Semantic Web, as traditionally conceived. This post gives an overview of three of the problems we’re facing in making that vision a reality. I went longer on these three than I had thought I would. I’ll delve into the fourth problem – trust – in the next post.
As for the title – let me clarify myself a bit. I like the Semantic Web vision – it has poetry. We’ll likely continue to see incrementally closer and closer implementations of it. What I have a hard time swallowing is the claims (usually of software vendors) that they are delivering it today. There are just too many tough problems between us and the goal to imagine that it’s all been solved by one software vendor.
If the vision of the Semantic Web is creating a distributed world-wide library of facts that your computer can use to answer all sorts of questions for you – what makes it so hard? Let’s take a look at three of the major problems.
First creating an ontology is hard. An ontology is an explicit, computer-readable declaration of what exists in the world and how all of those things are related. If it’s to really include all of the myriad things that people care about, it’s a monstrously complex task.
The task of classifying… all the ideas that seek expression is the most stupendous of logical tasks. Anybody but the most accomplished logician must break down in it utterly; and even for the strongest man, it is the severest possible tax on the logical equipment and faculty.- Charles Sanders Peirce
One way to tame this beast is to settle for an ontology that only covers the most popular items (e.g. food, travel, popular entertainment and consumer merchandise). We’ll be able to ask our computers about mass-market things, but anything more unusual (Burmese culture, history of organized crime, vacuum repair) would be outside of the system’s depth. It looks like Headup, among others, is tackling the problem from this angle.
The second problem – marking up semantic content is hard. Beyond the very simple cases, creating documents that effectively tell computers interesting facts is a job for experts; it’s not at all as easy as HTML/CSS. OWL, the language that the W3C has recommended for doing this work, is terrifically complex. A person needs to breath first order logic in order to use it in any interesting way. The general public is outclassed on this one.
Some less rigourous and less ardous ways to markup content are showing up (e.g. Microformats). These provide a simpler syntax for marking up very common items like places and people. Some companies are also marking up major storehouses of information (like IMDB) by hand in order to provide the core information for the mass market audience. In either case, the long-tail of human knowledge is left out of the picture.
Even if we were to have a good model of what exists in the world and gobs of documents all marked up beutifully, we’d still have our third problem – the reasoning problem. It is by no means simple to get a computer to do acts of logic in the wild. Getting these reasoners rolling to the point where you can ask them a question and have them come back with an answer sometime before the heat-death of the universe is not a simple task. Some questions are simply not answerable, but these are considered nice. There are some questions that are not-answerable in such a way that the computer will never know that they are not answerable – those are a bit nastier. There are all sorts of people working on their doctorates on just small subsets of this problem.
That’s three of the barriers – in the next post I’ll tackle the trust issue.