Why it’s hard to sell me on the Semantic Web – Part 1

A good friend of mine works as a social media editor.  We periodically get together for long lunches where the free wheeling conversation hits all the topics of note in the current communication scene.  I was surprised today when he brought up the question of the Semantic Web.  After a half-decade stint in the business of semantic technologies, I’ve basically written off the Semantic Web.  After ten years of failed promise, I’m always a bit surprised to hear another rumor of it’s pending existence.

In short – the Semantic Web promises to turn all of the text found on the web into machine readable facts, and to provide programs that can use those facts to answer questions for you.  So, for example, a restaurant website may say “We’re located at 518 Chestnut Street, have a wide variety of sandwiches, and are open on Saturday.”  The website may give a full menu, driving directions, a list of daily specials, etc.  To a computer this looks like just a bunch of text – blah blah blah blah.  A semantically marked up document would put a formal representation of this information in place along with the text.  Very loosely speaking, it would look something like this:

<Organization type=”Restaurant” name=”Bob’s Restaurant” id=”1″/><isLocatedAt/><Address text=”518 Chestnut Street”/>

<Organization id=”1″/><sellsGoods/><Food type=”sandwiches”/>

<Organization id=”1″/><isOpen/><recurringDay=”7″/>

Once beautiful documents like this are in place, you can ask your computer a question like “Where can I get a sandwich on Saturday”, and the computer would come back with my restaurant.  You could even give your computer quite complex tasks and have it come back with good answers –  “I have to pick up toothpaste, a watermelon, and a large camelhair coat, meet with the mayor, my fiancee, and my lawyer, and I want to get a good sandwich around lunchtime.  Please plan out a course of travel and schedule that takes into account expected traffic and the hours of the shops I have to visit. Also, let me know if I’m passing any place that’s having a going-out-of-business sale.”  The computer would hit tens of websites, communicate with other agents, and put together the schedule and information for you.

That’s the dream.  None less than Tim Berners Lee, the father of the web, has been championing this for years.  The seminal article on the topic was published in 2001.

There are a few major roadblocks.  Teaching computers about common sense is hard – that’s the ontology problem.  Creating those beatiful documents above is hard – that’s the markup problem.  Teaching computers to reason through all those facts is hard – that’s the reasoning problem.  The one I’d like to really focus on, though, is the trust problem.  I’ll post on that one in the coming days.

9 thoughts on “Why it’s hard to sell me on the Semantic Web – Part 1

  1. Hi Eliezer,

    Maybe it wasn’t just a coincidence that your friend brought up the Semantic Web the other day!

    Could be that he attended the Jerusalem Web Professionals meeting last week where representatives of an Israeli startup Headup presented their Semantic Web Add-on.

    Check out an excellent blogpost written by Debi Zylbermann on the evening.

    Looking forward to reading your future posts.


  2. Joel –

    Thanks for the background. I had seen this event coming up, but it had slipped off my radar. I even tried to give Headup a shot – it’s not Linux friendly, though, so I’ll have to bounce over to Windows to test it.

    My initial sense is that Headup is using a very limited set of information sources and has added a semantic layer to them by hand. It looks like they’ve focused on the high value areas (hotels, music, etc.), gambling on an 80/20 principle that most people really only care about that simple stuff anyway. It appears to be pretty far away from the larger vision of the Semantic Web.

  3. Looking forward to the post on the trust problem, Eliezer.

    We at Thomson Reuters (which bought ClearForest in ’07) have been working on automating the markup problem with the free Calais Web service and open API at OpenCalais.com. We’re up to 9,000 developers in the Calais community and starting to see some interesting applications.

    In our new release (4.0), we have also forged a connection to the Linked Data Cloud and introduced a new global metadata transport layer.

    The Calais / ClearForest team

  4. Pingback: Sowing Light » Why it’s hard to sell me on the Semantic Web - Part 2

  5. Pingback: Sowing Light » Why it’s hard to sell me on the Semantic Web - Part 3

  6. Hi Eliezer,

    I counted to ten before writing this response but I’m still hopping mad.

    Is it too much to expect you’d have the decency to try something out before passing judgment on it?

    It’s the least I’d expect from someone who presents himself as a teacher, “technologist and spritual seeker” (the typo, dear teacher, is in the original BTW).

    As a matter of fact Headup’s framework for accepting sources is expandable indefinitely. If it has an API or an RSS we can crunch it and use it as a source.

    We’re in the process of simplifying adding sources to a point that users will be able to add sources to Headup by themselves.

    According to Tim Berners-Lee the “larger vision of the Semantic Web” is this:

    “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers”

    I don’t claim Headup is there yet but in my humble opinion we’re definitely well on our way, and a damn sight closer than any other player in the field.

    The simple truth is that despite having nearly a decade to prove itself the bottom-up approach to Semantic Web has failed dismally. Defined by a bunch of academics it relies on content publishers compliance with cumbersome standards that are costly to adopt and have yet to prove their financial viability.

    Headup and other Top down Semantic Web applications are far more likely to succeed exactly because they are pushed by enterprise, require nothing from publishers and present financially sound models (Amazon’s API for example…).

    “I tweet @headup”

  7. Mike –

    First off, thanks for the spelling correction.

    Regarding the tone of what you wrote, I’m not sure I understand what happened in between this comment that you posted, and your comment above. I don’t think I whistled a significantly different tune in my two mentions of your company.

    To cover my bases, I booted into Windows to try out your software. It seems to work alright for software at this stage of development. It adds a quick way to do the obvious searches off of a single node (term or phrase). There’s promise in the way it integrates some of the intelligence from the social networks.

    However it doesn’t do, nor claim to do, any sort of reasoning. I would wager that under the hood it doesn’t understand relationships on any more sophisticated level than ‘related’. It casts a wide automatic search net, but at heart isn’t doing anything more than matching text in what it finds. This makes your claim to being a ‘semantic web’ tool ring hollow to these ears.

    Regarding your other claims of the difficulty of the bottom up approach, etc. I think we’re substantially in agreement. The later posts in this series spell out my position, and I think you even gave me a ‘word up’ in the comment linked above.

    I’m interested to see where headup heads to. Opening up the API is intriguing. Of course, whether you provide value to the market will determine your success, not words on my blog.


Leave a Reply