Wolfram Alpha

On the knowledge management front, it looks like Wolfram is about to take a major step forward.  I don’t make these predictions often – based on Wolfram’s track record and what it sounds like they’re set to deliver – this could be the next step in knowledge search, research, and computation.  Meaning – it looks like it’s going to make the Google/Wikipedia method look a tad bit dusty.  Wolfram Alpha is set to go live sometime in May. There’s a blog which gives us a bit of a tease.

Dr. Stephen Wolfram demonstrated Wolfram Alpha yesterday at Harvard.  The video (embed below) gives us tons of mouth watering scenarios, video of Dr. Wolfram typing, but almost no views of the product performing.  It sounds like it’s working.  It sounds like he’s demonstrating something quite amazing.

Hat tip to Joel Katz for the heads up.

Stay tuned…

Why it’s hard to sell me on the Semantic Web – Part 3

This is the third in a series.  Part 1 covered the basics of the Semantic Web vision. Part 2 gave a brief overview of 3 problems in the way – all of them of a technical nature.  This post looks at a problem that is not just technical – Trust.

When it comes to computer agents answering questions for me, trust is an essential problem, not a technical one.  Whenever I ask a question and get an answer, I’m outsourcing trust.  I’m believing in the answer and in the source of that answer.  If I’m asking a computer, I’m trusting the computer and the results that it will return.

What’s Good to Eat Around Here?
If I’m asking a simple question like “where’s the nearest stop for the 17 bus”, there’s not much room for mistrust, but if the question is any more complex, trust becomes a serious issue.  Let’s say I’m asking the question – “Where is the nearest place I can get a good sandwich at a decent price?”  Of course there are issues of ontology, markup, and reasoning involved here (What’s qualifies as a sandwich?  Am I talking about food or construction supplies?  How does one determine ‘decent price’?  How does one define ‘nearest’?)  But let’s look at the one word which begs the trust question – good.

Nowadays, to find out if a restaurant has a good sandwich, I can hit a whole bunch of websites looking for reviews.  For each piece of information I see, I make a judgment about whether to trust that piece of information.  I’ll use all sorts of subtle and not-so-subtle clues to decide to trust or not.  I look at what site it’s on, what else the person there has posted, how they express themselves, whether it’s balanced, whether it uses criterion I value – ultimately, there’s an element of intuition to it.  When I ask my computer the question and the computer comes back with an answer, the decisions of trust are left to the computer. 

Is there a Doctor Nearby?
The word “good” begs the trust question directly, but the question comes up even in less opinion-oriented questions.   The computer’s entire concept of reality is taught to it by people.  Who do you trust to teach your computer about what exists?  To teach it what is consequential and what is not, what is worthy of mention and what is not, what is part of reality and what is not?    

Let’s keep it simple.  If I own a restaurant that serves wraps, and I know that most of the world searches for “sandwiches”, not “wraps”.  I’ll publish an ontology that says “A wrap is a sandwich (a really valuable sandwich)”. My competitor down the street, a standard deli, will publish an ontology that says “Wraps aren’t sandwiches, people looking for sandwiches don’t want wraps, and wraps aren’t worth anything.”  Which one does the computer trust?  Similar questions will come up in all domains – politics, economics, news, medicine, nutrition, etc.

If businesses know that I am searching through semantic agents, they’ll do everything they can to optimize their business to be discovered by semantic agents.  This includes, of course, declaring themselves as fit in as many ways as they possibly can. With computer agents returning information, we can expect this to be standard practice by any business looking to attract customers.

As soon as we farm off our question answering to an outside agent, we can’t avoid this problem.  The definitions of everything will still be up for great debate – only we will have abdicated our right to answer the question and entrusted it to our computers. 

Who do you Trust?
There may be a first light of a solution to this question in the social network. The social network provides an explicit declaration of who I trust.  The computer can tell me “You can believe this review, because someone you trust (or someone who they trust) posted it.” 

The current networks are far too limited to cover the broad range of issues that will come up.  I may be interested in something that none of my friends know anything about.  To broaden the footprint of trust, we may see the formation of societies of mutual trust.  They will collectively form a vision of reality and self police to insure the lack of misleading information.  There would have to be many of these, as my conception of reality may not jive with yours.  The same question will have different answers depending on the differing underlying assumptions and network of trust.

In Summary
So that’s a capsule of my thoughts on the Semantic Web.  We’re making slow progress on each of these questions, but the questions are big and the progress is incremental.  The “Semantic Web” is growing organically – don’t buy it when the next start-up tells you they are delivering it to your door.  

Why it’s hard to sell me on the Semantic Web – Part 2

The first post in this series gave a background to the Semantic Web, as traditionally conceived.  This post gives an overview of three of the problems we’re facing in making that vision a reality.  I went longer on these three than I had thought I would.  I’ll delve into the fourth problem – trust – in the next post.

As for the title – let me clarify myself a bit.  I like the Semantic Web vision – it has poetry.  We’ll likely continue to see incrementally closer and closer implementations of it.  What I have a hard time swallowing is the claims (usually of software vendors) that they are delivering it today.  There are just too many tough problems between us and the goal to imagine that it’s all been solved by one software vendor.

If the vision of the Semantic Web is creating a distributed world-wide library of facts that your computer can use to answer all sorts of questions for you – what makes it so hard?   Let’s take a look at three of the major problems.

First creating an ontology is hard. An ontology is an explicit, computer-readable declaration of what exists in the world and how all of those things are related.  If it’s to really include all of the myriad things that people care about, it’s a monstrously complex task.

The task of classifying… all the ideas that seek expression is the most stupendous of logical tasks.  Anybody but the most accomplished logician must break down in it utterly; and even for the strongest man, it is the severest possible tax on the logical equipment and faculty.- Charles Sanders Peirce

One way to tame this beast is to settle for an ontology that only covers the most popular items (e.g. food, travel, popular entertainment and consumer merchandise).  We’ll be able to ask our computers about mass-market things, but anything more unusual (Burmese culture, history of organized crime, vacuum repair) would be outside of the system’s depth.  It looks like Headup, among others, is tackling the problem from this angle.

The second problem – marking up semantic content is hard.  Beyond the very simple cases, creating documents that effectively tell computers interesting facts is a job for experts; it’s not at all as easy as HTML/CSS.  OWL, the language that the W3C has recommended for doing this work, is terrifically complex.  A person needs to breath first order logic in order to use it in any interesting way.  The general public is outclassed on this one.

Some less rigourous and less ardous ways to markup content are showing up (e.g. Microformats).  These provide a simpler syntax for marking up very common items like places and people.   Some companies are also marking up major storehouses of information (like IMDB) by hand in order to provide the core information for the mass market audience.  In either case, the long-tail of human knowledge is left out of the picture.

Even if we were to have a good model of what exists in the world and gobs of documents all marked up beutifully, we’d still have our third problem – the reasoning problem.  It is by no means simple to get a computer to do acts of logic in the wild.  Getting these reasoners rolling to the point where you can ask them a question and have them come back with an answer sometime before the heat-death of the universe is not a simple task.  Some questions are simply not answerable, but these are considered nice.  There are some questions that are not-answerable in such a way that the computer will never know that they are not answerable – those are a bit nastier.  There are all sorts of people working on their doctorates on just small subsets of this problem.

That’s three of the barriers – in the next post I’ll tackle the trust issue.

Why it’s hard to sell me on the Semantic Web – Part 1

A good friend of mine works as a social media editor.  We periodically get together for long lunches where the free wheeling conversation hits all the topics of note in the current communication scene.  I was surprised today when he brought up the question of the Semantic Web.  After a half-decade stint in the business of semantic technologies, I’ve basically written off the Semantic Web.  After ten years of failed promise, I’m always a bit surprised to hear another rumor of it’s pending existence.

In short – the Semantic Web promises to turn all of the text found on the web into machine readable facts, and to provide programs that can use those facts to answer questions for you.  So, for example, a restaurant website may say “We’re located at 518 Chestnut Street, have a wide variety of sandwiches, and are open on Saturday.”  The website may give a full menu, driving directions, a list of daily specials, etc.  To a computer this looks like just a bunch of text – blah blah blah blah.  A semantically marked up document would put a formal representation of this information in place along with the text.  Very loosely speaking, it would look something like this:

<Organization type=”Restaurant” name=”Bob’s Restaurant” id=”1″/><isLocatedAt/><Address text=”518 Chestnut Street”/>

<Organization id=”1″/><sellsGoods/><Food type=”sandwiches”/>

<Organization id=”1″/><isOpen/><recurringDay=”7″/>

Once beautiful documents like this are in place, you can ask your computer a question like “Where can I get a sandwich on Saturday”, and the computer would come back with my restaurant.  You could even give your computer quite complex tasks and have it come back with good answers –  “I have to pick up toothpaste, a watermelon, and a large camelhair coat, meet with the mayor, my fiancee, and my lawyer, and I want to get a good sandwich around lunchtime.  Please plan out a course of travel and schedule that takes into account expected traffic and the hours of the shops I have to visit. Also, let me know if I’m passing any place that’s having a going-out-of-business sale.”  The computer would hit tens of websites, communicate with other agents, and put together the schedule and information for you.

That’s the dream.  None less than Tim Berners Lee, the father of the web, has been championing this for years.  The seminal article on the topic was published in 2001.

There are a few major roadblocks.  Teaching computers about common sense is hard – that’s the ontology problem.  Creating those beatiful documents above is hard – that’s the markup problem.  Teaching computers to reason through all those facts is hard – that’s the reasoning problem.  The one I’d like to really focus on, though, is the trust problem.  I’ll post on that one in the coming days.

A Social Question

I’m thinking about social networks, and an apple falls on my head. The geeks were on this train eight years ago, but no one noticed. And you know what? They’re still on the same train, and no one notices. FOAF, or ‘Friend of a Friend’ is a formal, computer understandable way of declaring who knows who. Put a bunch of FOAF documents together, and bing, you’ve got a social network. FOAF started a good long time ago, as part of the grass roots technical effort behind the fabled Semantic Web, but just like the Semantic Web, it never hit its growth spurt.

I googled FOAF – 6 and a half million results. Not bad.

Facebook? 36 million. “Social Network”? 15 million. OpenSocial? nearly 11 million.

And FOAF had a six year head start. Eight in the case of OpenSocial.

 

What factors make FOAF just a footnote? A few things –

1) It’s hard – FOAF is all about geeks, from beginning to end. Writing a FOAF document is hard, getting it online is hard, and doing anything with it is hard. The potential market of the technology is limited by its form. The technology was never put within reach of the masses.

2) It’s boring – Who cares if one computer coder is friends with another? The declaration of this knowledge is computationally interesting, but it doesn’t do anything. There’s no sizzle to sell. It creates a social graph, but there’s no socializing happening.

3) It’s artificial – In FOAF, the social connections aren’t created organically, they have to be constructed. If sending an email created a FOAF connection, that would be organic. As it stands now, someone has to go out and document reality. If you want to document reality, it’s much better if the reality forges its own documentation.

 

Let’s look at Facebook, on the other hand. It looses points on the technical purity and openness scale; it’s a big mean closed network. But it gets the three points above spot on.

1) It’s stupid easy to use – there’s barely any barrier to entry at all. Point and click, instant gratification, AJAX love.

2) It’s exciting – on Facebook you can see pictures of people who you might want to date. There has never been a more powerful engine for technical adoption. Period.

3) It’s organic – I create connections on facebook by going about my daily business – talking to people, showing off, looking for love, complementing others, planning a party, building a cause. It’s all sorts of organic; it’s useful.

 

Easy, exciting, and organic. Can we do the same thing for other otherwise doomed Semantic Web technologies? How do you make OWL easy, exciting, and organic? Would love to hear your insights.