This morning I woke up to the happy discovery that Semantic UMW got a shout-out in the New Media Consortium‘s 2009 Horizon Report. The Horizon Report gives an overview of emerging tools, practices, and ideas that could be significant for education. This year’s report puts semantic-aware applications in the four to five year horizon, and kindly included this project in a list of examples. Many thanks for the mention, and many thanks, too, for all the hard work and careful consideration that I know goes into producing this report. You all are great!
I’d like to expand, too, on one of the points made in the report:
“There are currently two theoretical approaches to developing the semantic capacity of the web. One, the bottom-up approach, is problematic in that it assumes metadata will be added to each piece of content to include information about its context; tagging at the concept level, if you will.”
To a large degree this is true — providing precise semantic metadata is, I think, more than can be reasonably expected of most people (see, for example, Hapax Tag-omena and Visual Design and Information Design) . But I think there’s also room for wider possibilities in the bottom-up approach since, after all, most of what’s going on here is based on the bottom-up approach.
First, since the Report mentions efforts at using linking data (see also Links–They’re Not Just For Breakfast and Google Anymore!), it’s worth saying that grabbing the link data is also a bottom-up approach. It’s based on the idea of scraping out data that authors are already automatically including, and exploring what possibilities there are for doing nifty things with it.
And that leads into some bigger-picture bottom-up approaches, and what we can and cannot reasonably expect of authors. There was a time when, I think, many people would have thought it outlandish to imagine creating an encyclopedia simply from the knowledge that anyone wants to contribute. Wikipedia obviously shows us that that was wrong. People do want to share information and make it work — when they see clearly the utility and power of doing so. Importantly, the info-boxes on many Wikipedia pages turned out to be a very helpful tool for communicating knowledge. They help authors express knowledge by providing an easy-to-use structure within which to communicate.
And the presence of that structure is the important bit for making data available on the semantic web. DBpedia has made use of that structure in Wikipedia to scrape it into structured, semantic web data from what people are already contributing. A bottoms-up approach that gleans an enormous amount of semantic data, simply based on the idea that people will contribute knowledge.
What’s still in development in the bigger picture are the tools to make it easy for people to add semantic data. The Report also mentions some plugins for WordPress that are designed to do exaclty that. Indeed, precise semantics are more than we should ask of the average bloggers. But the tools available are moving toward augmenting what can be expected them. To take my favorite taggging example, I don’t think it’s reasonable to ask a writer to disambiguate all the things that “Paris” could refer to. But when we have the tools that offer semantically precise suggestions, we’ve augmented the precision of that author’s categorization. That’s bottom-up data they are already offering, augmented.
Same with the data that people already love to add to their content. Geo-data in Flickr. Reviews of movies and songs. Maps of their surroundings. Their own interests and social networks. Their favorite songs. What’s on their bookshelves. What’s on their wishlists. People are already adding this wealth of metadata to the web, and other people are already making it semantically accessible and tinkering with what can be done with it. (The Linked Data effort is central here. Here’s a map of the data that exists ready-to-play-with.)
Importantly, all this linked data, like the data I’m playing with in Semantic UMW, doesn’t ask any author to think, “How do I make metadata about my blog semantic?” I don’t want them to think that — I want them to concentrate on their creativity. I do want glean as much as I can from it, and encourage some good practices (like extensive linking) that will make more info available. People like to share, and the tools for augmenting the power of their sharing — by helping to add precise semantics — are growing in strength and scope.
And so I think that there is more potential in the bottom-up approach than meets the eye because it doesn’t hinge on the authors thinking through semantic data; it hinges on what other folks can do to augment their thinking through nifty plugins and on what other folks can do to derive nice data from current, real-world practices.
There’s still much work to do in these directions, but over the past 18 months huge strides have been made. Are full-on applications using these approaches still five years out? Maybe. But not if I can do anything about it.