Semantic UMW Rotating Header Image

January, 2009:

An Aside To Other Things

A post on my more technical blog recently caught the eye of the good folks at Talis, and I had a great time talking with them about it.  The things going on here aren’t quite directly connected to this project, but some of the core principles about open data, educational possibilities, and linked data show up in the great conversation we had. (And I hope to make this a start toward some of the ideas we talked about.)

So, if you are interested in how library data, linked data, and more might come together in the future, give a listen to the podcast here.

I’m hoping to merge what’s going on here with the idea that we talked about there.  So much to do, so much to do!

Semantic UMW in the NMC 2009 Horizon Report

This morning I woke up to the happy discovery that Semantic UMW got a shout-out in the New Media Consortium‘s 2009 Horizon Report.  The Horizon Report gives an overview of emerging tools, practices, and ideas that could be significant for education.  This year’s report puts semantic-aware applications in the four to five year horizon, and kindly included this project in a list of examples.  Many thanks for the mention, and many thanks, too, for all the hard work and careful consideration that I know goes into producing this report.  You all are great!

I’d like to expand, too, on one of the points made in the report:

“There are currently two theoretical approaches to developing the semantic capacity of the web. One, the bottom-up approach, is problematic in that it assumes metadata will be added to each piece of content to include information about its context; tagging at the concept level, if you will.”

To a large degree this is true — providing precise semantic metadata is, I think, more than can be reasonably expected of most people (see, for example, Hapax Tag-omena and Visual Design and Information Design) .  But I think there’s also room for wider possibilities in the bottom-up approach since, after all, most of what’s going on here is based on the bottom-up approach.

First, since the Report mentions efforts at using linking data (see also Links–They’re Not Just For Breakfast and Google Anymore!), it’s worth saying that grabbing the link data is also a bottom-up approach.  It’s based on the idea of scraping out data that authors are already automatically including, and exploring what possibilities there are for doing nifty things with it.

And that leads into some bigger-picture bottom-up approaches, and what we can and cannot reasonably expect of authors.  There was a time when, I think, many people would have thought it outlandish to imagine creating an encyclopedia simply from the knowledge that anyone wants to contribute.  Wikipedia obviously shows us that that was wrong.  People do want to share information and make it work — when they see clearly the utility and power of doing so.  Importantly, the info-boxes on many Wikipedia pages turned out to be a very helpful tool for communicating knowledge.  They help authors express knowledge by providing an easy-to-use structure within which to communicate.

And the presence of that structure is the important bit for making data available on the semantic web.  DBpedia has made use of that structure in Wikipedia to scrape it into structured, semantic web data from what people are already contributing.  A bottoms-up approach that gleans an enormous amount of semantic data, simply based on the idea that people will contribute knowledge.

What’s still in development in the bigger picture are the tools to make it easy for people to add semantic data.  The Report also mentions some plugins for WordPress that are designed to do exaclty that.  Indeed, precise semantics are more than we should ask of the average bloggers.  But the tools available are moving toward augmenting what can be expected them.  To take my favorite taggging example, I don’t think it’s reasonable to ask a writer to disambiguate all the things that “Paris” could refer to.  But when we have the tools that offer semantically precise suggestions, we’ve augmented the precision of that author’s categorization.  That’s bottom-up data they are already offering, augmented.

Same with the data that people already love to add to their content.  Geo-data in Flickr.  Reviews of movies and songs.  Maps of their surroundings.  Their own interests and social networks. Their favorite songs.  What’s on their bookshelves. What’s on their wishlists. People are already adding this wealth of metadata to the web, and other people are already making it semantically accessible and tinkering with what can be done with it.  (The Linked Data effort is central here.  Here’s a map of the data that exists ready-to-play-with.)

Importantly, all this linked data, like the data I’m playing with in Semantic UMW, doesn’t ask any author to think, “How do I make metadata about my blog semantic?”  I don’t want them to think that — I want them to concentrate on their creativity.  I do want glean as much as I can from it, and encourage some good practices (like extensive linking) that will make more info available. People like to share, and the tools for augmenting the power of their sharing — by helping to add precise semantics — are growing in strength and scope.

And so I think that there is more potential in the bottom-up approach than meets the eye because it doesn’t hinge on the authors thinking through semantic data; it hinges on what other folks can do to augment their thinking through nifty plugins and on what other folks can do to derive nice data from current, real-world practices.

There’s still much work to do in these directions, but over the past 18 months huge strides have been made.  Are full-on applications using these approaches still five years out?  Maybe.  But not if I can do anything about it.

Goals for 2009

As always, there’s a lot on my plate, but, in addition to smaller tweaks to handle odd characters and improve the scrapers, I’m hoping to make these some of the top priorities for 2009.


Our student aide Serena Epstein has been doing some wonderful work to style up the Exhibits, including a nifty background image and much cleaner layout. There are a few odd tweaks we’re working on, but we should get those beaten into submission soon.

Moving to Talis Platorm

For both performance and stability, I’m (still) planning to move the datastore from its current home into the Talis Platform. They’ve helped me get set up with an account — now I just need to learn some things and finally block out the time to make the move.

Play with Linked Open Data

There is a fair number of links that can play nicely with the Linked Open Data cloud. One example I’m particularly interested in is links to the Internet Movie DataBase. Those links will give access to open data about the movies, and I’d like to turn that into some Exhibits for movie buffs — for example by having an Exhibit of movies and blog posts, with faceted browsing by the stars of the movies.

Integrate Course Information

This is actually the long-term biggie. It’s (I hope) great to have these Exhibits for browsing the blog activity. But the ultimate goal is to be able to find blogs based on course information — departments, topics studies, textbooks used, etc. One example question I’d like to address is, “I’m teaching Frankenstein in my history of science class. Who else is teaching it, in what department(s), what textbooks are they using, and what perspectives are they bringing to it, where are they blogging about it, and what else is on the syllabus?” Okay, that’s more than one question. But you get the idea. I’ve put up a new version of an information structure to help make this happen on my more technical blog (danger! thar be heavy geekery!). The next step is to find a happy way to build an interface for gathering all this info (and making it worthwhile for people to contribute it!).