Semantic UMW Rotating Header Image

September, 2008:

Episode IV: A New Search

It is a dark time for searching. We have no choice but to sift through pages of false hits, in desperate hope of finding our goal. Search results are willy-nilly. Googlification is everywhere.

Yet hope remains, for those who remain true —

Sorry…mixing my mythic battles. But searching really is weak — it tends toward spitting out all the possibilities and letting you sort it out. In many cases, that’s good and exactly what is needed. If I want to search for mentions of, say, “Athens”, a post might or might not be tagged with “Athens”, so that’s of dubious utility. Better to do a full on search, get lots of results, and go from there.

But notice that that’s the lowest common denominator of what you do. It’s great for the assumption that you know nothing about where to start, and so it gives you as many starting places as possible. But in plenty of cases you’re smarter than that. You know more about what you are searching for. You might have a vague recollection of just a part of the title of a post or of a blog. You might know that it starts with a particular word, but are unsure of the rest. You might remember just the first few letters of a blog’s subdomain (How do you respond to a “Go to my blog at…”, especially when it’s a spoken conversation? Ever remembered only the first few letters of a cryptic subdomain? I sure have!).

Our googlified method of searching doesn’t work well with the wacky notion that we might be coming to the table with a good idea of what we are looking for, but are just fuzzy on a few of the details. Much of the time, I suspect, we come to the table with knowledge that would make our searches much better. But there’s simply no place or mechanism for including that knowledge.

And so, two new options from Semantic UMW. Search by “Starts With” and search by “Contains”. There’s a simple input box (yes, yes, just like Google). After the first few letters they bring up a few lists that might be what you are looking for. Results are arranged in simple columns for things like blog titles that start with or contain that search, post titles, blogger display names, and in the case of “Starts With”, blog subdomain.

The Force is strong with the Semantic Web!

Exploring UMW’s Online Presence, part 2: How it works

The starting point for all the data going into the datastore for the Exhibits is the feeds that the blogs produce. This has advantages and disadvantages. The advantage is that I’ve punted the problem of interoperability with new web tools. We edupunks are constantly discovering and incorporating new tools for expressing UMW’s intellectual products. To try to directly interoperate with each new thing that comes along would be impossible. But I can fairly safely count on being able to get a feed out of most any new thing that comes along. The disadvantage is the relative weak expressiveness of feeds. There’s a good amount of information that gets obscured in the feeds. I decided to just work around that as best I can.

The starting point is the information from umwblogs.org just because that’s where the majority of a action is. UMWBlogs produces a feed of all the new posts. I’m looking at that feed just to find the newest posts. Then, using a tool for feed reading called Simple Pie, I’m finding the feeds for the blog of those posts. That lets me move through all the newest posts and discover all the other posts listed within that blog’s feed (though right now there are hiccups for some feeds).

Then I’m looking through the information in the feed to gather as much information as I can. I’m grabbing information about publication and modification time, links, authors, tags, images, and embedded media, among some other things. That all gets dumped into a big ol’ datastore using a tool called ARC.

The Exhibits that make the actual display of information are produced with some relatively straightforward work with Exhibit from MIT’s SIMILE project.

And ta-daa! That’s all there is.

Well, there’s a little more to it, technically speaking. Posts about that will be coming along at my other blog.

Exploring UMW’s Online Presence, part 1: What it looks like

I’ve built the displays using the ever-more outstanding Exhibit ( see also )web app from SIMILE at MIT.  Right now, I’m still using the default style, which looks a little like this example from the Bloggers Exhibit.

The red underline signals that when you click it, additional info will come up, like this:

This expands out the information available and where you could go.

But sa you browse through things, there is another aspect that work in tandem with the popups — facets. Facets provide versatile ways of filtering data to hone in on exactly what you are interested in at the moment. Depending on the kind of information you are looking at, I’ve offered up some facets to let you focus on a particular blog, or a particular blogger, or a particular post.  Here’s an example:

There’s a blogger cloud (a list of blogger names, highlighted by how oftern they appear in the page’s data), Post title and Blog title.  Click on one, and it filters the results on the page to just that one.  When you start filtering using the facets, you can also expand back out by using the checkboxes.  This setting will show data about both posts.  Notice that the other facets also trimmed down to only the ones relevant to the checked posts.

The possibilities for views on the info and for the facets are pretty broad–I’ll be working with lots of people on the information architecture as we go along.

What’s next to build?

As I’m tinkering with finer details in the first few exhibits semantifying UMW (see previous post), I’m toying around with the next exhibits to build. Here’s a few that I have in mind:

Weekly Tag Timeline
A timeline of the tags used in the past week, with links to the posts
All Blogs by Blog Name
An exhibit of all the blogs by title.
Tag Timeline and Exhibit by Blog
For any blog, the complete timeline of tag use
Top Linkers
An exhibit of the top 10 blogs for links.

Any other ideas?

And…for any of the exhibits, there’s the added question of what data to included in the bubbles that pop up. There are limits on the total amount of data for any exhibit, but when the focus is on a particular blog or post I think we can squeeze in a lot. What makes sense?

Building a Semantic University of Mary Washington

This is a blog about solutions to problems, problems that most people don’t yet know are problems.

What kinds of problems? There’s a lot of wonderful information out there, but it really is still hard to find what you need. Google, schmoogle. Try asking it to find pages written _by_ a person, not pages _about_ a person. Google’s indexing is fanstastic, but it doesn’t make that distinction. (Yahoo! is making much better progress here, BTW.)

We need something more. And there are a lot of people working on it. semantic.umwblogs.org is about my efforts to solve similar problems in University of Mary Washington’s little corner of the internet.

What’s Semantic?

The semantic web is a collection of tools, techniques, and mind-sets that looks at information on the internet in a new way. Much of the web today is a web of documents. You write a blog post; that’s a document. You write a web page; that’s a document. You put up a syllabus; that’s a document.

The new way that the semantic web looks at it all is as a web of data in addition to — and through — the web of documents. Your blog post has an author, a date it was updated, a blog it is part of, associated tags, things it links to; that’s data. Your web page is about a topic, has contributors, maybe re-presents a text; that’s data. Your syllabus is for a particular course, during a particular semester, taught by a particular person, has particular members; that’s data.

When documents and data work well together, we have more powerful ways of finding, sorting, filtering, slicing and dicing everything that’s out there. That’s what’s happening here.

I’m building some things to look at the content produced by UMW as a web of data in addition to a web of documents. This blog is about what I’m producing and why. I hope to offer some new ways to find your way through all the wonderful things that Mary Washington’s staff, faculty, and especially students are producing.

The first step is to work with umwblogs.org, our blogging environment maintained by the hippy, radical, edupunk Jim Groom. Eventually, this project will broaden, encompassing the other blogging platforms our students, staff, and faculty are using. It will also capture what’s going on in the rest of UMW’s online life, like Omeka installations, Drupal installations, pick-your-new-app-installations, and meshing it with administrative information about courses and who teaches them.

I’ll be blogging about new updates and developments here. Those of you who know me will be delighted and relieved to hear that in this space I’ll be avoiding the geeky details. For the folks who want the geeky details, they’ll be coming along at my other blog, Re-Mediation Roomy-Nation.

To start it off, here’s a few of the views on the data that are in the prototype stage. I’m hoping that the basic data presentation is there (and that my hosting server doesn’t crap out). If you know my wardrobe, you know that visual display is not my strong point, and so they’ll be prettified with help from others as we go along. Hope you enjoy, leave comments, and link!

Image Gallery
A gallery of images posted or linked to by the UMW online community
Bloggers
An alphabetical list of the bloggers and their posts by display name. Includes links to more detailed display about the individual posts.

Much more coming soon….

css.php