Semantic UMW Rotating Header Image

Uncategorized

Back up! (mostly)

I’ll have to consider the database meltdown as something of a blessing in disguise. First, in my digging around to find out what happened I discovered some previously unknown bugs. With those now fixed, I’ll be scraping in data much more reliably.

Second, in working to reconstruct as much data as I could I built some new code to back-fill a lot more data, so now we’ve got a lot more info than ever before!

Third, because we’ve got so much more data, I’m finding that I need to learn and discover more efficient techniques for searching through the database, which is bringing much improved performance as I rework the older code with the new techniques.

Last, Danny Ayers (wikipedia) at Talis (wikipedia), whose work and writing I’ve always much admired and has guided this project in many ways, popped right in to nudge me to sign up for an account with Talis. I’m on my way! (And notice, that’s another virtue of linking!)

Brief setback

So last night the database completely crashed. 🙁 I think I’ll be able to reconstruct at least the vast majority of information so that this experiment will be back up and running soon. But it looks like I’ll be bumping up the timetable for moving the code to a more reliable server.

I’ll also be exploring the Talis Platform to see if that will be a good solution.

Links: They’re Not Just For Breakfast and Google Anymore!

I mentioned in my previous post (now also here as a page) that working with links in posts is a big interest of mine. I’d like to give a quick update on the Link Friends Exhibit and expand on why links are so important and useful.

I’ve tweaked the pager for the link friends so that URLs with the most posts linking to them show up first. Unfortunately, the home page of UMWBlogs is excluded from the list because the “Hello World” post of new blogs links there. That makes the size of the result set simply too large to deal with, at least right now. Thanks to Eighteenth Century Audio, Librivox is at the top of the list with 70 — er, now 71 — posts that link to it.

In the Exhibits for individual blogs and posts, the list of links will now also direct you to an Exhibit of all the posts, blogs, and bloggers that share that link. Visit this blog’s Exhibit and click on something in the ‘All Links’ column to see it in action.

The majority of the posts that link to the same place are pairs created by auto-aggregation. Many course blogs are aggregating posts from the various members of the course, and so the same content — and links — appears in two different places. That makes a great number of pairs that link to the same place, which turns out to be a bit misleading since it’s really two instances of the same text, just from different contexts.

I’d like someday for things to get more interesting, with the Exhibit revealing completely different posts that happen to link to the same place. The technical mechanism will do that. This comes down to encouraging people to get into the habit of taking the time to include links to relevant sites when they can. What’s a relevant site?Blog home pages and particular posts are good candidates when your post is responding to someone else. Admittedly, this seems like a bit of overkill — trackbacks are meant to handle similar cross-referencing. Alas, because I’m scraping all the data from feeds, the trackbacks don’t show up.

Another good candidate is the sites being discussed in class, or are a reference, or are a useful tool for the class. Jeff McClurken’s post noting delicious (wikipedia link) as a useful tool is a good example of this. Mentioning Amazon.com (wikipedia)in your post? Make it a link. Using Zotero? (wikipedia) Make it a link. Omeka? Make it a link. Etc., etc., etc. . . .

That’s more than good practice for the readers of your blog, making it easy for them to check out a site that they might not yet be familiar with. It’s much, much, more. Many people are familiar with how Google (wikipedia) uses links. Through a mysterious algorithm that only they, and possibly Gandolf, know, the search results are rated by the links to that site. This is the “Google-juice” that useless sites use to get more traffic to their site. They create a bunch of links to their site, hoping that’ll boost their site to the top of the Google results page. It’s also why Wikipedia articles show up so frequently in Google searches — lots of people have linked to a relevant article.

But what I’m talking about is more, and more useful in some ways, than that. I’m talking about exactly the reverse of what Google does. Google uses the link to get information about the target of the link. I’m using the link to get information about the source of the link: your post. That’s a huge difference. That way, your link to stuff relevant to your post becomes data about your post. (That’s part of the idea of a Web of Data along with a Web of Documents that I mention in the About page.)

What difference does it make? Teaching and learning is all about discovering unexpected, maybe even serendipitous, connections. Two completely different people, studying completely different things, might very well be writing about the same site, tool, or topic. Including a link makes it easier to discover the unexpected commonalities across very different contexts.

But wait, as they say — there’s more. One especially useful variety of link is a link to a relevant Wikipedia article. My previous post mentioned that linking to a Wikipedia article serves as a tool for disambiguation, distinguishing between Paris France, Paris Texas, Paris Hilton, and Paris the Trojan hero. There’s more too it. A LOT more.

Through the extradinary service DBpedia (wikipedia article here), I will eventually be able to offer guides to finding similar posts even if they do not link to precisely the same Wikipedia article. DBpedia has been doing basically the same thing with Wikipedia that I’m doing with online content from UMW. Indeed, they are very much the inspiration for this project. They’re scraping data out of Wikipedia pages and making it available on the Semantic Web as Linked Data.

As I always used to encourage my students to ask, “So What?”. Almost all Wikipedia articles are in several different categories. DBpedia easily exposes those categories, which means it will be possible to find post that link to Wikipedia articles in the same categories. DBpedia also plays nicely with Geonames.org (wikipedia link), a huge body of geographical data. That will make it possible to find posts and/or blogs discussing things in the same geographical region. So if one post links to the Wikipedia article for Paris France, another links to the article for France, and other links to the article for the Eiffel Tower, it should be possible to pull all those together into a list of posts about stuff in France.

Did I mention that there’s more? DBpedia plays nicely with many other data sets. There’s YAGO, which offers standardized terms and relationships between them. There’s also a newer initiative from Zitgist called UMBEL. These projects are aiming toward subtle and precise categorization of material, which will make it easier to discover people with shared interests and thoughts.

We’ve moved into the future directions and possibilities for these technologies, and there’s still plenty of work to do to stitch it all together. That’s fine and as it should be. But the important thing we can all do now is to get into the habit of linking heavily. It’s a simple, easy-to-do technique that will pay off bigger and bigger dividends as time and technology progress.

Questions I’d Like To Help Answer / Scenarios

I said in my first post in this blog that I’m working on solutions to problems that most people don’t yet realize are problems. And I’ve hit on some possibilities sporadically as I’ve talked about new developments. It might be time to bring some of that thinking together into a few scenarios for which I think these approaches will be useful. Here’s a few of the things I have in mind.

“Just go to my blog at….”

Having an online presence that corresponds with a real-life presence is interesting. There are plenty of anecdotal stories people making connections in and out of the online environment. We need to foster mutual interaction of both.

I want people to be able to talk about their blogs and say “Just go to…” without having to find a cocktail napkin or matchbook to write it on. We should be able to find things later with only a hazy memory of how to get there, not the full URL. That’s what the “Contains” and “Starts With” searches aim at. You have just a few bits of where you want to get to, and those searches help you get to it from limited information.

So you know a blog is at something-or-other-“marching”. Or the blog has “marching” somewhere in the title. Or the post has “marching” in the title. Or the person you are talking to has a display name with “marching” in the title. Type “marching” into those searches and check out what comes up from there.

“Who’s got good pics?”

Start with the Image Gallery. It’ll show the images that’ve been included in online content from UMW (at least what I’ve been able to incorporate so far.). Each image will guide you to the post and/or the blog that it comes from.

“What’s the history of this blog?”

A lot of wonderful material gets buried in the reverse-chronological structure of blogs. Once something slips into the ‘archives’ (which only means that it’s not recent), it’s really hard to get back to it. But if you know the blog you want, do a “Contains” or “Starts With” search to dig up the blog, then go the Exhibit for that blog and you’ll get a better overview. (Improvements on that mechanism are coming soon).

“What interests do I have in common with others?”

This is a big-big-biggie for me. I have a suspicion that links are more reliable indicators of interests than tags, and so the “Link Friends” Exhibit is working toward helping people find others who have linked to the same places.

In my happy world, people will get into the habit of including lots of links, especially to Wikipedia. That’ll semi-tacitly provide information about you and your post, and make it easier to use these techniques to find possible common interests. For example, say you are writing a response to something in “Frankenstein, or the Modern Prometheus” for your chemistry class (yes, thanks to Leanna Giancarlo, that could happen!). And someone else is writing about the same work for a literature class. If they both include a link to the Wikipedia article on it, that’ll provide a great way to identify common interests and topics.

“Why not use tags?” you ask? Follow me to a more nuanced example. Instead of ‘Frankenstein’, you are writing about ‘Paris‘. ‘Paris‘, France?” No, the other ‘Paris’. ‘Paris‘, Texas?” No no, the other ‘Paris’. ‘Paris‘ Hilton?” No no no, the other ‘Paris’. ‘Paris’ the Trojan prince?” Yes! That’s the one!

The tag ‘Paris’ goes nowhere for disambiguating those possibilities, even though it is likely to be natural enough within the context of that blog. But connecting with others requires bridging contexts. In addition to the tag, a link to the relevant Wikipedia article will help to disambiguate and classify the blog post.

“What has so-and-so person written?”

Browse though the “Bloggers” Exhbit to find the display name you are looking for. That’ll guide you toward lots more info about what they’ve written and where.

These questions seem like they come naturally, but don’t have a handy mechanism for answering them. Providing that mechanism is at the core of what this blog and project is all about.

Got other questions you think would be useful? Let me know!

Episode IV: A New Search

It is a dark time for searching. We have no choice but to sift through pages of false hits, in desperate hope of finding our goal. Search results are willy-nilly. Googlification is everywhere.

Yet hope remains, for those who remain true —

Sorry…mixing my mythic battles. But searching really is weak — it tends toward spitting out all the possibilities and letting you sort it out. In many cases, that’s good and exactly what is needed. If I want to search for mentions of, say, “Athens”, a post might or might not be tagged with “Athens”, so that’s of dubious utility. Better to do a full on search, get lots of results, and go from there.

But notice that that’s the lowest common denominator of what you do. It’s great for the assumption that you know nothing about where to start, and so it gives you as many starting places as possible. But in plenty of cases you’re smarter than that. You know more about what you are searching for. You might have a vague recollection of just a part of the title of a post or of a blog. You might know that it starts with a particular word, but are unsure of the rest. You might remember just the first few letters of a blog’s subdomain (How do you respond to a “Go to my blog at…”, especially when it’s a spoken conversation? Ever remembered only the first few letters of a cryptic subdomain? I sure have!).

Our googlified method of searching doesn’t work well with the wacky notion that we might be coming to the table with a good idea of what we are looking for, but are just fuzzy on a few of the details. Much of the time, I suspect, we come to the table with knowledge that would make our searches much better. But there’s simply no place or mechanism for including that knowledge.

And so, two new options from Semantic UMW. Search by “Starts With” and search by “Contains”. There’s a simple input box (yes, yes, just like Google). After the first few letters they bring up a few lists that might be what you are looking for. Results are arranged in simple columns for things like blog titles that start with or contain that search, post titles, blogger display names, and in the case of “Starts With”, blog subdomain.

The Force is strong with the Semantic Web!

Exploring UMW’s Online Presence, part 2: How it works

The starting point for all the data going into the datastore for the Exhibits is the feeds that the blogs produce. This has advantages and disadvantages. The advantage is that I’ve punted the problem of interoperability with new web tools. We edupunks are constantly discovering and incorporating new tools for expressing UMW’s intellectual products. To try to directly interoperate with each new thing that comes along would be impossible. But I can fairly safely count on being able to get a feed out of most any new thing that comes along. The disadvantage is the relative weak expressiveness of feeds. There’s a good amount of information that gets obscured in the feeds. I decided to just work around that as best I can.

The starting point is the information from umwblogs.org just because that’s where the majority of a action is. UMWBlogs produces a feed of all the new posts. I’m looking at that feed just to find the newest posts. Then, using a tool for feed reading called Simple Pie, I’m finding the feeds for the blog of those posts. That lets me move through all the newest posts and discover all the other posts listed within that blog’s feed (though right now there are hiccups for some feeds).

Then I’m looking through the information in the feed to gather as much information as I can. I’m grabbing information about publication and modification time, links, authors, tags, images, and embedded media, among some other things. That all gets dumped into a big ol’ datastore using a tool called ARC.

The Exhibits that make the actual display of information are produced with some relatively straightforward work with Exhibit from MIT’s SIMILE project.

And ta-daa! That’s all there is.

Well, there’s a little more to it, technically speaking. Posts about that will be coming along at my other blog.

Exploring UMW’s Online Presence, part 1: What it looks like

I’ve built the displays using the ever-more outstanding Exhibit ( see also )web app from SIMILE at MIT.  Right now, I’m still using the default style, which looks a little like this example from the Bloggers Exhibit.

The red underline signals that when you click it, additional info will come up, like this:

This expands out the information available and where you could go.

But sa you browse through things, there is another aspect that work in tandem with the popups — facets. Facets provide versatile ways of filtering data to hone in on exactly what you are interested in at the moment. Depending on the kind of information you are looking at, I’ve offered up some facets to let you focus on a particular blog, or a particular blogger, or a particular post.  Here’s an example:

There’s a blogger cloud (a list of blogger names, highlighted by how oftern they appear in the page’s data), Post title and Blog title.  Click on one, and it filters the results on the page to just that one.  When you start filtering using the facets, you can also expand back out by using the checkboxes.  This setting will show data about both posts.  Notice that the other facets also trimmed down to only the ones relevant to the checked posts.

The possibilities for views on the info and for the facets are pretty broad–I’ll be working with lots of people on the information architecture as we go along.

What’s next to build?

As I’m tinkering with finer details in the first few exhibits semantifying UMW (see previous post), I’m toying around with the next exhibits to build. Here’s a few that I have in mind:

Weekly Tag Timeline
A timeline of the tags used in the past week, with links to the posts
All Blogs by Blog Name
An exhibit of all the blogs by title.
Tag Timeline and Exhibit by Blog
For any blog, the complete timeline of tag use
Top Linkers
An exhibit of the top 10 blogs for links.

Any other ideas?

And…for any of the exhibits, there’s the added question of what data to included in the bubbles that pop up. There are limits on the total amount of data for any exhibit, but when the focus is on a particular blog or post I think we can squeeze in a lot. What makes sense?

Building a Semantic University of Mary Washington

This is a blog about solutions to problems, problems that most people don’t yet know are problems.

What kinds of problems? There’s a lot of wonderful information out there, but it really is still hard to find what you need. Google, schmoogle. Try asking it to find pages written _by_ a person, not pages _about_ a person. Google’s indexing is fanstastic, but it doesn’t make that distinction. (Yahoo! is making much better progress here, BTW.)

We need something more. And there are a lot of people working on it. semantic.umwblogs.org is about my efforts to solve similar problems in University of Mary Washington’s little corner of the internet.

What’s Semantic?

The semantic web is a collection of tools, techniques, and mind-sets that looks at information on the internet in a new way. Much of the web today is a web of documents. You write a blog post; that’s a document. You write a web page; that’s a document. You put up a syllabus; that’s a document.

The new way that the semantic web looks at it all is as a web of data in addition to — and through — the web of documents. Your blog post has an author, a date it was updated, a blog it is part of, associated tags, things it links to; that’s data. Your web page is about a topic, has contributors, maybe re-presents a text; that’s data. Your syllabus is for a particular course, during a particular semester, taught by a particular person, has particular members; that’s data.

When documents and data work well together, we have more powerful ways of finding, sorting, filtering, slicing and dicing everything that’s out there. That’s what’s happening here.

I’m building some things to look at the content produced by UMW as a web of data in addition to a web of documents. This blog is about what I’m producing and why. I hope to offer some new ways to find your way through all the wonderful things that Mary Washington’s staff, faculty, and especially students are producing.

The first step is to work with umwblogs.org, our blogging environment maintained by the hippy, radical, edupunk Jim Groom. Eventually, this project will broaden, encompassing the other blogging platforms our students, staff, and faculty are using. It will also capture what’s going on in the rest of UMW’s online life, like Omeka installations, Drupal installations, pick-your-new-app-installations, and meshing it with administrative information about courses and who teaches them.

I’ll be blogging about new updates and developments here. Those of you who know me will be delighted and relieved to hear that in this space I’ll be avoiding the geeky details. For the folks who want the geeky details, they’ll be coming along at my other blog, Re-Mediation Roomy-Nation.

To start it off, here’s a few of the views on the data that are in the prototype stage. I’m hoping that the basic data presentation is there (and that my hosting server doesn’t crap out). If you know my wardrobe, you know that visual display is not my strong point, and so they’ll be prettified with help from others as we go along. Hope you enjoy, leave comments, and link!

Image Gallery
A gallery of images posted or linked to by the UMW online community
Bloggers
An alphabetical list of the bloggers and their posts by display name. Includes links to more detailed display about the individual posts.

Much more coming soon….