Semantic UMW Rotating Header Image

“Where’s the Links?”: Pedagogy and the Blue Underline

Yesterday I posted about how some students have retitled their blogs to reflect the course they are in this semester rather than last. That got me thinking about continuity of intellectual life and development from semester to semester — or lack thereof — and the blogging practices that could reflect and/or facilitate it.

Here’s a place where, perhaps, we should think about student blogging and the practices for it as something a little different from general blogging practices. Or maybe not different, but with some particular focus and emphasis. The particular thing I have in mind is linking, something I’ve also written about in “Links–They’re Not Just For Breakfast and Google Anymore!”. (Like how I went meta on that one?)

Experienced bloggers do a lot of linking to other bloggers and sites. But often enough we also link back to ourselves. That’s not a vanity thing, it reflects the fact that our blogs are a part of our own intellectual development. We work through ideas there, and so in the blogs history you can see the development of our thoughts. It’s quite natural, then, for us to link back to older posts. That linking is just the manifestation of our reflection on previous ideas.

That, I think, is the practice and philosophy that should be emphasized as good student blogging practices. This struck me particularly with the example of a blog that contains material from last semester’s Historical Methods class and this semester’s The Politics and Culture of the 1960s. Perhaps it’s only because there are few posts so far, but that is a place where I would hope for many many many back-links. It seems like the intellectual connections between the two classes should be there — that that is part of the organization of the entire curriculum — and so that should be manifested in the actual links. Pedagogically, pushing students to back-link to previous posts is just a way of saying, “Hey…let’s make the connections between different elements of the curriculum. That’s what it’s designed for.”

About year ago, I did a poster session at ELI along with Steve Greenlaw about a similar idea he and Gardner Campbell cooked up to encourage their advisees to talk about the connections they see between their different courses during a semester. Instead of this taking place in a separate web app during one semester, I’m talking about the connections being manifested within a space they are already using, UMWBlogs, and across many semesters.

This is a real place where encouraging a good blogging practice is also encouraging a good pedagogy and a goal of higher education.

Looks like the next Exhibit I build for Semantic UMW should be something like a “Links History” timeline for each blog.

Course Blogs and Students Palimpsesting Themselves

Much like a good 5-year overdue cleanup of the attic will uncover some gems among the detritus, I discovered something really interesting while cleaning up some of the data used behind Semantic UMW.

Due to some earlier code that didn’t clean up title information very well (funny characters, random spaces, etc.), many posts and blogs ended up with two titles, an old messy one and a new cleaned up one. Today I started working with some new tools to help me clean out the messy ones. First step was to dig out all the blogs and posts that have two different titles in the database.

A quick query pulled up this little surprise for two names for the same blog:

Introduction to Historical Methods AMST202 : The Politics and Culture of the 1960s

Hmm. A quick look at the archives, and what I know about the likely suspects in the History Department who would involve blogging in their courses, made it pretty clear that this is a case in which a student kept the same blog, but retitled it from the fall to spring semesters to reflect the new course.

The first thing I wondered was whether the theme used had changed, too. Unfortunately, the blog wasn’t listed in the Wayback Machine , so I couldn’t tell. But there are still some really interesting things to think about as a students’ blogging record grows to encompass their entire university history.

First, this will drive librarians, who are concerned with the provenance of a work, absolutely batty. Same with cultural historians and anyone else who is interested in knowing the context within which a work was produced. That is, most of the world of humanities. Those archived posts from last semester, which were written within the context of “Introduction to Historical Methods”, are now being presented as having been written within the context of “AMST202: The Politics and Culture of the 1960s”. It’s a funny kind of palimpsesting at work: the context within which a student’s work was created has been rewritten, with no good way to recapture it.

it seems like this also loses a powerful tool of self-reflection for the student, too. Imagine looking back as a graduating senior to posts that were written during your first semester. How many opportunities for long-term reflection might be lost with the loss of that context. Might a student even remember the course they were blogging in from four years ago? It also obscures a nice chance at contrasting between semesters and courses when the record gets blurred by simply by changing the title. For example, I’m really intrigued by this pair:

Gene’s Labor Rights Blog Gene’s Little Piece of Computer Heaven

In some of the cases I encountered, I suspect that there might also be a really interesting progression in the student’s thought at work. Does renaming the blog suggest that they have recontextualized their educational experience for themselves? The above instance is a simple example — the recontextualization is just the move from one course to another. But what if the retitling is actually a reflection of coming to a richer appreciation or understanding? Could that be at work in this pair?

More than just sleeping Dreams are the source of Imagination.

Or this one?

I Just Can’t Call it US History in Film I feel like a one-legged man at a fancy dance.

I don’t think that we should be asking students or faculty to do the annotations to keep a record of how the context of their posts do or do not shift. But I also think that it is important to try to capture this information somehow in an easy way to reveal it again. I’m going to guess that each individual post written under the titles “Gene’s Labor Rights Blog” and “Gene’s Little Piece of Computer Heaven” will make it clear that the context is very different. If someone stumbles upon an old post with the new blog title, I expect confusion. Some way to expose the fact that, once upon a time, the blog had a different title and context will be essential, both to help the reader out, and to guide the student toward thinking of their blogging and educational experiences as part of a continuous progress, not distinct jumps from class to class (even if the surface appearance of the blog makes it look that way).

One tool for addressing this issue of a student’s entire blogging history that encompasses the individual classes might be the idea of every student having their own domain, with subdomains, categories, or sub-blogs for each new course context. (See Jim Groom’s post on the idea).I can easily foresee long term management issues there, too, but it might be a way to start thinking about things.

These sorts of issues are why this project is here–both to discover them and to try to capture the material along with the bigger contexts in which the material is produced. Right now, I’m not able to add the needed metadata, but it is a goal of the longer-term project I write about in “Thoughts Toward a Giant EduGraph” (warning, this link goes to my much more technical blog, heavy geekery lies in wait for anyone who follows it!). Eventually, I’d like to see this kind of data being collected with little or no work needed by the blogger. It should all be readily exposed or derived from other known pieces of information, possibly just through the use of tags or machine tags, possibly with a quick and easy interface or plugin to update the blog’s context each semester. That’s still a bit far off, but it’s coming!

In the meanwhile, here’s a few more of my favorite retitlings:

molly moo’s blog yellow bubbles

Or best of all……

Musings Musings in my Pain, Pleasure, and Everything in Between Forgotten Fairytale What now?

An Aside To Other Things

A post on my more technical blog recently caught the eye of the good folks at Talis, and I had a great time talking with them about it.  The things going on here aren’t quite directly connected to this project, but some of the core principles about open data, educational possibilities, and linked data show up in the great conversation we had. (And I hope to make this a start toward some of the ideas we talked about.)

So, if you are interested in how library data, linked data, and more might come together in the future, give a listen to the podcast here.

I’m hoping to merge what’s going on here with the idea that we talked about there.  So much to do, so much to do!

Semantic UMW in the NMC 2009 Horizon Report

This morning I woke up to the happy discovery that Semantic UMW got a shout-out in the New Media Consortium‘s 2009 Horizon Report.  The Horizon Report gives an overview of emerging tools, practices, and ideas that could be significant for education.  This year’s report puts semantic-aware applications in the four to five year horizon, and kindly included this project in a list of examples.  Many thanks for the mention, and many thanks, too, for all the hard work and careful consideration that I know goes into producing this report.  You all are great!

I’d like to expand, too, on one of the points made in the report:

“There are currently two theoretical approaches to developing the semantic capacity of the web. One, the bottom-up approach, is problematic in that it assumes metadata will be added to each piece of content to include information about its context; tagging at the concept level, if you will.”

To a large degree this is true — providing precise semantic metadata is, I think, more than can be reasonably expected of most people (see, for example, Hapax Tag-omena and Visual Design and Information Design) .  But I think there’s also room for wider possibilities in the bottom-up approach since, after all, most of what’s going on here is based on the bottom-up approach.

First, since the Report mentions efforts at using linking data (see also Links–They’re Not Just For Breakfast and Google Anymore!), it’s worth saying that grabbing the link data is also a bottom-up approach.  It’s based on the idea of scraping out data that authors are already automatically including, and exploring what possibilities there are for doing nifty things with it.

And that leads into some bigger-picture bottom-up approaches, and what we can and cannot reasonably expect of authors.  There was a time when, I think, many people would have thought it outlandish to imagine creating an encyclopedia simply from the knowledge that anyone wants to contribute.  Wikipedia obviously shows us that that was wrong.  People do want to share information and make it work — when they see clearly the utility and power of doing so.  Importantly, the info-boxes on many Wikipedia pages turned out to be a very helpful tool for communicating knowledge.  They help authors express knowledge by providing an easy-to-use structure within which to communicate.

And the presence of that structure is the important bit for making data available on the semantic web.  DBpedia has made use of that structure in Wikipedia to scrape it into structured, semantic web data from what people are already contributing.  A bottoms-up approach that gleans an enormous amount of semantic data, simply based on the idea that people will contribute knowledge.

What’s still in development in the bigger picture are the tools to make it easy for people to add semantic data.  The Report also mentions some plugins for WordPress that are designed to do exaclty that.  Indeed, precise semantics are more than we should ask of the average bloggers.  But the tools available are moving toward augmenting what can be expected them.  To take my favorite taggging example, I don’t think it’s reasonable to ask a writer to disambiguate all the things that “Paris” could refer to.  But when we have the tools that offer semantically precise suggestions, we’ve augmented the precision of that author’s categorization.  That’s bottom-up data they are already offering, augmented.

Same with the data that people already love to add to their content.  Geo-data in Flickr.  Reviews of movies and songs.  Maps of their surroundings.  Their own interests and social networks. Their favorite songs.  What’s on their bookshelves. What’s on their wishlists. People are already adding this wealth of metadata to the web, and other people are already making it semantically accessible and tinkering with what can be done with it.  (The Linked Data effort is central here.  Here’s a map of the data that exists ready-to-play-with.)

Importantly, all this linked data, like the data I’m playing with in Semantic UMW, doesn’t ask any author to think, “How do I make metadata about my blog semantic?”  I don’t want them to think that — I want them to concentrate on their creativity.  I do want glean as much as I can from it, and encourage some good practices (like extensive linking) that will make more info available. People like to share, and the tools for augmenting the power of their sharing — by helping to add precise semantics — are growing in strength and scope.

And so I think that there is more potential in the bottom-up approach than meets the eye because it doesn’t hinge on the authors thinking through semantic data; it hinges on what other folks can do to augment their thinking through nifty plugins and on what other folks can do to derive nice data from current, real-world practices.

There’s still much work to do in these directions, but over the past 18 months huge strides have been made.  Are full-on applications using these approaches still five years out?  Maybe.  But not if I can do anything about it.

Goals for 2009

As always, there’s a lot on my plate, but, in addition to smaller tweaks to handle odd characters and improve the scrapers, I’m hoping to make these some of the top priorities for 2009.


Our student aide Serena Epstein has been doing some wonderful work to style up the Exhibits, including a nifty background image and much cleaner layout. There are a few odd tweaks we’re working on, but we should get those beaten into submission soon.

Moving to Talis Platorm

For both performance and stability, I’m (still) planning to move the datastore from its current home into the Talis Platform. They’ve helped me get set up with an account — now I just need to learn some things and finally block out the time to make the move.

Play with Linked Open Data

There is a fair number of links that can play nicely with the Linked Open Data cloud. One example I’m particularly interested in is links to the Internet Movie DataBase. Those links will give access to open data about the movies, and I’d like to turn that into some Exhibits for movie buffs — for example by having an Exhibit of movies and blog posts, with faceted browsing by the stars of the movies.

Integrate Course Information

This is actually the long-term biggie. It’s (I hope) great to have these Exhibits for browsing the blog activity. But the ultimate goal is to be able to find blogs based on course information — departments, topics studies, textbooks used, etc. One example question I’d like to address is, “I’m teaching Frankenstein in my history of science class. Who else is teaching it, in what department(s), what textbooks are they using, and what perspectives are they bringing to it, where are they blogging about it, and what else is on the syllabus?” Okay, that’s more than one question. But you get the idea. I’ve put up a new version of an information structure to help make this happen on my more technical blog (danger! thar be heavy geekery!). The next step is to find a happy way to build an interface for gathering all this info (and making it worthwhile for people to contribute it!).

Visual design and information design

There’s a funny tension at work throughout the web between the needs of visual design and the needs of data design. Sometimes it results in some funny things when you do what I’m doing — taking the data out of it’s visual context to mix and match it in new ways. Funny, and very interesting.

Case in point. I saw one blog has the title, “Charlie’s Blog for”. My first thought was that it was a type-o, or a blog so unfinished that the author hadn’t even completed figuring out what, exactly, the blog was for. Whatever the case, as I parsed both the grammar and the data, that there was no way this would work as a title. It seemed to do the opposite job of a title, completely and deliberated avoiding giving information about the blog.

Ahhh….but then I visited “Charlie’s Blog for” (Exhibit here ). Here’s a close-up of what I found:

See what Charlie did? He wanted to break up the visual design of the title. So, in the visual design, the full title is obviously “Charlie’s Blog for When America Came Marching Home 2008”, his blog for Jeff McClurken’s course by that title (Exhibit here ). To squeeze things into that visual design (supplied entirely by the WP theme), he made the tagline of his blog “When American Came Marching Home 2008” — a good, sneaky, respectable trick.

It’s a neat example of how people will disregard the information structure — The UI clearly says: “Give a title in one field, give a tag-line in another field” — if it gets the visual effect they are looking for. I’m not quite prepared to say that’s a bad thing. Just something that I need to be aware of. And, especially as we encounter more and more documents and data that have been republished via feed syndication, something that we will all need to be aware of as part of our information literacy. Something that looks like it must be an error, like this odd title, might make perfect sense when put back in it’s original visual context, even though the data is odd.

I can’t leave this idea without also wondering whether this signals at least some more consideration of how students (and everyone else, for that matter) thinks about how they are represented to the world via their blog. Not “represented” as in self-representation (“This is who I am”). But regarding the techniques and technologies of representation — things like making the data collected by me, Google, Yahoo!, and anyone else accurate when removed from a document’s context.

The key issue is whether we are at a point such that information fluency calls for understanding that any online document is also data, and as such calls for attention to both the ‘realization’ of the document — it’s visual form — and the ‘realization’ of the data — how it will be syndicated and re-formed by others.

In short, “All your document are belong to us.”

Hapax Tag-omena

I’ve thought for a long time that tags are overrated as a tool for organizing content. Paradoxically, that’s because of how useful they are. The paradox is that, because they are useful so many things, they are used for many different purposes. The result is that within the context of a single, particular purpose they are great. Using tags like “biol12108sec03” for a course to created an aggregation of blog posts is a perfect example (some people call these “functional tags”). But when all those different purposes are all generically called “tagging”, there’s a huge info-muddle.

That came to the fore once again as I’ve been working on adding in some slicing and dicing around tags to the menu of Exhibits I’m working on here. The first muddle is tags vs. categories. This is one that I’ve coped with for a long time, and is unavoidable given both the technologies I’m using and the practices of bloggers. It looks useful to distinguish between a tag and a category, and many people have explained this to me many times. However, if you look at the data reported by feeds, the distinction is lost.

Ultimately, though, that’s a good thing because of the idiosyncratic ways in which people decide what’s a tag and what’s a category. One person’s tag is another’s category. So the distinction may be meaningful within a blog, but not across blogs.

But what I really want to look at is something different, a phenomenon that seems to defy either “tag” or “category” labels. This is what I’m calling a “hapax tag-omenon”, by analogy to a “hapax legomenon“, a word which appears only once in a text or in an author’s works. A hapax tagomenon is a tag which appears only once in blogs I’m scraping data from.

In exploring the tags used in UMW blogs, I discovered over 1000 hapax tagomena. Granted, some gaps in the data I’m collecting have resulted in some false positives, and I think earlier versions of the code labeled a few things that aren’t tags as tags. But overall, I think it gives a good starting picture. Here are a few of my favorites (the links go to the post):

The complete list, generated on the fly, is here.

Some, like “TGGCH”, I’m guessing have a meaning to the blogger, and might likely be a useful way for him or her to find a post again later. Others, like “Other Stuff :)” look like the beginnings of a categorization system, but maybe it just hasn’t been filled out yet. Still others, like “tweet” and “bake” seem almost like free-association, or perhaps even a kind of teaser (a tagline?).

One idea that pops up here is that something like the inverse of a tag-cloud might be interesting. Instead of showing the most-used tag, expose the hapax tagomena. After all, to literary scholars the hapax legomena are usually quite interesting. Maybe the same is true here?

Don’t get me wrong — this is all, as I said above, extremely useful. But not useful, I think, for organizing the content, at least across different blogs.

If nothing else, I hope that this will spark some reflection on tagging/categorization habits, and maybe about how we talk about information fluency.

10,000 Posts!

The mission of semantifying the University of Mary Washington is coming along nicely — the store of information now knows about more than 10,000 posts from over 1,500 different blogs!

There’s also a lot more information that isn’t yet being exposed in the various exhibits and galleries. It’s all a big pile of facts about the blogs and bloggers, just waiting to be put together in a new and interesting way. How big is the pile of facts? First, let me get at what I mean by a ‘fact’.

Here, a fact is what you have whenever you put a relation between two different things. So “This post was created on October 30” is a fact — it relates the post to the date it was created. “This post was created by Patrick” is another fact, relating this post and the person who created it.

So when I say “a big pile of facts”, I mean that the pile contains approximately 140,000 distinct facts of that sort, and adding between 600 and 900 new ones each week!

Top 10 Lists

Always looking for new and interesting ways to see what’s going on in the world of UMW blogging, I’ve made a few quick Top Ten lists:

People with the most audio in their blogs.
People with the most images in their blogs.
People with the most video in their blogs.
People with the most links in their blogs.
People with the most distinct tags/categories in their blogs.

I haven’t done a thing to make them visually pretty — it’s just the data, so if you think data is pretty you’ll like it, if you need the visual appeal, well, that’s coming along soon.

A Timeline of Posts

Just put up a new prototype of an Exhibit/Timeline: Two Week Timeline. This gets a list of all the posts from the past two weeks and puts them into a timeline. I was surprised to find that this one interests me a little more than the others, because the timeline representation might end up revealing some patterns to the blogging activity. Even counting outliers and anomalies like the recent fall break and the pile-up from LabLogs (not sure, but there might be a bug here I need to chase!) — it looks like prominent blogging times show up.

This might start us toward some quantitative study of blogging behavior here at UMW.

In related news, the Exhibit for individual blogs now also contains a complete timeline. Use one of the searches to look up your blog and follow the link to its Exhibit, then click the “Timeline” tab at the top of the page.