Semantic UMW Rotating Header Image

November, 2008:

Visual design and information design

There’s a funny tension at work throughout the web between the needs of visual design and the needs of data design. Sometimes it results in some funny things when you do what I’m doing — taking the data out of it’s visual context to mix and match it in new ways. Funny, and very interesting.

Case in point. I saw one blog has the title, “Charlie’s Blog for”. My first thought was that it was a type-o, or a blog so unfinished that the author hadn’t even completed figuring out what, exactly, the blog was for. Whatever the case, as I parsed both the grammar and the data, that there was no way this would work as a title. It seemed to do the opposite job of a title, completely and deliberated avoiding giving information about the blog.

Ahhh….but then I visited “Charlie’s Blog for” (Exhibit here ). Here’s a close-up of what I found:

See what Charlie did? He wanted to break up the visual design of the title. So, in the visual design, the full title is obviously “Charlie’s Blog for When America Came Marching Home 2008”, his blog for Jeff McClurken’s course by that title (Exhibit here ). To squeeze things into that visual design (supplied entirely by the WP theme), he made the tagline of his blog “When American Came Marching Home 2008” — a good, sneaky, respectable trick.

It’s a neat example of how people will disregard the information structure — The UI clearly says: “Give a title in one field, give a tag-line in another field” — if it gets the visual effect they are looking for. I’m not quite prepared to say that’s a bad thing. Just something that I need to be aware of. And, especially as we encounter more and more documents and data that have been republished via feed syndication, something that we will all need to be aware of as part of our information literacy. Something that looks like it must be an error, like this odd title, might make perfect sense when put back in it’s original visual context, even though the data is odd.

I can’t leave this idea without also wondering whether this signals at least some more consideration of how students (and everyone else, for that matter) thinks about how they are represented to the world via their blog. Not “represented” as in self-representation (“This is who I am”). But regarding the techniques and technologies of representation — things like making the data collected by me, Google, Yahoo!, and anyone else accurate when removed from a document’s context.

The key issue is whether we are at a point such that information fluency calls for understanding that any online document is also data, and as such calls for attention to both the ‘realization’ of the document — it’s visual form — and the ‘realization’ of the data — how it will be syndicated and re-formed by others.

In short, “All your document are belong to us.”

Hapax Tag-omena

I’ve thought for a long time that tags are overrated as a tool for organizing content. Paradoxically, that’s because of how useful they are. The paradox is that, because they are useful so many things, they are used for many different purposes. The result is that within the context of a single, particular purpose they are great. Using tags like “biol12108sec03” for a course to created an aggregation of blog posts is a perfect example (some people call these “functional tags”). But when all those different purposes are all generically called “tagging”, there’s a huge info-muddle.

That came to the fore once again as I’ve been working on adding in some slicing and dicing around tags to the menu of Exhibits I’m working on here. The first muddle is tags vs. categories. This is one that I’ve coped with for a long time, and is unavoidable given both the technologies I’m using and the practices of bloggers. It looks useful to distinguish between a tag and a category, and many people have explained this to me many times. However, if you look at the data reported by feeds, the distinction is lost.

Ultimately, though, that’s a good thing because of the idiosyncratic ways in which people decide what’s a tag and what’s a category. One person’s tag is another’s category. So the distinction may be meaningful within a blog, but not across blogs.

But what I really want to look at is something different, a phenomenon that seems to defy either “tag” or “category” labels. This is what I’m calling a “hapax tag-omenon”, by analogy to a “hapax legomenon“, a word which appears only once in a text or in an author’s works. A hapax tagomenon is a tag which appears only once in blogs I’m scraping data from.

In exploring the tags used in UMW blogs, I discovered over 1000 hapax tagomena. Granted, some gaps in the data I’m collecting have resulted in some false positives, and I think earlier versions of the code labeled a few things that aren’t tags as tags. But overall, I think it gives a good starting picture. Here are a few of my favorites (the links go to the post):

The complete list, generated on the fly, is here.

Some, like “TGGCH”, I’m guessing have a meaning to the blogger, and might likely be a useful way for him or her to find a post again later. Others, like “Other Stuff :)” look like the beginnings of a categorization system, but maybe it just hasn’t been filled out yet. Still others, like “tweet” and “bake” seem almost like free-association, or perhaps even a kind of teaser (a tagline?).

One idea that pops up here is that something like the inverse of a tag-cloud might be interesting. Instead of showing the most-used tag, expose the hapax tagomena. After all, to literary scholars the hapax legomena are usually quite interesting. Maybe the same is true here?

Don’t get me wrong — this is all, as I said above, extremely useful. But not useful, I think, for organizing the content, at least across different blogs.

If nothing else, I hope that this will spark some reflection on tagging/categorization habits, and maybe about how we talk about information fluency.