Semantic UMW Rotating Header Image

Hapax Tag-omena

I’ve thought for a long time that tags are overrated as a tool for organizing content. Paradoxically, that’s because of how useful they are. The paradox is that, because they are useful so many things, they are used for many different purposes. The result is that within the context of a single, particular purpose they are great. Using tags like “biol12108sec03” for a course to created an aggregation of blog posts is a perfect example (some people call these “functional tags”). But when all those different purposes are all generically called “tagging”, there’s a huge info-muddle.

That came to the fore once again as I’ve been working on adding in some slicing and dicing around tags to the menu of Exhibits I’m working on here. The first muddle is tags vs. categories. This is one that I’ve coped with for a long time, and is unavoidable given both the technologies I’m using and the practices of bloggers. It looks useful to distinguish between a tag and a category, and many people have explained this to me many times. However, if you look at the data reported by feeds, the distinction is lost.

Ultimately, though, that’s a good thing because of the idiosyncratic ways in which people decide what’s a tag and what’s a category. One person’s tag is another’s category. So the distinction may be meaningful within a blog, but not across blogs.

But what I really want to look at is something different, a phenomenon that seems to defy either “tag” or “category” labels. This is what I’m calling a “hapax tag-omenon”, by analogy to a “hapax legomenon“, a word which appears only once in a text or in an author’s works. A hapax tagomenon is a tag which appears only once in blogs I’m scraping data from.

In exploring the tags used in UMW blogs, I discovered over 1000 hapax tagomena. Granted, some gaps in the data I’m collecting have resulted in some false positives, and I think earlier versions of the code labeled a few things that aren’t tags as tags. But overall, I think it gives a good starting picture. Here are a few of my favorites (the links go to the post):

The complete list, generated on the fly, is here.

Some, like “TGGCH”, I’m guessing have a meaning to the blogger, and might likely be a useful way for him or her to find a post again later. Others, like “Other Stuff :)” look like the beginnings of a categorization system, but maybe it just hasn’t been filled out yet. Still others, like “tweet” and “bake” seem almost like free-association, or perhaps even a kind of teaser (a tagline?).

One idea that pops up here is that something like the inverse of a tag-cloud might be interesting. Instead of showing the most-used tag, expose the hapax tagomena. After all, to literary scholars the hapax legomena are usually quite interesting. Maybe the same is true here?

Don’t get me wrong — this is all, as I said above, extremely useful. But not useful, I think, for organizing the content, at least across different blogs.

If nothing else, I hope that this will spark some reflection on tagging/categorization habits, and maybe about how we talk about information fluency.

2 Comments

  1. […] metadata is, I think, more than can be reasonably expected of most people (see, for example, Hapax Tag-omena and Visual Design and Information Design) .  But I think there’s also room for wider […]

  2. […] I can across today. I was browsing through the list of hapax tagomena in UMWBlogs (post about them here). I happened to come across these three tags that appear exactly once in […]

css.php