24 ways

to impress your friends

Marking Up a Tag Cloud by Mark Norman Francis

Everyone‘s doing it.

The problem is, everyone’s doing it wrong.

Harsh words, you might think. But the crimes against decent markup are legion in this area. You see, I’m something of a markup and semantics junkie. So I’m going to analyse some of the more well-known tag clouds on the internet, explain what’s wrong, and then show you one way to do it better.

del.icio.us

I think the first ever tag cloud I saw was on del.icio.us. Here’s how they mark it up.

  1. <div class="alphacloud">
  2. <a href="/tag/.net" class="lb s2">.net</a>
  3. <a href="/tag/advertising" class=" s3">advertising</a>
  4. <a href="/tag/ajax" class=" s5">ajax</a>
  5. ...
  6. </div>
  7. Source: /code/marking-up-a-tag-cloud/delicious.txt

Unfortunately, that is one of the worst examples of tag cloud markup I have ever seen. The page states that a tag cloud is a list of tags where size reflects popularity. However, despite describing it in this way to the human readers, the page’s author hasn’t described it that way in the markup. It isn’t a list of tags, just a bunch of anchors in a <div>. This is also inaccessible because a screenreader will not pause between adjacent links, and in some configurations will not announce the individual links, but rather all of the tags will be read as just one link containing a whole bunch of words. Markup crime number one.

Flickr

Ah, Flickr. The darling photo sharing site of the internet, and the biggest blind spot in every standardista’s vision. Forgive it for having atrocious markup and sometimes confusing UI because it’s just so much damn fun to use. Let’s see what they do.

  1. <p id="TagCloud">
  2. &nbsp;<a href="/photos/tags/06/" style="font-size: 14px;">06</a>&nbsp;
  3. &nbsp;<a href="/photos/tags/africa/" style="font-size: 12px;">africa</a>&nbsp;
  4. &nbsp;<a href="/photos/tags/amsterdam/" style="font-size: 14px;">amsterdam</a>&nbsp;
  5. ...
  6. </p>
  7. Source: /code/marking-up-a-tag-cloud/flickr.txt

Again we have a simple collection of anchors like del.icio.us, only this time in a paragraph. But rather than using a class to represent the size of the tag they use an inline style. An inline style using a pixel-based font size. That’s so far away from the goal of separating style from content, they might as well use a <font> tag. You could theoretically parse that to extract the information, but you have more work to guess what the pixel sizes represent. Markup crime number two (and extra jail time for using non-breaking spaces purely for visual spacing purposes.)

Technorati

Ah, now. Here, you’d expect something decent. After all, the Overlord of microformats and King of Semantics Tantek Çelik works there. Surely we’ll see something decent here?

  1. <ol class="heatmap">
  2. <li><em><em><em><em><a href="/tag/Britney+Spears">Britney Spears</a></em></em></em></em></li>
  3. <li><em><em><em><em><em><em><em><em><em><a href="/tag/Bush">Bush</a></em></em></em></em></em></em></em></em></em></li>
  4. <li><em><em><em><em><em><em><em><em><em><em><em><em><em><a href="/tag/Christmas">Christmas</a></em></em></em></em></em></em></em></em></em></em></em></em></em></li>
  5. ...
  6. <li><em><em><em><em><em><em><a href="/tag/SEO">SEO</a></em></em></em></em></em></em></li>
  7. <li><em><em><em><em><em><em><em><em><em><em><em><em><em><em><em><a href="/tag/Shopping">Shopping</a></em></em></em></em></em></em></em></em></em></em></em></em></em></em></em></li>
  8. ...
  9. </ol>
  10. Source: /code/marking-up-a-tag-cloud/technorati.txt

Unfortunately it turns out not to be that decent, and stop calling me Shirley. It’s not exactly terrible code. It does recognise that a tag cloud is a list of links. And, since they’re in alphabetical order, that it’s an ordered list of links. That’s nice. However … fifteen nested <em> tags? FIFTEEN? That’s emphasis for you. Yes, it is parse-able, but it’s also something of a strange way of looking at emphasis. The HTML spec states that <em> is emphasis, and <strong> is for stronger emphasis. Nesting <em> tags seems counter to the idea that different tags are used for different levels of emphasis. Plus, if you had a screen reader that stressed the voice for emphasis, what would it do? Shout at you? Markup crime number three.

So what should it be?

As del.icio.us tells us, a tag cloud is a list of tags where the size that they are rendered at contains extra information. However, by hiding the extra context purely within the CSS or the HTML tags used, you are denying that context to some users. The basic assumption being made is that all users will be able to see the difference between font sizes, and this is demonstrably false.

A better way to code a tag cloud is to put the context of the cloud within the content, not the markup or CSS alone. As an example, I’m going to take some of my favourite flickr tags and put them into a cloud which communicates the relative frequency of each tag.

To start with a tag cloud in its most basic form is just a list of links. I am going to present them in alphabetical order, so I’ll use an ordered list. Into each list item I add the number of photos I have with that particular tag. The tag itself is linked to the page on flickr which contains those photos. So we end up with this first example. To display this as a traditional tag cloud, we need to alter it in a few ways:

Displaying the items next to each other simply means setting the display of the list elements to inline. The context can be hidden by wrapping it in a <span> and then using the off-left method to hide it. And the link just means adding an anchor (with rel="tag" for some extra microformats bonus points). So, now we have a simple collection of links in our second example.

The last stage is to add the sizes. Since we already have context in our content, the size is purely for visual rendering, so we can just use classes to define the different sizes. For my example, I’ll use a range of class names from not-popular through ultra-popular, in order of smallest to largest, and then use CSS to define different font sizes. If you preferred, you could always use less verbose class names such as size1 through size6. Anyway, adding some classes and CSS gives us our final example, a semantic and more accessible tag cloud.

About the author

Mark Norman Francis Mark Norman Francis is a Lead Web Developer for Yahoo!, where amongst other things he runs code quality reviews. He is based in London, England, is obsessed with semantics and hopes one day to start blogging properly at marknormanfrancis.com.

Your comments

  1. § Arthus Erea:

    I love the idea of using this semantic markup and I probably will use much of this tutorial in my next project. However, the only problem is that the font sizes aren’t as scalable as they could be. Tags are only 1 of 6 sizes, so it cannot scale easily. For instance, there may be 1 tag with 5,000 photos under it, and then another tag with 1 photo in it. Using the semantic method of class names, this might only render a small difference between the very differently weighted tags. I am trying to think of a way of doing both a scalable and semantically valid task list.

  2. § brothercake:

    A couple of thoughts – the information you’re providing in the additional text is very verbose; a reader would have to listen to “... photos are tagged with …” over and over again. You could just as well make it a single number in brackets after the tag: “austion (344)”

    But, why make different information available to screenreaders than is available to sighted users. If it’s interesting information, why not make it available to all users? But if it’s to provide equivalence, mightn’t it be better to provide the same information as the class, like “ultra popular”?

  3. § Kevin:

    Great example. Thanks for the descriptions and semantics. I’ve always struggled with deciding how to mst accurately add semantics to my web pages. Now if only those double spaces from Flickr appeared in the final example…

  4. § Jens Nedal:

    Don’t they have people at any of the social bookmarking sites that actually understand CSS? Horrible code truly.
    All this tagging display business screams for a list with classes, just like displayed here. Thanks for putting up a good example!

  5. § wrtlprnft:

    Another idea for the numbers: What’s wrong with putting them into title attributes? That would make the numbers accessible to visual renderers as well, but not clutter the display.

  6. § Ed Eliot:

    wrtlprnft – Screenreaders are inconsistent in reading the value of the title attribute. With default configuration I’m pretty sure some don’t. I think Brothercake has a point about the verbosity of the text. I think I’d go for text equivalent to the class name values in brackets after the tags.

  7. § Mark Norman Francis:

    “Tags are only 1 of 6 sizes, so it cannot scale easily.” Arthus – that’s true in my example. Many tag clouds only have certain pre-set levels. Technorati’s is something of an exception, which is why you can find quite so many nested EMs. However, there’s nothing to stop people defining more steps and therefore more classes.

    “the information you’re providing in the additional text is very verbose” Brothercake – that’s very true, and for a cloud with a lot of links that would be a valid concern.

    “But, why make different information available to screenreaders than is available to sighted users.” Brothercake – I do believe that information should be available, and not just using CSS. My point here was not to show the ultimate tag cloud; rather, just that existing tag clouds could be made better and more semantic without altering how they are displayed to the user with CSS. Personally, I’m not that a big fan of tag clouds. But I am a fan of semantic markup. :)

  8. § Sebastian Redl:

    Great as this semantic markup is, I can’t help it: I hate tag clouds. I find the mixture of text sizes within a single block unappealing, hard to read and confusing.

  9. § Ben Ward:

    Interesting stuff, certainly; there is without a doubt some utterly grotesque mark-up out there.

    A couple of points:

    Firstly, your aside about using rel=“tag” on links isn’t right. rel-tag has a more specific purpose than just any link to a tag page:

    By adding rel=“tag” to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated “tag” (or keyword/subject) for the current page [emphasis mine]

    Therefore the list of tags following a blog post should be marked up with rel=“tag”, as they link to the page that aggregates all content for that tag. But a tag cloud is a summary of those tags; you’re not ‘tagging’ the current page, so those links shouldn’t have rel=“tag”. They’re just regular hyperlinks.

    As for tag cloud mark-up, I agree completely with the analysis of existing mark-up. Technorati tried to be semantic but lost the plot a bit when they reached such deep nesting. That said, I’m not sure they’re completely wrong for using EM.

    I think there are two distinct lines of thought for this. The first is that HTML does not have sufficient means to describe this cloud representation of tags and that therefore classes should be used on top of generic mark-up. The second is that HTML does not have sufficient means so we should use the closest matching mark-up we can find.

    The nearest-match for tag clouds is EM and STRONG. In whatever context, the use of text size in a tag cloud is to emphasise one tag over another. The problem is that that only provides three levels of emphasis (none, EM, STRONG). For the project I’m working on at the moment I’m taking the ‘nearest match mark-up’ approach and have contrived a forth level with EM nested in STRONG.

    Is nearest-match the way to go? I’m not sure. There are times when using nearest-match mark-up risks devaluing the semantics of the elements used (not dissimilar to TABLE being devalued for all its misuse in layout; could browser makers have done something amazingly cool with data tables had they only been used properly?). My feeling is that in the case of EM and STRONG in a tag cloud the devaluation is nil and so for me — in a situation where four size levels is sufficient in my cloud — I’d rather take some HTML semantics.

  10. § Scott Reynen:

    I llike Technorati’s use of HTML tags rather than class names with essentially the same meaning, but I agree it’s a bit verbose to be using all those ems. So I’m mixing ems and strongs in my own tag cloud markup, e.g.:

    http://typewriting.org/tag/
    http://typewriting.org/link/tag/

    Edit: Looks like we lost the tags - Scott shoot me an email and I'll put them back in. Drew.

  11. § Bryce:

    It had never dawned on me that tag clouds needed to be anything more than just links. You make a perfect point with the accessibility issue. Brilliant approach!

  12. § Mike Stenhouse:

    I mark mine up with an ol and bracketed counts – it was the best way I could think of to get all the information in one place. I did try ‘a’, ‘a em’, ‘a strong’ and ‘a strong em’ but I figure that the spoken-out-loud distinction is just too subtle and obtuse to be useful. I’m open to persuasion on that one though.

    Incidentally, it’s worth remembering that a ‘tag cloud’ is actually a weighted list of tags. Weighted lists are a great example of information design: managing to convey lots of information in a very compact way. They are useful in contexts outside of tags too…

  13. § Rob Ellis:

    I absolutely love this post, it’s great.
    I too feel that there should be a little more control in the size of the text, perhaps a JavaScript solution that pulls the number of posts from an inner span and dynamically assigns a weight based on some thresholds, min size, max size etc.
    The offset text is great for SEO.

    Again well done, perhaps when I have a bit more time I will put some code together.

    Rob Ellis

  14. § John Allsopp:

    I did a bit of work on this issue for a possible microformat a couple of months back.

    http://microformatique.com/?page_id=34

    It’s a reasonably complex but I think interesting issue. It demonstrates how superficial a lot of HTML based development is – little or any real attention paid to the underlying semantics (with a couple of honorable exceptions) as noted.

    john

  15. § Patrick H. Lauke:

    if they weren’t so hit and miss to radically restyle, tables could be used as well for this…or definition lists, at a stretch…hmmm

  16. § /T:

    Another really serious accessibility issue with your tag cloud is that flickr collapses multiple words into one. thereflectionofthehairofchristianheilmann is 42 letters and will not fit onto one line of a standard braille display with 40 modules.

  17. § Drew McLellan:

    /T: That’s a good point about words going over 40 letters. However, it’s more of an issue with long tags that with tag clouds specifically.

  18. § Andy Hawkes:

    I do like the idea of the semantic markup used, but I can’t help but think that the ordering of the list items should be by tag weight rather than alphabetical.

    The entire concept of your method is to make a tag cloud work for users who cannot see the font size variance whilst applying a solid semantic structure, yet you have semantically ordered the data by a totally different measure for those users than you have for those who see the tag cloud as intended.

    I get that the content provides the context in terms of the “46 images tagged with” text, but it’s the use of the ordered list that is creating that dull nagging feeling in the back of my head (or maybe it’s down to the fact that we had our office christmas party last night…) as the ordering is clearly different for the two audiences.

    Other than that, it’s a very simple approach.

  19. § Chris Messina:

    When John Allsopp was working on his tagcloud effort, I started documenting some of the syntaxes you had done (though didn’t get very far). I did, however, come up with my own proposals.

    Essentially it occurred to me that we’re working with two constraints (among others): (1) to be semantically meaningful in the source code such that it’s machine parseable but not human-offensive and (2) to reflect the relative tag weight in the visual, “graph” display (after all, a tag “cloud” is really just a visualization of data, like a pie chart).

    In order to do this without being verbose in markup, while maintaining semantics, and while also allowing for default renderings to make sense in graph sense (think about a tagcloud on a cell phone browser — i.e. without styles applied), I realized that all this nonsense that we’re really looking at an ordered list and that presentation is not necessary is completely false. Indeed it is the rendering of the data that makes tagclouds useful — as a summary of the relationships between multiple items in a set. Additionally, to presume that tagclouds only refer to popularity (as opposed to prevalence or frequency) is also a terrible mistake: piecharts don’t only refer to how much apple strudel one can eat!

    Anyway, I choose a path that would be both semantically accurate, somewhat condensed and universally accessible. It can be improved, surely, but I think the combination of em, strong, big and small tags can be rather flexible at expressing thie graphed data.

    In any case, I’d love to see folks revisit their proposals given the constraints I’ve discussed above.

  20. § Nicolas Hoizey:

    I did also think a lot about these accessible tag clouds a while ago, in french for those who can read it: http://www.gasteroprod.com/comment-faire-un-tag-cloud-nuage-de-tags-ou-d-etiquettes-accessible.html

    Like Ben Ward, I also chose to use EM and STRONG elements, but without nesting them. I feel it to be as improper as nesting EM with another EM.

    As noted by Andy Hawkes, I am really not satisfied with an alphabeticaly ordered list, especially expressed as an accessibility improvement, because this is popularity that should be used as the ordering criterium. Having not yet figured out a simple JavaScript that could take a popularity ordered list and produce an alphabeticaly ordered one only for “visual” user agents, I still propose both lists on my page dedicated to tags: http://www.gasteroprod.com/tags/

    I think there is a need (and I would be really happy to contribute) for a microformat for tagclouds, starting with John Allsopp’s work (also available here http://microformats.org/wiki/tagcloud-brainstorming) and all your comments!

  21. § ampz:

    I think it better to use title attribute in “a” tag instead of off-left technic. Your code will look better, and not only screen reader can know that information (in off-left span) also.

Commenting is closed for this article.

24 ways: day 9