Jump to content

Year

Day

24 ways to impress your friends

By rights the internet should be full of poltergeists, poor rootless things looking for their real homes. Many events on the internet are not properly associated with their correct timeframe. I don’t mean a server set to the wrong time, though that happens too. Much of the content published on the internet is separated from any proper reference to its publication time. What does publication even mean? Let me tell you a story…

“It is 2019 and this is Kathy Clees reporting on the story of the moment, the shock purchase of Microsoft by Apple Inc. A Internet Explorer security scare story from 2008 was responsible, yes from 11 years ago, accidently promoted by an analyst, who neglected to check the date of their sources.”

If you think this is fanciful nonsense, then cast your mind back to September 2008, this story in Wired or The Times (UK) about a huge United Airlines stock tumble. A Florida newspaper had a automated popular story section. A random reader looking at a story about United’s 2002 Bankruptcy proceedings caused this story to get picked up by Google’s later visit to the South Florida Sun Sentinel’s news home page.

The story was undated, Google’s news engine apparently gave it a 2008 date, an analyst picked it up and pushed it to Bloomberg and within minutes the United stock was tumbling. Their stock price dropped from $12 to $3, then recovered to $11 over the day. An eight percent fall in share price over a mis-configured date

Completing this out of order Christmas Carol, lets look at what is current practice and how dates are managed, we might even get to clank some chains. Publication date used to be inseparable from publication, the two things where stamped on the same piece of paper. How can we determine when things have been published, now?

Determining publication dates

Time as defined by http://www.w3.org/TR/NOTE-datetime extends ISO 8601, mandating the use of a year value. This is pretty well defined, we can even get very accurate timings down to milliseconds, Ruby and other languages can even handle Calendar reformation. So accuracy is not the issue.

One problem is that there are many dates which could be interpreted as the publication date. Publication can mean any of date written or created; date placed on server; last modified date; or the current date from the web server. Created and modified have parallels with file systems, but the large number of database driven websites means that this no longer holds much meaning, as there are no longer any files.

Checking web server HEAD may also not correspond, it might give the creation time for the HTML file you are viewing or it might give the last modified time for a file from disk. It is too unreliable and lacking in context to be of real value. So if the web server will not help, then how can we get the right timeframe for our content?

We are left with URLs and the actual page content.

Looking at Flickr, this picture (by Douglas County History Research Center) has four date values which can be associated with it. It was taken around 1900, scanned in 1992 and placed on Flickr on July 29th, 2008 and replaced later that day. Which dates should be represented here?

This is hard question to answer, but currently the date of upload to Flickr is the best represented in terms of the date URL, /photos/douglascountyhistory/archives/date-posted/2008/07/29/, plus some Dublin Core RDF for the year. Flickr uses 2008 as the value for this image. Not accurate, but a reasonable compromise for the millions of other images on their site.

Flickr represents location much better than it represents time. For the most part this is fine, but once you go back in time to the 1800s then the maps of the world start to change a lot and you need to reference both time and place.

The Google timeline search offers another interesting window on the world, showing results organised by decade for any search term. Being able to jump to a specific occurrence of a term makes it easier to get primary results rather than later reporting.

The 1918 “Spanish flu” results jump out in this timeline.

Timeline search result from Google

Any major news event will have multiple analysis articles after the event, finding the original reporting of hurricane Katrina is harder now. Many publishers are putting older content online, e.g. Harpers or Nature or The Times, often these use good date based URLs, sometimes they are unhelpful database references. If this content is available for free, then how much better would it be to provide good metadata on date of publication.

Date based URLs

A quick word on date based URLs, they can be brilliant at capturing first published date. However they can be hard to interpret. Is /03/04 a date in March or April, what about 08/03/04? Obviously 2008/03/04 is easier to understand, it is probably March 4th. Including a proper timestamp in the page content avoid this kind of guesswork.

Many sites represent the date as a plain text string; a few hook an HTML class of date around it, a very few provide an actual timestamp. Associating the date with the individual content makes it harder to get the date wrong.

Movable Type and TypePad are a notable exceptions, they will embed Dublin Core RDF to represent each posting e.g. dc:date="2008-12-18T02:57:28-08:00". WordPress doesn’t support date markup out of the box, though there is a patch and a howto for hAtom available.

In terms of newspapers, the BBC use <meta name="OriginalPublicationDate" content="2008/12/18 18:52:05" /> along with opaque URLs such as http://news.bbc.co.uk/1/hi/technology/7787335.stm.

The Guardian use nice clear URLs http://www.guardian.co.uk/business/2008/dec/18/car-industry-recession but have no marked up date on the page.

The New York Times are similar to the Guardian with nice URLs, http://www.nytimes.com/2008/12/19/business/19markets.html, but again no timestamps. All of these papers have all the data available, but it is not marked up in a useful manner.

Syndication formats

Syndication formats are better at supporting dates, RSS uses RFC 822 for dates, just like email so dates such as Wed, 17 Dec 2008 12:52:40 GMT are valid, with all the white space issues that entails.

The Atom syndication format uses the much clearer http://tools.ietf.org/html/rfc3339 with timestamps of the form 1996-12-19T16:39:57-08:00. Both syndication formats encourage the use of last modified. This is understandable, but a pity as published date is a very useful value. The Atom syndication format supports “published” and mandates “updated” as timestamps, see the Atom RFC 4287 for more detail.

Marking up dates

However the aim of this short article is to encourage you to use microformats or RDF to encode dates. A good example of this is Twitter, they use hAtom for each individual entry, http://twitter.com/zzgavin/status/1065835819 contains the following markup, which represents a human and a machine readable version of the time of that tweet.

<span class="published" title="2008-12-18T22:01:27+00:00">about 3 hours ago</span>

The spec for datetime is still draft at the minute and there is still ongoing conversation around the right format and semantics for representing date and time in microformats, see the datetime design pattern for details.

The hAtom example page shows the minimal changes required to implement hAtom on well formed blog post content and for other less well behaved content. You have the information already in your content publication systems, this is not some additional onerous content entry task, simply some template formatting.

I started to see this as a serious issue after reading Stewart Brand’s Clock of the Long Now about five years ago. Brand’s book explores the issues of short term thinking that permeate our society, thinking beyond the end of the financial year is a stretch for many people. The Long Now has a world view of a 10,000 year timeframe, see http://longnow.org/ for much more information. Freebase from Long Now Board member Danny Hillis, supports dates quite well – see the entry for A Christmas Carol.

In conclusion

I feel we should be making it easier for people searching for our content in the future. We’ve moved through tagging content and on to geo-tagging content. Now it is time to get the timestamps right on our content. How do I know when something happened and how can I find other things that happened at the same time is a fair question. This should be something I can satisfy simply and easily. There are a range of tools available to us in either hAtom or RDF to specify time accurately alongside the content, so what is stopping you?

Thinking of the long term it is hard for us to know now what will be of relevance for future generations, so we should aim to raise the floor for publishing tools so that all content has the right timeframe associated with it. We are moving from publishing words and pictures on the internet to being able to associate publication with an individual via XFN and OpenID. We can associate place quite well too, the last piece of useful metadata is timeframe.

Like what you read?

Comments

Comments are ordered by helpfulness, as indicated by you. Help us pick out the gems and discourage asshattery by voting on notable comments.

Got something to add? You can leave a comment below.

  • AP http://www.moebiuscreative.com

    Great post. I often have a hard time finding a date stamp on a lot of blog posts, which is a really strange phenomenon to me. They should be a) present, and b) obvious.

    Thanks!

    Vote Helpful or Unhelpful

  • John Faulds http://www.tyssendesign.com.au

    Interesting article. I’ve just updated my WP site based on the hatom example. For anyone else wanting to do the same, and using the Microformats recommended date/time format, you’ll want to do this:

    <abbr class=“published” title=”<?php the_time(‘Y-m-d\TH:i:sO’) ?>”>…

    Vote Helpful or Unhelpful

  • Douglas Greenshields http://bedroomation.com/

    An important subject, and one the use of microformats like hAtom will only barely scratch the surface of (though scratch we must!). It’s sobering to realise that next to everything we emanate to the web will be available, before very long, to everyone forever. If you’re really considering the future, and your extremely privileged position being alive at the time of the first few microseconds of the web, every form of “content” (I’m trying to use that word in the best sense possible) should really have some kind of timestamp against it, even if it’s of a more appropriate level of specificity (for example, when’s the last time you read a book that gave its publication date to the nearest second?).

    Search engines need to pull their weight too – it must be possible to search the web along the time axis. I regularly search technical issues and find I turn up mostly pages prior to 2004, which are mostly useless – true, I can often tell the age of the page by its lack of adherence to any kind of less-is-more principle – and it’s interesting to note that we will tend to mark the coming decades using web design idioms – but there’s a difference between the sum of human knowledge now and the sum of human knowledge forever to infinity and beyond, and search on the web needs to start explicitly recognising that!

    Vote Helpful or Unhelpful

  • James Aylett http://tartarus.org/james/

    There’s a good reason for Atom’s having atom:published as optional but atom:updated as required (which is what I assume Gavin means by the somewhat opaque phrase “encourage[s] the use of last modified”): the date an entry was updated is more useful to sort on than the date it was published.

    (From what I can remember, and a quick look through the archive and wiki, dates in Atom were a complex debate that went on for months. At one point there were proposals on the table for up to five different dates associated with an entry; in the end it was decided to keep Atom slim, and allow extensions to carry the weight of further requirements.)

    While we’re here, Dublin Core has exactly the right term to cover the date of the subject of an article (eg: ‘some time in 1891’ according to the Flickr page for the photo above). Coverage is “the spatial or temporal topic of the resource”, and can be expressed as a named period, date or date range. For machine readability, ISO 8601:1988(E) is a good choice (you could use one of its profiles, such as RFC 3339, or W3CDTF as mentioned at the start of the article; 8601:1988(E) has the advantage of supporting start-end date pairs and durations as well as single instants in time).

    Vote Helpful or Unhelpful

  • Ben http://www.idhsolutions.com.au

    Thankyou for the wake up call. This is something I have often neglected in my themes (and URLs for that matter), mainly due for cosmetic reasons and laziness in my theme building.

    However, this has changed as of today. University emphasised how important RDF is becoming more and more each day, and since graduating I have neglected it. Turns out I should’ve paid more attention.

    Thanks again for reminding us that the content on the Internet lasts forever. And it appears that no one is archiving it correctly.

    What a crying shame.

    Vote Helpful or Unhelpful

  • Andi Farr http://www.semibad.com

    Wow, great article. Like several other commenters, I have been extremely slack in properly integrating date and time into my web content. I’ve also spent frustrating hours trying to ascertain the age of web documents on more than one occasion, so I should know better. Occasionally you do run into documents which are either extremely important or completely irrelavent, depending on their age – with no way of telling how old they are, it’s sometimes impossible to tell which it is!

    Anyway, this has been added as item 0 on my site revamp to-do list. Much obliged for the extremely insightful look at different ways of handling this crucial information, and for all the further reading that you’ve linked to.

    Take care, Andi

    Vote Helpful or Unhelpful

  • Roman Bercot http://romanbercot.wordpress.com

    I’m a bit disappointed to learn that there’s not a standalone microformat for date, but that the date pattern has to be part of a larger microformat. I think it would be useful to be able to arbitrarily mark things up with a machine-readable date.

    For instance, if I were writing a story that mentioned September 11th, it would seem handy to be able to put a span around it with a machine-readable date of 20010911. This is neither the publication nor modified date, but it is relevant to the content of the article.

    If anybody has ideas on this, I’d love to hear them.

    Vote Helpful or Unhelpful

  • Per Wiklander

    As some of the previous commenters have said, it is frustrating to look for information, mostly technical documentation in my case, that might be relevant or not at all, depending on the date it was published. A clear time stamp would help here (preferably at the top of the page).

    What would be even better is if the contents creator would take the time to actually mark the content as outdated when new information has become available. Not seldom I read a long article on one technical subject or an other, that actually has a recent date, only to see in comment #312 that it is not correct or no longer relevant.

    I’m almost thinking of creating something like oldpages.com (I made that URL up now) which would let people submit pages with a short comment describing why the content at that URL is outdated or actually dangerous. I guess a Firefox plugin could then use this information to display a warning message on the outdated page when visited. If the service actually tried to contact the content author it would be even better.

    Vote Helpful or Unhelpful

  • Chris http://installcms.com/

    Great article, and indeed a microformat should exist for it, but I doubt it will be used much.

    In my opinion, site owners intentionally omit the date, as to make the obsolete content more palatable to visitors referred by Google. Let’s face it, the web has a lot more old content than new.

    Vote Helpful or Unhelpful

  • Shaun http://www.douglascountyhistory.org

    Thanks for using our photo as an example! I just wanted to clarify (and this just adds to your argument, I think.) The photo was collected by the Douglas County History Research Center in 1992. It was scanned 08/30/2002. I’m not sure when it was first put online. Sometime in early 2003 I suspect. The Flickr project is a fairly new one for us.

    Vote Helpful or Unhelpful

  • Kris http://naturaldiabetescure.info

    I like the idea of geo-tagging. Its another great add to making the internet for value to the user.

    Vote Helpful or Unhelpful

  • blackdog

    great article, luckily enough (looking the bright side) my production is so small up to now that it will be very easy to be more compliant.
    i would correct saying this is not the last piece to add if not chronologically speaking, it should have been the first one. But it’s probably implied in the sentence :)

    Vote Helpful or Unhelpful

  • Kris http://spacemakercoffeemaker.com

    Very good point about dates. But this goes for everything now. There is lots of good information on the net but the 5% of junk can be dangerous if it gets passed along as fact!

    Vote Helpful or Unhelpful

  • Douglas http://dogtagsfordogs.org

    Hmm. I think if you could update content in a way as to make it expire at a given date, that could prevent the spreading of certain information that my become inaccurate over time. Not sure how to go about creating such a code though, and changing it manually sounds like quite a job.

    Vote Helpful or Unhelpful

Impress us

Be friendly / use Textile

About the author

Gavin Bell

Gavin Bell designs web applications and social software for the Nature Publishing Group. Large scale web applications covering identity, on-demand media and social software have been the main focus of his work. Since the early 90s he has worked in academia, advertising, publishing and developed multimedia software.

He is the author of a forthcoming book entitled Building Social Web Applications for O’Reilly Media Inc. He lives in London with his wife and two sons. He keeps track of the world on take one onion, you can keep track of him on twitter and gavinbell.com were he generally avoids the third person.

Photo: James Duncan Davidson

More information

Brought to you by:

Perch - a really little cms

The easiest way to publish fast, flexible HTML5 websites your clients will love.