Jump to content

Year

Day

24 ways to impress your friends

A bare woofer, loudspeaker

With the hype around HTML5 and CSS3 exceeding levels not seen since 2005’s Ajax era, it’s worth noting that the excitement comes with good reason: the two specifications render many years of feature hacks redundant by replacing them with native features. For fun, consider how many CSS2-based rounded corners hacks you’ve probably glossed over, looking for a magic solution. These days, with CSS3, the magic is border-radius (and perhaps some vendor prefixes) followed by a coffee break.

CSS3’s border-radius, box-shadow, text-shadow and gradients, and HTML5’s <canvas>, <audio> and <video> are some of the most anticipated features we’ll see put to creative (ab)use as adoption of the ‘new shiny’ grows. Developers jumping on the cutting edge are using subsets of these features to little detriment, in most cases. The more popular CSS features are design flourishes that can degrade nicely, but the current audio and video implementations in particular suffer from a number of annoyances.

The new shiny: how we got here

Sound involves one of the five senses, a key part of daily life for most – and yet it has been strangely absent from HTML and much of the web by default. From a simplistic perspective, it seems odd that HTML did not include support for the full multimedia experience earlier, despite the CD-ROM-based craze of the early 1990s. In truth, standards like HTML can take much longer to bake, but eventually deliver the promise of a lowered barrier to entry, consistent implementations and shiny new features now possible ‘for free’ just about everywhere.

<img> was introduced early and naturally to HTML, despite having some opponents at the time. Perhaps <audio> and <video> were avoided, given the added technical complexity of decoding various multi-frame formats, plus the hardware and bandwidth limitations of the era. Perhaps there were quarrels about choosing a standard format or – more simply – maybe these elements just weren’t considered to be applicable to the HTML-based web at the time. In any event, browser plugins from programs like RealPlayer and QuickTime eventually helped to fill the in-page audio/video gap, handling <object> and <embed> markup which pointed to .wav, .avi, .rm or .mov files. Suffice it to say, the experience was inconsistent at best and, on the standards side of the fence right now, so is HTML5 in terms of audio and video.

<audio>: the theory

As far as HTML goes, the code for <audio> is simple and logical. Just as with <img>, a src attribute specifies the file to load. Pretty straightforward – sounds easy, right?

<audio src="mysong.ogg" controls>
	<!-- alternate content for unsupported case -->
	Download <a href="mysong.ogg">mysong.ogg</a>;
</audio>

Ah, if only it were that simple. The first problem is that the OGG audio format, while ‘free’, is not supported by some browsers. Conversely, nor is MP3, despite being a de facto standard used in all kinds of desktop software (and hardware). In fact, as of November 2010, no single audio format is commonly supported across all major HTML5-enabled browsers.

What you end up writing, then, is something like this:

<audio controls>
	<source src="mysong.mp3" />
	<source src="mysong.ogg" />
	<!-- alternate content for unsupported case, maybe Flash, etc. -->
	Download <a href="mysong.ogg">mysong.ogg</a> or <a href="mysong.mp3">mysong.mp3</a>
</audio>

Keep in mind, this is only a ‘first class’ experience for the HTML5 case; also, for non-supported browsers, you may want to look at another inline player (object/embed, or a JavaScript plus Flash API) to have inline audio. You can imagine the added code complexity in the case of supporting ‘first class’ experiences for older browsers, too.

<audio>: the caveats

With <img>, you typically don’t have to worry about format support – it just works – and that’s part of what makes a standard wonderful. JPEG, PNG, BMP, GIF, even TIFF images all render just fine if for no better reason, perhaps, than being implemented during the ‘wild west’ days of the web. The situation with <audio> today reflects a very different – read: business-aware – environment in 2010. (Further subtext: There’s a lot of [potential] money involved.) Regrettably, this is a collision of free and commercial interests, where the casualty is ultimately the user. Second up in the casualty list is you, the developer, who has to write additional code around this fragmented support.

The HTML5 audio API as implemented in JavaScript has one of the most un-computer-like responses I’ve ever seen, and inspired the title of this post. Calling new Audio().canPlayType('audio/mp3'), which queries the system for format support according to a MIME type, is supposed to return one of “probably”, “maybe”, or “no”. Sometimes, you’ll just get a null or empty string, which is also fun. A “maybe” response does not guarantee that a format will be supported; sometimes audio/mp3 gives “maybe,” but then audio/mpeg; codecs="mp3" will give a more-solid “probably” response. This can vary by browser or platform, too, depending on native support – and finally, the user may also be able to install codecs, extending support to include other formats. (Are you excited yet?)

Damn you, warring formats!

New market and business opportunities go hand-in-hand with technology developments. What we have here is certainly not failure to communicate; rather, we have competing parties shouting loudly in public in attempts to influence mindshare towards a de facto standard for audio and video. Unfortunately, the current situation means that at least two formats are effectively required to serve the majority of users correctly.

As it currently stands, we have the free and open source software camp of OGG Vorbis/WebM and its proponents (notably, Mozilla, Google and Opera in terms of browser makers), up against the non-free, proprietary and ‘closed’ camp of MP3 and MPEG4/HE-AAC/H.264 – which is where you’ll find commitments from Apple and Microsoft, among others. Apple is likely in with H.264 for the long haul, given its use of the format for its iTunes music store and video offerings.

It is generally held that H.264 is a technically superior format in terms of file size versus quality, but it involves intellectual property and, in many use cases, requires licensing fees. To be fair, there is a business model with H.264 and much has been invested in its development, but this approach is not often the kind that wins over the web. On that front, OGG/WebM may eventually win for being a ‘free’ format that does not involve a licensing scheme.

Closed software and tools ideologically clash with the open nature of the web, which exists largely thanks to free and open technology. Because of philosophical and business reasons, support for audio and video is fragmented across browsers adopting HTML5 features. It does not help that a large amount of audio and video currently exists in non-free MP3 and MPEG-4 formats. Adoption of <audio> and <video> may be slowed, since it is more complex than <img> and may feel ‘broken’ to developers when edge cases are encountered. Furthermore, the HTML5 spec does not mandate a single required format. The end result is that, as a developer, you must currently provide at least both MP3 and OGG, for example, to serve most existing HTML5-based user agents.

Transitioning to <audio>

A small circular "360-degree" player UI for audio with a play/pause button showing a progress ring, and a label

There will be some growing pains as developers start to pick up the new HTML5 shiny, while balancing the needs of current and older agents that don’t support either <audio> or the preferred format you may choose (for example, MP3). In either event, Flash or other plugins can be used as done traditionally within HTML4 documents to embed and play the relevant audio.

A screenshot of a Muxtape.com-styled UI with title, time, progress bars and a VU meter, part of SoundManager 2The SoundManager 2 page player demo in action.

Ideally, HTML5 audio should be used whenever possible with Flash as the backup option. A few JavaScript/Flash-based audio player projects exist which balance the two; in attempting to tackle this problem, I develop and maintain SoundManager 2, a JavaScript sound API which transparently uses HTML5 Audio() and, if needed, Flash for playing audio files. The internals can get somewhat ugly, but the transition between HTML4 and HTML5 is going to be just that – and even with HTML5, you will need some form of format fall-back in addition to graceful degradation.

It may be safest to fall back to MP3/MP4 formats for inline playback at this time, given wide support via Flash, some HTML5-based browsers and mobile devices. Considering the amount of MP3/MP4 media currently available, it is wiser to try these before falling through to a traditional file download process.

Early findings

Here is a brief list of behavioural notes, annoyances, bugs, quirks and general weirdness I have found while playing with HTML5-based audio at time of writing (November 2010):

Apple iPad/iPhone (iOS 4, iPad 3.2+)

  • Only one sound can be played at a time. If a second sound starts, the first is stopped.
  • No auto-play allowed. Sounds follow the pop-up window security model and can only be started from within a user event handler such as onclick/touch, and so on. Otherwise, playback attempts silently fail.
  • Once started, a sequence of sounds can be created or played via the ‘finish’ event of the previous sound (for example, advancing through a playlist without interaction after first track starts).
  • iPad, iOS 3.2: Occasional ‘infinite loop’ bug seen where audio does not complete and stop at a sound’s logical end – instead, it plays again from the beginning. Might be specific to example file format (HE-AAC) encoded from iTunes.

Apple Safari, OS X Snow Leopard 10.6.5

  • Critical bug: Safari 4 and 5 intermittently fail to load or play HTML5 audio on Snow Leopard due to bug(s) in QuickTime X and/or other underlying frameworks. Known Apple ‘radar’ bug: bugs.webkit.org #32159 (see also, test case.) Amusing side note: Safari on Windows is fine.

Apple Safari, Windows

  • Food for thought: if you download “Safari” alone on Windows, you will not get HTML5 audio/video support (tested in WinXP). You need to download “Safari + QuickTime” to get HTML5 audio/video support within Safari. (As far as I’m aware, Chrome, Firefox and Opera either include decoders or use system libraries accordingly. Presumably IE 9 will use OS-level APIs.)

General Quirks

  • Seeking and loading, ‘progress’ events, and calculating bytes loaded versus bytes total should not be expected to be linear, as users can arbitrarily seek within a sound. It appears that some support for HTTP ranges exists, which adds a bit of logic to UI code. Browsers seem to vary slightly in their current implementations of these features.
  • The onload event of a sound may be of little relevance, if non-linear loading is involved (see above note re: seeking).
  • Interestingly (perhaps I missed it), the current spec does not seem to specify a panning or left/right channel mix option.
  • The preload attribute values may vary slightly between browsers at this time.

Upcoming shiny: HTML5 Audio Data API

A screenshot of a circular audio UI design showing waveforms being drawn from audio data. With access to audio data, you can incorporate waveform and spectrum elements that make your designs react to music.

The HTML5 audio spec does a good job covering the basics of playback, but did not initially get into manipulation or generation of audio on-the-fly, something Flash has had for a number of years now. What if JavaScript could create, monitor and change audio dynamically, like a sort of audio <canvas> element? With that kind of capability, many dynamic audio processing features become feasible and, when combined with other media, can make for some impressive demos.

What started as a small idea among a small group of audio and programming enthusiasts grew to inspire a W3C audio incubator group, and continued to establish the Mozilla Audio Data API. Contributors wrote a patch for Firefox which was reviewed and revised, and is now slated to be in the public release of Firefox 4. Some background and demos are also detailed in an article from the BBC R&D blog.

There are plenty of live demos to see, which give an impression of the new creative ideas this API enables. Many concepts are not new in themselves, but it is exciting to see this sort of thing happening within the native browser context.

Mozilla is not alone in this effort; the WebKit folks are also working on a JavaScriptAudioNode interface, which implements similar audio buffering and sample elements.

The future?

It is my hope that we’ll see a common format emerge in terms of support across the major browsers for both audio and video; otherwise, support will continue to be fragmented and mildly frustrating to develop for, and that can impede growth of the feature. It’s a big call, but if <img> had lacked a common format back in the wild west era, I doubt the web would have grown to where it is today.

Complaints and nitpicks aside, HTML5 brings excellent progress on the browser multimedia front, and the first signs of native support are a welcome improvement given all audio and video previously relied on plugins. There is good reason to be excited. While there is room for more, support could certainly be much worse – and as tends to happen with specifications, the implementations targeting them should improve over time.

Note: Thanks to Nate Koechley, who suggested the Audio().canPlayType() response be part of the article title.

Like what you read?

Comments

Comments are ordered by helpfulness, as indicated by you. Help us pick out the gems and discourage asshattery by voting on notable comments.

Got something to add? You can leave a comment below.

  • David Calhoun http://davidbcalhoun.com

    Great article! Nice to see some attention being paid to the Audio tag (Video seems to have gotten all the attention these days).

    From some feature tests I’ve run, it looks like Audio is supported by Android – I wonder what sort of quirks await us there?

    Do you have any test pages up for folks who would be interested? You can be the next PPK.. for audio ;)

    Vote Helpful or Unhelpful

  • barryvan http://www.barryvan.com.au

    It’s interesting to note that reading through the entries on 24ways so far, Firefox seems to really be leading innovation. Yes, Jaegermonkey might not be as fast as v8, but things like the audio data API, -moz-calc(), -moz-any(), support for ES5 Harmony, etc. are, in my opinion at least, more important. :)

    Vote Helpful or Unhelpful

  • Tab Atkins http://www.xanthir.com

    Note: if a browser returns “no” for .canPlayType(), it’s an old implementation. The spec currently requires the empty string for “I know I can’t play this” (because the empty string is == false, so testing for the value is easy).

    The other two values, “maybe” and “probably”, are indeed pretty weird, but they’re a direct result of the weirdness surrounding audio formats and containers. To put it simply, it’s nearly impossible to tell if you can actually play a given file without just passing it to your decoder and seeing if it throws an error.

    So, at best browsers can offer a “probably” for when they’re pretty sure a particular file is playable. “maybe” is for the even more uncertain cases, like when you specify the type as a container format without giving the codec. The browser may support decoding the container, but without more knowledge it can’t give you an answer as to whether the file itself is playable.

    So, yeah, it’s kinda weird, but it’s a good compromise with reality. Just remember that the “I know I can’t play it” case can just be treated like it returned false, as long as you’re not using ===. Something easy like “if( (new Audio()).canPlayType(type) ) {…}” will work fine.

    Vote Helpful or Unhelpful

  • TheFella http://thefellagoesarctic.com

    Thanks for this article! I might redesign my music site over the holidays and this will be invaluable info!

    Vote Helpful or Unhelpful

  • Patrick H. Lauke http://www.splintered.co.uk

    “With img, you typically don’t have to worry about format support – it just works – and that’s part of what makes a standard wonderful. JPEG, PNG, BMP, GIF, even TIFF images all render just fine”

    note though that HTML never actually defined any formats that browsers should support. which image formats browsers actually support can also still vary (thinking PNG24 and IE for instance). and who can forget the patent war around GIF?

    yes, NOW the situation is fairly stable, but it’s not always been this way. same thing with audio/video formats/codecs

    Vote Helpful or Unhelpful

  • Nicolas Chevallier http://www.nicolas-chevallier.fr/

    Thanks for sharing your experience about audio tag. While the video tag has been largely tested, it’s the first I read about audio… I keep it in mind for future development.

    Vote Helpful or Unhelpful

  • Walt Ribeiro http://ForOrchestra.com

    As a musician, using the audio tag and the HTML5 vs. Flash setup was a drag.

    I agree, native audio support is the future – but I’ve decided that it’s too soon to worry about right now until everyone and every browser ‘catches up’.

    Vote Helpful or Unhelpful

  • Scott Schiller http://www.schillmania.com/

    Thanks, all, for the comments.

    David: I believe HTML5 audio is in the “Gingerbread” (Android 2.3) release. Re: Test pages, it might not be a bad idea to put together a table of common canPlayType() sort of calls.

    Tab: Good point re: canPlayType() and “no” responses; in testing I saw a mix of “no” and “”. The latter is smart, given it makes for a nice truthy/falsy test – even if the API is still goofy, it reflects the complicated situation of mixing containers, support and formats. In SoundManager 2, I look for a “probably” response as an indicator of support for a given format; I also specify codecs in the test and try a few variants to coax a best-case response out of the given browser. Despite best efforts, some browsers won’t say “probably” at this time.

    Of course, if developers simply drop <audio> elements and use multiple <source> tags, most of this becomes moot as the browser handles it all – but the dynamic, JavaScript application land is what I’m most interested in, and is where things get pretty funky. It’ll be interesting to see how things evolve.

    Patrick: You are correct that HTML does not mandate a standard image format, but it was a different (i.e., research-oriented) environment back when browsers were implementing image support vs. today’s business/commercial web world, of course.

    GIFs still work in browsers despite some copyright assertions, and we still have the IE 6 PNG problem, but that browser is finally – wait for it – fading, itself. ;) (zing! / groan / boo, hiss, etc.)

    I forgot to mention, I’m on the twitters over at @schill and try to share interesting and nifty things I find about JS, audio, CSS and so on.

    Vote Helpful or Unhelpful

  • Bart Lewis http://www.bartlewis.me

    Love the title of this post! While I understand we can’t really expect a boolean response (true/false), the return values of “probably”, “maybe”, and “no” are downright laughable. Truly a glass is half empty API. ;)

    Vote Helpful or Unhelpful

  • Remy Sharp http://twitter.com/rem

    One fairly significant bug I’ve found that’s apparent in most browsers:

    - audio (and video) http request aren’t sent the referrer header.

    So if someone decides to start hotlinking your content – which can be rather large with audio and video, there’s no way to control your bandwidth use.

    I’m not against sharing content, but I’ve been caught out where a massive web site starting linking to my videos and took my server down.

    Vote Helpful or Unhelpful

  • dxo

    The reason audio and video tags were not implemented in early html/etc (in my guess) is because it wasn’t realistic to even consider under the dial-up bandwidth of the time. Audio and video files take a lot of bandwidth, of course.

    I remember when downloading a 5meg file took anywhere from 15 to 45 minutes, depending on network conditions.

    YouTube would not even function on dial-up. It was the dark ages, for sure. ;)

    Vote Helpful or Unhelpful

Impress us

Be friendly / use Textile

About the author

Scott Schiller

Scott Schiller has been enjoying building web things since 1995. He also has a fondness for Creedence and the occasional White Russian.

Scott’s personal site is probably best known for its DHTML Arkanoid remake (2002), Snowstorm and holiday christmas light-smashing distractions and other random JavaScript + CSS-based experiments.

In his spare time, Scott tinkers with side projects like SoundManager 2, a JavaScript sound API used by some nifty sites to drive audio features. By day, he works for Yahoo! building shiny things at Flickr.

More information

Brought to you by:

Perch - a really little cms

The easiest way to publish fast, flexible HTML5 websites your clients will love.