The Web Is Your CMS

Christian Heilmann

It is amazing what you can do these days with the services offered on the web. Flickr stores terabytes of photos for us and converts them automatically to all kind of sizes, finds people in them and even allows us to edit them online. YouTube does almost the same complete job with videos, LinkedIn allows us to maintain our CV, Delicious our bookmarks and so on.

We don’t have to do these tasks ourselves any more, as all of these systems also come with ways to use the data in the form of Application Programming Interfaces, or APIs for short. APIs give us raw data when we send requests telling the system what we want to get back.

The problem is that every API has a different idea of what is a simple way of accessing this data and in which format to give it back.

Making it easier to access APIs

What we need is a way to abstract the pains of different data formats and authentication formats away from the developer — and this is the purpose of the Yahoo Query Language, or YQL for short.

Libraries like jQuery and YUI make it easy and reliable to use JavaScript in browsers (yes, even IE6) and YQL allows us to access web services and even the data embedded in web documents in a simple fashion – SQL style.

Select * from the web and filter it the way I want

YQL is a web service that takes a few inputs itself:

A query that tells it what to get, update or access
An output format – XML, JSON, JSON-P or JSON-P-X
A callback function (if you defined JSON-P or JSON-P-X)

You can try it out yourself – check out this link to get back Flickr photos for the search term ‘santa’*%20from%20flickr.photos.search%20where%20text%3D%22santa%22&format=xml in XML format. The YQL query for this is

select * from flickr.photos.search where text="santa"

The easiest way to take your first steps with YQL is to look at the console. There you get sample queries, access to all the data sources available to you and you can easily put together complex queries. In this article, however, let’s use PHP to put together a web page that pulls in Flickr photos, blog posts, Videos from YouTube and latest bookmarks from Delicious.

Check out the demo and get the source code on GitHub.

<?php
  /* YouTube RSS */
  $query = 'select description from rss(5) where url="http://gdata.youtube.com/feeds/base/users/chrisheilmann/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile";';
  /* Flickr search by user id */
  $query .= 'select farm,id,owner,secret,server,title from flickr.photos.search where user_id="11414938@N00";';
  /* Delicious RSS */
  $query .= 'select title,link from rss where url="http://feeds.delicious.com/v2/rss/codepo8?count=10";';
  /* Blog RSS */
  $query .= 'select title,link from rss where url="http://feeds.feedburner.com/wait-till-i/gwZf"';
  /* The YQL web service root with JSON as the output */
  $root = 'http://query.yahooapis.com/v1/public/yql?format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys';
  /* Assemble the query */
  $query = "select * from query.multi where queries='".$query."'";
  $url = $root . '&q=' . urlencode($query);
  /* Do the curl call (access the data just like a browser would) */
  $ch = curl_init(); 
  curl_setopt($ch, CURLOPT_URL, $url); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
  $output = curl_exec($ch); 
  curl_close($ch);
  $data = json_decode($output);
  $results = $data->query->results->results;
  /* YouTube output */
  $youtube = '<ul id="youtube">';
  foreach($results[0]->item as $r){
	$cleanHTML = undoYouTubeMarkupCrimes($r->description);
	$youtube .= '<li>'.$cleanHTML.'</li>';
  }
  $youtube .= '</ul>';
  /* Flickr output */
  $flickr = '<ul id="flickr">';
  foreach($results[1]->photo as $r){
	$flickr .= '<li>'.
			   '<a href="http://www.flickr.com/photos/codepo8/'.$r->id.'/">'.
			   '<img src="http://farm' .$r->farm . '.static.flickr.com/'.
			   $r->server . '/' . $r->id . '_' . $r->secret . 
			   '_s.jpg" alt="'.$r->title.'"></a></li>';
  }
  $flickr .= '</ul>';
  /* Delicious output */
  $delicious = '<ul id="delicious">';
  foreach($results[2]->item as $r){
	$delicious .= '<li><a href="'.$r->link.'">'.$r->title.'</a></li>';
  }
  $delicious .= '</ul>';
  /* Blog output */
  $blog = '<ul id="blog">';
  foreach($results[3]->item as $r){
	$blog .= '<li><a href="'.$r->link.'">'.$r->title.'</a></li>';
  }
  $blog .= '</ul>';
  function undoYouTubeMarkupCrimes($str){
	$cleaner = preg_replace('/555px/','100%',$str);
	$cleaner = preg_replace('/width="[^"]+"/','',$cleaner);
	$cleaner = preg_replace('/<tbody>/','<colgroup><col width="20%"><col width="50%"><col width="30%"></colgroup><tbody>',$cleaner);
	return $cleaner;
  }
?>

What we are doing here is create a few different YQL statements and queue them together with the query.multi table. Each of these can be run inside YQL itself. Check out the YouTube, Flickr, Delicious and Blog example in the console if you don’t believe me. The benefit of using this table is that we don’t make individual requests for each query but we get all the data in one single request – which means a much better performing solution as the YQL server farm is faster on the web than our servers.

We point the query to the YQL web service end point and get the resulting data using cURL. All that we need to do then is to convert the returned data to HTML lists that can be printed out inside an HTML template.

Mixing, matching and using HTML as a data source

This was a simple example of what YQL can do for you. Where it gets really powerful however is by mixing and matching different APIs. YQL is also a good tool to get information from HTML documents. By using the html table you can load the content of an HTML document (which gets fixed automatically by HTMLTidy) and use XPATH to filter down results to what you need. Take the following example which takes headlines from the news.bbc.co.uk homepage and runs the results through Yahoo’s Term Extractor API to give you a list of currently hot topics.

select * from search.termextract where context in (
  select content from html where url="http://news.bbc.co.uk" and xpath="//table[@width=800]//a"
)

Try it out in the console or see the results here. In English, this means:

Go to http://news.bbc.co.uk and get me the HTML
Run it through HTML Tidy to clean it up.
Get me only the links inside the table with an attribute of width and the value 800
Get only the content of the link and for each of the links
1. Take the content and send it as context to the Yahoo Term Extractor API

If we choose JSON-P as the output format we can use the outcome directly in JavaScript (see this demo or see its source):

<ul id="hottopics"></ul>
<script type="text/javascript">
function hottopics(o){
  var res = o.query.results.Result,
	  all = res.length,
	  topics = {},
	  out = [],
	  html = '',
	  i=0;
  /* create hash from topics to prevent repetition */	 
  for(i=0;i<all;i++){
	topics[res[i]] = res[i];
  };
  for(i in topics){
	out.push(i);
  };
  html = '<li>' + out.join('</li><li>') + '</li>';
  document.getElementById('hottopics').innerHTML = html;
};
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=select%20content%20from%20search.termextract%20where %20context%20in%20(select%20content%20from%20html%20where%20url%3D%22http%3A%2F%2Fnews.bbc.co.uk%22%20and%20xpath%3D%22%2F%2Ftable%5B%40width%3D800%5D%2F%2Fa%22)&format=json&callback=hottopics"></script>

Using JSON, we can also use PHP which means the demo works for everybody – not only those with JavaScript enabled (see this demo or see its source):

<ul id="hottopics"><li>
<?php
$url = 'http://query.yahooapis.com/v1/public/yql?q=select%20content'.
	   '%20from%20search.termextract%20where%20context%20in'.
	   '%20(select%20content%20from%20html%20where%20url%3D%22'.
	   'http%3A%2F%2Fnews.bbc.co.uk%22%20and%20xpath%3D%22%2F%2F'.
	   'table%5B%40width%3D800%5D%2F%2Fa%22)&format=json';
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$output = curl_exec($ch); 
curl_close($ch);
$data = json_decode($output);
$topics = array_unique($data->query->results->Result);
echo join('</li><li>',$topics);
?>
</li></ul>

Summary

This article could only scratch the surface of YQL. You have not only read access to the web but you can also write to web services. For example you can update Twitter, post to your WordPress blog or shorten a URL with bit.ly. Using Open Tables you can add any web service to the YQL interface and you can even run server-side JavaScript which is for example useful to return Flickr photos as HTML or get the HTML content from a document that needs POST data.

The web of data is already here, and using YQL you don’t have to be a web services expert to use it and be part of it.

This article available in German at webkrauts.de

Christian Heilmann grew up in Germany and, after a year working for the red cross, spent a year as a radio producer. From 1997 onwards he worked for several agencies in Munich as a web developer. In 2000 he moved to the States to work for Etoys and, after the .com crash, he moved to the UK where he lead the web development department at Agilisys. In April 2006 he joined Yahoo! UK as a web developer and moved on to be the Lead Developer Evangelist for the Yahoo Developer Network. In December 2010 he moved on to Mozilla as Principal Developer Evangelist for HTML5 and the Open Web. He publishes an almost daily blog at http://wait-till-i.com and runs an article repository at http://icant.co.uk. He also authored Beginning JavaScript with DOM Scripting and Ajax: From Novice to Professional.

The Web Is Your CMS

Making it easier to access APIs

Select * from the web and filter it the way I want

Mixing, matching and using HTML as a data source

Summary

About the author

Comments