Archive for the “Twitter” Tag

Parse Twitter search atom feed using PHP and jQuery

Tuesday, May 19, 2009
by Bari

Many of us have twitter accounts and we all know twitter has an excellent search feature. Hundreds and thousands of tweeple are tweeting various topics every moment. Twitter gives us the facility to search in those tweets from outside using Search API’s search method. This method returns the results in 2 formats – json and atom. I will show how to parse the atom format of search results using PHP and how to display them using jQuery in this post.

Parsing the Atom feed

Lets take a look at a search feed’s source file :

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom" xmlns:twitter="http://api.twitter.com/">
  <id>tag:search.twitter.com,2005:search/twitter</id>
  <link type="text/html" rel="alternate" href="http://search.twitter.com/search?q=twitter"/>
  <link type="application/atom+xml" rel="self" href="http://search.twitter.com/search.atom?q=twitter&lang=en"/>
  <title>twitter - Twitter Search</title>
  <link type="application/opensearchdescription+xml" rel="search" href="http://search.twitter.com/opensearch.xml"/>
  <link type="application/atom+xml" rel="refresh" href="http://search.twitter.com/search.atom?lang=en&q=twitter&since_id=1855677271"/>
  <twitter:warning>adjusted since_id, it was older than allowedsince_id removed for pagination.</twitter:warning>
  <updated>2009-05-20T03:54:29Z</updated>
  <openSearch:itemsPerPage>15</openSearch:itemsPerPage>
  <openSearch:language>en</openSearch:language>
  <link type="application/atom+xml" rel="next" href="http://search.twitter.com/search.atom?lang=en&max_id=1855677271&page=2&q=twitter"/>
  <entry>
    <id>tag:search.twitter.com,2005:1855677271</id>
    <published>2009-05-20T03:54:29Z</published>
    <link type="text/html" rel="alternate" href="http://twitter.com/nihongotako/statuses/1855677271"/>
    <title>Respond to tgis twitter with a challenge u want to see accomplished in the next video!
www.youtube.com/fitzner123</title>
    <content type="html">Respond to tgis <b>twitter</b> with a challenge u want to see accomplished in the next video!

<a href="http://www.youtube.com/fitzner123">www.youtube.com/fitzner123</a></content>
    <updated>2009-05-20T03:54:29Z</updated>
    <link type="image/png" rel="image" href="http://s3.amazonaws.com/twitter_production/profile_images/217800128/Photo_13_normal.jpg"/>
    <twitter:source><a href="http://twitterfon.net/">TwitterFon</a></twitter:source>
    <twitter:lang>en</twitter:lang>
    <author>
      <name>nihongotako (Fitzner123)</name>
      <uri>http://twitter.com/nihongotako</uri>
    </author>
  </entry>

...................................
other results are truncated
..................................

</feed>

As we can see, its an XML file. Every search result is inside an entry tag. We have to parse contents of the tags inside those entry tags. I used PHP’s curl and simplexml_load_file to load and parse the atom feed (of search results for “twitter”). Here is the file that does the works : search.php

<?php
//URL encode the query string
$q = urlencode("twitter");

//request URL
$request = "http://search.twitter.com/search.atom?q=$q&lang=en";

$curl= curl_init();

curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1);

curl_setopt ($curl, CURLOPT_URL,$request);

$response = curl_exec ($curl);

curl_close($curl);

//remove "twitter:" from the $response string
$response = str_replace("twitter:", "", $response);

//convert response XML into an object
$xml = simplexml_load_string($response);

//wrapping the whole output with <result></result>
echo "<results>";

//loop through all the entry(s) in the feed
for($i=0;$i<count($xml->entry);$i++)
{

	//get the id from entry
	$id = $xml->entry[$i]->id;

	//explode the $id by ":"
	$id_parts = explode(":",$id);

	//the last part is the tweet id
	$tweet_id = array_pop($id_parts);

	//get the account link
	$account_link = $xml->entry[$i]->author->uri;

	//get the image link
	$image_link = $xml->entry[$i]->link[1]->attributes()->href;

	//get name from entry and trim the last ")"
	$name = trim($xml->entry[$i]->author->name, ")");

	//explode $name by the rest "(" inside it
	$name_parts = explode("(", $name);

	//get the real name of user from the last part
	$real_name = trim(array_pop($name_parts));

	//the rest part is the screen name
	$screen_name = trim(array_pop($name_parts));

	//get the published time, replace T and Z with " " and trim the last " "
	$published_time = trim(str_replace(array("T","Z")," ",$xml->entry[$i]->published));

	//get the status link
	$status_link = $xml->entry[$i]->link[0]->attributes()->href;

	//get the tweet
	$tweet = $xml->entry[$i]->content;

	//remove <b> and </b> from the tweet. If you want to show bold keyword then you can comment this line
	$tweet = str_replace(array("<b>", "</b>"), "", $tweet);

	//get the source link
	$source = $xml->entry[$i]->source;

	//the result div that holds the information
	echo '<div class="result" id="'. $tweet_id .'">
			<div class="profile_image"><a href="'. $account_link .'"><img src="'. $image_link .'"></a></div>
			<div class="status">
				<div class="content">
					<strong><a href="'. $account_link .'">'.$screen_name.'</a></strong> '. $tweet .'
				</div>
				<div class="time">
					'. $real_name .' at <a href="'. $status_link .'">'. $published_time .'</a> via '. $source .'
				</div>
			</div>
		</div>';
}

echo "</results>";

?>

When I parsed the XML feed into object using simplexml_load_file first time, I found its not loading the contents of twitter:source and twitter:lang tags.

<twitter:source><a href="http://twitter.com/">web</a></twitter:source>
<twitter:lang>en</twitter:lang>

So, I removed twitter: from these tag names :

//remove "twitter:" from the $response string
$response = str_replace("twitter:", "", $response);

That means these tags now became :

<source><a href="http://twitter.com/">web</a></source>
<lang>en</lang>

Now simplexml_load_file nicely parsed them.

These are the other extra works I did after parsing some useful data from each entry :

This is the id inside every entry:

<id>tag:search.twitter.com,2005:1846263500</id>

We just need 1846263500 from it.

//get the id from entry
$id = $xml->entry[$i]->id;

//explode the $id by ":"
$id_parts = explode(":",$id);

//the last part is the tweet id
$tweet_id = array_pop($id_parts);

The content of name tag is processed to separate real name and screen name :

<name>bauerbauerbauer (Danny Bauer)</name>

in this part :

//get name from entry and trim the last ")"
$name = trim($xml->entry[$i]->author->name, ")");

//explode $name by the rest "(" inside it
$name_parts = explode("(", $name);

//get the real name of user from the last part
$real_name = trim(array_pop($name_parts));

//the rest part is the screen name
$screen_name = trim(array_pop($name_parts));

The published tag contains time of this tweet was published :

<published>2009-05-19T11:32:11Z</published>

it is processed here to remove the T and Z from it :

//get the published time, replace T and Z with " " and trim the last " "
$published_time = trim(str_replace(array("T","Z")," ",$xml->entry[$i]->published));

I will describe the output part later.

The display page index.php

This is the simple page to display the search results – index.php

<html>
<head>
	<title>Twitter Search</title>
	<link href="style.css" rel="stylesheet" type="text/css">
	<script type="text/javascript" src="http://code.jquery.com/jquery-latest.pack.js"></script>
	<script src="search.js" type="text/javascript"></script>
</head>
<body>
	<div id="main">
		<h2>Search Results for "Twitter"</h2>
		<div id="update-alert"></div>
		<div id="twitter-update"></div>
	</div>
</body>
</html>

This page acts as the client side page for AJAX. It sends request to search.php and displays response result in the #twitter-update div.

The AJAX part using jQuery – search.js

We saw that search.php was used to manage the search atom feed and it will act as the server side page. This is the javascript file from where we will send request to search.php and process the response – search.js :

function loadSearch()
{
	//show message when the feed is loading
	$("#update-alert").text("Loading new results...");

	//ajax request to get the results from search.php
	$.ajax({
			url : "search.php",
			success : function(results){

			//counter to count new results
			var counter = 0;

			//find each 'div' with class 'result'  from the response and loop through them
			$(results).find("div.result").each(function(i){

				//get the id of the div
				var div_id = $(this).attr("id");

				//check if any div with the same id already exists
				if( $("#twitter-update div#" + div_id).length == 0 )
				{
					//if doesn't exist then prepend this div
					$("#twitter-update").prepend(this);

					//increase the counter
					counter++;
				}
			});

			//show the number of new results
			if(counter == 0)
				$("#update-alert").text("no new result");
			else if(counter == 1)
				$("#update-alert").text("1 new result");
			else
				$("#update-alert").text(counter + " new results");
   		}
	});

}

$(document).ready(function(){
	loadSearch();
	setInterval("loadSearch()",30000);
});

loadSearch() is the main thing here. This function mainly does all the works related to AJAX and display. This function is called first time when the page is loaded and after then its called every 30 seconds. Now lets see the response that is sent from search.php :

//wrapping the whole output with <result></result>
echo "<results>";

//loop through all the entry(s) in the feed
for($i=0;$i<count($xml->entry);$i++)
{
.....................................................
other codes are truncated
.....................................................

	//the result div that holds the information
	echo '<div class="result" id="'. $tweet_id .'">
			<div class="profile_image"><a href="'. $account_link .'"><img src="'. $image_link .'"></a></div>
			<div class="status">
				<div class="content">
					<strong><a href="'. $account_link .'">'.$screen_name.'</a></strong> '. $tweet .'
				</div>
				<div class="time">
					'. $real_name .' at <a href="'. $status_link .'">'. $published_time .'</a> via '. $source .'
				</div>
			</div>
		</div>';
}

echo "</results>";

When I first made this part it was not wrapped with any tag like results. There were only the .result divs and I simply populated the #twitter-update div with the response. But for many keywords twitter search atom feed does not change that much within 30 seconds. Search API’s search method sends 15 results every time (you can load more results by adding extra parameters in the feed request URL) and in case of a slow feed it is not a new set of 15 results every 30 seconds, may be 2-3 new results and 12-13 old results which are already loaded in the #twitter-update div. So, I had to find a way to handle these duplicate entries. I added $tweet_id as id in the .result divs. So, every .result div got a unique id.

echo '<div class="result" id="'. $tweet_id .'">

And also I wrapped all .result divs with results tag :

echo "<results>";
.........................
15 .result divs each having unique id
.........................
echo "</results>";

So, now the whole response will act as XML and it can be handled with jQuery. Here is the part that does the trick :

//find each 'div' with class 'result'  from the response and loop through them
$(results).find("div.result").each(function(i){

	//get the id of the div
	var div_id = $(this).attr("id");

	//check if any div with the same id already exists
	if( $("#twitter-update div#" + div_id).length == 0 )
	{
		//if doesn't exist then prepend this div
		$("#twitter-update").prepend(this);

		//increase the counter
		counter++;
	}
});

As you can see, this part finds every .result div from the response XML (results) and loops though them. Then it gets id of the div and finds if the #twitter-update div already has any div with that id. If it does not find any .result div with that id then it prepends that new .result div inside #twitter-update. Unfortunately, this part won’t work in IE. IE won’t let you parse the response XML! Good news is that all the other browsers are very friendly with this part and they will behave very well!

The #update-alert div shows “Loading new results…” and messages based on number of new search results (counter).

//show message when the feed is loading
$("#update-alert").text("Loading new results...");
.........................................................
.........................................................

//show the number of new results
if(counter == 0)
	$("#update-alert").text("no new result");
else if(counter == 1)
	$("#update-alert").text("1 new result");
else
	$("#update-alert").text(counter + " new results");

CSS Stylesheet – style.css

I added a simple stylesheet to beautify the search results in index.php. You can check this file from here.

twitter-search

Download the source files

Well, that’s all. I hope you enjoyed it. I will be grateful if anyone comes up with a working solution for the IE problem. I am waiting for your comments.