Parse Twitter search atom feed using PHP and jQuery

2009 May 19
by Bari

Many of us have twitter accounts and we all know twitter has an excellent search feature. Hundreds and thousands of tweeple are tweeting various topics every moment. Twitter gives us the facility to search in those tweets from outside using Search API’s search method. This method returns the results in 2 formats – json and atom. I will show how to parse the atom format of search results using PHP and how to display them using jQuery in this post.

Parsing the Atom feed

Lets take a look at a search feed’s source file :

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom" xmlns:twitter="http://api.twitter.com/">
  <id>tag:search.twitter.com,2005:search/twitter</id>
  <link type="text/html" rel="alternate" href="http://search.twitter.com/search?q=twitter"/>
  <link type="application/atom+xml" rel="self" href="http://search.twitter.com/search.atom?q=twitter&lang=en"/>
  <title>twitter - Twitter Search</title>
  <link type="application/opensearchdescription+xml" rel="search" href="http://search.twitter.com/opensearch.xml"/>
  <link type="application/atom+xml" rel="refresh" href="http://search.twitter.com/search.atom?lang=en&q=twitter&since_id=1855677271"/>
  <twitter:warning>adjusted since_id, it was older than allowedsince_id removed for pagination.</twitter:warning>
  <updated>2009-05-20T03:54:29Z</updated>
  <openSearch:itemsPerPage>15</openSearch:itemsPerPage>
  <openSearch:language>en</openSearch:language>
  <link type="application/atom+xml" rel="next" href="http://search.twitter.com/search.atom?lang=en&max_id=1855677271&page=2&q=twitter"/>
  <entry>
    <id>tag:search.twitter.com,2005:1855677271</id>
    <published>2009-05-20T03:54:29Z</published>
    <link type="text/html" rel="alternate" href="http://twitter.com/nihongotako/statuses/1855677271"/>
    <title>Respond to tgis twitter with a challenge u want to see accomplished in the next video!
www.youtube.com/fitzner123</title>
    <content type="html">Respond to tgis <b>twitter</b> with a challenge u want to see accomplished in the next video!

<a href="http://www.youtube.com/fitzner123">www.youtube.com/fitzner123</a></content>
    <updated>2009-05-20T03:54:29Z</updated>
    <link type="image/png" rel="image" href="http://s3.amazonaws.com/twitter_production/profile_images/217800128/Photo_13_normal.jpg"/>
    <twitter:source><a href="http://twitterfon.net/">TwitterFon</a></twitter:source>
    <twitter:lang>en</twitter:lang>
    <author>
      <name>nihongotako (Fitzner123)</name>
      <uri>http://twitter.com/nihongotako</uri>
    </author>
  </entry>

...................................
other results are truncated
..................................

</feed>

As we can see, its an XML file. Every search result is inside an entry tag. We have to parse contents of the tags inside those entry tags. I used PHP’s curl and simplexml_load_file to load and parse the atom feed (of search results for “twitter”). Here is the file that does the works : search.php

<?php
//URL encode the query string
$q = urlencode("twitter");

//request URL
$request = "http://search.twitter.com/search.atom?q=$q&lang=en";

$curl= curl_init();

curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1);

curl_setopt ($curl, CURLOPT_URL,$request);

$response = curl_exec ($curl);

curl_close($curl);

//remove "twitter:" from the $response string
$response = str_replace("twitter:", "", $response);

//convert response XML into an object
$xml = simplexml_load_string($response);

//wrapping the whole output with <result></result>
echo "<results>";

//loop through all the entry(s) in the feed
for($i=0;$i<count($xml->entry);$i++)
{

	//get the id from entry
	$id = $xml->entry[$i]->id;

	//explode the $id by ":"
	$id_parts = explode(":",$id);

	//the last part is the tweet id
	$tweet_id = array_pop($id_parts);

	//get the account link
	$account_link = $xml->entry[$i]->author->uri;

	//get the image link
	$image_link = $xml->entry[$i]->link[1]->attributes()->href;

	//get name from entry and trim the last ")"
	$name = trim($xml->entry[$i]->author->name, ")");

	//explode $name by the rest "(" inside it
	$name_parts = explode("(", $name);

	//get the real name of user from the last part
	$real_name = trim(array_pop($name_parts));

	//the rest part is the screen name
	$screen_name = trim(array_pop($name_parts));

	//get the published time, replace T and Z with " " and trim the last " "
	$published_time = trim(str_replace(array("T","Z")," ",$xml->entry[$i]->published));

	//get the status link
	$status_link = $xml->entry[$i]->link[0]->attributes()->href;

	//get the tweet
	$tweet = $xml->entry[$i]->content;

	//remove <b> and </b> from the tweet. If you want to show bold keyword then you can comment this line
	$tweet = str_replace(array("<b>", "</b>"), "", $tweet);

	//get the source link
	$source = $xml->entry[$i]->source;

	//the result div that holds the information
	echo '<div class="result" id="'. $tweet_id .'">
			<div class="profile_image"><a href="'. $account_link .'"><img src="'. $image_link .'"></a></div>
			<div class="status">
				<div class="content">
					<strong><a href="'. $account_link .'">'.$screen_name.'</a></strong> '. $tweet .'
				</div>
				<div class="time">
					'. $real_name .' at <a href="'. $status_link .'">'. $published_time .'</a> via '. $source .'
				</div>
			</div>
		</div>';
}

echo "</results>";

?>

When I parsed the XML feed into object using simplexml_load_file first time, I found its not loading the contents of twitter:source and twitter:lang tags.

<twitter:source><a href="http://twitter.com/">web</a></twitter:source>
<twitter:lang>en</twitter:lang>

So, I removed twitter: from these tag names :

//remove "twitter:" from the $response string
$response = str_replace("twitter:", "", $response);

That means these tags now became :

<source><a href="http://twitter.com/">web</a></source>
<lang>en</lang>

Now simplexml_load_file nicely parsed them.

These are the other extra works I did after parsing some useful data from each entry :

This is the id inside every entry:

<id>tag:search.twitter.com,2005:1846263500</id>

We just need 1846263500 from it.

//get the id from entry
$id = $xml->entry[$i]->id;

//explode the $id by ":"
$id_parts = explode(":",$id);

//the last part is the tweet id
$tweet_id = array_pop($id_parts);

The content of name tag is processed to separate real name and screen name :

<name>bauerbauerbauer (Danny Bauer)</name>

in this part :

//get name from entry and trim the last ")"
$name = trim($xml->entry[$i]->author->name, ")");

//explode $name by the rest "(" inside it
$name_parts = explode("(", $name);

//get the real name of user from the last part
$real_name = trim(array_pop($name_parts));

//the rest part is the screen name
$screen_name = trim(array_pop($name_parts));

The published tag contains time of this tweet was published :

<published>2009-05-19T11:32:11Z</published>

it is processed here to remove the T and Z from it :

//get the published time, replace T and Z with " " and trim the last " "
$published_time = trim(str_replace(array("T","Z")," ",$xml->entry[$i]->published));

I will describe the output part later.

The display page index.php

This is the simple page to display the search results – index.php

<html>
<head>
	<title>Twitter Search</title>
	<link href="style.css" rel="stylesheet" type="text/css">
	<script type="text/javascript" src="http://code.jquery.com/jquery-latest.pack.js"></script>
	<script src="search.js" type="text/javascript"></script>
</head>
<body>
	<div id="main">
		<h2>Search Results for "Twitter"</h2>
		<div id="update-alert"></div>
		<div id="twitter-update"></div>
	</div>
</body>
</html>

This page acts as the client side page for AJAX. It sends request to search.php and displays response result in the #twitter-update div.

The AJAX part using jQuery – search.js

We saw that search.php was used to manage the search atom feed and it will act as the server side page. This is the javascript file from where we will send request to search.php and process the response – search.js :

function loadSearch()
{
	//show message when the feed is loading
	$("#update-alert").text("Loading new results...");

	//ajax request to get the results from search.php
	$.ajax({
			url : "search.php",
			success : function(results){

			//counter to count new results
			var counter = 0;

			//find each 'div' with class 'result'  from the response and loop through them
			$(results).find("div.result").each(function(i){

				//get the id of the div
				var div_id = $(this).attr("id");

				//check if any div with the same id already exists
				if( $("#twitter-update div#" + div_id).length == 0 )
				{
					//if doesn't exist then prepend this div
					$("#twitter-update").prepend(this);

					//increase the counter
					counter++;
				}
			});

			//show the number of new results
			if(counter == 0)
				$("#update-alert").text("no new result");
			else if(counter == 1)
				$("#update-alert").text("1 new result");
			else
				$("#update-alert").text(counter + " new results");
   		}
	});

}

$(document).ready(function(){
	loadSearch();
	setInterval("loadSearch()",30000);
});

loadSearch() is the main thing here. This function mainly does all the works related to AJAX and display. This function is called first time when the page is loaded and after then its called every 30 seconds. Now lets see the response that is sent from search.php :

//wrapping the whole output with <result></result>
echo "<results>";

//loop through all the entry(s) in the feed
for($i=0;$i<count($xml->entry);$i++)
{
.....................................................
other codes are truncated
.....................................................

	//the result div that holds the information
	echo '<div class="result" id="'. $tweet_id .'">
			<div class="profile_image"><a href="'. $account_link .'"><img src="'. $image_link .'"></a></div>
			<div class="status">
				<div class="content">
					<strong><a href="'. $account_link .'">'.$screen_name.'</a></strong> '. $tweet .'
				</div>
				<div class="time">
					'. $real_name .' at <a href="'. $status_link .'">'. $published_time .'</a> via '. $source .'
				</div>
			</div>
		</div>';
}

echo "</results>";

When I first made this part it was not wrapped with any tag like results. There were only the .result divs and I simply populated the #twitter-update div with the response. But for many keywords twitter search atom feed does not change that much within 30 seconds. Search API’s search method sends 15 results every time (you can load more results by adding extra parameters in the feed request URL) and in case of a slow feed it is not a new set of 15 results every 30 seconds, may be 2-3 new results and 12-13 old results which are already loaded in the #twitter-update div. So, I had to find a way to handle these duplicate entries. I added $tweet_id as id in the .result divs. So, every .result div got a unique id.

echo '<div class="result" id="'. $tweet_id .'">

And also I wrapped all .result divs with results tag :

echo "<results>";
.........................
15 .result divs each having unique id
.........................
echo "</results>";

So, now the whole response will act as XML and it can be handled with jQuery. Here is the part that does the trick :

//find each 'div' with class 'result'  from the response and loop through them
$(results).find("div.result").each(function(i){

	//get the id of the div
	var div_id = $(this).attr("id");

	//check if any div with the same id already exists
	if( $("#twitter-update div#" + div_id).length == 0 )
	{
		//if doesn't exist then prepend this div
		$("#twitter-update").prepend(this);

		//increase the counter
		counter++;
	}
});

As you can see, this part finds every .result div from the response XML (results) and loops though them. Then it gets id of the div and finds if the #twitter-update div already has any div with that id. If it does not find any .result div with that id then it prepends that new .result div inside #twitter-update. Unfortunately, this part won’t work in IE. IE won’t let you parse the response XML! Good news is that all the other browsers are very friendly with this part and they will behave very well!

The #update-alert div shows “Loading new results…” and messages based on number of new search results (counter).

//show message when the feed is loading
$("#update-alert").text("Loading new results...");
.........................................................
.........................................................

//show the number of new results
if(counter == 0)
	$("#update-alert").text("no new result");
else if(counter == 1)
	$("#update-alert").text("1 new result");
else
	$("#update-alert").text(counter + " new results");

CSS Stylesheet – style.css

I added a simple stylesheet to beautify the search results in index.php. You can check this file from here.

twitter-search

Download the source files

Well, that’s all. I hope you enjoyed it. I will be grateful if anyone comes up with a working solution for the IE problem. I am waiting for your comments.

19 Responses leave one →
  1. Manjiri

    Hello Bari,
    Ur script is awesome…ne luck on IE?
    Btw it works differently on php version 5.0.4 …, works great on the newer version.
    Thanks

  2. Sebastien

    Hi,

    Thanks for your great work on this topic… !!!

    I’ve been searching a long time for this kind of script. Yours is perfect apart from the IE problem…. :-(

    I found a script that uses Jquery like yours but that works with IE except it refreshes the whole data instead of just refreshing if something new comes up in the feed:
    http://css-tricks.com/video-screencasts/60-ajax-refreshing-rss-content/
    http://nikibrown.com/bananatweets/

    You might find a workaround IE with his code… I tried but I’m not very familiar with JQuery…

    Let me know if that helps !!!

    Thanks again !

  3. thanks a lot bari, your tips was very useful

    hugs from Brazil.

  4. wow great work bari. I just downloaded your sources, but – it is not working on my side. I only get a “no new result”. maybe I forgot something .. or twitter changed something?
    best regards and thanks,
    juergen

  5. Bari,

    I just discovered your blog from a google search today

    keep up these great posts.

    I love learning from them

  6. A wonderful script, thanks for the effort.

    Two suggestions:

    1. Change the ‘for ()’ loop direction in the search.php page to display the results with the latest at the top, rather than the latest at the bottom. The twitter search API has a limit on results given through the API so changing your ‘for ()’ loop from ($i=0;$ientry);$i++) to ($i=14;$i>=0;$i–) will change the result direction with newest tweet result at the top.

    2. Use the suggestions in this link: http://tinyurl.com/lsfc6k to help correct the IE non-display error. This seems like a common bug that users are experiencing with your script. I have customized your code to work in all browsers for my needs with the help of the suggestions in said link.

    Thank you again for the great script.

  7. ritesh ambastha

    Thats a great help !! Thanks a lot.

    How can we keep a input box and take keyword from users to change the value of $q ?

  8. Очень понравился ваш блог! Подписался на rss. Буду регулярно читать.

  9. Спасибо за очччень информативную информацию насчёт етого всего вам!

  10. Hello, nice tutorial, can you upload your source code again, because i’m getting a “not found” error

  11. Awesome post! I’ve been pulling my hair out trying to figure out how to get the out. Thanks!!!

  12. Nash

    Bari, great post!!Any idea if this can be used with the popular Thesis Theme?

  13. Well done mate with explanation step by step your tutorial very easy to understood and then practice it for newbie like me. One answer mate about this script..

    How to limited result become only showing 5 search result?

    Apreciate if you have time answer my question.

    Regards,

  14. Well done mate with explanation step by step your tutorial very easy to understood and then practice it for newbie like me. One question mate about this script..

    How to limited result become only showing 5 search result?

    Apreciate if you have time answer my question.

    Regards,

  15. Thanks for the very useful post. Saved me the trouble of figuring out the parsing myself!

  16. Thank you so much for this feed parser. I was busy getting all information from the atom feed, but the tags with ” : ” were a problem for me. This is complete.

  17. Хороший пост, прочитав пару книг на тему всё таки не взглянул со стороны, а пост как-то задел.

  18. Dennis

    Greetings:

    Any ideas on a fix so results show on IE?

    I’m very new to all of this and do not know how to make work on Internet Explorer.

    Thank you so much.

Trackbacks & Pingbacks

  1. bari’s blog: Parse Twitter search atom feed using PHP and jQuery

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS