Building a Twitter Status Cloud
Last week, I produced a word map of the statuses of the people I follow on Twitter. Willem Kossen asked if I could release the program under an open source license. Actually, I’ll do something I hope some of you will find even more helpful. I’ll produce it free, public domain, including my comments about how I put this together.
I actually started off trying to come up with some nice GraphViz images of various social networks I’m on. (For more about GraphViz, read my blog post Installing GraphViz in Drupal and Using GraphViz, a Brief Tutorial. You may also want to check out some of the GraphViz images I’ve uploaded to Flickr and a great Visualization of the Madoff Secruities “Feeder Funds”.
From my Flickr images, you’ll see that I like to create images of social networks using GraphViz, and I thought I would try to create an interesting image of my Identi.ca network. I like working with Identi.ca because it is open source and it uses open standards. For example, you can get my network on Identi.ca as a FOAF file. This is a standardized XML format that can easily be parsed.
In PHP, you can read a website, if you have curl installed fairly easily:
$ch=curl_init();
curl_setopt($ch,CURLOPT_URL,'http://identi.ca/'.$target.'/foaf');
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$xmlstr = curl_exec($ch);
curl_close($ch);
This little snipped of PHP opens a channel which I’m calling $ch. It goes out and gets the FOAF file for whichever $target I specify. The result is saved in a string called $xmlstr.
With this, you can then parse the XML into an easy to use structure using SimpleXML.
try {
$xml = new SimpleXMLElement($xmlstr);
} catch (Exception $e) {
print "Skipping " . $target . "\n";
}
I use a ‘try’ around the calling of SimpleXMLElement in case the $xmlstr doesn’t contain valid XML. In my case, I just skip the records that don’t have valid XML.
The next part is where I’ve always needed to explore a little bit to make sure that I get the right syntax. XML documents can be multiple levels and they get mapped into structures within structures within structures in PHP with the SimpleXMLElement function.
In this case, the information about the first person in the FOAF document can be found as
$xml->Person[0]->holdsAccount->OnlineAccount->accountName[0];
The people that the person knows can be found by incrementing the index of Person. So, I wrote a loop to go through the structure and write out all the relationships in GraphViz format. I also built a list of other FOAF files to extract the relationships so I could get additional degrees of separation.
Unfortunately, I have a lot of friends on Identica, and most of them have lots of friends as well, and the graph became unmanageable. I kicked around building some filters to only track special friends, but didn’t come up with anything good, so I set aside the identi.ca graphing.
MyBlogLog also provides FOAF files. In addition, the MyBlogLog FOAF files includes links to other services that users have specified. Unfortunately, the MyBlogLog FOAF files does uses namespaces which complicates the parsing. In addition, I probably have even more friends on MyBlogLog than I do on identi.ca, so I set that aside.
Which takes me to Twitter. Twitter also gives you the ability to extract information in XML. As an example, you can get my most recent 100 friends on Twitter, including their name, screen name, location, description, and most recent status. For the status, there is information such as what it says, when it was created, what tool was used, etc.
As I noted, you can get up to 100 friends worth of statuses at a time. If you have lots of friends, you need to loop through all of them.
So, I used the curl and SimpleXML processing above, together with some extra looping to pull all the statuses. With that, here is the PHP program that I used:
<?php
$page = 1;
while(1) {
$ch=curl_init();
curl_setopt($ch,CURLOPT_URL,'http://twitter.com/statuses/friends/ahynes1.xml?p
age='.$page);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$xmlstr = curl_exec($ch);
curl_close($ch);try {
$xml = new SimpleXMLElement($xmlstr);
} catch (Exception $e) {
exit;
}$i = 0;
$uname = $xml->user[$i]->name;
if ($uname == '') exit;
while($uname != '') {
$status = $xml->user[$i]->status->text;
print $uname . " : " . $status . "\n";
$i = $i + 1;
$uname = $xml->user[$i]->name;
}
$page = $page + 1;
}
?>
As you can see, you simply put the name of the person you want in the URL and off you go. Caveats: You don’t need to login to Twitter to be able to do this, and you can do it for anyone, providing the people they follow don’t have their Tweets protected. However, you will get limited if you try to do more a lot of pages at the same time.
What I did was save the results to a file that you can wee here. The next step was to paste the text into Wordle.net I then took a screen print of the page and saved it as an image. I could probably search around for some other word cloud software and do that as part of the process, but this is good enough for now.
A minor change and this could be used to show the description of the people that I follow, or the people that follow me. Someone else has already set up a word cloud generator like that, and you can see the word cloud of the bios of people that follow me at TwitterSheep
So, with that, here is my Friday evening word cloud of statuses of the people that I am following, thanks to a little PHP using curl and SimpleXML as well as the word cloud software at Wordle.net: