Tweet! TWEET! Yes, I’ve gotten myself into twitter way too much lately. And with that I think I’m ready to release my twitter bot.
It’s under the GPL. So feel free to use it for things.
Here it is: Twitter Bot.
And a copy of the GPL v3.
October 11, 2008
A little bit ago I built a twitter bot called: AcroPoll. This is a fairly simple bot as all it does is generate a simple string of characters randomly selected from the alphabet and then tweets them. It’s part of a game that my friends and I have where someone makes up an acronym and others rattle off word that fit. We were finding that it would be picking Z, X and Y just as frequently as the other letters. This posed a problem since every one “knows” there are less words that start with Z, X and Y then the rest. Right? Well sure if you throw Q in there also.
With the help of some friends I’ve compiled a count of words by letter. (Of course this is for US English)
s : 31675 10.856%
c : 25994 8.909%
p : 23936 8.204%
a : 17704 6.068%
m : 17330 5.940%
d : 16463 5.643%
b : 16076 5.510%
r : 15406 5.280%
t : 15127 5.185%
e : 11457 3.927%
h : 11510 3.945%
f : 10441 3.579%
i : 10346 3.546%
g : 9899 3.393%
u : 9272 3.178%
l : 9263 3.175%
o : 9092 3.116%
n : 7445 2.552%
w : 6584 2.257%
v : 4747 1.627%
k : 4484 1.537%
j : 3041 1.042%
z : 1464 0.502%
q : 1446 0.496%
y : 1206 0.413%
x : 355 0.122%
I think what I find interesting is the 3% range. It makes sense when you think about it, but actually seeing it is something else.
I started out with an alphabet much like this:
$alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
But after reworking based on the above data I’ve come up with this:
$alphabet = "SSSSSCCCCCPPPPPAAAAAMMMM DDDDBBBBRRRRTTTTEEEHHHFFFIIIGG GUUULLLOOONNWWVVKKJJZQYX";
I started at the bottom and worked my way up from X to S. Each group level got one vote. So Q only gets one letter in my modified alphabet due to the lack of words that start with it. Now T gets four because it has more. I did group every thing above 6% together because I didn’t want S to come up a lot even though there are more words for it.
October 4, 2008
Twitter. We all love to hate it. I’ve been learning to love it more then hate it. The Twitter API is pretty fancy but does have a few limitations that can make it a little confusing. So as an experiment and a quest to work with the Twitter API, I’ve created a twitter bot. My inspiration came from my friend and fellow tweeter Claude Nix. A little while ago he wrote a Twitter bot to scrape Whiskey Militia and tweet the latest deal that they have going. Well WM has a sister site called Steep and Cheap. Now there is already a Twitter bot for SAC called TweetSAC but it updates in-frequently and randomly. This lead me to want to create my own for SAC. I thought: hey I can do it better. So I did.
Goals
I had a few goals when setting out in creating this bot. The bot must:
- Get the latest deal and tweet it.
- Use the Twitter API.
- Not use a database or flat file to store data.
- Use the least amount of bandwidth.
The bot
The bot is a simple PHP script run from a cron job on my webserver. With my goals above I have to grab some info from the SAC website. At first I just grabbed the whole page every time and striped out the part I needed (the title element). Now this uses a bit of bandwidth and isn’t very elegant. I had thought that I could use HTTP headers to see if the page had changed but the page changes every few seconds. With HTTP headers out the next step was to limit the amount of data that was downloaded. cURL has a fancy option called CURLOPT_RANGE, this allows you to set how many bytes you want to grab with a start and end point. Since the title element is near the start of the document I can just grab the first kilobyte of data and parse that. The full page weight is about 430K with about 17k of HTML. With cURL I can reduce the amount of data pulled down to about 1k (less if I play around with it some). So a reduction down to 1k is quite elegant.
I started out by storing the last deal in a text file with the script. Since I had some issues with the file and it didn’t seem like the best solution, I opted to use the Twitter API some more. I knew what the current deal was and how that was going to be tweeted. I also knew what my last tweet was from the Twitter API. So it was as simple to grab my last tweet and check it against what I was about to tweet. If they matched I wouldn’t tweet but if they didn’t off to twitter it went.
All in all I learned more about cURL, Twitter API, cron and PHP. I’m planning on releasing a GPL version of my bot soon.
If you want to check out the Twitter User that my bot tweets to you can find it at: twitter.com/sandctweet.
August 15, 2008