Twitter Bot
Twitter. We all love to hate it. I’ve been learning to love it more then hate it. The Twitter API is pretty fancy but does have a few limitations that can make it a little confusing. So as an experiment and a quest to work with the Twitter API, I’ve created a twitter bot. My inspiration came from my friend and fellow tweeter Claude Nix. A little while ago he wrote a Twitter bot to scrape Whiskey Militia and tweet the latest deal that they have going. Well WM has a sister site called Steep and Cheap. Now there is already a Twitter bot for SAC called TweetSAC but it updates in-frequently and randomly. This lead me to want to create my own for SAC. I thought: hey I can do it better. So I did.
Goals
I had a few goals when setting out in creating this bot. The bot must:
- Get the latest deal and tweet it.
- Use the Twitter API.
- Not use a database or flat file to store data.
- Use the least amount of bandwidth.
The bot
The bot is a simple PHP script run from a cron job on my webserver. With my goals above I have to grab some info from the SAC website. At first I just grabbed the whole page every time and striped out the part I needed (the title element). Now this uses a bit of bandwidth and isn’t very elegant. I had thought that I could use HTTP headers to see if the page had changed but the page changes every few seconds. With HTTP headers out the next step was to limit the amount of data that was downloaded. cURL has a fancy option called CURLOPT_RANGE, this allows you to set how many bytes you want to grab with a start and end point. Since the title element is near the start of the document I can just grab the first kilobyte of data and parse that. The full page weight is about 430K with about 17k of HTML. With cURL I can reduce the amount of data pulled down to about 1k (less if I play around with it some). So a reduction down to 1k is quite elegant.
I started out by storing the last deal in a text file with the script. Since I had some issues with the file and it didn’t seem like the best solution, I opted to use the Twitter API some more. I knew what the current deal was and how that was going to be tweeted. I also knew what my last tweet was from the Twitter API. So it was as simple to grab my last tweet and check it against what I was about to tweet. If they matched I wouldn’t tweet but if they didn’t off to twitter it went.
All in all I learned more about cURL, Twitter API, cron and PHP. I’m planning on releasing a GPL version of my bot soon.
If you want to check out the Twitter User that my bot tweets to you can find it at: twitter.com/sandctweet.
