Andrew Jaswa

Twitter Bot

Twitter. We all love to hate it. I’ve been learning to love it more then hate it. The Twitter API is pretty fancy but does have a few limitations that can make it a little confusing. So as an experiment and a quest to work with the Twitter API, I’ve created a twitter bot. My inspiration came from my friend and fellow tweeter Claude Nix. A little while ago he wrote a Twitter bot to scrape Whiskey Militia and tweet the latest deal that they have going. Well WM has a sister site called Steep and Cheap. Now there is already a Twitter bot for SAC called TweetSAC but it updates in-frequently and randomly. This lead me to want to create my own for SAC. I thought: hey I can do it better. So I did.

Goals

I had a few goals when setting out in creating this bot. The bot must:

  1. Get the latest deal and tweet it.
  2. Use the Twitter API.
  3. Not use a database or flat file to store data.
  4. Use the least amount of bandwidth.

The bot

The bot is a simple PHP script run from a cron job on my webserver. With my goals above I have to grab some info from the SAC website. At first I just grabbed the whole page every time and striped out the part I needed (the title element). Now this uses a bit of bandwidth and isn’t very elegant. I had thought that I could use HTTP headers to see if the page had changed but the page changes every few seconds. With HTTP headers out the next step was to limit the amount of data that was downloaded. cURL has a fancy option called CURLOPT_RANGE, this allows you to set how many bytes you want to grab with a start and end point. Since the title element is near the start of the document I can just grab the first kilobyte of data and parse that. The full page weight is about 430K with about 17k of HTML. With cURL I can reduce the amount of data pulled down to about 1k (less if I play around with it some). So a reduction down to 1k is quite elegant.

I started out by storing the last deal in a text file with the script. Since I had some issues with the file and it didn’t seem like the best solution, I opted to use the Twitter API some more. I knew what the current deal was and how that was going to be tweeted. I also knew what my last tweet was from the Twitter API. So it was as simple to grab my last tweet and check it against what I was about to tweet. If they matched I wouldn’t tweet but if they didn’t off to twitter it went.

All in all I learned more about cURL, Twitter API, cron and PHP. I’m planning on releasing a GPL version of my bot soon.

If you want to check out the Twitter User that my bot tweets to you can find it at: twitter.com/sandctweet.

category code
tags: , ,
August 15, 2008

5 Comments »

  1. I’m curious, why are you scraping their site instead of getting the latest deal from their RSS feed and posting that to Twitter?

    Comment by jeremy — August 28, 2008 @ 5:27 pm

  2. Ahh good question. I guess I forgot to mention that their RSS feed was very slow to update even worse then the other Twitter bot. It might not be the case anymore but it gave me the chance to learn the Twitter API and write some code.

    Comment by ajaswa — August 28, 2008 @ 5:31 pm

  3. It would be cool if your bot would separate deals for men and women. For instance, have deals for men go to twitter.com/sandcmen and deals for women to twitter.com/sandcwomen. Then if a deal is not gender specific it could go to both.

    Comment by jeremy — August 28, 2008 @ 5:33 pm

  4. Just wanted to pop in and say nice job. I wrote TweetSAC for almost the same reasons you did and it was a fun project. You are right, my bot sometimes gets stuck and it is because I have it running on an old PC that craps out on me. Another problem I found with scraping the site was ISP caching. Luckily I found that they have a different RSS feed that updates the instant they change the site and you can find it at http://www.steepandcheap.com/docs/steepcheap/rssplus.xml. That is the feed that their desktop app hits so you can pull from that if you want as well.

    Again, very nice job! If yours turns out to be more stable than mine then I will shut mine off and follow yours.

    Comment by Luke — September 3, 2008 @ 7:04 pm

  5. Thanks Luke! I have a few more modifications I need to make to it before I release it. I would love to get your feed back on it.

    Comment by ajaswa — September 5, 2008 @ 6:53 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

I build crappy websites every day!
Andrew Jaswa