Andrew Jaswa

Twitter Bot

Twitter. We all love to hate it. I’ve been learning to love it more then hate it. The Twitter API is pretty fancy but does have a few limitations that can make it a little confusing. So as an experiment and a quest to work with the Twitter API, I’ve created a twitter bot. My inspiration came from my friend and fellow tweeter Claude Nix. A little while ago he wrote a Twitter bot to scrape Whiskey Militia and tweet the latest deal that they have going. Well WM has a sister site called Steep and Cheap. Now there is already a Twitter bot for SAC called TweetSAC but it updates in-frequently and randomly. This lead me to want to create my own for SAC. I thought: hey I can do it better. So I did.

Goals

I had a few goals when setting out in creating this bot. The bot must:

  1. Get the latest deal and tweet it.
  2. Use the Twitter API.
  3. Not use a database or flat file to store data.
  4. Use the least amount of bandwidth.

The bot

The bot is a simple PHP script run from a cron job on my webserver. With my goals above I have to grab some info from the SAC website. At first I just grabbed the whole page every time and striped out the part I needed (the title element). Now this uses a bit of bandwidth and isn’t very elegant. I had thought that I could use HTTP headers to see if the page had changed but the page changes every few seconds. With HTTP headers out the next step was to limit the amount of data that was downloaded. cURL has a fancy option called CURLOPT_RANGE, this allows you to set how many bytes you want to grab with a start and end point. Since the title element is near the start of the document I can just grab the first kilobyte of data and parse that. The full page weight is about 430K with about 17k of HTML. With cURL I can reduce the amount of data pulled down to about 1k (less if I play around with it some). So a reduction down to 1k is quite elegant.

I started out by storing the last deal in a text file with the script. Since I had some issues with the file and it didn’t seem like the best solution, I opted to use the Twitter API some more. I knew what the current deal was and how that was going to be tweeted. I also knew what my last tweet was from the Twitter API. So it was as simple to grab my last tweet and check it against what I was about to tweet. If they matched I wouldn’t tweet but if they didn’t off to twitter it went.

All in all I learned more about cURL, Twitter API, cron and PHP. I’m planning on releasing a GPL version of my bot soon.

If you want to check out the Twitter User that my bot tweets to you can find it at: twitter.com/sandctweet.

category code
tags: , ,
August 15, 2008

Font Survey

The reason

A while ago I got the idea to do some research on font and their usage across the internet. I was trying to figure out what font or typefaces people use and why. To me this is rather interesting because if you have any background in design or typography you’ll know that type conveys meaning and emotion. Would this not be true about the web? Could the typeface of a site convey something about the author or the message they are trying to get across? Maybe something they were feeling when they had it designed? Or maybe there is a corporate style guide that the designer had to follow when building the site? Or maybe that style guide was made with the idea of conveying emotion?

Whew… that’s a lot of questions I have. I’m not got to even try to answer them because, frankly, I can’t. I don’t know what was going through the designers heads. What I can do is survey websites and present the results.

The start

My initial survey was completed in early 2008 with about 100 sites. The sites I first selected were gleaned from the Alexa top 100 sites for the month of January 2008. Since I am from the US, speak English and am interested in western typefaces, I was only interested in English sites. It would be rather hard for me try to figure out different character sets other then Western/Latin. The rest of the sites I pulled were from sites I visit often.

This gave me a wide range of websites from categories of news and social network to retail and design. I figure that 100 sites or so of the whole internet would a fair sample to kick things off. Also I need some very highly and low trafficked sites to get a better idea of how people use type on the web.

The process

I began by going into the CSS and pulling out the *, html and body selectors and seeing what those were set to. In a lot of cases one of those three selectors set the font for the entire site. Great! Job done! Well… sort of. Some sites didn’t have one of those selectors setting the font. So I had to dig some more. Some sites had it set on the p selector, some just had IDs and classes. I ended up going through lots of CSS, some of it nicely organized and some of it downright disgusting.

As anyone who has been working with CSS and browsers for a bit, you would know that for the best results you want to set more then one font in your CSS declarations. So seeing something like this was far from uncommon:
font-family:Arial, Helvetica, Verdana, sans-serif;
I collected all the font information I could because who knows it could be useful at some point. Most of the font stats in this post and in the survey are based on the first font.

The odd bits

While going through the sites I noticed was that some sites would use one type for headings and another type for body text and yet another for their footer. In the case of Coudal Partners out of Chicago, they use Gill Sans for their H1, Times for the rest of their headings and Verdana for most of everything else. Now this puts me in a tight spot. All three faces are in the site, but I can’t then lump a site into a category or group. It got me thinking about what people “would/should/could” be reading the most.

I settled on going with what a majority of the text was set to. In the case of Coudal I settled on Verdana. Why? Because my thought was thus: If I (the user) is going to read, I’m going to read the majority of the text, so I’m going to see that face the most. In turn Verdana was used for a majority of the text in this case. I followed this same thinking for all the other sites I collected data on.

So why did Coudal use three typefaces on their site? I’m not sure but I bet it has something to do with a question I asked before: Could the typeface of a site convey something about the author or the message the are trying to get across?

The interesting bits

Some of the more interesting bits I found were the unbalanced serif to sans-serif ratio. In a sample of 112 sites 8.93% or 10 sites used serif fonts. Of the sites that used sans-serif fonts Arial came out on top with 46 sites. 35 sites had 1 primary font and 2 secondary fonts. 27 had 4 total fonts set. This one amazed me: 1 site (reference.com) had 8 total fonts set. "Lucida Sans Unicode", "Arial Unicode MS", "Lucida Sans", "Lucida Grande", Verdana, Helvetica, Arial, sans-serif;
This blew me away. Why would anyone want to set 8 fonts?

Check out the survey

July 16, 2008

Baseline: Markup

The other week I posted a CSS Baseline. So I’ve decided to create its counterpart: a Markup Baseline. I put some thought into if I should create a markup baseline in the first place. I can’t find any other attempts to create something like this. I believe this is due to the issues I ran into when creating this baseline.

Issues

Purpose

Well formed markup (semantic markup that is) is based on the content. Content is usually based on the purpose of a site. So how would you make a baseline all different kinds of content? You might just end up with a baseline for every different type of website out there. There would be millions of baselines then. And that wouldn’t be very productive.

doctype

Different types of sites may require different doctypes. I thought about making some php functions that would switch the doctype based upon your preference. That seemed a bit more like a framework and out of the scope of this project.

The Baseline

I’ve noticed over the years that I have been creating websites in similar fashion. Websites I’ve created lately have followed a set of ideas that I’ve concocted. The basics of theses ideas start with the structure of the web page. You can usually distill web page structure down to 4 major areas: the header, navigation, the main content area and the footer. Now this doesn’t work in all cases but it should work for most. Again this is an issue with the purpose of the site or page. However most sites will have these 4 elements. In the end I settled on the basic structure of 4 major elements and the XHTML 1.0 Strict doctype.

Markup Baseline

June 11, 2008
I build crappy websites every day!
Andrew Jaswa