Hung Truong: The Blog!

Conditional GETs in App Engine

December 01, 2010 | 1 Minute Read

I’m currently working on an app in Google App Engine that polls feeds periodically and then does stuff with them. I suppose I could use that pubsubhubbub thingy but I have a feeling that most feeds aren’t using this yet.

Anyway, I did a quick naive implementation of polling about every hour or so. Apparently the feed parser I’m using is pretty inefficient because it’s eating up a lot of resources (relatively speaking) on App Engine. I remembered that the http protocol is pretty smart, and there’s a way to figure out if stuff has changed since the last time you grabbed it.

Google’s urlfetch doesn’t seem to support conditional GETs (someone tell me if I am wrong). I looked around and found a few tutorials on how to accomplish this in Python using urllib2. The tutorials weren’t exactly what I wanted, so I had to change a few things here or there. Here’s a snippet of code that I’m using:

import urllib2
feed = Feed.get() #my feed object has a etag, last_modified and url property
req = urllib2.Request(url)
if feed.etag:
    req.add_header("If-None-Match", feed.etag)
if feed.last_modified:
    req.add_header("If-Modified-Since", feed.last_modified)
try:
    url_handle = urllib2.urlopen(req)
    content = url_handle.read()
    headers = url_handle.info()
    feed.etag = headers.getheader("ETag")
    feed.last_modified = headers.getheader("Last-Modified")
    feed.put()
except Exception, e:
    logging.info(e) #just says 304 didn't change
    return
dostuffwith(content)

This handles my use case, which is doing work if the feed is new, and ignoring it if it hasn’t been modified. I could probably wrap this into a function that returned false if it the file hadn’t changed, and the content if it was new… Probably will do that next.