Lately I’ve been playing around a lot with Google App Engine. I think it’s finally at a point where it makes sense to develop for it, and it can be fun and profitable to do so. For example, I wrote Instascriber on App Engine, and so far it has cost me a few pennies in CPU cycles (and like $2 for a domain name or something).
Playing around with memcaching and stuff with Instascriber has led me to become a little obsessed with efficiency (not that I wasn’t already). I took a look at Mapskrieg and found that it was performing pretty slowly. My Google Webmaster Tools thing was saying it took on average 6-8 seconds to load! I wanted to decrease that number, and I figured I could make it work on App Engine, so I started working on it last week.
I modified the parser I use to grab craigslist listings to also send them to Mapskrieg. Then I basically rewrote all of the logic that existed in PHP and ported it over to App Engine. This took a little while but it wasn’t too difficult since Mapskrieg is a pretty simple web app. I hardly changed any code in the Maps API implementation, though I’m thinking of moving to v3 (which is apparently supposed to be faster).
I just switched out the hosting from my MediaTemple PHP based host to the Google App Engine based one. So far the results look good. I think Mapskrieg is faster in terms of response time and the caching is definitely smarter than it was before (a memcache that only gets cleared when the content in it changes versus a time based cache expiration).
I’m going to watch closely to make sure nothing got borked in the transition, but so far so good. If I can keep Mapskrieg on App Engine, I might downgrade the MediaTemple server to save some money and maybe eventually move other services to App Engine as well.
One funny thing I noticed was the pricing plan for App Engine. I’m currently pruning listings from the database when they get too old or move past the limit of listings per listing type (because I’ll never show them). This is actually older behavior from when I had listing info in MySQL. Back then, the size of the database would affect the performance of selecting rows. Google’s datastore apparently does not get slower with respect to size. This is actually pretty awesome.
The weird thing is that adding and removing datastore objects costs quite a lot in CPU time. Additional CPU time past 6.5 hours is $.10 an hour. Storage, on the other hand, is .$01 per 2 gigabytes per day. From my usage, I am predicting that the CPU cycles that I would use to trim the datastore listings would actually cost more than just leaving them in the datastore and paying for storage. Is that insane or what? I still have to test some things out, but it kind of surprises me that storage would be that much cheaper than the CPU cycles to remove something.
I’m currently working on an app in Google App Engine that polls feeds periodically and then does stuff with them. I suppose I could use that pubsubhubbub thingy but I have a feeling that most feeds aren’t using this yet.
Anyway, I did a quick naive implementation of polling about every hour or so. Apparently the feed parser I’m using is pretty inefficient because it’s eating up a lot of resources (relatively speaking) on App Engine. I remembered that the http protocol is pretty smart, and there’s a way to figure out if stuff has changed since the last time you grabbed it.
Google’s urlfetch doesn’t seem to support conditional GETs (someone tell me if I am wrong). I looked around and found a few tutorials on how to accomplish this in Python using urllib2. The tutorials weren’t exactly what I wanted, so I had to change a few things here or there. Here’s a snippet of code that I’m using:
feed = Feed.get() #my feed object has a etag, last_modified and url property
req = urllib2.Request(url)
url_handle = urllib2.urlopen(req)
content = url_handle.read()
headers = url_handle.info()
feed.etag = headers.getheader("ETag")
feed.last_modified = headers.getheader("Last-Modified")
except Exception, e:
logging.info(e) #just says 304 didn't change
This handles my use case, which is doing work if the feed is new, and ignoring it if it hasn’t been modified. I could probably wrap this into a function that returned false if it the file hadn’t changed, and the content if it was new… Probably will do that next.
Edit: Using the tutorial from this dude, I was able to get the Django “hey, you got the barebones working!” page working. I may have time to mess around with this more in a week or so.
I just wasted an hour trying to get the Google App Engine and Django to work together on both my Mac and my PC. Both give failure errors when I try to run the server.
Should I assume I’m a bad computer scientist? Or should I assume that Google needs to do more testing/documentation before releasing stuff? Granted, I haven’t used Django before, but it should really work if I follow the directions correctly (and I did, twice)!
I would consider learning the webapp framework, but I’d rather learn something I could use on a server other than Google’s. I’ve been meaning to learn Django for a while now. I had some high hopes that I’d be able to get something cool running in a short amount of time, but I guess not.
People are saying this App Engine thing is a great competitor to Amazon’s ECC. Not quite yet. You can’t run any software you want on App Engine, you’re limited to using Python for now, and from my experience, App Engine seems a bit half-baked at this point. What about cron-jobs? What about running scripts? What about root access? I can’t see why Google didn’t just decide to copy Amazon and allow developers to upload a linux image or something.
My initial impressions are disappointing. Maybe once the semester is over, they’ll have a more stable version of this available…