Archive for the 'Code' Category

DOM Vs. SAX: Only One Will Survive

So while working on iPhone stuff, we used a SAX parser to handle xml. I thought this was a wacky way of parsing stuff, and I prefer DOM since it is much easier to understand as it’s conveniently in tree form.

Today, I tried parsing a 39mb xml file with Python’s minidom. Bad idea. It’s currently making my MacBook choke and it also made my desktop PC cry for memory. Apparently SAX is 1337 when it comes to memory efficiency while DOM is incredibly inefficient. Seriously, Python was taking more than 1gb of memory to parse a 39mb file! Perhaps my actual use of the minidom stuff was incorrect, but either way, I’ll probably try doing stuff in SAX if the file happens to be even remotely big.

Lesson learned!

Update: SAX totally pwned those large files. All 8 of them! And in less than a minute!

Hardcore Rails/Server Troubleshooting Session!

So early last night, I noticed that one of my sites, Anime Nano, was flipping out and throwing 502 proxy errors. I tried to do some simple troubleshooting but it seems as though the problem came very suddenly, and I figured it might go away very suddenly as well. The thing about this particular problem was that I had made no changes to the server. So the problem should fix itself! Unfortunately, life is not so simple.

When I woke up in the morning, not only was the site still throwing errors, it was also affecting my other websites as well. MapsKrieg, Notecentric, Basugasubakuhatsu, etc. So I figured I should find out what is going on. I sent a support request to MediaTemple, my host, in case it might’ve been something on their end. It took a while for them to reply, but they eventually just told me it was something taking up all the memory and that I should try to optimize the site. Not too helpful. But I don’t really expect MediaTemple to provide this kind of support anyway. And it turned out it wasn’t their fault.

Upon closer inspection, the site was really behaving weird. I could tell by the logs that the site was actually rendering things, but it always took about 189 seconds. This was odd. I’ve had experiences before where the rendering took less than a second but the page still took more than 3 seconds to load. But 189 seconds! That was a bit too much.

I had suspected it was something to do with the mongrel cluster that I had set the site up to run on. Basically, I followed the script that MediaTemple provided and I still don’t have a great understanding of how mongrel cluster works. That’s definitely bad.

I tried stripping the view of everything but the content for layout. I got rid of before_filters and tried running the site on the mongrel clusters as well as the webrick server on port 3000. The same thing happened. It took way too long for the site to load. Thinking it might be easier to test on my local machine, I got the site and database from svn and mysql and, strangely, it worked fine on my PC.

Now things were slowly fitting together. I was reminded of a post that someone made on Anime Nano about how their feed wasn’t being aggregated into the site. Apparently their feed was unreachable from my server. I had previously chalked this up to weirdness on their server, but I wondered if it wasn’t a problem on my server. The thing about each pageload taking almost exactly 189 seconds pointed to a timeout issue. Then I realized there was a piece of code in Anime Nano that I hadn’t written. It was a plugin for Text Link Ads that I used for some ads on the bottom of the site. That plugin was grabbing a feed of ads and parsing it to show links automatically on the site.

If their site was unreachable, it’s possible that Anime Nano was just waiting on grabbing that snippet of code and timing out. And that’s exactly what happened. I commented out the text for the ads and the site immediately started working again. I’ll have to look into how I can prevent this from happening again before reinstating the ads.

So what did I learn from this whole experience? You should probably understand how code works before randomly integrating it into your site (I added that code a long, long time ago and forgot about it). Also, relying on a third party to provide some information before loading your site is just plain stupid. I can’t believe how badly thought out I had made the organization of the site.

Setting The Crontab Environment For Your Ruby on Rails Jobs

So today is my birthday. In honor of my birthday, I’ve been migrating some websites from my old server to a new one. Okay, it’s not really in honor of my birthday; I’m just doing it because I have some free time today.

Anyway, one of the things that Anime Nano needs to do is run a crontab of a function from a Rails model using a crontab. If that made no sense to you, don’t worry. Just enjoy the pretty picture of the rails logo above. If that made sense to you, wow, you’re pretty hardcore.

This is something that I remember spending time getting to work the last time I set up my server. But unfortunately I didn’t write about how I did it, so I had to spend an hour or so figuring it out again. Apparently, crontab runs in a different environment than the root user. If you try to run a script for Ruby using “ruby script/runner” you’ll just get nothing. No error message, just nothing. I did something on the previous server that set the crontab environment, but I have no idea what I did. I searched. I didn’t find the answer.

So what I ended up doing (this is for future Hung when he has to eventually move his sites to an even newer server) is sticking some lines for my environment into the crontab itself. I just ran ‘env’ and got a bunch of stuff. Comparing the ‘env’ command run from the crontab itself (I just piped the output to a random text file) and running in my shell, I saw a few differences. Most notably these ones:

GEM_HOME=/usr/local/rubygems/gems
GEM_PATH=/usr/local/rubygems/gems
RUBYLIB=/usr/local/rubygems/lib/ruby/site_ruby/1.8

And voila, my crontabbed rails script/runner model script thing ran. Hopefully writing this post will save me some time in the future.

Oh, and happy birthday to me.

Filezilla: Update Whore

I actually like Filezilla as a good, free open source FTP client. But man, does it whore for upgrades! It seems like every single time I open the program, it’s got some kind of upgrade it wants to download.

Seriously, Filezilla is an FTP client. FTP is just a file transfer protocol (whoa, that’s what FTP stands for!). As far as I know, FTP really hasn’t changed much in the past few years. Oh, sure, we’ve got SFTP and FTP over SSL and stuff, but c’mon, how often do you really need to update the core software for FTP!?

What’s that? There’s a way to turn off automatic updates? Okay. But I can’t stand not having the latest version of any software! What’s a nerd to do?

Also, Filezilla: I don’t need to be told to send all bug reports to you every time I update the software. I got that the first time…

Fun Facebook Follies (Bugs)

One nice thing about constantly iterating your code is that you get new code out quickly to your users. The bad part is you probably don’t do enough testing and your users get code before it’s ready.

Facebook gives you a choice to mark notifications that you get as spam. This is for when your friends won’t stop sending you notifications that you’ve been bitten by werewolf them. Apparently, you can also mark any notification as spam. Even, say, Facebook’s own wall feature. Facebook’s. Wall. Feature.

You know what would be cool? If Facebook developers weren’t so zealous that they skip sanity checks in order to get their pushes out sooner. Unless Facebook wants users to be able to mark their walls as spam. Which is actually understandable, now that I think of it…