Hung Truong: The Blog!

Facebook Information Download: Report Card

October 17, 2010 | 4 Minute Read

Ever since I signed up for Facebook, I’ve wanted a clean and easy way to export the content that my friends and I create on it. See this post for background. Facebook has never really made it easy to do so for end users, though they have an API that could theoretically be used for data export. Just recently, they announced a new feature that allows you to download your information. I gave the feature a test drive and took a look at what you actually get.

The process for grabbing your information is pretty simple. You go to Account Settings -> Download Your Information and then request your data. Facebook sends you an email when the zip file containing your stuff is ready.

The zip file contains a few files and directories: html, photos and videos. Photos contains the photos you’ve uploaded, videos contains the videos, and html contains things like a list of your friends, messages, notes, wall posts and events. These are all stored in html files, which makes it easy for normal users to view them.

From a data portability standpoint, it’s great that you can get all of your photos, videos and messages, etc. I like the fact that they’re in html that’s easy for anyone to browse. Since it works in its own self-contained directory structure, you could theoretically upload the contents to your own web server and host your albums yourself! For most users, the data download feature is really great.

From a programmer/hardcore archivist’s point of view, the data download is still lacking. For example, the friend’s list gives you a list of your friends’ names. It does not, however, provide you with the unique identifier for your friends (e.g. their Facebook profile name or id number). This might be useful if you have a friend named “John Smith” and you’d like to know exactly which John it is. Generally, the files just don’t contain enough metadata to keep good records.

Let’s say that in the distant future, Facebook has been abandoned. What we have left are the .zip files that people used to download their information. How would we go about reconstructing network ties? With the files as they are today, we can only make assumptions using names, which aren’t unique identifiers. While some people would say that including those would be overkill, they could be pretty easily added via meta tags within the html (or in a separate xml file for hardcore nerds like me).

In addition to this lack of metadata, my other complaint is that Facebook only gives you half of your information. You can download a pretty ego-centric payload of data; stuff that friends wrote on your wall and photos you uploaded. You cannot, however, download things that you posted on your friends’ walls. This is especially important because Facebook’s early messaging model was based on wall-to-wall posts. My oldest wall posts come without any sort of context at all.

Finally, it seems that Facebook did the unthinkable in deleting user data until around early 2006. I signed up for Facebook in January of 2005, yet my first wall post that shows up in my downloaded info is from February 2006. So there’s a whole year’s worth of wall messages that are missing. I suppose there’s no way for Facebook to retrieve this info anymore (unless they’re just witholding it because it’s hard to get to).

If you found this tl;dr, here’s a summary:

The Good:

Facebook information download makes it easy for anyone to download their data in a nicely organized and self-contained package. Everyone should go to their account settings right now and download their data, even if they don’t plan on using for anything in the near future. The download provides a lot of information and is a good faith attempt at letting users “own” their data. There’s still work to be done, howerver.

The Bad:

The data that’s downloaded lacks enough meaningful metadata. Specifically, the data regarding connections between you and your friends is too general. Unique identifiers for friend connections would be a good first step. Facebook also omits an important side of your data: the stuff you’ve posted to others’ walls. Finally, Facebook’s data download only goes back to early 2006 (for me).

What Facebook Should Do Next:

I think Facebook’s done a really great job so far for this first iteration of data download. Now they should add more metadata and provide data in a cleaner xml or json format. After that, they should enable download of photos and video in their original format (I have a feeling Facebook keeps the original size photos before they resize them). I think that providing users with an easy way to download all of their data will lead to a better relationship and more trust, which is something Facebook could really use.