Hung Truong: The Blog!

Questions About Facebook And Data Portability

April 21, 2010 | 4 Minute Read

Something that is usually on my mind, either in the forefront or the back of it, is data portability. I like “owning” the data that I create, whether it’s from a tweet or a Facebook status message, or even more mundane like the bit of knowledge via a scrobble that I listened to Dexter Gordon’s Wave at 3:27am on August 16, 2007 (I’m thinking this is in GMT?). The point is that data that I create is my property. I would go as far as to say that the online interactions I have, like friends’ activity that I comment on and interact with, is also my property. For example, a friend’s status message that I reply to and that ends up being a long thread.

Some services make gathering my data easy. Twitter has a dead simple API and so does I’ve been toying around with gathering location data from Twitter, embedded in photos I take on my phone, and other sources. One website that still confuses me a bit with regards to their policies is Facebook.

About two years ago, there was a semi-large fuss made over Facebook joining a Data Portability organization. Apparently it was sort of a me-too move to copy MySpace. Since then, there’s been a Stream API created and you can actually grab stuff from your “activity stream” from Facebook.

The thing is, the rules are super vague and contradict each other. For example, the Statement of Rights and Responsibilities states that

You own all of the content and information you post on Facebook, and you can control how it is shared through your privacy and application settings.

This makes it seem like I have the right to collect that content and information that I create, since I “own” it and have control over it. Yet the Developer Policy states:

  • Storing and Using Data You Receive From Us
    1. You must not store or cache any data you receive from us for more than 24 hours unless doing so is permitted by the offline exception, or that data is explicitly designated as Storable Data.
    2. You must not give data you receive from us to any third party, including ad networks.
    3. You must not use user data you receive from us or collect through running an ad, including information you derive from your targeting criteria, for any purpose off of Facebook, without user consent.
    4. Unless authorized by us, your ads must not display user data – such as users’ names or profile photos – whether that data was obtained from us or otherwise.
    5. You cannot convert user data you receive from us into Independent Data (e.g., by pre-filling user information with data obtained from the API and then asking the user to save the data).
    6. Before making use of user data that may be protected by intellectual property rights (e.g., photos, videos), you must obtain permission from those who provided that data to us.
    7. You must not give your secret key to another party, unless that party is an agent acting on your behalf as an operator of your application, but you must never give your secret key to an ad network. You are responsible for all activities that occur under your account identifiers.

This basically says that I have to delete any information gathered within 24 hours. Facebook is making the assumption here that users are not developers and vice versa. I’m not interested in gathering other users’ data; I just want my own. And yet here are two conflicting statements.

I’ve already used the Facebook Stream API in the past to collect my data. While the policy states this is not allowed, it’s basically unenforceable. What bothers me a bit is that it is against policy for me gather my own data using Facebook APIs. Twitter allows this, and even goes a step beyond by suggesting that developers cache data to improve performance. To their credit, Google has a “Data Liberation Front” whose purpose is to keep an eye on products and keep data import/export for users as a priority.

I see data portability as a big issue while considering the natural lifecycle of a social networking website. As I use Facebook less and less, I still want to have a connection with those who are on it, and I want to maintain a record of what happened. I hate to think that while I “own” this data, I have no right to access it, especially if I decide to leave the service.

I started writing this post before realizing that Facebook’s annual f8 conference is actually going on today! I guess I can look towards today’s news to see if anything has been announced re: data portability.

EDIT: Well that was fast! I guess they removed the 24 hour limit thing during the keynote today. What this means directly to data portability is still up in the air, though.