For my SI 544: Stats class this semester, I worked with two cool dudes, Jim Laing and Sameer Halai. Our project involved using data gathered from a Facebook application to test a hypothesis about the perceived sociability of certain musical instruments.
If you recall, I wrote a blog post a few months ago about the viral vs. non-viral growth of Facebook applications that I had developed. One of those apps, Musical Instruments, lets you list which musical instruments you play. It’s kind of fun because some people play really whacked-out instruments (I play pianica and soprano trombone). I think playing instruments is typically a pretty social experience, which sort of led me to think about comparing the “sociability” of certain instruments to each other with the data gathered from this app.
Users input their instruments via an autocompleting text field. If an instrument already exists in the database (and at least 3 or so users have claimed it), it will autocomplete. In the above screenshot, I’ve typed “Trumpet” and you can see there’s many different types of trumpet to choose from. A user can also type an instrument that doesn’t yet exist in the database and it’ll be added automatically. This kind of free vocabulary is nice because it doesn’t require an administrator to continuously accept new instruments.
The data that the application has access to are:
- The user’s FBID (Their unique Facebook user ID in the form of a number)
- The instruments that the user claims to play
- The number of friends that the user has
We ended up getting 8603 rows of data (user/instrument pairs). After getting a bunch of free text instruments, we went to work classifying many of them into groups and subgroups. So Piccolo is in the group “Flute” and subgroup “Woodwind.”
We then generated a survey for people to rank 16 instruments in order of sociability. That is, people who play x instrument probably have more friends than y instrument.
The survey results showed that people thought Vocalists had the most friends and that Guitar was pretty popular too.
From the application data and survey results, we formed a hypothesis. We hypothesized that the instruments given high sociability rank would also have statistically higher mean numbers of friends. So people who played Guitar would have more friends than people who played Flute.
First, we did some basic analysis of the data using R, the free stats program that we were using for class assignments and labs.
This figure shows the histogram of frequency of number of friends. Basically, many people have 0-100 friends, less people have 101-200 friends, etc. This probably follows a power law curve, but we didn’t think it would be really important to find the alpha or anything for our purposes.
This is a graph of the mean number of friends, by instrument. This looks like a pretty standard normal distribution and it shows off the central limit theorem that Lada is always talking about in class.
This is just a boxplot of all of the classified instruments and their # of friends. There are some crazy outliers; people who have 1000 friends. From this boxplot, it’s hard to make out whether or not any of the means are actually statistically significant.
Finally, we ran pairwise t-tests on each set of instruments. We could see that there was a significant difference in the mean number of friends for certain instruments. For example, Guitar and Horn, Guitar and Oboe, and Guitar and Saxophone. Looking at the mean number of friends for these instruments, Saxophone players had on average 20 more friends than Guitar players. This is interesting because Saxophone was ranked 10.7 (not very sociable) and Guitar was ranked very sociable.
The scope of this project was pretty small, and given some more time, I think we could’ve come up with some more interesting conclusions. Stuff like “is flute really a girly instrument?” by looking at the average number of female flute players vs male flute players and “do guitar players get more chicks?” by looking at relationship status of guitar players vs. something like trumpet players (personal burn!).
I was glad my Facebook app actually provided some interesting data. I’ve always been sort of skeptical to the ability of Facebook apps to be profitable. I think the data that the apps provide is very valuable in the context of social network research. Anyway, I hope you found this post to be somewhat entertaining. I’ve also uploaded the project report and presentation slides in PDF if you want to check them out.
Many thanks to Jim and Sameer for sharing much of the work in this project. I ended up providing data, formatting it, and presenting the final presentation, so props to my teammates!