Hung Truong: The Blog!

Vulnerabilities In The Mechanical Turk Search For Jim Gray

February 03, 2007 | 3 Minute Read

ocean.jpg

I just read about a really awesome use for Amazon.com’s Mechanical Turk: finding missing people! From the Mechanical Turk’s page:

On Sunday, January 28th, 2007, Jim Gray, a renowned computer scientist was reported missing at sea. As of Thursday, Feb. 1st, the US Coast Guard has called off the search, having found no trace of the boat or any of its emergency equipment.

Follow the story here.

Through the generous efforts of his friends, family, various communities and agencies, detailed satellite imagery has been made available for his last known whereabouts.

You will be presented with 5 images. The task is to indicate any satellite images which contain any foreign objects in the water that may resemble Jim’s sailboat or parts of a boat.

Jim’s sailboat will show up as a regular object with sharp edges, white or nearly white, about 10 pixels long and 4 pixels wide in the image.

If in doubt, be conservative and mark the image. Marked images will be sent to a team of specialists who will determine if they contain information on the whereabouts of Jim Gray.

While I think it’s a really nifty and commendable idea, it bothers me how vulnerable this system is from the way I understand it. It seems to me that once an image is marked as containing a sailboat-like object, it will be sent to others for investigation. If not, it will be discarded.

In computer science, you’re trained to think as an adversary to the algorithm. While I’d hope there aren’t any people malicious enough to try and sabotage this system, it’s always worth considering the possibility. I’ve thought of a few vulnerabilities that the Mechanical Turk faces in the search for Jim Gray:

Vulnerability 1

The adversary marks each picture as not containing anything of interest, even if it does. In the case that an adversary finds the picture with the sailboat, he and he alone can invalidate it. The data is never considered and the search fails.

I’m assuming a few things about the system. The biggest assumption is that each HIT will be viewed by only one person. The Mechanical Turk doesn’t specify this. If two people are assigned to each hit, the odds of this happening are lower. Though this system can still be defeated by two adversaries working in tandem.

Vulnerability 2

The adversary marks each picture as being interesting. An overwhelming amount of data is sent to the “team of specialists” and we’re back to square one. We can’t just assume that all of the pictures flagged as interesting were not, because one could be useful.

This kind of attack seems much more malicious to me. At best, you could invalidate all of the adversary’s HITs and reassign them. This could take precious time, though, since once a HIT is taken, it is locked from other users to attempt.

Vulnerability 3

The adversary randomly marks non-interesting pictures interesting, and interesting pictures to be non-interesting.

This is probably the most malicious and the hardest to detect. While the odds of finding an interesting picture to mark non-interesting would be low, it could happen.

I can’t think of a way to combat this kind of attack, apart from having two users work on each HIT. Again, this could still be defeatable by two adversaries working together to dirty the results.

Conclusion

Even with these vulnerabilities, I think the Mechanical Turk’s search for Jim Gray is an incredible idea. I sincerely hope that they do find him. It would be a great victory for everyone involved. Please do join in the search for Jim Gray. Please don’t follow any of the adversarial techniques I mentioned!

I just hope that the people behind the Mechanical Turk are aware of these vulnerabilities, and have come up with solutions (that are better than mine) to combat any kind of malicious use that could undermine the entire project.

If you have any ideas to combat these vulnerabilities, or have any other ideas for vulnerabilities that I may have missed, please let me know in the comments. I’d be interested from a computer science sort of viewpoint.