Comparing the locations of photos posted on the Internet with social network contacts, Cornell University computer scientists have found that as few as three "co-locations" for images at different times and places could predict with high probability that two people posting photos were socially connected.
The results have implications for online privacy, the researchers said, but also suggest a quantitative answer to a very old psychological question: What can we conclude from observing coincidences?
"This is a kind of question that goes way back," said Jon Kleinberg, Cornell professor of computer science, who conducted the study with Dan Huttenlocher, dean of the Faculty of Computing and Information Science, and colleagues. "Online data gives us new ways to address it," he said.
"Inferring Social Ties from Geographic Coincidences," is reported online in the Proceedings of the National Academy of Sciences (Dec. 8, 2010). David Crandall, a former Cornell student and postdoctoral researcher now at Indiana University, is the lead author on the study.
The researchers used a database of some 38 million photos uploaded to the Flickr photo-sharing website by about a half million people. The time and place where photos were taken was provided by GPS-equipped cameras or by people who used Flickr's online-interface to indicate the location on a map. Anyone can read this information from a Flickr page.
Flickr also offers a social networking service, and computer analysis showed that when two people posted photos several times from the same locations (often famous landmarks) and at about the same times, this was a good predictor that those people would have a social network link.
"It's not that you know with certainty, but it's a high likelihood that these people know each other," Huttenlocher said. As expected, the probability increases as the analysis moves to smaller areas and shorter time spans.
Flickr is just a convenient place to study the phenomenon, the researchers said. The same conclusions might be drawn from credit card purchases, fare card transactions on the bus and subway, and cell phone records, they suggested.
"It's surprising and not in a reassuring way that so much information comes from so little," Kleinberg said. "You go through life and leave all sorts of records. You're conveying information you deliberately wrote but also conveying broader information. Our research is trying to provide a way of quantifying these risks."
"While it's obvious that a photo you post online reveals information about what is pictured in the photo, what is less obvious is that as you post multiple photos you are probably revealing information which may not be pictured anywhere," Huttenlocher added.
One way to mitigate privacy risks, Kleinberg suggested, would be to "blur" time and space information in permanent records, making it less precise. This research might offer hints on how much blurring is needed, he said.
The researchers recognized that the photo-sharing process might introduce some bias into their results. For example, people might seek out social contacts with others who had photographed the same site. To control for this they compared photos posted after a certain date only with social links established before that date. They also controlled for the possibility that friends might upload the same photos, and for the fact that people with many social contacts on Flickr might be more likely to geotag their photos.
"I think we've all wondered about questions like this, and there's an opportunity now to start making them precise," Kleinberg concluded. "This paper is trying to begin that line of questioning."
Explore further: Earthquake simulation tops one quadrillion flops