(PhysOrg.com) -- Cornell scientists have downloaded and analyzed nearly 35 million Flickr photos taken by more than 300,000 photographers from around the globe, using a supercomputer at the Cornell Center for Advanced Computing (CAC).
Their research, which was presented at the International World Wide Web Conference in Madrid, April 20-24, provides a new and practical way to automatically organize, label and summarize large-scale collections of digital images. The scalability of the method allows for mining information latent in very large sets of images, raising the intriguing possibility of an online travel guidebook that could automatically identify the best sites to visit on a vacation, as judged by the collective wisdom of the world's photographers.
The research also generated statistics on the world's most photographed cities and landmarks, gleaned from the analysis of the multi-terabyte photo collection:
• The top 25 most photographed cities in the Flickr data are (in order): New York City, London, San Francisco, Paris, Los Angeles, Chicago, Washington, D.C., Seattle, Rome, Amsterdam, Boston, Barcelona, San Diego, Berlin, Las Vegas, Florence, Toronto, Milan, Vancouver, Madrid, Venice, Philadelphia, Austin, Dublin and Portland.
• The top seven most photographed landmarks are (in order): Eiffel Tower, Paris; Trafalgar Square, London; Tate Modern museum, London; Big Ben, London; Notre Dame, Paris; The Eye, London; and the Empire State Building, New York City.
Interestingly, the Apple Store in midtown Manhattan was the fifth-most photographed place in New York City -- and the 28th-most photographed place in the world.
The researchers developed techniques to identify places that people find interesting to photograph, showing results for thousands of locations at both city and landmark scales.
"We developed classification methods for characterizing these locations from visual, textual and temporal features," said Daniel Huttenlocher, the John P. and Rilla Neafsey Professor of Computing, Information Science and Business and Stephen H. Weiss fellow. "These methods reveal that both visual and temporal features improve the ability to estimate the location of a photo compared to using just textual tags."
As the creation of digital data accelerates, said CAC director David Lifka, "supercomputers and high-performance storage systems will be essential in order to quickly store, archive, preserve and retrieve large-scale data collections."
The research was supported in part by the National Science Foundation (NSF) and by funding from Google, Yahoo! and the John D. and Catherine T. MacArthur Foundation. The CAC is supported by Cornell, the NSF, the Department of Defense, the Department of Agriculture and members of its corporate program.
The paper is available for download at www.cs.cornell.edu/~dph/papers/photomap-www09.pdf .
Provided by Cornell University (news : web)
Explore further: Computer scientist publishes new algorithm cluster to data mine health records