Google engineer creates application that monitors Wikipedia content bots

Feb 19, 2014 by Bob Yirka report
Screenshot of the application. Credit: arXiv:1402.0412 [cs.DL]

(Phys.org) —Thomas Steiner, a Customer Solutions Engineer at Google Germany GmbH, Hamburg has created an application that shows in a very clear way, how much of Wikipedia entries are being created or edited by bots, versus humans. He's also written a paper describing his efforts and posted it on the preprint server arXiv.

Many people may not realize it, but some of the appearing on Wikipedia is put there by , rather than human beings. This is because Wikipedia has grown too large to be managed by people alone, especially when noting it's still mostly a volunteer effort.

To keep entries coming and to keep them updated, bots have been created—they grab information from one place and post them into another, thus, they're not actually writers or composer, they're more like auditors updating files automatically. Also, many people may not know that the folks at Wikipedia have also created another information repository—Wikidata—it's a database whose sole purpose is to share data amongst the difference language versions of Wikipedia. If a user in the U.S. enters information about the results of the New York Marathon into a Wiki entry, for example, that data can be automatically ported to Wikidata, where other bots can retrieve it, convert it to the pertinent language and post it to another language version of Wikipedia—all rather seamlessly to readers.

Because of all the automation, some have begun to wonder what portion of Wiki pages are generated by humans versus bots. That's where Steiner comes in—he's written an application that can be accessed and used by anyone to see—in real time—what percentage of pages are being written by humans, versus bots.

The application also allows for noting other aspects of Wikipedia—a quick glance, for example reveals that bots are doing a lot more of the work adding information to pages in non-English speaking countries, which suggests that the majority of Wikipedia content is still being created by real human beings in the U.S. and the U.K. The application also monitors activity on Wikidata, for those who are interested and also displays the data for both in a way that shows which bots are most active.

Steiner has also published the code for the application, making it open source. That should allow those who are interested in the murky world of bots to gain an insider's perspective, and perhaps, to add to the utility.

Explore further: Incapsula reports that web bots now account for 61% of web traffic

More information: Bots vs. Wikipedians, Anons vs. Logged-Ins, arXiv:1402.0412 [cs.DL] arxiv.org/abs/1402.0412

Abstract
Wikipedia is a global crowdsourced encyclopedia that at time of writing is available in 287 languages. Wikidata is a likewise global crowdsourced knowledge base that provides shared facts to be used by Wikipedias. In the context of this research, we have developed an application and an underlying Application Programming Interface (API) capable of monitoring realtime edit activity of all language versions of Wikipedia and Wikidata. This application allows us to easily analyze edits in order to answer questions such as "Bots vs. Wikipedians, who edits more?", "Which is the most anonymously edited Wikipedia?", or "Who are the bots and what do they edit?". To the best of our knowledge, this is the first time such an analysis could be done in realtime for Wikidata and for really all Wikipedias—large and small. Our application is available publicly online at the URL this http URL, its code has been open-sourced under the Apache 2.0 license.

add to favorites email to friend print save as pdf

Related Stories

Wikipedia losing editors, study says

Jan 04, 2013

Wikipedia, one of the world's biggest websites, is losing many of its English-language editors, crippling its ability to keep pace with its mission as a source of knowledge online, a study says.

New clues to Wikipedia's shared super mind

Mar 28, 2013

(Phys.org) —Wikipedia's remarkable accuracy and usefulness comes from something larger than the sum of its written contributions, a new study by SFI Research Fellow Simon DeDeo finds.

Robots learn to create language

May 17, 2011

(PhysOrg.com) -- Communication is a vital part of any task that has to be done by more than one individual. That is why humans in every corner of the world have created their own complex languages that help ...

Recommended for you

Oculus unveils new prototype VR headset

Sep 20, 2014

Oculus has unveiled a new prototype of its virtual reality headset. However, the VR company still isn't ready to release a consumer edition.

Who drives Alibaba's Taobao traffic—buyers or sellers?

Sep 18, 2014

As Chinese e-commerce firm Alibaba prepares for what could be the biggest IPO in history, University of Michigan professor Puneet Manchanda dug into its Taobao website data to help solve a lingering chicken-and-egg question.

Computerized emotion detector

Sep 16, 2014

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

User comments : 0