New system solves the 'who is J. Smith' puzzle

December 14, 2006

Penn State researchers have developed an automated system that can determine which "J. Smith" is authoring papers on computer science—the one who teaches at Penn State or the one who teaches at M.I.T—as well as whether "J. Smith" is John Smith, Jane Smith, Joanna L. Smith or James H. Smith.

The system, which retrieves classes of authors with similar names, considers not just names in making its determination but also other information such as co-authors, dates of publications, citations and keywords.

When tested with 3,355 academic papers written by 490 authors, the system correctly identified authors 90.6 percent of the time.

"It works very similarly to how humans would figure out authors’ identity—by looking at affiliations, topics, publications," said C. Lee Giles, the David Reese Professor of Information Sciences and Technology and principal researcher.

"The system works by using machine-learning methods to cluster together names that the system believes to be similar. If you think there’s another parameter that’s relevant, you can change the algorithm and include it," Giles said.

The system is explained in a paper, "Efficient Name Disambiguation for Large-Scale Databases," presented at the recent 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases in Berlin. Co-authors were Jian Huang, a doctoral student in the College of Information Sciences and Technology, and Seyda Ertekin, a doctoral student in the Department of Computer Science and Engineering. Even in academic publications, figuring out an author’s identity can be difficult as publications vary in how individuals’ names are presented. For instance, some publications opt just for first initial and last name as in "J. Smith." Others include full name—C. Lee Giles, for instance. But if the surname is common, as in "Smith" or "Chen," first names may not suffice to accurately identify the author.

Confusion also can occur because of how entities are listed with some publications choosing Penn State, The Pennsylvania State University or PSU. The researchers’ algorithm can clear up ambiguities surrounding entities whether institutions, businesses, funding agencies or organizations.

"This method will work on many entity disambiguation problems," Giles said.

The algorithm uses a clustering method to train computers to extract information based on similar properties. Each time information is clustered, the result is a smaller and smaller grouping.

The algorithm will be a part of the next generation CiteSeer, the largest academic search engine for computer and information-science literature. Giles was co-creator of CiteSeer when he was at NEC.

Source: Penn State

3.1 /5 (7 votes)  

Rank 3.1 /5 (7 votes)
Tags

Relevant PhysicsForums posts
  • Ideas to mitigate risk of 911 calls being misdirected
    createdMay 24, 2012
  • Live scribe pen?
    createdMay 10, 2012
  • Shallow water flow simulation
    createdMay 07, 2012
  • Tablet for taking notes?
    createdMay 05, 2012
  • Best fit tablet for me?
    createdMay 05, 2012
  • Measure of Informaton
    createdMay 04, 2012
  • More from Physics Forums - Computing & Technology

More news stories

Browser wars flare in mobile space

The browser wars are heating up again, but this time the fight is for dominance of the mobile Internet.

Technology / Software

created 18 hours ago | popularity 4 / 5 (4) | comments 3

Probability of contamination from severe nuclear reactor accidents is higher than expected: study

Catastrophic nuclear accidents such as the core meltdowns in Chernobyl and Fukushima are more likely to happen than previously assumed. Based on the operating hours of all civil nuclear reactors and the number ...

Technology / Energy & Green Tech

created May 22, 2012 | popularity 3.6 / 5 (25) | comments 56 | with audio podcast

HyperSolar shows dirty water no barrier to power world

(Phys.org) -- The Santa Barbara, California, company, HyperSolar, is set to transparently share the ups and downs of its research experiences toward the company’s ultimate vision, successfully producing ...

Technology / Energy & Green Tech

created May 24, 2012 | popularity 4.8 / 5 (16) | comments 17 | with audio podcast report

SpotterRF debuts Radar Backpack Kit (w/ Video)

(Phys.org) -- SpotterRF has announced a special radar backpack kit designed to enhance situational awareness for soldiers on the ground. The company says its special radar is designed for warfighters as part ...

Technology / Hi Tech & Innovation

created May 26, 2012 | popularity 5 / 5 (5) | comments 13 | with audio podcast report

Tesla to launch electric sedan in US on June 22

Tesla Motors said Tuesday it would begin deliveries of "the world's first premium electric sedan" on June 22, slightly ahead of schedule.

Technology / Energy & Green Tech

created May 22, 2012 | popularity 4.5 / 5 (12) | comments 18


'Unzipped' carbon nanotubes could help energize fuel cells, batteries

Multi-walled carbon nanotubes riddled with defects and impurities on the outside could replace some of the expensive platinum catalysts used in fuel cells and metal-air batteries, according to scientists at ...

Change in developmental timing was crucial in the evolutionary shift from dinosaurs to birds: study

At first glance, it's hard to see how a common house sparrow and a Tyrannosaurus Rex might have anything in common. After all, one is a bird that weighs less than an ounce, and the other is a dinosaur that ...

Computer model used to pinpoint prime materials for efficient carbon capture

When power plants begin capturing their carbon emissions to reduce greenhouse gases – and to most in the electric power industry, it's a question of when, not if – it will be an expensive undertaking.

T cells 'hunt' parasites like animal predators seek prey, study shows

By pairing an intimate knowledge of immune-system function with a deep understanding of statistical physics, a cross-disciplinary team at the University of Pennsylvania has arrived at a surprising finding: T cells use a movement ...

Stunning image of smallest possible five-ringed structure

Scientists have created and imaged the smallest possible five-ringed structure – about 100,000 times thinner than a human hair – and you'll probably recognise its shape.

Land and sea species differ in climate change response: study

(Phys.org) -- Marine and terrestrial species will likely differ in their responses to climate warming, new research by Simon Fraser University and Australia’s University of Tasmania has found.