Trimming time in the stacks

December 21, 2011 By Nicole Freeling

Trimming time in the stacks

Enlarge

A sophisticated text-analyzing tool developed by a UC Berkeley graduate student could speed literary searches for humanities scholars and other researchers.

For graduate students in the humanities, spending months or years in a library combing through obscure texts is a time-honored, if not always savored, rite of passage. Technologies like Scholar have speeded the process somewhat, but students still must sift through tens of thousands of results to pluck a few kernels of useful research from the dross.  

Now, as part of her doctoral thesis in computer science, UC Berkeley Aditi Muralidharan has developed a technology platform that she believes has the potential to transform the process of scholarly research. It could reduce what takes months in the stacks and days in front of a computer screen into an electronic query that takes about five minutes.

Called WordSeer, the program uses rubrics about the way we use and structure language to bring a facsimile of human logic to the business of interpreting search results.  

“Up to now, the state of art for search in literary scholarship has been to ask a graduate student,” Muralidharan says. That’s because existing technologies lack the intuitive understanding required for meaningful analysis of text.

“Search technology works much better when the target is a known object or piece of data,” she said. “But humanities scholars don’t know what the item they’re looking for is exactly. It could be a document, a paragraph or a sentence. The target is much more obscure.“

This video is not supported by your browser at this time.

WordSeer employs a technology called Natural Language Processing, which uses understanding about parts of speech, usage and word relationships to enable search that is both more broad — in that it allows users to cast about for citations without a key word — and more specific, in that it makes determinations about the relevancy of text.  

The technology itself is about 20 years old but, ironically, the programming language it is based on has itself not been translated into plain English. Thus far it has been executable only by use of a complex programming code.

Muralidharan has been able to create a simple user interface for the technology, allowing users to pose questions of particular sets of text. Berkeley English professors Bryan Wagner and Todd Carmody, for example, employed WordSeer to assist with research exploring American slave narratives to better understand slaves’ relationships with God.  

They posed two questions of the collection (which had been digitized): “What does God do?” and “How was God described?” The query returned thousands of citations and ranked the results by frequency of usage. The search revealed that positive attributes — such as great, wise and merciful — and benevolent actions — such as bless, give and grant — were in the overwhelming majority.

“You don’t have to read all 4,000 matches to get a sense of the overall tone of the collection,” said Muralidharan. In this case, while one might expect slaves in their misery to blame God, the search results indicated the relationship was quite a positive one.

“The WordSeer project is one of the most sophisticated tools for computational text analysis I've seen,” said Wagner. “There are many tools that tabulate words and phrases, but WordSeer can discern grammatical structure as well as stylistic features, representing the results in truly interactive visualizations.”  

Funded through a grant from the National Endowment for the Humanities, Muralidharan believes the technology could be useful for all kinds of research that depends upon poring over vast quantities of text, including journalistic and legal research. Berkeley School of Information professor Marti Hearst, Muralidharan’s faculty adviser on the project, said this kind of technology has the potential not just to speed literary research, but to influence how scholars make sense of their subject matter. 

scholars will continue to formulate their hypotheses as they have before, but will also get new ideas by being inspired by patterns in the text that these tools suggest,” Hearst said. “These tools will also help scholars test their hypotheses across many more documents than they can do now manually.”

As a condition of the grant, the program will be freely available and open source. Muralidharan says her intention is not to sell the technology. Rather, it’s to complete her doctoral thesis — and along the way, smooth the process of doing research for some of her fellow scholars.

Provided by University of California - Berkeley search and more info website


Rank not rated yet
Relevant PhysicsForums posts
  • Ideas to mitigate risk of 911 calls being misdirected
    createdMay 24, 2012
  • Live scribe pen?
    createdMay 10, 2012
  • Shallow water flow simulation
    createdMay 07, 2012
  • Tablet for taking notes?
    createdMay 05, 2012
  • Best fit tablet for me?
    createdMay 05, 2012
  • Measure of Informaton
    createdMay 04, 2012
  • More from Physics Forums - Computing & Technology

More news stories

Browser wars flare in mobile space

The browser wars are heating up again, but this time the fight is for dominance of the mobile Internet.

Technology / Software

created 15 hours ago | popularity 5 / 5 (2) | comments 3

Probability of contamination from severe nuclear reactor accidents is higher than expected: study

Catastrophic nuclear accidents such as the core meltdowns in Chernobyl and Fukushima are more likely to happen than previously assumed. Based on the operating hours of all civil nuclear reactors and the number ...

Technology / Energy & Green Tech

created May 22, 2012 | popularity 3.6 / 5 (25) | comments 56 | with audio podcast

HyperSolar shows dirty water no barrier to power world

(Phys.org) -- The Santa Barbara, California, company, HyperSolar, is set to transparently share the ups and downs of its research experiences toward the company’s ultimate vision, successfully producing ...

Technology / Energy & Green Tech

created May 24, 2012 | popularity 4.8 / 5 (16) | comments 17 | with audio podcast report

SpotterRF debuts Radar Backpack Kit (w/ Video)

(Phys.org) -- SpotterRF has announced a special radar backpack kit designed to enhance situational awareness for soldiers on the ground. The company says its special radar is designed for warfighters as part ...

Technology / Hi Tech & Innovation

created May 26, 2012 | popularity 5 / 5 (5) | comments 13 | with audio podcast report

Tesla to launch electric sedan in US on June 22

Tesla Motors said Tuesday it would begin deliveries of "the world's first premium electric sedan" on June 22, slightly ahead of schedule.

Technology / Energy & Green Tech

created May 22, 2012 | popularity 4.5 / 5 (12) | comments 18


Stunning image of smallest possible five-ringed structure

Scientists have created and imaged the smallest possible five-ringed structure – about 100,000 times thinner than a human hair – and you'll probably recognise its shape.

'Unzipped' carbon nanotubes could help energize fuel cells, batteries

Multi-walled carbon nanotubes riddled with defects and impurities on the outside could replace some of the expensive platinum catalysts used in fuel cells and metal-air batteries, according to scientists at ...

Change in developmental timing was crucial in the evolutionary shift from dinosaurs to birds: study

At first glance, it's hard to see how a common house sparrow and a Tyrannosaurus Rex might have anything in common. After all, one is a bird that weighs less than an ounce, and the other is a dinosaur that ...

Computer model used to pinpoint prime materials for efficient carbon capture

When power plants begin capturing their carbon emissions to reduce greenhouse gases – and to most in the electric power industry, it's a question of when, not if – it will be an expensive undertaking.

T cells 'hunt' parasites like animal predators seek prey, study shows

By pairing an intimate knowledge of immune-system function with a deep understanding of statistical physics, a cross-disciplinary team at the University of Pennsylvania has arrived at a surprising finding: T cells use a movement ...

Land and sea species differ in climate change response: study

(Phys.org) -- Marine and terrestrial species will likely differ in their responses to climate warming, new research by Simon Fraser University and Australia’s University of Tasmania has found.