Trimming time in the stacks

Dec 21, 2011 By Nicole Freeling

A sophisticated text-analyzing tool developed by a UC Berkeley graduate student could speed literary searches for humanities scholars and other researchers.

For graduate students in the humanities, spending months or years in a library combing through obscure texts is a time-honored, if not always savored, rite of passage. Technologies like Scholar have speeded the process somewhat, but students still must sift through tens of thousands of results to pluck a few kernels of useful research from the dross.  

Now, as part of her doctoral thesis in computer science, UC Berkeley Aditi Muralidharan has developed a technology platform that she believes has the potential to transform the process of scholarly research. It could reduce what takes months in the stacks and days in front of a computer screen into an electronic query that takes about five minutes.

Called WordSeer, the program uses rubrics about the way we use and structure language to bring a facsimile of human logic to the business of interpreting search results.  

“Up to now, the state of art for search in literary scholarship has been to ask a graduate student,” Muralidharan says. That’s because existing technologies lack the intuitive understanding required for meaningful analysis of text.

“Search technology works much better when the target is a known object or piece of data,” she said. “But humanities scholars don’t know what the item they’re looking for is exactly. It could be a document, a paragraph or a sentence. The target is much more obscure.“

This video is not supported by your browser at this time.

WordSeer employs a technology called Natural Language Processing, which uses understanding about parts of speech, usage and word relationships to enable search that is both more broad — in that it allows users to cast about for citations without a key word — and more specific, in that it makes determinations about the relevancy of text.  

The technology itself is about 20 years old but, ironically, the programming language it is based on has itself not been translated into plain English. Thus far it has been executable only by use of a complex programming code.

Muralidharan has been able to create a simple user interface for the technology, allowing users to pose questions of particular sets of text. Berkeley English professors Bryan Wagner and Todd Carmody, for example, employed WordSeer to assist with research exploring American slave narratives to better understand slaves’ relationships with God.  

They posed two questions of the collection (which had been digitized): “What does God do?” and “How was God described?” The query returned thousands of citations and ranked the results by frequency of usage. The search revealed that positive attributes — such as great, wise and merciful — and benevolent actions — such as bless, give and grant — were in the overwhelming majority.

“You don’t have to read all 4,000 matches to get a sense of the overall tone of the collection,” said Muralidharan. In this case, while one might expect slaves in their misery to blame God, the search results indicated the relationship was quite a positive one.

“The WordSeer project is one of the most sophisticated tools for computational text analysis I've seen,” said Wagner. “There are many tools that tabulate words and phrases, but WordSeer can discern grammatical structure as well as stylistic features, representing the results in truly interactive visualizations.”  

Funded through a grant from the National Endowment for the Humanities, Muralidharan believes the technology could be useful for all kinds of research that depends upon poring over vast quantities of text, including journalistic and legal research. Berkeley School of Information professor Marti Hearst, Muralidharan’s faculty adviser on the project, said this kind of technology has the potential not just to speed literary research, but to influence how scholars make sense of their subject matter. 

scholars will continue to formulate their hypotheses as they have before, but will also get new ideas by being inspired by patterns in the text that these tools suggest,” Hearst said. “These tools will also help scholars test their hypotheses across many more documents than they can do now manually.”

As a condition of the grant, the program will be freely available and open source. Muralidharan says her intention is not to sell the technology. Rather, it’s to complete her doctoral thesis — and along the way, smooth the process of doing research for some of her fellow scholars.

Explore further: Earthquake simulation tops one quadrillion flops

add to favorites email to friend print save as pdf

Related Stories

iSchool prof predicts the future of search user interfaces

Nov 07, 2011

School of Information professor Marti Hearst predicts the future of online search interfaces in an article in this month’s edition of the Communications of the ACM. “The future of user interfaces will involv ...

Is the Internet lying to us?

Nov 25, 2010

(PhysOrg.com) -- University of Alberta scholars talk about the relativity of truth on the World Wide Web.

Professor moves Greek texts, Arabic translations online

Mar 30, 2011

Long before the Italians rediscovered original Greek sources during the Renaissance, Arab scholars recognized the importance of ancient science and philosophy and began translating precious writings into Arabic. ...

Unique dictionary nears completion

May 12, 2011

A huge number of students ranging from linguists to those studying coins and family ancestry are benefiting from a 100 year project to compile the world’s most comprehensive dictionary of Medieval Latin.

Recommended for you

Tech giants look to skies to spread Internet

2 hours ago

The shortest path to the Internet for some remote corners of the world may be through the skies. That is the message from US tech giants seeking to spread the online gospel to hard-to-reach regions.

Patent talk: Google sharpens contact lens vision

3 hours ago

(Phys.org) —A report from Patent Bolt brings us one step closer to what Google may have in mind in developing smart contact lenses. According to the discussion Google is interested in the concept of contact ...

Wireless industry makes anti-theft commitment

4 hours ago

A trade group for wireless providers said Tuesday that the biggest mobile device manufacturers and carriers will soon put anti-theft tools on the gadgets to try to deter rampant smartphone theft.

Dish Network denies wrongdoing in $2M settlement

13 hours ago

The state attorney general's office says Dish Network Corp. will reimburse Washington state customers about $2 million for what it calls a deceptive surcharge, but the satellite TV provider denies any wrongdoing.

Yahoo sees signs of growth in 'core' (Update)

13 hours ago

Yahoo reported a stronger-than-expected first-quarter profit Tuesday, results hailed by chief executive Marissa Mayer as showing growth in the Web giant's "core" business.

User comments : 0

More news stories

Patent talk: Google sharpens contact lens vision

(Phys.org) —A report from Patent Bolt brings us one step closer to what Google may have in mind in developing smart contact lenses. According to the discussion Google is interested in the concept of contact ...

Tech giants look to skies to spread Internet

The shortest path to the Internet for some remote corners of the world may be through the skies. That is the message from US tech giants seeking to spread the online gospel to hard-to-reach regions.

Wireless industry makes anti-theft commitment

A trade group for wireless providers said Tuesday that the biggest mobile device manufacturers and carriers will soon put anti-theft tools on the gadgets to try to deter rampant smartphone theft.

Making 'bucky-balls' in spin-out's sights

(Phys.org) —A new Oxford spin-out firm is targeting the difficult challenge of manufacturing fullerenes, known as 'bucky-balls' because of their spherical shape, a type of carbon nanomaterial which, like ...

Gene removal could have implications beyond plant science

(Phys.org) —For thousands of years humans have been tinkering with plant genetics, even when they didn't realize that is what they were doing, in an effort to make stronger, healthier crops that endured climates better, ...