Stimulus grant will improve physics arXiv

November 18, 2009 By Bill Steele

( -- Stimulus funding will enhance Cornell's e-print arXiv of scientific papers to help users identify a work's main concepts, see research reports in context and easily find related work.

"It shouldn't be a one-way channel," said Paul Ginsparg, professor of physics and information science, who heads the new project funded by a three-year $883,000 grant from the National Science Foundation, with federal money from the American Recovery and Reinvestment Act (ARRA).

The arXiv currently contains close to 600,000 papers in physics, mathematics, , quantitative biology, quantitative finance and statistics, with some 5,000 new papers submitted each month. Researchers submit their work as "preprints" before formal publication. Such preprints used to be passed around by hand before Ginsparg launched the arXiv (pronounced "archive") in 1991 at the Los Alamos National Laboratory; he brought it to Cornell in 2001, where it is now hosted by Cornell Library.

New tools will link papers by concepts, not just by the citations they contain, and this will help users without advanced expertise -- including some outside the scientific community -- understand the significance of new research, said Ginsparg.

"One of the underlying concepts of the arXiv was leveling the playing field," he explained. "Formerly, new research was available only to a few privileged people. Now everyone has equal access, but because of differential levels of expertise not [all scientists] can as easily assess significance. We will be working on automated tools to help identify and highlight the most important concepts," he said. Along with scientists, he added, the site is closely watched by journalists.

The system also will identify related databases and commentaries. For example, Ginsparg said, if a paper mentions an astronomical object, the computer could serve up a menu of related information, including a database describing the object, the original observations that generated the description, and blogspace commentary.

Computers usually search documents by looking for specific words or phrases, but concepts are not always described with the same exact words, and some words mean different things in different places. New algorithms will use a "fuzzier" approach, inferring concepts by the ways terms are used, and will track related documents over a five- or 10-year time scale, so users will be able to see the "genealogy" of ideas. Newer documents will be linked to such data as definitions and rules for reasoning about it, which enables machines to infer relationships.

Other enhancements will provide interoperability with such research sites as PubMedCentral and provisions to allow scientists to contribute in newer, more flexible text formats.

Researchers might be more enthusiastic about participating in open access journals and repositories if they could see that their work was more accessible and usable, Ginsparg suggested. "And perhaps the academic community will again play a role at the forefront as the semantic Web 3.0 rolls out," he said. Academic publishing has lagged behind the commercial Internet in providing interactive enhancements that today's students take for granted, he explained. "Configuring research communications infrastructure for the next generation of researchers requires getting into the heads of near-term future researchers -- undergrads and grad students -- coming of age in the Google/Facebook/Twitter era."

The project is expected to generate jobs for two graduate students and a half-time programmer. To date, Cornell has received 124 ARRA grants, totaling more than $99 million.

Provided by Cornell University (news : web)

Explore further: New Cornell institute will apply artificial intelligence to decision making and data searches

Related Stories

China's eye on the Internet

September 12, 2007

The "Great Firewall of China," used by the government of the People's Republic of China to block users from reaching content it finds objectionable, is actually a "panopticon" that encourages self-censorship through the perception ...

You've got mail -- somewhere

December 20, 2007

New "smart" email search software from IBM can figure out what you are trying to find, even when you aren't so sure yourself. Its semantic search capabilities allow you to search on concepts and ideas rather than set-in-stone ...

Free articles get read but don't generate more citations

July 31, 2008

When academic articles are "open access" or free online, they get read more often, but they don't -- going against conventional wisdom -- get cited more often in academic literature, finds a new Cornell study.

arXiv online scientific repository hits milestone

October 9, 2008

Reinforcing its place in the scientific community, the arXiv repository at Cornell University Library reached a new milestone in October 2008: Half a million e-print postings -- research articles published online -- now reside ...

Researcher studies blood vessels that feed tumors

November 3, 2009

( -- Federal stimulus funding helps Cornell researchers create tiny 3-D models of tumors to mimic conditions necessary for the development of vascular systems by tumors.

Recommended for you

A quantum of light for materials science

December 1, 2015

Computer simulations that predict the light-induced change in the physical and chemical properties of complex systems, molecules, nanostructures and solids usually ignore the quantum nature of light. Scientists of the Max-Planck ...

Quantum dots used to convert infrared light to visible light

December 1, 2015

(—A team of researchers at MIT has succeeded in creating a double film coating that is able to convert infrared light at modest intensities into visible light. In their paper published in the journal Nature Photonics, ...

Test racetrack dipole magnet produces record 16 tesla field

November 30, 2015

A new world record has been broken by the CERN magnet group when their racetrack test magnet produced a 16.2 tesla (16.2T) peak field – nearly twice that produced by the current LHC dipoles and the highest ever for a dipole ...

Turbulence in bacterial cultures

November 30, 2015

Turbulent flows surround us, from complex cloud formations to rapidly flowing rivers. Populations of motile bacteria in liquid media can also exhibit patterns of collective motion that resemble turbulent flows, provided the ...

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Nov 23, 2009
That is a great news! I'm missing the comments in arXiv articles, it would be so much easier for a young researcher to find the important and also the suspicious places in an article trough comments. Nice !!!

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.