Scientists who share data publicly receive more citations

Oct 01, 2013
This is Fig 1 of the paper. Citation density for papers with and without publicly available microarray data, by year of study publication. Credit: Piwowar and Vision

A new study finds that papers with data shared in public gene expression archives received increased numbers of citations for at least five years. The large size of the study allowed the researchers to exclude confounding factors that have plagued prior studies of the effect and to spot a trend of increasing dataset reuse over time. The findings will be important in persuading scientists that they can benefit directly from publicly sharing their data.

The study, which adds to growing evidence for an open data citation benefit across different scientific fields, is entitled "Data reuse and the open citation advantage". It was conducted by Dr. Heather Piwowar of Duke University and Dr. Todd Vision of the University of North Carolina at Chapel Hill, and published today in PeerJ, a peer reviewed open access journal.

The study examined citations to over ten thousand articles that generated new gene expression data, a quarter of which had data publicly archived in the GEO and ArrayExpress repositories. Papers with publicly available data received about 9% more overall, with the difference increasing over time. The researchers concluded that much of this citation difference was due to actual data reuse.

"Professional advancement in science is still highly dependent on how well your paper gets cited, even in a field like genomics where the data underlying that paper may have far more scientific impact over the long term." said Dr. Vision, a biologist affiliated with the National Evolutionary Synthesis Center and the Dryad Digital Repository. "Until the happy day when hiring and promotion committees catch up with how to value data sharing for its own sake, it is comforting to know that scientists can still receive credit for data sharing in a currency that counts."

The researchers also mined the full text of articles for references to dataset identifiers in order to study trends in data reuse directly. They took the unusual step of discussing the obstacles they encountered in the paper. Dr. Piwowar, at the time of the study a postdoc with the DataONE project, said "We need more open and cohesive infrastructure to support collecting evidence about the process and products of science. This evidence is needed to inform important policy decisions. For example, data archiving requirements, infrastructure, and education should be informed by evidence about how data is and is not reused."

The mined references revealed that scientists generally stopped publishing papers using their own datasets within two years, while other continued to reuse their data for at least six years. It also showed that data reuse is on the rise. "Not only were the number of reuse papers higher", says Dr. Piwowar, "but analyses from 2002 to 2004 were reusing only one or two datasets, while a quarter of the studies by 2010 were using three or more."

Explore further: Ig Nobel winner: Using pork to stop nosebleeds

More information: Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 dx.doi.org/10.7717/peerj.175

add to favorites email to friend print save as pdf

Related Stories

Cancer data not readily available for future research

Jul 13, 2011

A new study finds that -- even in a field with clear standards and online databases -- the rate of public data archiving in cancer research is increasing only slowly. Furthermore, research studies in cancer and human subjects ...

How to Spot an Influential Paper Based on its Citations

Jul 04, 2009

(PhysOrg.com) -- At first it may seem that the number of citations received by a published scientific paper is directly related to that paper's quality of content. The higher the quality, the more people read ...

Peer review option proposed for biodiversity data

Oct 25, 2012

Data publishers should have the option of submitting their biodiversity datasets for peer review, according to a discussion paper commissioned by the Global Biodiversity Information Facility (GBIF).

Recommended for you

Ig Nobel winner: Using pork to stop nosebleeds

Sep 19, 2014

There's some truth to the effectiveness of folk remedies and old wives' tales when it comes to serious medical issues, according to findings by a team from Detroit Medical Center.

History books spark latest Texas classroom battle

Sep 16, 2014

As Texas mulls new history textbooks for its 5-plus million public school students, some academics are decrying lessons they say exaggerate the influence of Christian values on America's Founding Fathers.

Flatow, 'Science Friday' settle claims over grant

Sep 16, 2014

Federal prosecutors say radio host Ira Flatow and his "Science Friday" show that airs on many National Public Radio stations have settled civil claims that they misused money from a nearly $1 million federal ...

User comments : 0