Audit finds biodiversity data aggregators 'lose and confuse' data

April 23, 2018, Pensoft Publishers
A snippet of the results from a data processing event. Credit: Dr. Robert Mesibov

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian Museum, Museums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

"I was mainly interested in changes made by the aggregators to the genus and species names in the records," said Dr Mesibov.

"I found that names in up to 1 in 5 records were changed, often because the aggregator couldn't find the name in the look-up table it used."

Another worrying result concerned type specimens - the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

"There was very little agreement," he explained. "One aggregator would change a name and the other wouldn't, or would change it in a different way."

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

"The lesson from this audit is that biodiversity data aggregation isn't harmless," said Dr Mesibov. "It can lose and confuse perfectly good data."

"Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names," he concluded.

Explore further: Online biodiversity databases audited: 'Improvement needed'

More information: Robert Mesibov, An audit of some processing effects in aggregated occurrence records, ZooKeys (2018). DOI: 10.3897/zookeys.751.24791

Related Stories

Online biodiversity databases audited: 'Improvement needed'

April 22, 2013

The records checked were for native Australian millipede species and were published online by the Global Biodiversity Information Facility, GBIF and the Atlas of Living Australia, ALA. GBIF and ALA obtain most of their records ...

Effective new biodiversity data access portal

July 2, 2007

A new internet tool (http://data.gbif.org) was launched today by the Global Biodiversity Information Facility (GBIF). The launch event took place at an international meeting for scientific and technical advice to the Parties ...

Recommended for you

Prenatal forest fire exposure stunts children's growth

February 19, 2019

Forest fires are more harmful than previously imagined, causing stunted growth in children who were exposed to smoke while in the womb, according to new research from Duke University and the National University of Singapore.

'Astrocomb' opens new horizons for planet-hunting telescope

February 19, 2019

The hunt for Earth-like planets, and perhaps extraterrestrial life, just got more precise, thanks to record-setting starlight measurements made possible by a National Institute of Standards and Technology (NIST) "astrocomb."

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.