Audit finds biodiversity data aggregators 'lose and confuse' data

Audit finds biodiversity data aggregators 'lose and confuse' data
A snippet of the results from a data processing event. Credit: Dr. Robert Mesibov

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian Museum, Museums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

"I was mainly interested in changes made by the aggregators to the genus and species names in the records," said Dr Mesibov.

"I found that names in up to 1 in 5 records were changed, often because the aggregator couldn't find the name in the look-up table it used."

Another worrying result concerned type specimens - the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

"There was very little agreement," he explained. "One aggregator would change a name and the other wouldn't, or would change it in a different way."

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

"The lesson from this audit is that biodiversity data aggregation isn't harmless," said Dr Mesibov. "It can lose and confuse perfectly good data."

"Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names," he concluded.


Explore further

Online biodiversity databases audited: 'Improvement needed'

More information: Robert Mesibov, An audit of some processing effects in aggregated occurrence records, ZooKeys (2018). DOI: 10.3897/zookeys.751.24791
Journal information: ZooKeys

Provided by Pensoft Publishers
Citation: Audit finds biodiversity data aggregators 'lose and confuse' data (2018, April 23) retrieved 26 April 2019 from https://phys.org/news/2018-04-biodiversity-aggregators.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
9 shares

Feedback to editors

User comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more