Online biodiversity databases audited: 'Improvement needed'
The records checked were for native Australian millipede species and were published online by the Global Biodiversity Information Facility, GBIF and the Atlas of Living Australia, ALA. GBIF and ALA obtain most of their records from cooperating museums, but disclaim any responsibility for errors in museum databases, instead warning users that the data may not be accurate or fit for purpose.
The auditing was done voluntarily by Dr Bob Mesibov, who is a millipede specialist and a research associate at the Queen Victoria Museum and Art Gallery in Launceston, Tasmania.
The audit found duplicated records and other bookkeeping problems, as well as errors in scientific nomenclature and in locations and dates for specimen collections. Location errors were particularly common, with 15% of a 'best data' subset of the records at least 5 km from the correct locality.
"The data quality problem is not trivial," said Dr Mesibov. "On the one hand, the data aggregators like GBIF and ALA are telling the world that they offer one-stop shops for data that can, for example, greatly assist decision-making in conservation and land management. On the other hand, the aggregators are not working to ensure that the data they publish are correct. And bad data aren't very useful."
Dr Mesibov contacted museums directly to alert them to errors he found and to query inconsistencies in the occurrence records. The museums concerned have edited their records and will pass corrections on to GBIF and ALA when updating their contributions to the online databases. Error-correcting at the level of GBIF and ALA is slow and piecemeal, says Dr Mesibov, and should not have to rely on interested outsiders like himself.
"Data cleaning isn't rocket science," said Dr. Mesibov. "The aggregators could do much more checking and could collaborate with their providers in sorting out inconsistencies and fixing at least some of the errors. At the moment, that doesn't seem to be happening, so GBIF and ALA users need to take the aggregators' warnings about data quality very seriously."