Green digitization: Botanical collections data answer real-world questions
Even as botany has moved firmly into the era of "big data," some of the most valuable botanical information remains inaccessible for computational analysis, locked in physical form in the orderly stacks of herbaria and museums. Herbarium specimens are plant samples collected from the field that are dried and stored with labels describing species, date and location of collection, along with various other information including habitat descriptions. The detailed historical record these specimens keep of species occurrence, morphology, and even DNA provides an unparalleled data source to address a variety of morphological, ecological, phenological, and taxonomic questions. Now efforts are underway to digitize these data, and make them easily accessible for analysis.
Two symposia were convened to discuss the possibilities and promise of digitizing these data—at the Botanical Society of America's 2017 annual meeting in Fort Worth, Texas, and again at the XIX International Botanical Congress in Shenzhen, China. The proceedings of those symposia have been published as a special issue of Applications in Plant Sciences; the articles discuss a range of methods and remaining challenges for extracting data from botanical collections, as well as applications for collections data once digitized. Many of the authors contributing to the issue are involved in iDigBio (Integrated Digitized Biocollections), a new "national coordinating center for the facilitation and mobilization of biodiversity specimen data," as described by Dr. Gil Nelson, a botanist at Florida State University and coeditor of this issue.
iDigBio is funded by the U.S. National Science Foundation's Advancing Digitization of Biodiversity Collections initiative, and has already digitized about 50 million herbarium specimens. According to Dr. Nelson, "A primary significance has been community building among biodiversity scientists, curators, and collections managers, and developing and disseminating recommended practices and technical skills for getting these jobs done." The challenges of digitizing these data are formidable, said Dr. Nelson, and include "developing computer vision techniques for making species determinations and scoring phenological traits, and developing effective natural language processing algorithms for parsing label data."
But as the papers in this issue show, steady progress is being made in developing methods to address these challenges. Nelson et al. (2018) and Contreras (2018) address more nuts-and-bolts issues of data management, the former discussing the need for globally unique IDs for herbarium specimens, and the latter providing a workflow for digitizing new fossil leaf collections. Botella et al. (2018) review and discuss the prospects for "computer vision" aided by deep-learning neural networks that, while in their infancy, could eventually identify species from variable images. Yost et al. (2018) offer a protocol for digitizing data on phenology (the timing of events such as flowering or fruiting) from herbarium specimens.
These digitization methods can help unlock valuable herbarium data to address a range of questions. James et al. (2018) discuss how digitized herbarium specimens can be used to show how plant species have responded to global change, for example by using location and time data to model shifts in range. Cantrill (2018) discusses how the Australasian Virtual Herbarium database has been used for ecological and other research. Thiers and Halling (2018) extend the applications to the fungal world, showing how herbarium data can be used as a baseline to determine the distribution of macrofungi in North America. Furthermore, digitization efforts can have real payoff in public perception; Dr. Nelson sees an "increasing presence of biodiversity data and museums in the popular press, which has raised the profiles of herbaria and other collections for the general public." Along these lines, Konrat et al. (2018) show how digital herbarium data can be used to engage citizen scientists.
Through centuries of painstaking collection and cataloguing, botanists have created a unique and irreplaceable bank of data in the tens of millions of herbarium specimens worldwide. But converting a dried, pressed plant specimen with a handwritten label from 1835 into a format that you can fit on a USB stick is no small trick. Using creative thinking, sophisticated methodology, and hard work, these scientists are bringing the valuable information locked in herbarium specimens into the digital age.