Home » News

Excavating observational data buried in botanical literature
Iliyana Kuzmova

It can be surprisingly hard to map the global distribution of a species and even harder to understand how it has changed with time. Even though many millions of observations are available on the Global Biodiversity Information Facility, many more are buried in books, papers, on specimens and in databases. What if all these observations were available to us, wouldn’t this help reveal species distributions and their change?

In order to address interoperability issues, the pro-iBiosphere project is conducting various pilots. One of these pilots is testing ways in which information from legacy texts can be digitized to consolidate data. We plan to use these data to reveal how the distribution of one particular plant species has changed over time: i.e., Chenopodium vulvaria L., a small annual weed commonly associated with man-made disturbance.  Due to its striking smell of rotten fish, C. vulvaria (also known as "Stinking Goosefoot") is easily identified and unlikely to be confused with any other plant.

While, in California (USA) and Victoria (Australia) it is considered an invasive weed, in northern Europe it is a declining species and has been included on the Red Lists of several countries. No one knows why this species is decreasing in some places and increasing in others. Indeed, we don’t even really know what its complete distribution is, let alone the rates of spread and decline.

To convert all the paper records of Chenopodium vulvaria into data, we first need to convert them to digital text. Sometimes, this can be done with Optical Character Recognition (OCR), but in cases were old fashioned fonts have been used, transcription can be done in Wikisource, which is more reliable, but slower than OCR. Once we have the digital text, we mark it up using the GoldenGate Editor. This tool allows us to semantically markup the text so that its meaning is explicit. These data can then be loaded into a database, georeferenced and studied.

Records of biodiversity are, by nature, patchy and biased both spatially and temporally. They are fortuitous accounts of species collected by many different individuals over long periods for a multitude of reasons. Just like fossil hunters, we only get snippets of information from different places and times. Using these pieces of evidence we can reconstruct the situation of the past through statistical techniques. Undoubtedly, the more information we can gather the more reliable our reconstructions will be. So far, we have gathered over 2000 dated observations of Chenopodium vulvaria from hundreds of books, databases and herbaria. Ultimately, we aim to have the best possible set of observations for this species.

The website of the pilot project is a portal to all the data we have gathered so far on Chenopodium. It is a work in progress, so please do not be surprise when you find gaps in the information. If you happen to have information on Chenopodium, we hope you might consider contributing it.

Article written by Quentin Groom (National Botanic Garden, Belgium)

Also collaborating on the pilot project are Patricia Kelbert, Sabrina Eckert & Susy Fuentes (Botanischer Garten und Botanisches Museum Berlin-Dahlem)



flag big

This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 312848