Soraya Sierra*, Rutger Vos* (Naturalis Biodiversity Center)
From 17 – 21 March 2014, software developers and taxonomists came together in Leiden, the Netherlands, to address the challenges, and highlight the opportunities, in the enrichment of biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. The event had two goals:
To facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as ecologists and niche modelers.
To foster a community of experts in biodiversity informatics and to build human links between research projects and institutions.
The Hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools, and visualization.
The Biodiversity Data Enrichment Hackathon followed a use-case-driven model, i.e. a model where effort during the Hackathon was prioritized on the basis of compelling end user scenarios that could be enabled by the combined contributions of people that otherwise, outside of the Hackathon, do not collaborate. Most use cases and exemplar data were provided by taxonomists. The suggested use cases resulted in nine breakout groups addressing three main themes: (i) mobilizing heritage biodiversity knowledge; (ii) formalizing and linking concepts; and (iii) interoperability between service platforms.
Beyond deriving prototype solutions for each use case, areas of insufficiency were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilizing biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalization of the concepts (and the links between them) that define our research domain as well as increased interoperability between the software platforms that operate on these concepts.
The tangible outcomes of the Hackathon are finding sustainable homes in the appropriate code bases (e.g. the code bases for CDM platform, the Plazi server, the BHL server) and registries and repositories (e.g. the BiodiversityCatalogue, the Pypi index, the NCBO BioPortal), or form the basis of proofs-of-concept for scientific publications and project proposals. The main intangible outcomes of the event are turning out to be the fostering of a community of experts in biodiversity informatics and the strengthened human links between research projects and institutions. The event also demonstrated both the ongoing need for data normalization and integration, e.g. through the application of ontologies, as well as the opportunities for innovative research such integration will afford.
Additional information of the Hackathon is available here
. The outcomes of the Hackathon will be reported in the Biodiversity Data Journal (May 2014 issue) and presented during the pro-iBiosphere final event