Home » News

The Coordination and Support Action pro-iBiosphere will come to an end by the 31st of August 2014. The project was launched for two years to investigate ways to increase the accessibility of biodiversity data, improve the efficiency of its curation and increase the user base of biodiversity data consumers and applications. Ten of its key major outcomes have been summarised in the "pro-iBiosphere final brochure".

The project delivered a series of recommendations on various pressing topics to the wider biodiversity informatics community, for instance, on how to improve the use of digital infrastructures among taxonomists, on how to addresses barriers to the open exchange of biodiversity knowledge that arise from European laws, in particular European legislation on copyright and database protection rights. The recommendations have been documented in various pro-iBiosphere deliverables (here available).  

The project conducted 5 pilots and organised a total of seven meetings. The enthusiasm, involvement and breadth of the community participation to these meetings was very impressive!. The pro-iBiosphere final event took place from the 10th - 12th of June 2014 at the Bouchout Castle (Botanic Garden Meise, Belgium). An audience consisting of more than 75 persons participated in the event. Activities organised during the event included a Workshop on the Biodiversity Catalogue, Demonstrations on the project pilots, Demonstrations on the outcomes of the Data Enrichment Hackathon, a Training on WikiMedia, a Poster session and the Final Conference.

A major highlight of the Final conference was the official launch and ceremony of the Bouchout Declaration for Open Biodiversity Knowledge Management.  At present (August 2014) more than 170 institutions and 90 organisations have signed the Declaration. For more information please see the news items "The Bouchout Declaration: A commitment to open science for better management of nature" (published below on page # and "The Bouchout Declaration: A contribution from the biodiversity community to Open Digital Science" (published on the Digital Agenda for Europe website).

The conference proceedings, including an event report (detailing the statistics and outputs of the Final Conference), a Storify (i.e. a collection of tweets and pictures) and pictures, are available here.

It has been a pleasure working with all of you!

Soraya Sierra

pro-iBiosphere Project Leader


The 5-year EU project Securing the Conservation of biodiversity across Administrative Levels and spatial, temporal, and Ecological Scales (SCALES) has come to an end in July 2014 resulting in a first of its kind description of challenges that arise in protecting biodiversity across different scales.
A wide range of practical methods and recommendations to improve conservation at regional, national and supranational scales are included in a book published as a synthesis of project outcomes. The book "Scaling in Ecology and Biodiversity Conservation" was published in advanced open access via Pensoft Publisher's Advanced Books platform. This innovative format aimed at accelerating data publishing, mining, sharing and reuse, offers a range of semantic enhancements to book contents, including external sources.
Results are also presented in an easy to use interactive SCALETOOL, specifically developed for the needs of policy and decision-makers. The tool also provides access to a range of biodiversity data and driver maps compiled or created in the project.
Human actions, motivated by social and economic driving forces, generate various pressures on biodiversity, such as habitat loss and fragmentation, climate change, land use related disturbance patterns, or species invasions that have an impact on biodiversity.
Each of these factors acts at characteristic scales, and the scales of social and economic demands, of environmental pressures, of biodiversity impacts, of scientific analysis, and of governmental responses do not necessarily match. However, management of the living world will be effective only if we understand how problems and solutions change with scale.
'The book and the tool are the first of their kind and would be of great help to everyone concerned with the conservation of biodiversity. They provide ideas of how to handle complex issues of scaling in applied and theoretical environmental studies' says the chief editor Prof. Klaus Henle.
The book aims to bundle the main results of SCALES in a comprehensive manner and present it in a way that is usable not only for scientists but also for people making decisions in administration, management, policy or even business and NGOs; to people who are more interested in the "practical" side of this issue.
Guidelines, practical solutions and special tools are also presented as a special web based portal, SCALETOOL, which puts together scientific outcomes widely spread over the scientific literature.
Original Source:
Henle K, Potts S, Kunin W, Matsinos Y, Simila J, Pantis J, Grobelnik V, Penev L, Settele J (Eds) (2014) Scaling in Ecology and Biodiversity Conservation. Advanced Books: e1169. doi: 10.3897/ab.e1169

Creative Commons, a nonprofit organization that enables the sharing and use of creativity and knowledge through free legal tools, just signed the Bouchout Declaration in line with its vision and commitment to open science Data.

Creator and steward of the legal and technical infrastructure that allows open licensing of content, the non profit organization entirely supports the Declaration which exhorts the use of licenses or waivers to grant all users a free right to copy, use, distribute, transmit and display the work publicly, as well as to build on the work and to make derivative works, subject to proper attribution consistent with community practices.

Creative Commons, which has participated in the activities that led to the Joint Declaration of the Data Citation Principles and advocates the use of persistent identifiers to allow discovery and attribution of resources, encourages the Declaration to promote tracking the use of identifiers in links and citations. This methods ensures that sources and suppliers of data are assigned credit for their contributions and Persistent identifiers for data objects and physical objects such as specimens, images and taxonomic treatments with standard mechanisms to take users directly to content and data.

Creative Commons, which works assiduously on fostering the promulgation of open policies and practices, naturally encourages the declaration calls for Policy developments to foster free and open access to biodiversity data.

If you too believe that open biodiversity information is crucial for science and society, join the movement and sign the Bouchout Declaration!




The next BioVel meeting will take place in Paris, France on November 13, 2014.

This one-day event, entitled "BioVeL : In Practice and in future" aims at presenting the achievments, experiences gained and lesson learnt from the BioVel initiative which has been working on building a virtual laboratory for biodiversity research. This event will also provide an opportunity to introduce BioVel plans for the future.

BioVeL is a pilot implementation of some of the core ideas from the LifeWatch Preparatory Phase. In the past three years the project has been working with the biodiversity research community to construct, test, and revise some essential elements of a robust e-infrastructure for biodiversity and ecosystem research.

The event will be structured around the 3 key goals that encapsulate the BIH2013 initiative.

• Integration: Making better use of existing data and tools.
• Cooperation: Working together towards a global biosphere model.
• Promotion: Informatics leadership to serve the needs of science and policy.

For more information and registration, click here.

For any additional information, please contact: [email protected].

Find out more on the BioVel project a




The article published on July 7, 2014 on the European Commission Digital Agenda website presents the Bouchout Declaration launched by the project on June 12, 2014 like a major contribution from the biodiversity community to Open Digital Science.

The article stresses that only three weeks after being launched, this unprecedented declaration have already been endorsed by more than 70 institutions and 140 individuals from 40 countries around the Globe. A total success!

With their signature, the management of the organizations and individuals encourage an overarching approach to Biodiversity Knowledge Management based on the principles of Open Access, the use of unique stable identifiers for data objects, resolution mechanisms that take users directly to content and data, registries that allow discovering, access and re-use of the data as well as fostering an ongoing dialogue to refine the concept of Open Biodiversity Knowledge Management.

The Bouchout Declaration has been translated into 8 languages available online on the Bouchout Declaration website

Follow the Bouchout Declaration on twitter @bouchoutdec

Read the full article by the European Commission online at

The pro-iBiosphere Final Conference successfully took place in Meise on June 12, 2014 at the Bouchout Castle in the Botanic Garden Meise in the North of Brussels. More than 75 participants from the biodiversity and/or e-Infrastructures community joined the active discussions while (i) reviewing project results and the key areas of improvement in the design and implementation of an OBKMS and (ii) providing recommendations on future research needs for the preparation of the next WP 2016-2017 of EU Horizon 2020 Framework Programme for Research and Innovation.

On this occasion, one of the major highlight of the conference has been the official launch ceremony of the Bouchout Declaration on Open Biodiversity Knowledge Management System (OBKMS) in which key biodiversity institutions officially signed it together and the release of the Bouchout Declaration website. Following the event, a total of 66 organizations and 116 individuals endorsed the Declaration. The Declaration has been translated into 8 languages available online on the dedicated website.

The Final Conference has been the last meeting organised among a series of activities, so-called pro-iBiosphere Final event, including (i) MS24 - Model Evaluation Workshop held on June 9-10, (ii) Training on Wikimedia, (iii) Biodiversity Catalogue (BioVeL) Workshop, (iv) Demonstrations on project pilots, (v) Demonstrations on outcomes of pro-iBiosphere Data Enrichment Hackathon and a Poster session organised during coffee breaks on June 11.

The proceedings of the Final Conference (updated agenda with presentations, final attendee list and pictures) with conclusions of each session are available on the wiki page.

An event report has also been produced and is available here detailing the Final Conference objectives, programme, promotion, audience and outputs.

For any additional information, please contact us at [email protected].




The ICT Proposers' day 2014 (#ICTpropday) is a networking event organised by the European Commission and will be held in Florence, Italy on the 9th and 10th of October 2014.

This event is specifically dedicated to networking and promoting research and innovation in the field of Information and Communication Technologies. It will focus on networking for the Horizon 2020 Work Programme 2015.

It is free of charge and offers an exceptional occasion to build quality partnerships as it will connect academia, research institutes, industrial stakeholders, SMEs and government actors from all over Europe. The registration to attend the event is now open.

Find out more on ICT Proposers’ Day website. Register now here.



The BioVeL project, supporting research on biodiversity by offering computerized tools ("workflows") to process large amounts of data from cross-disciplinary sources is proud to announce the release of its Spring 2014 newsletter.

The 5th BioVeL newsletter focuses on the latest developments with the e-laboratory, especially with its new portal. It also addresses the sustainability of the project.

pro-iBiosphere and the BioVeL project have been in close contact in the past months while pro-iBiosphere became a " BioVeL friend ", supporting the objectives of the BioVeL project and the BioVeL project participated in the pro-iBiosphere Final Event while organising a Workshop on Biodiversity Catalogue on June 11, 2014 and financially sponsoring catering and lunch on that day.

The Spring 2014 BioVeL newsletter is available online here.

The Bouchout Declaration targets the need for data to be openly accessible, so that scientists can use the information for new types of research and to provide better advice. Currently, data may be prevented from becoming open or usable because of copyright оr concerns of institutions that hold the data, or because it is not in a form that can be easily managed by computers. The Declaration identifies mechanisms to structure open data so that they can be drawn together, queried and analysed on a much larger scale than was previously possible.
The Bouchout Declaration allows the community to demonstrate its support for data to be openly available. It extends previous efforts, like the Berlin Declaration, to the biodiversity sciences. The objective is to promote free and open access to data and information about biodiversity by people and computers. This will help to bring about an inclusive and shared knowledge management infrastructure that will inform our decisions so that we respond more effectively to the challenges of the present and future.
"Biodiversity research is painstakingly built up from the study of billions of specimens over hundreds of years from every region of the Earth. We are now in a position to share this hard-won knowledge freely with everyone who wishes to read, extend, interconnect, or apply it. We should do so as soon as humanly possible. If we do, we will not only make biodiversity research more accessible, discoverable, retrievable, and useful. We will make it more useful for the critical purpose of preserving biodiversity itself," comments Peter Suber from the Harvard Open Access Project on the significance of the declaration.
International initiatives like the Global Biodiversity Information Facility (GBIF) support science and society by gathering and helping scientists to analyse knowledge acquired by past generations and from streams new observations and technologies. The GBIF's Executive Secretary Donald Hobern commented: "This knowledge cannot be recreated and needs to be used and reinterpreted over time. We need to manage it as a precious resource of value to the whole human race. This is why Open Biodiversity Knowledge Management matters."
The Bouchout Declaration emerged from the pro-iBiosphere project (a Coordination and Support Action funded through the European Union's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement №312848 ) as a reaction to the need of better access to biodiversity information. The inaugural ceremony of the Bouchout Declaration (including official launch of the website) will take place on the 12th of June 2014 during the final event of the project.
"Museum collections around the world hold invaluable biodiversity information that are often hidden in dark rooms. Digitalizing and providing free and open access to these resources through an Open Biodiversity Knowledge Management System in Europe is crucial for the advancement of biodiversity research and better management of nature for a sustainable future. We are happy to be one of the first institutions which endorsed the Declaration" concluded Prof. Johannes Vogel, Director General of the Museum für Naturkunde, Berlin.
Universities, research institutions, funding agencies, foundations, publishers, libraries, museums, archives, learned societies, professional associations and individuals who share the vision of the Bouchout Declaration are invited to join the signatories. If you wish to join the list of signatories or would like to receive additional information please email [email protected].
Among the initial signatories are some of the world's leading natural history museums, botanical gardens, and scientific networks.



The Bouchout Declaration is a major output from the pro-iBiosphere project.

The Bouchout Declaration is an opportunity for those organizations, initiatives and individuals who create, manage and use biodiversity information, and who believe in the opportunities and potential of the big data world, to declare their support of the Open Access agenda.

By endorsing the principles of Open Access and discoverability of data, the signatories strengthen the arguments that will be put to governments and funding bodies, and will accelerate the maturation and evolution of Open Biodiversity Knowledge Management, making biodiversity sciences more relevant, innovative, and responsive to societal needs.

As of to-date, 73 signatories from 26 countries share the vision expressed in the Bouchout Declaration.

The Bouchout Declaration official launch took place today during the pro-iBiosphere Final Conference at Bouchout Castle, Botanic Garden Meise in Belgium. The offical Website ( has been unveiled on this occasion.

If you also share the vision of the Bouchout Declaration, we invite you to sign this document here.





The new Advanced Books platform of Pensoft opens new horizons for semantic book publishing
Easy access to legacy data collected over hundreds-of-years of exploration of nature from the convenience of people's own computers for anyone all over the world? It may sound futuristic but a brand new pilot showcases how this is possible here and now.

The new workflow demonstrates a re-publication of a volume of Flora Malesiana in a semantically enriched HTML edition available on the newly launched, Advanced Books publishing platform. The platform was demonstrated today at the EU funded pro-iBiosphere project which supported, in part, the re-publication of Flora Malesiana.

When Linnaeus was laying the foundations of taxonomy as a science in his Species Plantarum and Systema Naturae books he probably did not imagine that his methods of publication of natural history data would remain almost unchanged for more than 270 years! The bulk of the information on the living World is still closed in paper-based legacy literature, especially in fundamental regional treatises such as Flora, Fauna and Mycota series, hardly accessible for readers, despite the dramatic changes in the publishing technologies that have taken place over the last decade.

The new pilot, developed by Pensoft Publishers in a cooperation with the Naturalis Biodiversity Center, Plazi, and Botanischer Garten und Botanisches Museum Berlin-Dahlem (BGBM), demonstrates how a fundamental book in natural history can start a new life with Advanced Books. Re-publication of the Flora of Northumberland & Durham, published in 1838, will be the next to appear as a result of a collaboration between the Botanical Garden Meise National Botanic Garden of Belgium and Pensoft.

Flora Malesiana is a systematic account of the flora of Malesia, the plant-geographical unit spanning six countries in Southeast Asia: Indonesia, Malaysia, Singapore, Brunei Darussalam, the Philippines, and Papua New Guinea. The plant treatments are not published in a systematic order but as they come about by the scientific efforts of some 100 collaborators all over the world.

With the new platform, such scientifically important historical monographs, enriched with additional information from up-to-date external sources related to organisms' names, species treatments, information on their ecology, distribution and conservation value, morphological characters, etc., become freely usable for anyone at any place in the world.

The re-publication in advanced open access comes with the many other benefits of the digitization and markup efforts such as data extraction and collation, distribution and re-use of content, archiving of different data elements in relevant repositories and so on.

"Advanced Books will bring many outstanding scientific monographs to a new life, however the platform is not only restricted to e-publish our legacy literature." commented Prof. Lyubomir Penev, Managing Director of Pensoft. "New books are mostly welcome on the platform, joining their historical predecessors in an open, common, human- and machine-readable, data space for the benefit of future researchers and the society in general" concluded Prof. Penev.
Original Source:
de Wilde W (2014) Flora Malesiana. Series I - Seed Plants, Volume 14. Myristicaceae. Advanced Books: e1141. doi: 10.3897/ab.e1141


As a part of the series of final project utputs a new pro-iBiosphere article published in the open access journal ZooKeys assesses the need and future for building an Open Biodiversity Knowledge Management System (OBKMS) - the infrastructure for a system that will intelligently manage and integrate digital biodiversity information.

Background. The 7th Framework Programme for Research and Technological Development is helping the European to prepare for an integrative system for intelligent management of biodiversity knowledge. The infrastructure that is envisaged and that will be further developed within the Programme "Horizon 2020" aims to provide open and free access to taxonomic information to anyone with a requirement for biodiversity data, without the need for individual consent of other persons or institutions. Open and free access to information will foster the re-use and improve the quality of data, will accelerate research, and will promote new types of research. Progress towards the goal of free and open access to content is hampered by numerous technical, economic, sociological, legal, and other factors. The present article addresses barriers to the open exchange of biodiversity knowledge that arise from European laws, in particular European legislation on copyright and database protection rights.

We present a legal point of view as to what will be needed to bring distributed information together and facilitate its re-use by data mining, integration into semantic knowledge systems, and similar techniques. We address exceptions and limitations of copyright or database protection within Europe, and we point to the importance of data use agreements. We illustrate how exceptions and limitations have been transformed into national legislations within some European states to create inconsistencies that impede access to biodiversity information.

Conclusions. The legal situation within the EU is unsatisfactory because there are inconsistencies among states that hamper the deployment of an open biodiversity knowledge management system. Scientists within the EU who work with copyright protected works or with protected databases have to be aware of regulations that vary from country to country. This is a major stumbling block to international collaboration and is an impediment to the open exchange of biodiversity knowledge. Such differences should be removed by unifying exceptions and limitations for research purposes in a binding, Europe-wide regulation.

Original Source:

Egloff W, Patterson DJ, Agosti D, Hagedorn G (2014) Open exchange of scientific knowledge and European copyright: The case of biodiversity information. ZooKeys 414: 109–135. doi: 10.3897/zookeys.414.7717



Do not miss this unique opportunity and join us in Meise (Brussels) from June 10-12 to participate in this major event devoted to making fundamental biodiversity data digital, open and re-usable (organized by the pro-iBiosphere project funded by the European Commission DG Connect)!

This pro-iBiosphere Final Event is taking place at a crucial time for the development of new instruments for the future needs of biodiversity research through the preparation of the next WP 2016-2017 of EU Horizon 2020 Framework Programme for Research and Innovation.

Within this context, one of the main highlights of the conference will be to provide key recommendations and inputs from biodiversity experts to the European Commission.

In this context, if you plan to attend and have not yet registered, please do it here (free of charge).

Follow and contribute to the Final Event discussions while tweeting using the following hashtag: #pibmei !

For further information on this event please visit the dedicated wiki page here or contact us at [email protected].

by Stephanie Morales
The pro-iBiosphere project supported by the European Commission (DG CONNECT) through its FP7 research funding programme has the pleasure to invite you to join its Final Event.
During its two-year duration, pro-iBiosphere contributed to making fundamental biodiversity data digital, open and re-usable. The achievements of the project will be presented in a series of activities (workshops, trainings, demonstrations and a Final Conference) that will take place from Tuesday the 10th to Thursday the 12th of June 2014 at the Bouchout Castle in the Botanic Garden Meise, in Meise (Brussels), Belgium.
The event wiki page here has been recently updated with additional information on the different series of activities organised and the Final Conference agenda now comprises worldwide high-level speakers, including (i) officials from the European Commission DG Connect, the US National Academy of Sciences, (ii) representatives from botanic gardens, natural museums, other biodiversity initiatives and (iii) experts or (iv) researchers specialized in biodiversity informatics, environmental/natural science.
One of the key objectives of these series of events is to provide key recommendations and inputs from biodiversity experts for the preparation of the next WP 2016-2017 as specifically asked by the European Commission.
The number of registered attendees has already reached a good level of participation, in this context, if you plan to attend and have not yet registered, we can only recommend you do to it as soon as possible here (due to room capacity constraints).
For further information on this event (agenda, concept & objectives, registration) please visit the Event wiki page or contact us at [email protected].


The pro-iBiosphere Final Event will take place on 9 – 13 June 2014 at the Bouchout Castle of the Botanic Garden Meise, Brussels. During the third day of the meeting a special event is designated for the demonstration of the pro-iBiosphere pilots.
During this session, the task and pilot leaders will demonstrate the tools and workflows developed or improved in the course of the project. The demonstration will be interactive and will allow for discussions, real-time tests and consultations on possible implementations by the interested stakeholders. The pilots and demos planned until now are:
Interoperability of taxon treatments
In the past, taxonomic information has been published in numerous scattered outlets and in different formats. The production of a taxonomic revision or such as a flora or fauna required that the appropriate text was discovered, and retyped manually. The current pilot demonstrates a greatly accelerated workflow that takes advantage of the informatics developments of pro-iBiosphere. The workflow locates, identifies, and enhances data included in treatments from both legacy and newly published taxonomic literature, facilitating discovery, analysis, and reuse through the Plazi Treatment Repository (PTR).
The workflow includes the following steps:
  • Step 1: Convert printed taxonomic articles/monographs to digital text format.
  • Step 2a: Mark up generic document features and domain-specific information (taxon treatments) and store the results at Plazi; and also
  • Step 2b: Export of newly published treatments marked up during the editorial process (for example in the journals ZooKeys, PhytoKeys and Mycokeys)
  • Step 3: Browse, search, export and re-use treatments coming from different sources.
Streamlining automated registration of taxon names between publishers and registries
The pre-publication registration of taxonomic and nomenclatural acts with registries such as the International Plat Name Index (IPNI), Index Fungorum, MycoBank, and ZooBank involves two main classes of actors: (1) publishers, and (2) registry curators. The publisher takes the responsibility for initiating the registration of nomenclatural acts so that the workflow can be performed following a common stepwise model:
  • Step 1. XML message from the publisher to the registry on acceptance of the manuscript containing the type of act, taxon names, and preliminary bibliographic metadata; the registry will store the data but not make these publicly available before the final publication date.
  • Step 2a. Response XML report containing the unique identified of the act as supplied by the registry and/or any relevant error messages.
  • Step 2b. Error correction and d-duplication performed manually: human intervention at either registry’s or publisher’s side (or at both).
  • Step 3. Inclusion of registry supplied identifiers in the published treatments (protologues, nomenclatural acts).
  • Step 4. Making the information in the registry publicly accessible upon publication, providing a link from the registry record to the artice.
Improved cooperation and interoperability of e-infrastructures
Challenges related to the technical interoperability of biodiversity data present themselves in competing standards, ambiguous, poor or absent documentation, lack of stable identifier systems and the absence of semantic interoperability. For improving the interoperability between e-infrastructures, stable identifiers for biodiversity collection objects and a global service registry were identified as the two major achievables for progress. The use of state-of-the-art digitisation software & tools for literature markup is another important factor.
  • Steps forward 1: Implementation of HTTP-URIs by 8 major institutions for their collection objects by October 2013 and recommendations for further topics to be explored in detail.
  • Steps forward 2: Agreement on the BiodiversityCatalogue as a global registry for biodiversity related services. Improvement recommendations for it to be able to fill this role even better, registration of services available now.
  • Steps forward 3: Workflow improvement between the Plazi document registry and the Common Data Model (CDM)-based EDIT Platform for Cybertaxonomy ( ). In the course of this a markup granularity table evolved. The pro-iBiosphere pilot portals visualize the data results at different stages and show the possibilities for scientists willing to mark up their data. The markup granularity table explains in detail work load and connected output gain.

Daniel Mietchen (Museum für Naturkunde Berlin)
For two days in February 2014, a pro-iBiosphere workshop on mark-up of biodiversity literature brought together a group of 20 participants at the Museum für Naturkunde in Berlin. In an introductory talk, Rod Page of the University of Glasgow presented the idea of a biodiversity knowledge graph that interlinks the biodiversity literature with the wider biodiversity information landscape. He then discussed a number of use cases for mark-up - namely for archiving, display and citation linking - and questioned whether it was actually necessary for identifying nodes and extracting edges of the knowledge graph, which could be achieved by simple indexing. He also discussed collaborative editing and version control with regards to mark-up.
With this introductory presentation having set the stage for discussing mark-up of the biodiversity literature in general terms and from a long-term perspective, the following presentations looked at specific subsets of that literature corpus, at specific use cases, at specific approaches to mark-up, and at workflows and business models around that. For example, Dimitris Koureas of the Natural History Museum in London discussed how the mark-up of specimen records in the literature could help with the digitization of specimen labels (which are often transcribed in systematic reviews of taxa or collections), and how the tracking of specimen citations in the literature could allow to assess the impact of collections on current and past research. Another perspective was provided by William Ullate of the Biodiversity Heritage Library, who described how BHL is ramping up its efforts on mark-up, including through gamification.
Throughout the workshop, there was a lively discussion, and the individual talks were given not according to a fixed schedule but when the respective topic came up in the discussion. All presentations are linked from the workshop page on the pro-iBiosphere wiki.
Screenshot 2014-04-25 11.43.07 copy.png
Figure 1: The Biodiversity Knowledge Graph. By Roderic Page.






Soraya Sierra*, Rutger Vos* (Naturalis Biodiversity Center)
From 17 – 21 March 2014, software developers and taxonomists came together in Leiden, the Netherlands, to address the challenges, and highlight the opportunities, in the enrichment of biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. The event had two goals:
  1. To facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as ecologists and niche modelers.
  2. To foster a community of experts in biodiversity informatics and to build human links between research projects and institutions.
The Hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools, and visualization.
The Biodiversity Data Enrichment Hackathon followed a use-case-driven model, i.e. a model where effort during the Hackathon was prioritized on the basis of compelling end user scenarios that could be enabled by the combined contributions of people that otherwise, outside of the Hackathon, do not collaborate. Most use cases and exemplar data were provided by taxonomists. The suggested use cases resulted in nine breakout groups addressing three main themes: (i) mobilizing heritage biodiversity knowledge; (ii) formalizing and linking concepts; and (iii) interoperability between service platforms.
Beyond deriving prototype solutions for each use case, areas of insufficiency were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilizing biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalization of the concepts (and the links between them) that define our research domain as well as increased interoperability between the software platforms that operate on these concepts.
The tangible outcomes of the Hackathon are finding sustainable homes in the appropriate code bases (e.g. the code bases for CDM platform, the Plazi server, the BHL server) and registries and repositories (e.g. the BiodiversityCatalogue, the Pypi index, the NCBO BioPortal), or form the basis of proofs-of-concept for scientific publications and project proposals. The main intangible outcomes of the event are turning out to be the fostering of a community of experts in biodiversity informatics and the strengthened human links between research projects and institutions. The event also demonstrated both the ongoing need for data normalization and integration, e.g. through the application of ontologies, as well as the opportunities for innovative research such integration will afford.
Additional information of the Hackathon is available here. The outcomes of the Hackathon will be reported in the Biodiversity Data Journal (May 2014 issue) and presented during the pro-iBiosphere final event.




Patricia Kelbert1 and Quentin Groom2
The EDIT Platform for Cybertaxonomy is a convenient tool for managing and editing details of specimens and observations. Also, the BioVel workflows for data refinement and niche modelling provide a powerful means to clean up and analyse the distributions of organisms. A way to join these seamlessly together was lacking so that, at the one end of the workflow, a researcher can manage their data in a user friendly interface, and at the other, sophisticated models of distributions can be generated. This problem was tackled by a task group at the recent pro-iBiosphere Hackathon.
One of the pro-iBiosphere pilots was to use legacy literature as a source of data on the historical changes to the distribution of Chenopodium vulvaria. Details of over 2000 observations and specimens were imported into the Common Data Model (CDM) database administered with the Taxonomic EDITor. Many of these data were extracted from legacy literature through a process of digitization and mark-up. These were imported as a whole into the CDM and are a valuable test dataset for bioclimatic niche modelling. In this way, heterogeneous data was homogenised to make it tractable to statistical analysis.
Until now, the link between database and workflow could only be performed by experienced users, who would need directly access to the database. During the hackathon the task group developed a new Java web-service within the CDM-library. This web-service takes the identifier of a taxon as input and returns a list of specimens or observation details. The precise fields returned were based on the prerequisites for reusing in the BioVel refinement workflow, but also contained other fields that might be useful in the future. In this manner we have completed the final link in a workflow that starts with 16th century botanists and ending with 21st century bioclimatic modelling.
Figure 1. A schema showing the flow of data from legacy publications to modelling workflows. The red arrow shows the additional link in the chain.

Bachir Balech (Institute of Biomembranes and Bioenergetics - 
Italian National Research Center), Christian Brenninkmeijer (University of Manchester), Hannes Hettling (Naturalis Biodiversity Center), Rutger Vos (Naturalis Biodiversity Center)
Biodiversity phylogenetics' analysis workflows usually involve various software tools connected in series and depend on different sources and types of data. The proliferation of different, mutually incompatible and poorly defined data syntax standards poses significant challenges both for software developers and end users of such workflows. Recent years have seen the development and adoption of a new, expressive, and easy to process data standard that intends to remedy this issue: NeXML.
NeXML is an XML standard that supports the representation of (among others) taxa, character-state matrices and phylogenetic trees as well as semantic annotations (using RDFa) within one single document and is therefore specifically tailored to ease the interplay of different tools in evolutionary comparative and biodiversity analysis. 
Since XML documents are generally intended to be handled by software rather than by users directly,  a software tool to easily manipulate NeXML files appears desirable. To this end, participants of the biodiversity data enrichment hackathon (Leiden, the Netherlands, 17 – 21 March 2014) developed web services that can (i) construct NeXML documents from data encoded in commonly-used phylogenetic file formats or add metadata to an existing NeXML document, and (ii) extract information identified by the user from a given NeXML file and represent it in a variety of output formats.
To make the NeXML merger- and extractor tools easily accessible for the biodiversity research community  and to enable their integration into existing workflows, they are  implemented as RESTful web services, to be hosted by Naturalis Biodiversity Center and made available in the BiodiversityCatalogue. Clients that use these services can be implemented in a variety of ways; proofs-of-concept demonstrate that this is trivially done using the popular workflow management tool Taverna, such that these data merger and extractor facilities are available to the users of, inter alia, BioVeL workflows. Preliminary tests of NeXML merger and extractor have been conducted using data inputs and outputs used by the phylogenetic service set of BioVeL (; while, NeXML extractor output has been tested, visualizing a phylogenetic tree with its taxa associated metadata, by implementing ITOL ( tool wraper within a taverna workflow.
For more information, visit the project wiki:


 Robert Hoehndorf (Aberystwyth University),  Quentin Groom (Botanic Garden Meise), George Gosline (Royal Botanic Gardens Kew), Claus Weiland (Biodiversity and Climate Research Centre / Senckenberg), Thomas Hamann (Naturalis Biodiversity Center)
The aim of the Traits task group at the recent pro-iBiosphere Biodiversity Data Enrichment Hackathon was to extract plant trait data from digitized Floras (i.e. a book that describes the plant life occurring in a particular region or time). We wanted to demonstrate the feasibility of using an ontology-based approach for extracting and integrating trait information from digitized Floras, even when the Floras are available in different languages. To tackle our main aim, we addressed two main questions: (1) Can we automatically extract trait and phenotype information from Flora descriptions written in multiple languages (English and French)?, and (2) Can we represent and integrate the extracted trait and phenotype information semantically using an ontology-based approach?
Extracting structured information about traits and phenotypes from natural language descriptions is a common problem in mobilizing legacy biodiversity data. One tool that has been developed for this purpose is the CharaParser [1], which is applied in the Phenoscape project [2] and integrated in the Phenex tool [3]. As the flora descriptions in our use cases were written in both on English and French language, and CharaParser primarily supports English language descriptions, we have chosen not to use CharaParser during the Hackathon. Instead we followed a simple text matching approach applicable to multiple languages. In particular, we identified mentions of plant anatomical entities (taken from the Plant Ontology [4]) and mentions of trait or phenotype terms (from the PATO ontology [5]) in the Flora descriptions. We  used a dictionary to translate French and English terms referring to plant anatomy or plant traits.  In the future, we plan to use more complex approaches such as CharaParser to provide a more complete and accurate mark-up of anatomy and phenotype terms in Flora descriptions.
To semantically describe traits, we follow the Entity-Quality (EQ) approach [6] that has been widely applied to semantically characterize model organism [7] and disease phenotypes [8]. Using the EQ model, a trait is characterized by an entity (E) of which a trait is observed, and the quality (Q) that characterizes the trait. The characterize identity can be an anatomical entity (from the Plant Ontology), or a biological process or function (from the Gene Ontology). The Phenotypic Attribute and Trait Ontology (PATO) contains a rich classification of widely applicable traits. A phenotype is described in a similar way using the EQ pattern, but the quality has a specific value and is a subclass of the trait. For example, the trait "flower color" will be described using the entity "flower" (from Plant Ontology) and the trait "color" (from PATO). The phenotype "flower red" is described using the entity "flower" (from Plant Ontology) and the quality "red" (from PATO), where "red" is a subclass of "color" in PATO.
We then used a data-driven approach to build a flora phenotype ontology (FLOPO) from the EQ statements we identified in the Flora descriptions. FLOPO is an ontology of over 25,000 trait and phenotype terms, all of which have at least one taxon annotation in one of the Floras we processed. The draft of FLOPO is available in BioPortal (, and the source code we produced and the data we used is available from
We have also started to generate further resources that we plan to use in the future. In particular, we have started to add environmental terms to the Environment Ontology [9] that will allow us to extract parts of the environmental conditions in which taxa are found, we collected vocabulary and glossary terms that need to be incorporated into FLOPO. We have also experimented with using an RDF store that contains the FLOPO and its taxon annotations.
[1] Cui, H. (2012). CharaParser for fine-grained semantic annotation of organism morphological descriptions. Journal of American Society of Information Science and Technology. 63(4) DOI: 10.1002/asi.22618
[6] Gkoutos, G. V., Green, E. C., Mallon, A.-M. M., Hancock, J. M., and Davidson, D. (2005) Using ontologies to describe mouse phenotypes. Genome biology, 6(1).
[7] Mungall, C., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., and Ashburner, M. (2010) Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2+.
[8] Robinson, P. N. et al. (2008) The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. American journal of human genetics, 83(5), 610–615.
For more information, please contact Robert Hoehndorf: [email protected]



flag big

This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 312848