pro-iBiosphere - News

SWeDe (Scientific Web-service Description) - an XML Schema Definition for describing Web Services in the scientific domain

24.04.2014 Share:

Niall Beard (University of Manchester), Patricia Kelbert (FUB-BGBM), Bachir Balech (Institute of Biomembranes and Bioenergetics - Italian National Research Center)

At the Biodiversity Data Enrichment Hackathon in Leiden we created an XML Schema Definition for describing Web services in the scientific domain called SWeDe (Scientific Web-service Description).

A web service provider wishing to propagate their web service will upload descriptive information on catalogue sites such as the Biodiversity Catalogue, the Tools Registry or any other relevant catalogue. This information should include a textual description of how to use the service as well as usage conditions such as licensing and restrictions, and other useful annotations.

The purpose of SWeDe is to allow web service providers to maintain just one document describing their web services rather than maintaining documentation over several different catalogues.

Hence, if a provider is required to change some information about their service, they can do so once - in their SWeDe document. Participating catalogues can then both periodically or in real-time, download and parse the SWeDe file and display its contents within their site. They can then update their databases with any alterations accordingly.

The SWeDe Schema was designed by scientists and developers to cover as many aspects of scientific web services as possible. These include attributes such as the scientific category, technological category, projects (ie. funding), contact information (ie. institutions, persons) , intellectual property rights (IPR) and citations. The SWeDe schema re-uses several components from the Access to Biological Collections Data (ABCD) Schema. It can be used to describe services of both the two most predominant service types, REST and SOAP.

In addition to the schema, a rudimentary form to create your own SWeDe document (code-named the "SWeDe farmer") was also produced which can be found at http://swede-farmer.herokuapp.com

Further steps involve collaborating with Biodiversity Catalogue to parse SWeDe schemas, to improve the SWeDe Farmer, and to disseminate SWeDe to the scientific community.

The full XSD schema can be found in its GitHub Repository and further reading about SWeDe can be found on the pro-iBioshpere wiki.

https://github.com/njall/XS-SWeDe

http://wiki.pro-ibiosphere.eu/wiki/The_SWeDe_Project

http://swede-farmer.herokuapp.com/

For more information please contact [email protected]

Posted by pro-iBiosphere

The running of Taverna Workflows within an IPython Notebook

23.04.2014 Share:

Alan Williams (University of Manchester, Aleksandra Pawlik (Software Sustainability Institute), Youri Lammers (Naturalis), Ross Mounce (University of Bath)

During the recent pro-iBiosphere Data Enrichment Hackathon, a prototype Taverna Player Client Python package was developed for IPython Notebook. The package allows the listing of workflows available on a Taverna Portal, selection of a workflow and the running of the workflow within the Notebook. Data from the Python environment can be used as inputs to the workflow, and the results of the workflow run are available for further manipulation in the notebook. User can interact with the running of the workflow using the Taverna Player and interaction services.

IPython Notebook [1]provides an interactive computational environment within a web browser. Users can write and execute Python code. This code may be combined with text, mathematical and statistical calculations, production of plots and HTML display to produce shareable and re-usable notebooks. These notebooks can be shared on the IPython Notebook Viewer[2].

Taverna[3] provides a suite of tools for workflow design, editing and execution. This includes the Taverna Workbench, the main creation tool for workflows, and the Taverna Server. Taverna Server enables you to set up a dedicated server for executing workflows remotely and it can be accessed by a WSDL or a REST API.

Instances of a Taverna Portal can be used to host, share and execute Taverna Workflows. The execution takes place on a Taverna Server and is exposed within the portal using a Taverna Player. The Taverna Player can also be accessed by a REST API.

Following discussions with the developers of IPython Notebook, the exciting potential of running Taverna Workflows from within an IPython Notebook was realized.

Figure 1: Running a workflow in IPython Notebook

The Taverna Player Client can be used to chain together workflows, using the outputs from one workflow run as the inputs to another. The capabilities of IPython Notebook can be used to generate documentation of the overall experiment; the templating mechanisms of jinja2 prove extremely useful for this.

The code for the Taverna Player Client is hosted on github[4] and a description of its current classes is available[5]. An example notebook has been uploaded to the Notebook Viewer [6].

Further work on the Taverna Player Client is planned, including meetings, both remote and face-to-face with the developers of IPython Notebook. The Client has been demonstrated to members of the BioVeL[7] and SCAPE[8] projects and colleagues at the University of Leiden.

We wish to thank the developers of IPython Notebook and Taverna Player, especially for their online support during the recent hackathon.

For more information, contact [email protected]

[1] http://ipython.org/notebook.html

[2] http://nbviewer.ipython.org/

[3] http://www.taverna.org.uk

[4] https://github.com/myGrid/DataHackLeiden

[5] http://dev.mygrid.org.uk/wiki/download/attachments/18972939/tavernaPlayerClient.html

[6]http://nbviewer.ipython.org/urls/raw.githubusercontent.com/myGrid/DataHackLeiden/alan/Player_example.ipynb?create=1

[7] http://www.biovel.eu

[8] http://www.scape-project.eu/

Posted by pro-iBiosphere

Hacking OCR for pro-iBiosphere

22.04.2014 Share:

* by David P. Shorthouse, Rod Page, Kevin Richards, Marko Tähtinen

Taking his own lead from a pitch he delivered to an audience of receptive biodiversity informaticians at the outset of the March 17-21, 2014 pro-iBiosphere hackathon, Rod Page (University of Glasgow) fashioned an engaging interface to edit the OCR text from scanned pages in the Biodiversity Heritage Library (BHL). He wooed David P. Shorthouse (Canadensys), Kevin Richards (ex Landcare Research New Zealand) and Marko Tähtinen (University of Eastern Finland, BioVeL) away from eight other competing task groups, each of which issued products in a remarkably short amount of time.

The purpose of the pro-iBiosphere hackathon was to "enrich structured biodiversity input data with semantic links to related resources and concepts". The OCR task group led by Rod had a distinctly different starting point, one that is no less important to the semantic linking of biodiversity resources. The unstructured data in the BHL is arguably the richest source of freely accessible information for taxonomists and biodiversity enthusiasts that can be mined into structured data. However, the quality of its OCR output suffers from variable typefaces, layouts, page contrasts and page bleeding, artifacts and other issues that occasionally bewilder its OCR engine. As a result, data mining and indexing routines that lift scientific names, place names, and other entities in support of semantic linking are not always successful. The browsing interface in the BHL could be made more engaging if visitors had an opportunity to rapidly correct the OCR text while viewing the original scanned image, thus enriching search and discovery for future visitors. Indeed, BHL and its partners were recently awarded a "Digging Into Data Challenge" grant (see http://blog.biodiversitylibrary.org/2014/03/first-meeting-of-mining-biodiversity.html), part of which will employ automated text-cleaning methodologies developed by its Canadian collaborators. An OCR editor might complement their funded work. Likewise, the Finnish National Library has developed its own OCR editor interface (see http://blogs.helsinki.fi/fennougrica/2014/02/21/ocr-text-editor/). Unlike the Finnish editor that uses ALTO XML as its source documents, the OCR editing interface developed during this hackathon uses BHL’s DjVu XML documents as its source, rendered as HTML5.

The OCR Task Group had one aim: provide a simple interface for interactive editing of text, as well as tools to make inferences from the edits. After four solid days of hacking, the team completed this aim and integrated value-added features to engage users and to boost developer confidence in reuse of the code. The underlying document store is the cloud-based CouchDB (on Cloudant) and the team is confident that the proof-of-concept can be made to scale. The capabilities of the software are:

An in-place panel shows the exact line in the original scanned image while the user edits a single line of OCR text at a time (Figure 1)
Global Names scientific name-finding is integrated in real-time when a user completes a line edit, giving feedback if a scientific name is newly recognized (Figure 2)
Authentication uses the facile https://oauth.io/ such that all edits are tied to users’ OAuth2-provider accounts (eg Google, Twitter, GitHub)
Frequencies of common edits are summarized in real-time and other words that may benefit from similar edits are highlighted for users
Batch processes collapse all user edits and text files are recreated for possible re-introduction into data mining routines
Unit and integration tests are included

Figure 1. The OCR Editing interface rendered as HTML5, illustrating the original line of text as a clipped image under the line being edited, a scrolling tally of user edits, lines that have been previously edited (yellow highlight) and words that share strings of characters that match previous edits elsewhere on the page (mauve highlight).

Figure 2. Tooltip showing a scientific name newly recognized by the Global Names Recognition and Discovery service when a user completes an edit.

A proof-of-concept can be examined at http://bionames.org/~rpage/ocr-correction/ and the MIT-licensed code can be obtained from https://github.com/rdmpage/ocr-correction. The team will follow-up with the BHL to share what was accomplished and to discuss how this could be integrated in their web-based interface.

The team spent the last day of the hackathon investigating the production of DjVu XML files from scanned specimen labels. Although investigations are still underway, this particular outcome would be an excellent enhancement to the workflow at the Université de Montréal Biodiversity Centre (David P. Shorthouse) and useful for other members of the Canadensys network. The OCR editing interface may also be useful for the multi-national Notes From Nature crowd-sourcing initiative, http://www.notesfromnature.org/ as well as other national, regional, and local specimen label digitizing efforts throughout the world.

Rutger Vos and Soraya Sierra (Naturalis, co-organizers) received abundant praise by all participants at the completion of the hackathon, and rightly so. The hackathon was exceptionally well organized, developer team sizes were perfect for each of the nine task groups, each participant’s work ethic was remarkable, facilities were well provisioned, and nibbles and luncheons were delectable. We look forward to the reactions of pro-iBiosphere members at the final event in Meise, Brussels.

Contact:

David P. Shorthouse

Université de Montréal Biodiversity Centre / Canadensys, Montréal, QC CANADA

Email: [email protected]

Posted by pro-iBiosphere

Important principles of identification and web integration: Identifier and Resolution

22.04.2014 Share:

by Kevin Richards, email: [email protected]

The topic of "stable unique identifiers" in the biodiversity informatics community has had quite a varied history in recent years. With the fast changing world of technology, information and the latest approaches to deal with information storage and access, several changes in direction have taken place.

In these changing times it seems that trying to stick to basic technologies, especially those that work with standard internet protocols, is the way to go. However, it is important to emphasise the two major components of identifiers: the IDENTIFIER and the RESOLUTION. These important principles of identification and web integration were put to use at the recent Biodiversity Enrichment Hackathon that took place on 17-21 March 2014 in Leiden. The importance of identification and resolution is obvious when attempting to link various data sets and information sources in the Biodiversity domain.

IDENTIFIER for the data

The first issue for any user of data is the need to identify that particular piece of data. This has traditionally been done using fairly local identifiers such as a number counter (i.e. 1,2,3...). With the need to integrate and access data globally, other mechanisms have been required. The simplest approach to this is called Universally Unique Identifier (UUID). UUIDs are hard to read and quite unappealing to look at, for example "1696AC49-548F-404D-9DEA-8A1C4DDA37F4" but are still a good mechanism for identifying data in a computer system, and hence, work well for computer needs.

RESOLUTION of data by their identifiers

With the increasing demand to have data accessible and linked on the web other identifier mechanisms are required to allow data to be fetched via their identifiers. Within the biodiversity community several approaches have been taken. Originally LSID (Life Science IDentifiers) were promoted as they had several appealing features, namely, a degree of indirection from the domain name associated with the data host and a defined protocol for accessing the data and metadata for a particular object. Other identifier systems were also considered such as DOI, PURL and Handles. The main benefit of all these identifier systems is that the data is then accessible over the web using web technologies.

Then came along the semantic web with some really cool ideas about linking data together in a meaningful way and building a reusable, re-purposeable giant set of data. This has become really appealing to biodiversity informaticians and has consequently resulted in some interesting hurdles to jump to achieve these attractive ambitions. Firstly semantic web technologies highly depend on automation and basic web protocols for harvesting and linking data. So any identifier system that doesn't work well with basic HTTP web protocols is difficult to integrate. This meant that LSIDs have become unfavourable due to their reasonably complex resolution protocol. Instead basic stable permanent URLs have been promoted.

A good approach to using these type of identifiers is to first pick a very agnostic domain name, ie not an institution or university name, but perhaps a "project" name. A good example of this is the International Plant Names Index project – also known as IPNI (its data system is hosted by the Royal Botanic Gardens Kew, London). Then a locally unique identifier portion is attached to the chosen domain name. An example of this combination is Zoobank with their zoobank.org domain name and an identifier for a particular piece of data they host, eg http://zoobank.org/NomenclaturalActs/8BDC0735-FEA4-4298-83FA-D04F67C3FBEC is a resolvable identifier for the zoobank record for the taxon "Chromis abyssus".

The pro-iBiosphere project has created a Best Practices page for stable URIs that outlines some good approaches to creating identifiers for your data with consideration of semantic web requirements and the latest ideas on identification.

Posted by pro-iBiosphere

Data visualisation task for pro-iBiosphere

22.04.2014 Share:

by David King* (Open University), Jeremy Miller (Naturalis), Guido Sautter (Plazi), Serrano Pereira (Naturalis)

* [email protected]

Inspired by Pensoft's development in electronic publishing workflows, in combination with marked-up texts generated using GoldenGATE, Jeremy Miller (Naturalis) devised the design for a dashboard to visualise treatment data with the aim of better understanding that data and assisting with its quality control. Ultimately, Jeremy's vision would make it be possible to offer a kind of reverse Biodiversity Data Journal, resurrecting primary data from marked-up legacy literature for aggregation and re-analysis. Our challenge in the recent pro-iBiosphere hackathon, excellently hosted by Naturalis, was to craft a prototype to extract and display the data for Jeremy's dashboard.

Working with GoldenGATE's author, Guido Sautter, enabled us to immediately refine one weakness of the original design: rather than process exported GoldenGATE marked-up text to extract statistical data, we could have GoldenGATE extract it for us and make that data available for export. Hence, GoldenGATE's functionality was extended and a new API service made freely available at http://plazi.cs.umb.edu/GgServer/srsStats for us to use, and for anyone else to use who wishes to explore this statistical data. Some solid visualisation work by Serrano Pereira, a recent recruit to Naturalis, using the established frameworks jQuery, jqPlot and jVectorMap saw the exported data rendered into the form Jeremy envisaged.

A version of the demonstrator produced during the hackathon is currently available at http://plazi.byobu.info/, courtesy of Plazi, a pro-iBiosphere partner. We look forward to refining and enhancing the existing demonstrator in-line with feedback from Jeremy and other users, and from its presentation at pro-iBiosphere's final event in June.

Jeremy's original concept for the dashboard is available from https://github.com/Dauvit/Data_enrichment/tree/master/data_visualisation/use_case.

The code for GoldenGATE can be downloaded from https://code.google.com/p/goldengate-tools/.

The documentation for GoldenGATE's statistical export service is available from https://github.com/Dauvit/Data_enrichment/blob/master/data_visualisation/Stats_queries_HOWTO.md.

The code for the demonstrator dashboard can be downloaded from https://github.com/Dauvit/Data_enrichment/tree/master/data_visualisation.

Posted by pro-iBiosphere

The pro-iBiosphere Final Event on very promising tracks

15.04.2014 Share:

The pro-iBiosphere Final series of events organized in Brussels (Meise) on June 10-12 at the Bouchout Casle in the Botanic Garden Meise domain is on very good tracks.

The event wiki page here has been recently updated with additional information on the different series of activities organised (workshops, trainings and demonstrations) and the Final Conference agenda now comprises worldwide high-level speakers, including (i) officials from the European Commission DG Connect, the US National Academy of Sciences, (ii) representatives from botanic gardens, natural museums, other biodiversity initiatives and (iii) experts or (iv) researchers specialized in biodiversity informatics, environmental/natural science.

One of the key objectives of these series of events will be to ensure the Final event will provide key recommendations and inputs from biodiversity experts for the preparation of the next WP 2016-2017 as specifically asked by the European Commission.

The number of registered attendees has already reached a good level of participation to insure a thorough exchange of information and experience between stakeholders interested in making fundamental biodiversity data digital, open and re-usable. Visit the different activities pages to find out more on the attendance status.

In this context, if you plan to attend and have not yet registered, we can only recommend you do to it as soon as possible here (due to room capacity constraints).

For further information on this event (agenda, concept & objectives, registration) please visit the Event wiki page or contact us at [email protected].

Posted by Stephanie Morales

Despatch from the field: New species discovery, description and data sharing in less than 30 days

27.03.2014 Share:

Researchers and the public can now have immediate access to data underlying discovery of new species of life on Earth, under a new streamlined system linking taxonomic research with open data publication.

The partnership paves the way for unlocking and preserving a wealth of 'small data' backing up research conclusions, which often become lost within a few years of an article's publication in an academic journal.

In the first example of the new collaboration in action, the Biodiversity Data Journal carries a peer-reviewed description of a new species of spider discovered during a field course in Borneo just one month ago. At the same time, the data showing location of the spider's occurrence in nature are automatically harvested by the Global Biodiversity Information Facility (GBIF), and richer data such as images and the species description are exported to the Encyclopedia of Life (EOL).

This contrasts with an average 'shelf life' of twenty-one years between field discovery of a new species and its formal description and naming, according to a recent study in Current Biology.

A group of scientists and students discovered the new species of spider during a field course in Borneo, supervised by Jeremy Miller and Menno Schilthuizen from the Naturalis Biodiversity Center, based in Leiden, the Netherlands. The species was described and submitted online from the field to the Biodiversity Data Journal through a satellite internet connection, along with the underlying data . The manuscript was peer-reviewed and published within two weeks of submission. On the day of publication, GBIF and EOL have harvested and included the data in their respective platforms.

The new workflow established between GBIF, EOL and Pensoft Publishers' Biodiversity Data Journal, with the support of the Swiss NGO Plazi, automatically exports treatment and occurrence data into a Darwin Core Archive, a standard format used by GBIF and other networks to share data from many different sources. This means GBIF can extract these data on the day of the article's publication, making them immediately available to science and the public through its portal and web services, further enriching the biodiversity data already freely accessible through the GBIF network. Similarly, the information and multimedia resources become accessible via EOL's species pages.

One of the main purposes of the partnership is to ensure that such data remain accessible for future use in research. A recent study published in Current Biology found that 80 % of scientific data are lost in less than 10 years following their creation.

Donald Hobern, GBIF's Executive Secretary, commented: "A great volume of extremely important information about the world's species is effectively inaccessible, scattered across thousands of small datasets carefully curated by taxonomic researchers. I find it very exciting that this new workflow will help preserve these 'small data' and make them immediately available for re-use through our networks."

"Re-use of data published on paper or in PDF format is a huge challenge in all branches of science", said Prof. Lyubomir Penev, managing director of Pensoft and founder of the Biodiversity Data Journal. "This problem has been tackled firstly by our partners from Plazi who created a workflow to extract data from legacy literature and submit it to GBIF. The workflow currently launched by GBIF, EOL and the Biodiversity Data Journal radically shortens the way from publication of data to their sharing and re-use and makes the whole process cost efficient", added Prof. Penev.

The elaboration of the workflow from BDJ and Plazi to GBIF through Darwin Core Archive was supported by the EU-funded project EU BON (Building the European Biodiversity Observation Network, grant No 308454). The basic concept has been initially discussed and outlined in the course of the pro-iBiosphere project (Coordination and policy development in preparation for a European Open Biodiversity Knowledge Management System, addressing Acquisition, Curation, Synthesis, Interoperability and Dissemination, grant No 312848).

Original source:

Miller J, Schilthuizen M, Burmester J, van der Graaf L, Merckx V, Jocqué M, Kessler P, Fayle T, Breeschoten T, Broeren R, Bouman R, Chua W, Feijen F, Fermont T, Groen K, Groen M, Kil N, de Laat H, Moerland M, Moncoquet C, Panjang E, Philip A, Roca-Eriksen R, Rooduijn B, van Santen M, Swakman V, Evans M, Evans L, Love K, Joscelyne S, Tober A, Wilson H, Ambu L, Goossens B (2014) Dispatch from the field: ecology of micro web-building spiders with description of a new species. Biodiversity Data Journal 2: e1076. DOI: 10.3897/BDJ.2.e1076

Posted by pro-iBiosphere

Outcomes of the pro-iBiosphere Workshop on Sustainable Business Models

26.03.2014 Share:

Charlotte Johns, Kew Royal Botanic Gardens, Email: [email protected]

A workshop dedicated to sustainable business models was held during the 5th pro-iBiosphere project meeting on the 11th and 12th of February 2014, at the Museum für Naturkunde (MfN) in Berlin, Germany. It was attended by consortium members and eight external participants with experience in strategic business and finance.

The workshop was planned to split into 4 sessions. The first session looked to agree the scope of a future "iBiosphere", to decide which products and services will be included as part of the Open Biodiversity Knowledge Management System (OBKMS). This session was followed by a number of talks given by the external participants, who shared their experience on the sustainability of their projects. Session three looked at enabling factors contributing towards the OBKMS, including open access, data and technology and communication. The final session concentrated upon sustainability and governance and how the management of iBiosphere should be structured.

The main outcome of the workshop was agreement on a list of core products and services which the OBKMS will provide, and an agreement on the core functionality. Information gathered through this milestone also helped to create the draft sustainability model for the OBKMS, which highlights gaps in our present knowledge and helps to decide upon future work which needs to be completed. These workshop outcomes, along with suggestions as to how the OBKMS will be governed and a list of challenges and solutions for a number of enabling functions, can be found within D6.4.2 the ‘Draft Sustainability Report’. The information collected through the workshop will also aid future reports including D6.1.2 ‘Report on Costs’, D6.4.3 ‘Summary of model evaluations’ and D6.4.4 ‘Sustainability recommendations’, to be made available here.

We would like to again thank all the participants for the success of the workshop and who contributed valuable information that will help shape our future pro-iBiosphere sustainability deliverables.

Posted by pro-iBiosphere

REGISTER NOW: pro-iBiosphere Final Event in Meise (Brussels) - June 10-12, 2014

18.03.2014 Share:

The pro-iBiosphere Final Event will take place on June 10-12 2014, at the Bouchout Castle – Meise in Belgium (Agentschap Plententuin Meise, also known as Botanic Garden Meise).

The aim of these series of activities is to present the achievements of the project and its sustainability perspectives.

The week agenda comprises:

Tuesday June 10 (PM)
Workshop on Model Evaluation

Wednesday June 11 (all day)
Demonstrations on pro-iBiosphere pilots
Demonstrations on outcomes of pro-iBiosphere Data Enrichment Hackathon
Workshop on Biodiversity Catalogue
Training on WikiMedia
Poster session

Thursday June 12 (all day)
Final Conference
Networking Cocktail

Do not miss this unique opportunity and join us in Meise (Brussels)!.

Registration is free of charge but compulsory due to room capacity constraints. You can register by filling out the online registration form at http://tiny.cc/pib-final-event.

For complementary information on the Final Event (background, registration, logistics), please visit the dedicated wiki page at http://tiny.cc/wiki_pib_final_event or contact us at [email protected].

piB Final Event announcement

Posted by Camille Torrenti

iMarine Catalogue of Applications

17.03.2014 Share:

The iMarine initiative provides a data infrastructure aimed at facilitating open access, the sharing of data, collaborative analysis, processing and mining processing, as well as the dissemination of newly generated knowledge. The iMarine data infrastructure is developed to support decision making in high-level challenges that require policy decisions typical of the ecosystem approach.

iMarine has developed a series of applications which can be clustered in four main thematic domains (the so called Application Bundles, set of services and technologies grouped according to a family of related tasks for achieving a common objective).

More information on the iMarine Catalogue of Applications here.

Posted by Stephanie Morales

In just a couple of weeks : the release of BioVeL Portal !

10.03.2014 Share:

BioVeL announces the upcoming release of the BioVeL Portal. Designed in response to scientists' needs through a continuous cycle of requests and feedback, the portal will be robust and scalable for handling greater workloads.

An important feature of the Portal will be the ability to do "data sweeps"– that is, to initiate multiple runs of the same workflow, each with different input conditions. Other neat points are the organisation of the workflows by categories with "facetted browsing" for easier search, and a complete history of all your own workflow runs. Also through the Portal you can share workflows and results between collaborators. And as always with BioVeL tools, the codebase used for the portal benefits from being used across multiple projects.

Access BioVeL Portal here.

Please do not hesitate to provide your comments to [email protected]

Posted by Stephanie Morales

pro-iBiosphere’s series of workshops held in Berlin

10.02.2014 Share:

The pro-iBiosphere project organied 2 workshops between February 10-12 in Berlin:

February 10: MS12 - Workshop on mark-up of biodiversity literature
February 11: MS12 - Workshop on mark-up of biodiversity literature and MS23 - Workshop on alternative business models
February 12: MS23 - Workshop on alternative business models

The workshops took place at the Museum für Naturkunde Berlin (MFN) located 43 Invalidenstraße in Berlin.

For complementary information on these events (concept, objectives and outcomes), please visit the dedicated project wiki page here or contact us: [email protected].

Posted by Stephanie Morales

Commission launches pilot to open up publicly funded research data

16.12.2013 Share:

Today, 16/12/2013, the European Commission announced the launch of a new Pilot on Open Research Data in Horizon 2020, to ensure that valuable information produced by researchers in many EU-funded projects will be shared freely. Researchers in projects participating in the pilot are asked to make the underlying data needed to validate the results presented in scientific publications and other scientific information available for use by other researchers, innovative industries and citizens. This will lead to better and more efficient science and improved transparency for citizens and society. It will also contribute to economic growth through open innovation. For 2014-2015, topic areas participating in the Open Research Data Pilot will receive funding of around €3 billion.

The Commission recognises that research data is as important as publications. It therefore announced in 2012 that it would experiment with open access to research data (see IP/12/790). The Pilot on Open Research Data in Horizon 2020 does for scientific information what the Open Data Strategy does for public sector information: it aims to improve and maximise access to and re-use of research data generated by projects for the benefit of society and the economy.

The Pilot involves key areas of Horizon 2020:

Future and Emerging Technologies
Research infrastructures – part e-Infrastructures
Leadership in enabling and industrial technologies – Information and Communication Technologies
Societal Challenge: Secure, Clean and Efficient Energy – part Smart cities and communities
Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw materials – with the exception of topics in the area of raw materials
Societal Challenge: Europe in a changing world – inclusive, innovative and reflective Societies
Science with and for Society

Neelie Kroes, Vice-President of the European Commission for the Digital Agenda said "We know that sharing and re-using research data holds huge potential for science, society and the economy. This Pilot is an opportunity to see how different disciplines share data in practice and to understand remaining obstacles."

Commissioner Máire Geoghegan-Quinn said: "This pilot is part of our commitment to openness in Horizon 2020. I look forward to seeing the first results, which will be used to help set the course for the future."

Projects may opt out of the pilot to allow for the protection of intellectual property or personal data; in view of security concerns; or should the main objective of their research be compromised by making data openly accessible.

The Pilot will give the Commission a better understanding of what supporting infrastructure is needed and of the impact of limiting factors such as security, privacy or data protection or other reasons for projects opting out of sharing. It will also contribute insights in how best to create incentives for researchers to manage and share their research data.

The Pilot will be monitored throughout Horizon 2020 with a view to developing future Commission policy and EU research funding programmes.

Posted by Iliyana Kuzmova

The pro-iBiosphere project highlighted at the 6th EU-AU Cooperation Forum on ICT in Addis Ababa

13.12.2013 Share:

The 6th Africa-EU Cooperation Forum on ICT took place on December 2-3 2013 at the African Union Conference Center in Addis Ababa, Ethiopia under the aegis of the European Commission and the African Union Commission.

The event was organised by the EU EuroAfrica-P8 project on the occasion of the 50th anniversary of the African Union among the African ICT week.

During 20 sessions, 300 participants had the opportunity to share knowledge and explore the possibilities for cooperation in the framework of the Joint Africa-EU Strategic Partnership (JAES).

This meeting offered opportunities to the pro-iBiosphere project to promote its activities through the dissemination of project brochures and to network with potential stakeholders from Africa such as representatives from the Ethiopian agriculture portal on the occasion of the session on ICT for Agriculture on December 3.

Posted by Stephanie Morales

Working, not drinking, at the Taverna

10.12.2013 Share:

Data analysis is a complicated and time-consuming process. Like a craftsman, you require a set of tools that source, reformat, merge and analyse data. Using these tools manually in a workflow can take weeks. Then, when you finally get the workflow working, you often need to run it again with a new set of inputs and parameters. What if there were a piece of software that could couple all these tools together and then run it all over again at a click of a button?. This is what Taverna Workbench does. Taverna, changes a time-consuming job with multiple tools into a single machine that does all the work seamlessly.

Taverna Workbench is one of the tools that supports the BioVel project with their stated aim of creating a "virtual e-laboratory that supports research on biodiversity issues". Taverna, by itself, is like a conductor without an orchestra. The power of Taverna is in the flexible coupling together of web-services, scripts and all kinds of processing engines to create workflows. For example, an obvious use case is the coupling together of the webservice from GBIF with a niche model engine and rerunning of the workflow using different projections of future climate change. However, Taverna can be used to simplify the processing of practically any digital data. Many ecologists use R as their primary statistical software. R can be run from within Taverna, but Taverna helps you couple its running to pre- and post-processing so that it can be run more easily.

Taverna is widely used among phylogeneticists and bioinformaticians, but other disciplines are rapidly adopting it. Another, unique and powerful feature of Taverna is that people can share, distribute, and collaborate on their workflows. On the website myExperiment.org scientists post their workflows for others to use, critisise and improve upon. The website works like a social network, enabling users to create groups, "like" favorite workflows and exchange ideas. You could spend the time to program your own links between services, but Taverna lets you do this easily without sacrificing innovation and adaptability.

One of the important features of Taverna is the seamless way it allows users to use webservices. There is a growing list of webservices for biodiversity from organisations such as GBIF, EOL and EU-Brazil OpenBio. One of the big issues with webservices is that they are, almost by definition, invisible to human users. Therefore, how do you find out that they exist?. This is where biodiversitycatalogue.org comes in. It allows scientists to discover webservices, but also describes how they work and their input and output formats. The pro-iBiosphere project has helped to improve the catalogue and will help to set priorities for future development. It now recommends the use of the Biodiversity Catalogue as central service registration facility for a future Open Biodiversity Knowledge Management System.

Taverna is a relatively new system to the Biodiversity community and through the BioVeL project its user-base is growing rapidly. Furthermore, people are finding new uses for it all the time.

On 5th December 2013, BioVel organised a workshop on Taverna at which pro-iBiosphere was represented. One potential use-case that is of interest to pro-iBiosphere is in the automated markup of text. Some aspects of automated markup are common to many texts, such as the identification of scientific names. On the other-hand there are other aspects that are specific to particular texts, such as the identification of treatment boundaries and language specific features. Taverna may be used to link generic services with custom scripts to significantly reduce the time it takes to markup text. Workflows could be created for one particular publication and then tweaked to work for another.

The possibilities of Taverna are almost limitless. It is just the glue, and you decide what you stick together. You might think the Taverna sounds like a quiet place for a drink, whereas, it is really the factory floor of data processing.

By Quentin Groom (National Botanic Garden of Belgium)

Posted by Iliyana Kuzmova

Outcomes of the pro-iBiosphere workshop 4 on Business Models

10.12.2013 Share:

A pro-iBiosphere workshop on evaluation of business models took place on the 10th of October 2013 at the Botanischer Garten und Botanisches Museum (FUB-BGBM) in Berlin, Germany. It was attended by project partners and four external experts. The workshop was split into two sessions, each divided into smaller working groups. In the first session, a prioritised list of the partners' current products and services was drawn up, and the opportunities for, and threats to these were assessed. In the second session, the participants focussed on the services and activities that would comprise a future OBKMS (Open Biodiversity Knowledge Management System) and documented the constraints that might prevent the projected benefits of OBKMS from being realised.

The sessions have been very fruitful in terms of content (more than 20 matrix were made) and all participants (including external participants) have been very active during the whole day. Having external participants represented a real asset as they helped shaping the project vision more precisely while also demonstrating and confirming their interest in the OBKMS. Partners found these exercises very productive while taking time to step back and envision the future of the Consortium all together. This workshop is not the end of the exercise but only a milestone to agree on the various concepts, methodology and tools to be used to envision project sustainability and allow discussions among the partners. All in all, workshop objectives have been achieved.

The next steps of this workshop will be the release of an event report detailing the event outputs in presenting the project exploitation potential. We will keep you updated on the development of these activities.

For complementary information on the workshop (concept & objectives, agenda, participants list and presentations), visit the dedicated wiki page here.

Posted by Stephanie Morales

pro-iBiosphere Meeting 4: Evaluation of the meeting

05.12.2013 Share:

From Oct 8-10th 2013 the 4^th pro-iBiosphere meeting took place at the Botanic Garden and Botanical Museum Berlin-Dahlem. In total, 87 participants from 15 countries attended the 4 workshops held on:

1. 8 Oct Workshop 1 (M4.1): How to improve technical cooperation and interoperability at the e-infrastructure level (FUB-BGBM). For results from the workshop, see here.

2. 8 Oct Workshop 2 (M4.2): How to promote and foster the development & adoption of common mark-up standards & interoperability between schemas (PLAZI)

3. 9 Oct Workshop 3 (M6.2.2): Workshop on user engagement and benefits (RBGK)

4. 10 Oct Workshop 4 (M6.3.2): Towards sustainability towards service: Meeting to evaluate business models currently in use by partners and relevant non-partners (SIGMA)

Networking

A questionnaire was sent to the participants of the meeting. A total of 87 persons answered the questionnaire. The meeting received an overall positive feedback. The agendas of the workshop and the possibility to network were the strongest attractions for attending. Half of the delegates were able to establish or strengthen 5-10 contacts during the event (see Figure 1). Delegates appreciated the discussions and welcoming atmosphere. Despite that, they mentioned that there could have been more time allocated for each workshop and each break for discussions.

Workshops

Participants expressed their preference to work in small groups with well-defined targets. The need for presentations was very low, provided that the workshops are well focused and give ample time for discussion.

30% of the participants judged the quality of the workshop as high, 31% as very good, 13% as acceptable and 1% as below expectations (see Figure 2). 50% of the participants are interested in attending other pro-iBiosphere events in the future and 64 out of 87 persons would recommend them to colleagues.

Caption: Contacts established by participants during the pro-iBiosphere meeting (first column; clockwise: 38% of the participants were able to make between 5 and 10 contacts, second column: 14% of the participants were able to establish less than 5 contacts, third column: 4% of the participants were not able to establish any contacts, fourth column: 9% of the participants were able to establish more than 10 contacts).

Posted by Iliyana Kuzmova

Improving the technical cooperation and interoperability at the e-infrastructure level

05.12.2013 Share:

A pro-iBiosphere workshop on "How to improve technical cooperation and interoperability at the e-infrastructure level" was held at the Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM) on October 8 2013. A total of 22 participants were invited to attend the workshop, representing a wide range of international biodiversity-related institutions and e-infrastructures. The workshop focused on the establishment of two highly relevant interoperability aspects of: (i) a consistent space of stable identifiers for collection objects across European taxonomic institutions; and (ii) a central registry for biodiversity-related services.

In the workshop 8 different implementations of stable http-URI-based identifier systems in European- and US-based taxonomic institutions where positively evaluated. These implementations are an important outcome of the fruitful collaboration between pro-iBiosphere and the Information Science and Technology Commission (ISTC) of the Consortium of European Taxonomic Facilities (CETAF).

In addition, the workshop conducted a thorough analysis of the BiodiversityCatalogue (https://www.biodiversitycatalogue.org/) developed by the University of Manchester in the context of the EU 7^th Framework project BioVeL. As a result, a detailed list of recommended improvements of the Catalogue was compiled and agreed on. The University of Manchester will use these recommendations for setting priorities when further developing the Catalogue. Detailed results from the workshop are available here.

Authors: Anton Güntsch, Sabrina Eckert (FUB-BGBM)

Posted by Iliyana Kuzmova

pro-iBiosphere project highlighted at the ICT2013 event

03.12.2013 Share:

The most visible forum for ICT research and innovation in Europe "ICT2013: Create, Connect and Grow", took place on the 6-8th of November 2013 in Vilnius (Lithuania). The event consisted of c.250 sessions and 200 exhibitors; and brought together lead thinkers and people driving European ICT research and innovation. A total of 6.000 persons participated in the event, including researchers, innovators, entrepreneurs, industry representatives, and politicians.

ICT 2013 allowed participants to share best practices and experiences in big data management, and provided them an excellent opportunity to learn about the current state of ICT research in Europe and the new Horizon2020 Framework programme for Research and Innovation.

The pro-iBiosphere project was strongly represented during the event by means of an exhibition booth and a networking session co-organised with other EC-funded projects (i.e. ei4Africa, Chain-reds, e-Science Talk). The exhibition booth entitled ‘e-Infrastructures at work and the future of research' showcased information from these four projects. Potential contacts were made with 20 stakeholders comprising projects on biodiversity data, EC-funded projects managing big data infrastructures (i.e. platforms, storage); and engineers specialised in semantic integration, enhancement, oncology, and autonomics.

The networking session on ‘What does the future hold for e-Science and Big Data?' brought together researchers, data owners and service providers (including SMEs) to explore the future for e-science and how to deliver open access to data through Horizon2020. During this session, the pro-iBiosphere project (represented by Plazi) presented its vision and potential impact to the biodiversity community and beyond. The networking session led to better understanding of how e-infrastructures can solve scientific challenges. Additional information is available here.

Panel participants during the networking session ‘What does the future hold for e-Science and Big Data’

Posted by Camille Torrenti

Vibrant workshop

27.11.2013 Share:

for users of the EDIT Platform for Cybertaxonomy

Vibrant workshop for users of the EDIT Platform for Cybertaxonomy

The Vibrant workshop for users of the EDIT Platform for Cybertaxonomy was held from 11-13 November at the Botanic Garden and Botanical Museum Berlin-Dahlem (FUB-BGBM). The aim of the workshop was to explain the new Taxonomic Editor, the Data Portals and other components of the Platform to users, as well as to give an introduction to the software for structured taxonomic descriptions and keys, Xper2. The meeting brought together 40 participants representing a wide range of international biodiversity-related institutions and e-infrastructures (Euro+Med Plantbase, e-Floras, German Red Data Book editors, Pensoft publishers, Chinese Virtual Herbaria, Atlas Florae Europaeae and more).

The workshop was split into two parts, 1.5 days for the EDITor and the data portals, 0.5 days for Xper2. For the EDITor and data portal workshop, 3 parallel working groups (1 in German and 2 in English language) were led by two developers and one or two taxonomists from the FUB-BGBM. One group was solely formed by Euro+Med Plantbase family editors and taxonomic experts. For Xper2 two parallel groups were led by four colleagues from Paris. To explain the new Taxonomic EDITor two virtual box images where created (the images had the software - data Portal, Taxonomic Editor and cdmserver preinstalled). One image included a simple dataset with 10 taxa and factual data from the Compositae tribe Cichorieae. The other image included a more complex dataset from Euro+MedPlantBase, to give the editors the opportunity to do the hands-on training with the data they are handling in the framework of this project.

All three groups were able to get extensive hands-on experience with the Taxonomic EDITor and to observe the direct interaction of the EDITor with the data portals. Feedback from the users was generally positive. Some of them expressed the need of a "light" version of the EDITor which makes it easier to use for less experienced users. Constructive criticism from the participants will help to improve functionalities of the EDITor, and a part of them will use the EDIT platform in the future for their projects. The introduction and hands-on training on Xper2 was also well received by the audience and yielded some fruitful feedback for the presenters of that part of the workshop.

http://cybertaxonomy.eu/

Logo

Posted by Sabrina Eckert

NEWSLETTERS

This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 312848