From DNA barcoding databases to complex systems

V. Robert1, D. Vu1, S. Szoke1, L. Irinyi1 , W. Meyer2, G. Cardinali3

Author address: 

1 Westerdijk Institute, UTRECHT, Netherlands 2 Sydney University, SYNDEY, Australia 3 University of Perugia, PERUGIA, Italy

Abstract: 

DNA barcoding databases have been created by a number of institutions as a response to the lack of curation of major repositories such as GenBank and EMBL. The main and generalist Barcoding of Life database is currently managed by the University of Guelph (Canada; http://www.boldsystems.org). While the latter improved over the years, it remains focused on biodiversity data of Arthropods mainly, since 81% of the specimens belong to that clade while only 0.7% are fungal specimens. Databases like UNITE (https://unite.ut.ee/), MycoBank (http://www.mycobank.org), FungiBank (http://fungibank.pasteur.fr), ISHAM ITS/EF1a (http://its.mycologylab.org) or the CBS collection (http://www.westerdijkinstitute.nl) are focusing on fungi and contain a much larger diversity with a higher degree of curation. If curation and quality of barcoding data are crucial points, many others are needed. DNA barcoding per se is non-informative, neither at the taxonomic nor at the overall biological levels. It mainly allows for efficient clustering of organisms. This calls for a change of paradigm that requires many more data and data types, instruments able to cope with all of them and automated curation/filtering prior to the human based curation. Databases of the future will include all possible data associated with the basic units (usually, strains in our case). Ecological, geographical, physiological, molecular, and many other (descriptive) data, all in their broad sense, should be included using the most accurate and efficient formats. This would allow humans and, even more important, machines to analyze these data and their correlations or relations. This would mean that efficient analytical tools (for identification, clustering, data mining, correlation tools, etc) would need to be implemented around these database clusters. At the (meta-)genomic and big data age, speed and efficiency must also be at the core of the systems as the amount and complexity of the data will rise dramatically. A high degree of Artificial Intelligence and automation must be injected in such systems. To achieve such advanced systems, international and multi-disciplinary cooperation are needed. We are going to present the problems associated with and possible solutions to these exciting challenges with the hope to possibly initiate research and collaborative projects.

2018

abstract No: 

S8.4b

Full conference title: 

20th Congress of the International Society for Human and Animal Mycology, Amsterdam, the Netherlands
    • ISHAM 20th (2018)