DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases. / Renner, Susanne S.; Scherz, Mark D.; Schoch, Conrad L.; Gottschling, Marc; Vences, Miguel.

I: Systematic Biology, 2024.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Renner, SS, Scherz, MD, Schoch, CL, Gottschling, M & Vences, M 2024, 'DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases', Systematic Biology. https://doi.org/10.1093/sysbio/syad068

APA

Renner, S. S., Scherz, M. D., Schoch, C. L., Gottschling, M., & Vences, M. (2024). DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases. Systematic Biology. https://doi.org/10.1093/sysbio/syad068

Vancouver

Renner SS, Scherz MD, Schoch CL, Gottschling M, Vences M. DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases. Systematic Biology. 2024. https://doi.org/10.1093/sysbio/syad068

Author

Renner, Susanne S. ; Scherz, Mark D. ; Schoch, Conrad L. ; Gottschling, Marc ; Vences, Miguel. / DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases. I: Systematic Biology. 2024.

Bibtex

@article{bc7b9da2e4034270a7f9d474622a3b85,

title = "DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases",

abstract = "Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 1,000 species of arthropods, 8,441 vertebrates, and 430 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.",

author = "Renner, {Susanne S.} and Scherz, {Mark D.} and Schoch, {Conrad L.} and Marc Gottschling and Miguel Vences",

note = "Published by Oxford University Press on behalf of the Society of Systematic Biologists 2023. This work is written by (a) US Government employee(s) and is in the public domain in the US.",

year = "2024",

doi = "10.1093/sysbio/syad068",

language = "English",

journal = "Systematic Biology",

issn = "1063-5157",

publisher = "Oxford University Press",

}

RIS

TY - JOUR

T1 - DNA sequences from type specimens and type strains - how to increase their number and improve their annotation in NCBI GenBank and related databases

AU - Renner, Susanne S.

AU - Scherz, Mark D.

AU - Schoch, Conrad L.

AU - Gottschling, Marc

AU - Vences, Miguel

N1 - Published by Oxford University Press on behalf of the Society of Systematic Biologists 2023. This work is written by (a) US Government employee(s) and is in the public domain in the US.

PY - 2024

Y1 - 2024

N2 - Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 1,000 species of arthropods, 8,441 vertebrates, and 430 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

AB - Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 1,000 species of arthropods, 8,441 vertebrates, and 430 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

U2 - 10.1093/sysbio/syad068

DO - 10.1093/sysbio/syad068

M3 - Journal article

C2 - 37956405

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

ER -

ID: 383619005

Statens Naturhistoriske Museum