Recent advances in biotechnology and bioinformatics has provided a flood of genomic data and tremendous growth in the number of associated data sets. Sequencing projects now include draft assemblies, complete genomes, comparative genomics, and metagenomics where genetic material is sequenced directly from environmental samples. The NCBI provides integrated systems for data storage, retrieval, and analysis. GenBank, an archival database of DNA sequences, contains consensus sequences assembled from raw sequence reads. The Trace Archive serves as a repository of raw sequence data from a variety of automated sequencing platforms. In addition, NCBI provides reference sequence collections and specialized tools for sequence analysis and visualization. A novel approach recently developed at NCBI allows the visualization of large phylogenetic trees in an aggregated form with a special representation of subscale details. Rapid advances in sequencing technologies have created new challenges for information systems. The new NCBI resource, Short Read Archive (SRA) has been designed specifically to handle sequence data from massively parallel sequencing technologies. Specialized metagenomic resources at NCBI include a collection of environmental projects in Entrez Genome Project database, and specialized BLAST databases including Environmental Samples, Whole Genome Shotgun Reads and Trace Archives. The NCBI Metagenomics e-book links together related NCBI resources including sequence data, publications and analysis tools. The unusual structure of metagenomic data (heterogeneity, fragmentation, redundancy, high error rate, etc.) will require new the development of new analysis tools and visualization techniques. New computational tools for the large-scale analysis of complex metagenomic data (both DNA and predicted proteins) are under development. On going work on 16S rRNA analysis and visualization tools will be presented.
« Hide