As the size and complexity of scientific datasets and the corresponding information stores grow, reporting standards are playing an increasingly active role. Interopera bility among standards, however, becomes pivotal for the development of software applications. Here we present the key synergistic standards activities in the omics domain and the motivation for, an overview of, the BioInvestigation Index infrastructure at the EBI. The marriage of conventional methods with (meta)genomics, transcriptomics, proteomics and metabol/nomics technologies (hereafter referred as ‘omics’) has created not only opportunities, but also substantial new informatics challenges. For example, consider the reporting of a complex multi-omics study looking at the effect on a population of worms, investigating the effect of heavy metals by measuring gene and protein expression whole
organism (by mass spectrometry and DNA microarrays, respectively), sequencing the genome (by highthrougput) and conducting a series of conventional environmental analysis. It is pivotal that such datasets are reported in a standard manner to enable communication, interpretation and analysis.
New approaches are required for describing, formatting, submitting and exchanging both data and metadata (i.e., sample characteristics, study design and execution) from such complex studies. Many groups are rising to this challenge to this end, including the Genomics Standards Consortium (GSC) [1]. However, standards for data content (minimal information checklists), semantics (ontologies) and syntax (fi le formats) are being specifi cally developed to target a particular omics technology or a particular biologically-delineated community. Unfortunately, remaining bounded by a particular discipline, standardisation efforts in general remain fragmented and cannot be easily integrated.
This result in unnecessary duplication of effort, and more significantly, the development of (arbitrarily) different standards being developed, thereby limiting the scope for data exchange. The result of such ‘fragmentation’ is also reflected in the implementations. For example, systems such as ArrayExpress and Pride at EBI -built to store microarray-based and proteomics experiments, respectively employ different submission/exchange formats and terminologies as developed by the standardisation initiatives in their domain. In such scenario, description and submission of multi-omics studies will be diffi cult if
not impossible.
Fortunately, several synergistic activities have begun fostering the harmonization and consolidation of the three kinds of standards being developed. Over 20 projects are registered in the ‘Minimum Information about a Biomedical or Biological’ (MIBBI) portal [2,3] set to created orthogonal checklist modules. At present, over 60 groups participate under the OBO Foundry umbrella [4,5] with the objective of developing interoperable ontologies. Several groups participate in the Functional Genomics (FuGE) project [6,7] which underpins the XML-based formats they have developed. Only very recently, another complementary initiative has sprung up from a growing number of communities that work collaboratively on a common tabular framework for presenting the experimental meta- data (ISA-TAB) [8,9]. The reuse of common standards and ontologies will ease the task of software developers, vendors, and equipment manufacturers by reducing time and costs for implementing standards-compliant products. In turn, these will be valuable interoperable resources for the system biology community, simplifying the job of data integration. Members of the GSC are actively involved in OBO, MIBBI and the ISA-TAB efforts.
Undoubtedly, the interoperability of reporting standards will ease the task of those developers working to implement standards-compliant systems for complex multi-om- ic studies, such as the BioInvestigation Index at the EBI [10]. This infrastructure aims to create a common structured representation of the metadata and the sample-data relationship for biological, biomedical and environmental studies employing omics-based technologies along with more conventional methodologies. The BioInvestiga- tion Index infrastructure - along with a fi rst set of publicly available multi-omics datasets- will be lunched in Dec 2008.
1. http://gensc.org
2.http://mibbi.sf.net
3. Taylor, Field, Sansone,...Rocca-Serra,... et al. (2008)
Nat Biotechnol 26(8):889-96.
4. http://www.obofoundry.org
5. Smith, Ashburner, Rosse,…Rocca-Serra, Sansone et
al. (2007). Nat Biotechnol. 25(11):1251-5.
6. http://fuge.sf.org
7. Jones, Miller, Aebersold,...Sansone,...Taylor et al.
(2007). Nat Biotechnol. 25(10):1127-33.
8. http://isa-tab.sf.net
9. Sansone, Rocca-Serra, Brandizi,...Sklyar, Taylor et al.
(2008) OMICS. 2(2):143-9.
10. http://www.ebi.ac.uk/bioinvindex
« Hide