ABSTRACT
High-throughput data production in proteomics and other post-genome disciplines has resulted in many data sharing, annotation, and...
» More
ABSTRACT
High-throughput data production in proteomics and other post-genome disciplines has resulted in many data sharing, annotation, and dissemination challenges. Successful systems require as much thought and effort be put into the social aspects of engineering as the software side. Close relationships with data generators and users are crucial as is the recognition that development of a database to support a static field is quite different from research fields where the data types, qualities, and relationships change constantly. What happens after initial development is also crucial. Models for long term maintenance and continued development are challenging to implement and require flexibility and significant institutional investment. The Tranche data repository provides public access to the large, complex, and expensive to generate data sets common in the field of proteomics. This support for data sharing reinforces the peer review process in proteomics and allows reuse of data through centralized databases and the ability to cross-correlate and aggregate these data sets. The Tranche project is a distributed, free to use, open-source system specifically dedicated to alleviating these problems and designed to be a community resource that interfaces readily with existing databases and computational resources. The Tranche and Proteomecommons.org projects have several direct benefits to researchers. Together they provide a secure resource for archiving and annotating large datasets while inherently maintaining data integrity and provenance, they provide a proper citation for data, allow a high degree of compliance with annotation standards, and allow facile access to public datasets.