The
FLUX CAPACITOR: next generation sequencing technologies provide an unprecedented capacity for surveying the nucleic acid content of cells. This profound sequencing depth may allow in particular for exhaustively sequencing through the large dyanimc range of RNA abundances in the cell, overcoming limitations imposed by current (random) clone selection approaches. However, the very short reads produced by the most cost-effective such technologies make the reconstruction of complete RNA molecules very challenging, considering that biological processes as for instance alternative splicing, promoter choice and/or poly-adenylation can generate molecules which share a substantial fraction of their sequence.
Herein, we focus specifically on the data produced by the Illumina/Solexa instrument. We first show that, when using single or paired-end reads, it theoretically cannot be guaranteed to reconstruct de novo the RNA species present in the cell. We also demonstrate, however, that when using paired-end reads, most spliceforms can be fully reconstructed—provided sufficient number of reads. In addition we also shown that , assuming knowledge of all expressed spliceforms – either from reconstruction or preliminary annotations –single reads can be effectively used to reconstruct abundances of the original RNA molecules. To this end we model the transcriptome as a flow network and use the observed read counts as capacity constraints that are subsequently optimized (i.e., filtered from noise) in order to give a robust prediction for the abundance of each RNA species. Using this method, we are able to assign expression levels to entire alternative spliceforms and subsequently to the different variants of AS events in contrast to current methods which measure the expression of genes or exons projected on the genomic scale. For evaluation and benchmarking, we have developed a sophisticated system to simulate the experimental protocol – and the intrinsic biases – involved in the sequencing of RNA by the Solexa instrument.
We have used the method, to investigate changes when knocking down the splicing factor PTB in human HeLa cells. We have generated 78 milion reads, both single and paired-end. Our results are in accordance with prior results obtained using slicing arrays preliminary in more than 80% of the cases. In addition, we also detect changes in alternative splice event which are not interrogated in the microarrays, as well as novel spliceforms, some of them involving very distal loci in the genome.
« Hide