Maximum-Likelihood Model Averaging To Profile Clustering of Site Types across Discrete Linear Sequences

August 10, 2009

Please describe the reason for abuse:

Peer-Reviewed Paper, View Original
A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or... » More
PLoS Comput Biol. 2009 Jun 26; 5(6):e1000421
Zhang Zhang, Jeffrey P. Townsend

Loading comments
  1. Stephens JC (1985) Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion.. Mol Biol Evol 2: 539-556.
  2. Nekrutenko A, Li WH (2000) Assessment of compositional heterogeneity within and between eukaryotic genomes.. Genome Res 10: 1986-1995.
  3. Nachman MW (2001) Single nucleotide polymorphisms and recombination rate in humans.. Trends Genet 17: 481-485.
  4. Wolfe KH, Sharp PM, Li WH (1989) Mutation rates differ among regions of the mammalian genome.. Nature 337: 283-285.
  5. Huelsenbeck JP, Nielsen R (1999) Variation in the pattern of nucleotide substitution across sites.. J Mol Evol 48: 86-93.
  6. Nei M (1987) . Molecular Evolutionary GeneticsColumbia University Press. New York, USA.
  7. Nielsen R (2005) Molecular signatures of natural selection.. Annu Rev Genet 39: 197-218.
  8. Yang ZH (1996) Among-site rate variation and its impact on phylogenetic analyses.. Trends Ecol Evol 11: 367-372.
  9. Attimonelli M, Lanave C, Sbisa E, Preparata G, Saccone C (1985) Multisequence comparisons in protein coding genes. Search for functional constraints.. Cell Biophys 7: 239-250.
  10. Reeves JH (1992) Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA.. J Mol Evol 35: 17-31.
  11. Zheng Y, Roberts RJ, Kasif S (2004) Segmentally variable genes: a new perspective on adaptation.. PLoS Biol 2: e81doi:10.1371/journal.pbio.0020081.
  12. Marin I, Fares MA, Gonzalez-Candelas F, Barrio E, Moya A (2001) Detecting changes in the functional constraints of paralogous genes.. J Mol Evol 52: 17-28.
  13. Andres AM, de Hemptinne C, Bertranpetit J (2007) Heterogeneous rate of protein evolution in serotonin genes.. Mol Biol Evol 24: 2707-2715.
  14. Gaut BS, Weir BS (1994) Detecting substitution-rate heterogeneity among regions of a nucleotide sequence.. Mol Biol Evol 11: 620-629.
  15. Hartmann M, Golding GB (1998) Searching for substitution rate heterogeneity.. Mol Phylogenet Evol 9: 64-71.
  16. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review.. ACM Computing Surveys 31: 264-323.
  17. Berkhin P (2006) A Survey of Clustering Data Mining Techniques.. Grouping Multidimensional Data: Recent Advances in Clustering: 25-71Springer-Verlag Berlin Heidelberg. Berlin, Heidelberg.
  18. Mrazek J, Karlin S (1998) Strand compositional asymmetry in bacterial and large viral genomes.. Proc Natl Acad Sci U S A 95: 3720-3725.
  19. Ponger L, Mouchiroud D (2002) CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences.. Bioinformatics 18: 631-633.
  20. Zharkikh AA, Rzhetsky A (1993) Quick assessment of similarity of two sequences by comparison of their L-tuple frequencies.. Biosystems 30: 93-111.
  21. Liang H, Zhou W, Landweber LF (2006) SWAKK: a web server for detecting positive selection in proteins using a sliding window substitution rate analysis.. Nucleic Acids Res 34: W382-W384.
  22. Proutski V, Holmes E (1998) SWAN: sliding window analysis of nucleotide sequence variability.. Bioinformatics 14: 467-468.
  23. Fares MA, Elena SF, Ortiz J, Moya A, Barrio E (2002) A sliding window-based method to detect selective constraints in protein-coding genes and its application to RNA viruses.. J Mol Evol 55: 509-521.
  24. Pesole G, Attimonelli M, Preparata G, Saccone C (1992) A statistical method for detecting regions with different evolutionary dynamics in multialigned sequences.. Mol Phylogenet Evol 1: 91-96.
  25. Schmid K, Yang Z (2008) The trouble with sliding windows and the selective pressure in BRCA1.. PLoS ONE 3: e3746doi:10.1371/journal.pone.0003746.
  26. Karlin S, Brendel V (1992) Chance and statistical significance in protein and DNA sequence analysis.. Science 257: 39-49.
  27. Karlin S, Ladunga I, Blaisdell BE (1994) Heterogeneity of genomes: measures and values.. Proc Natl Acad Sci U S A 91: 12837-12841.
  28. Karlin S (1998) Global dinucleotide signatures and analysis of genomic heterogeneity.. Curr Opin Microbiol 1: 598-610.
  29. Goss PJ, Lewontin RC (1996) Detecting heterogeneity of substitution along DNA and protein sequences.. Genetics 143: 589-602.
  30. Tang H, Lewontin RC (1999) Locating regions of differential variability in DNA and protein sequences.. Genetics 153: 485-495.
  31. Peng X, Karuturi RK, Miller LD, Lin K, Jia Y (2005) Identification of cell cycle-regulated genes in fission yeast.. Mol Biol Cell 16: 1026-1042.
  32. Schaeffer SW, Walthour CS, Toleno DM, Olek AT, Miller EL (2001) Protein variation in Adh and Adh-related in Drosophila pseudoobscura. Linkage disequilibrium between single nucleotide polymorphisms and protein alleles.. Genetics 159: 673-687.
  33. Zheng Y, Roberts RJ, Kasif S (2004) Identification of genes with fast-evolving regions in microbial genomes.. Nucleic Acids Res 32: 6347-6357.
  34. Dermitzakis ET, Clark AG (2001) Differential selection after duplication in mammalian developmental genes.. Mol Biol Evol 18: 557-562.
  35. Schmid KJ, Nigro L, Aquadro CF, Tautz D (1999) Large number of replacement polymorphisms in rapidly evolving genes of Drosophila. Implications for genome-wide surveys of DNA polymorphism.. Genetics 153: 1717-1729.
  36. Levin MS (2007) Towards hierarchical clustering.. Computer Science - Theory and Applications: 205-215Springer Berlin/Heidelberg. Heidelberg.
  37. Castro RM, Coates MJ, Nowak RD (2004) Likelihood based hierarchical clustering.. IEEE Trans Signal Process 52: 2308-2321.
  38. Sullivan J, Joyce P (2005) Model selection in phylogenetics.. Annu Rev Ecol Evol Syst 36: 445-466.
  39. Akaike H (1974) New look at statistical-model identification.. IEEE Trans Automat Contr Ac19: 716-723.
  40. Hurvich CM, Tsai CL (1989) Regression and time-series model selection in small samples.. Biometrika 76: 297-307.
  41. Schwarz G (1978) Estimating dimension of a model.. Ann Stat 6: 461-464.
  42. Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models.. J Am Stat Assoc 92: 179-191.
  43. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.. Syst Biol 53: 793-808.
  44. Johnson JB, Omland KS (2004) Model selection in ecology and evolution.. Trends Ecol Evol 19: 101-108.
  45. Zhang Z, Li J, Zhao XQ, Wang J, Wong GK (2006) KaKs_Calculator: calculating Ka and Ks through model selection and model averaging.. Genomics Proteomics Bioinformatics 4: 259-263.
  46. Kullback S, Leibler RA (1951) On information and sufficiency.. Ann Math Stat 22: 79-86.
  47. Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools.. Nucleic Acids Res 36: D588-D593.
  48. Benach J, Winberg JO, Svendsen JS, Atrian S, Gonzalez-Duarte R (2005) Drosophila alcohol dehydrogenase: acetate-enzyme interactions and novel insights into the effects of electrostatics on catalysis.. J Mol Biol 345: 579-598.
  49. Chen Z, Jiang JC, Lin ZG, Lee WR, Baker ME (1993) Site-specific mutagenesis of Drosophila alcohol dehydrogenase: evidence for involvement of tyrosine-152 and lysine-156 in catalysis.. Biochemistry 32: 3342-3346.
  50. Cols N, Marfany G, Atrian S, Gonzalez-Duarte R (1993) Effect of site-directed mutagenesis on conserved positions of Drosophila alcohol dehydrogenase.. FEBS Lett 319: 90-94.
  51. Persson B, Krook M, Jornvall H (1991) Characteristics of short-chain alcohol dehydrogenases and related enzymes.. Eur J Biochem 200: 537-543.
  52. Albalat R, Gonzalez D, Atrian S (1992) Protein engineering of Drosophila alcohol dehydrogenase. The hydroxyl group of Tyr152 is involved in the active site of the enzyme.. FEBS Lett 308: 235-239.
  53. Cols N, Atrian S, Benach J, Ladenstein R, Gonzalez-Duarte R (1997) Drosophila alcohol dehydrogenase: evaluation of Ser139 site-directed mutants.. FEBS Lett 413: 191-193.
  54. Benyajati C, Place AR, Powers DA, Sofer W (1981) Alcohol dehydrogenase gene of Drosophila melanogaster: relationship of intervening sequences to functional domains in the protein.. Proc Natl Acad Sci U S A 78: 2717-2721.
  55. Bodmer M, Ashburner M (1984) Conservation and change in the DNA sequences coding for alcohol dehydrogenase in sibling species of Drosophila.. Nature 309: 425-430.
  56. Gillespie JH (1986) Variability of evolutionary rates of DNA.. Genetics 113: 1077-1091.
  57. Gu X, Fu YX, Li WH (1995) Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites.. Mol Biol Evol 12: 546-557.
  58. Arndt PF, Hwa T, Petrov DA (2005) Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects.. J Mol Evol 60: 748-763.
  59. Takano TS (1998) Rate variation of DNA sequence evolution in the Drosophila lineages.. Genetics 149: 959-970.
  60. Wagner A (2007) Rapid detection of positive selection in genes and genomes through variation clusters.. Genetics 176: 2451-2463.
  61. Yu J, Thorne JL (2006) Testing for spatial clustering of amino acid replacements within protein tertiary structure.. J Mol Evol 62: 682-692.
  62. Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL (2007) Quantifying the impact of protein tertiary structure on molecular evolution.. Mol Biol Evol 24: 1769-1782.
  63. Vawter L, Brown WM (1993) Rates and patterns of base change in the small subunit ribosomal RNA gene.. Genetics 134: 597-608.
  64. Foster PG (2004) Modeling compositional heterogeneity.. Syst Biol 53: 485-495.
  65. Gao F, Zhang CT (2006) GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences.. Nucleic Acids Res 34: W686-W691.
  66. Carulli JP, Krane DE, Hartl DL, Ochman H (1993) Compositional heterogeneity and patterns of molecular evolution in the Drosophila genome.. Genetics 134: 837-845.
  67. Pond SK, Muse SV (2005) Site-to-site variation of synonymous substitution rates.. Mol Biol Evol 22: 2375-2385.
  68. Yang Z, Swanson WJ (2002) Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes.. Mol Biol Evol 19: 49-57.
  69. Bao L, Gu H, Dunn KA, Bielawski JP (2008) Likelihood-based clustering (LiBaC) for codon models, a method for grouping sites according to similarities in the underlying process of evolution.. Mol Biol Evol 25: 1995-2007.
  70. Yang Z, Nielsen R, Goldman N, Pedersen AM (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites.. Genetics 155: 431-449.
  71. Bird CP, Stranger BE, Liu M, Thomas DJ, Ingle CE (2007) Fast-evolving noncoding sequences in the human genome.. Genome Biol 8: R118.
  72. Stajich JE, Dietrich FS, Roy SW (2007) Comparative genomic analysis of fungal genomes reveals intron-rich ancestors.. Genome Biol 8: R223.
Zhang, Townsend. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Copyright 2015 © Zhang Zhang, Jeffrey P. Townsend. This pubcast is licensed under the terms of the Creative Commons Attribution License 3.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.