Metagenomics allows for a new approach to studying biogeochemical processes in an environment, by giving an approximation of a community’s biogeochemical potential through the genes involved in a given process. Lipid biomarkers are well suited to this kind of approach because they are important for a number of biogeochemical questions and some have well studied biosynthetic pathways. One important class of lipid biomarkers, triterpenoids, were investigated in a new metagenomic sample that was collected at station SJ0609.03 in the western tropical Atlantic Ocean within the offshore Amazon River plume (12o15.43’N, 56o8.74’W). We identified proteins involved in triterpenoid biosynthesis that have bacterial and eukaryotic homologs in this metagenome. Though this metagenome sample was 5um prefiltered and then onto a 0.2 um filter for sequencing, the database contains some eukaryotic sequences, along with the target prokaryotic sequences. In this context, genes must be identified with a high degree of confidence, and BLAST searches alone are not rigorous enough to determine the affinity of short fragments. We test a method for identifying genes for lipid biomarker biosynthesis, using TBLASTN searches coupled to a phylogenetic method for making maximum likelihood trees for distantly related proteins by fitting fragments into a reference tree. By examining the relative likelihood of ML trees with the fragment attached at all points in the reference tree, we gain a measure of confidence in the phylogenetic placement of fragment sequences.
« Hide