Metagenomics, the process of acquiring the genomes – or pieces of genomes – of all the microorganisms in a single environmental sample and then analyzing their composition, has developed in recent years with the advent of next-generation sequencing techniques. Metagenomic studies are increasing our knowledge about microbial life by providing vast amounts of data on the overall diversity of organisms found in soil, aquatic habitats, the human body, and even what is splattered across car windshields (see here). Unknown organisms found in metagenomic studies correspond to the three domains of life: Bacteria, Archaea, and Eukaryotes, but scientists have wondered if other domains of life exist, but have gone unnoticed.
A paper authored by Wu et al., entitled “Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees“, recently published in the PLoS One journal and from the laboratory of Jonathan Eisen at UC Davis (see here and here), ponders the presence of novel lineages of life by searching for genes with presumed deep origins in the tree of life. By using metagenomic sequences from Craig Venter’s Global Ocean Sampling (GOS) initiative, the authors searched for novel life by probing for genes – those associate with ribosomal RNA – assumed to have early origins in the evolution of life. Image link from a commentary from the Economist.
The researchers began looking for novelty across the small subunit rRNA gene, a common gene for phylogenetics at the level of bacteria and archaea, but were unable to resolve these phylogenies at deep levels due to a lack of robust sequence alignments for novel sequences. The researchers ended up focusing on two rRNA associated genes also with assumed deep origins: RecA, a gene involved in DNA recombination, and RpoB, a gene involved in translating DNA into RNA. Jonathan Eisen has written a very detailed and elucidating blog post of the background of the methodology, in supplement to the methodology found in this paper. The following figure comes from Norm Pace’s excellent 2009 review article on the tree of life and shows how the basal nodes of many lineages remain unresolved.
When constructing phylogenetic trees of the RecA and RpoB sequences, the authors found specific novel branches that could not be easily identified. The authors describe four explanations concerning the characterization of these sequences. One explanation is that these novel clades come from undescribed viruses not previously observed. A second possibility is that the sequences represent recombinations of previously identified genes, which the authors rule out due to phylogenetic uniqueness. A third explanation is the presence of ancient paralogous genes from organisms lacking gene data or information. Lastly, a fourth possibility is that the novel sequences come from yet unknown lineages of organisms and their phylogenetic novelty actually represents novel organisms. The authors stress that this study needs more data and more rigorous research in order to investigate these possibly novel clades, but this study is the first of, hopefully, many to address this interesting research question. If you would like to read more about this research there are numerous commentaries available for your reading pleasure (see here, here, and here).