This week, it’s been hard to miss the new paper, “How many species are there on Earth and in the Ocean?” published by Mora et al. in the August 2011 issue of the journal PLOS Biology. There have been commentaries or news articles printed in the New York Times, The Economist, The Guardian, Damian Carrington’s Guardian Blog, National Geographic, Yahoo News, AlterNet, MSNBC, Reuters, UNEP, NewsDaily and Ed Yong has posted a commentary on his Google+ page. Furthermore, some well respected scientists who study biological diversity have joined the debate too: Jonathan Eisen has devoted two blog posts to the paper (one about the actual paper in PLOS Biology and another on the National Geographic commentary) and there is a commentary from Robert May in PLOS Biology about the study and its significance. Since there is ample information on the study elsewhere, let me communicate a brief summary of the study and some of my feelings about the paper.
It’s quite embarrassing that we have really no clue how much biological diversity is found on this planet. Adding insult to injury is the fact that we have no concept of the current magnitude of the loss of diversity due to human induced mass extinctions. This paper seeks to predict total global biological diversity by documenting current taxonomic numbers and extrapolating consistent patterns to estimate the number of species that have yet to be identified.
The methods of the authors essentially consisted of three parts. First, the authors compiled a list of approximately 1.2 million species pulled from numerous biological databases. Second, they surveyed a little over 500 taxonomists who were asked to identify the validity of current scientific names and comment on the intensity of current taxonomic efforts to describe new species. Third, the authors analyzed this data to find the estimated global numbers of biological taxa for each phylum.
The authors show a predictable pattern in the classification of species (at the phylum, class, order, family, and genus level) at least consistently for animals. By evaluating these patterns using regression, the authors validated this by closely examining 18 taxonomic groups that we think we understand their total biological diversity. By doing this, the authors come up with a total estimate of 7.7 million species of animals (mostly insects), close to 300,000 species of plants, more than 600,000 species of fungi, and a total estimate of roughly 9 million eukaryotes on Earth. The authors estimate that 86% of species on Earth and 91% of species in the oceans still have not been formally described. Previous estimates of species diversity have been wide: anywhere between 3 million to a 100 million species.
This paper is a novel and worthwhile attempt to determine the total amount of species diversity on this planet. Despite this, I think – and the authors have their own reservations – that there are some serious problems with some of their calculations.
One problem is that the study is based mainly on using animals, and vertebrates for that matter – which are the best described of any phylum, as the baseline for measuring the completeness of species diversity. I would argue that plants and fungi, and obviously bacteria, archaea, and “the protists” are clearly not well known enough to extrapolate any serious estimate species numbers especially when considering vertebrate animals as a baseline and whose numbers are largely skewed.
Another problem is in our collective definition of species, as well as taxonomic subjectivity of the categorization of other taxonomic hierarchies, which are based on the on the homology of shared characters and, I would argue, are largely incomparable outside of each phylum. For example, what one taxonomist calls an order in one grouping may not be equivalent to what another taxonomist calls a completely different order in another completely different grouping.
I should point out that the authors don’t ignore these caveats, but they still exist in their study. In any event, this paper is important because it adds to the dialogue concerning species diversity, the need to estimate, inventory and preserve the massive amount of diversity we share on the planet.
Published in the June 2011 issue of the journal Nature Biotechnology was a paper reporting on the genome sequence of the data palm, Phoenix dactylifera. This paper, authored by Al-Dous et al., addressed the genome sequencing and de novo assembly of this agriculturally important monocot tree, along with comparative genomics with other plants.
Dates have been found in the tombs of pharaohs estimated at 8,000 years old. Fields of agriculturally planted trees, estimated to be older than 5,000 years, suggest the date palm is one of the oldest cultivated plants in the world. Dates are the most important agricultural crop in the hot and arid regions surrounding the Arabian Gulf and their global production is close to 7 million tons yearly.
Despite a prolonged emphasis on their agriculture, there are a few problems to deal with if you are a date grower. Typical of tree crops, there is a long generation time from seedling to fruit harvesting. Additionally, only the female date palm provides fruit and it takes at least 5 years after seed germination to tell if you have a male or female plant. To make it even harder for a date grower, there are more than 2000 date varieties, each exhibiting its own color, flavor, size, shape, and ripening schedule, and they are all really hard to keep track of based on conventional techniques.
In an effort to provide genetic resources for date growers and breeders, the authors of this study – who were mainly located in Qutar – sequenced and assembled 380 Mb of the estimated 658 Mb genome of the Khalas cultivar, which is known for high fruit quality. Generated using short reads from the Illumina Genome Analyzer IIx platform, this partial sequence excluded numerous large repeated regions, includes a predicted 28,890 genes, and represented 18 pairs of chromosomes. The authors estimate that this draft genome represents roughly 90% of the total genes and 60% of the total genome.
This genome resource also serves a comparative genomics purpose by being the first member of the widespread monocot order Arecales. To this date, the only Monocots with sequenced genomes – for example: Corn, Rice, and Sorghum – have all been in the grass order, the Poales.
This report is missing some vital information: in addition to an incomplete genome assembly, there is no metabolic, developmental, or gene network pathway reconstruction for the date palm provided in this paper (and unfortunately this paper also includes some glaring typos in the citation section). In place of these expected analyses, the authors conducted a throughout survey of SNPs in this Khalas cultivar, along with eight additional cultivars common in breeding programs for the date palm. Within these nine cultivars, 3,518,029 SNPs were determined, but quite interestingly, a total of 32 SNPs could be used to differentiate the cultivars.
In addition to the throughout SNP analysis, the researchers then did a full parentage analysis of the cultivars used in this study, which includes the famous date varieties such as Deglet Noor, Dayri, and Medjool. Here‘s an article in Nature Middle East on the importance of understanding this parentage and gender analysis.
Although this is a draft genome still being completed and undergoing resequencing, namely the tools provided by the authors, the SNP and parentage analysis, should provide date palm breeders with many resources for improved fruit quality and this genome represents an exciting piece of the monocot evolutionary puzzle.
Horizontal Gene Transfer (HGT) goes against what we typically consider the normal transfer of genetic material from parent to offspring. HGT involves the transfer of genetic material from one organism to another. Within the bacteria, whose mode of survival typically depends on phagocytosis, there is a fairly amount of HGT. Events of HGT have been rarely observed in Eukaryotes because numerous barriers exist to prevent foreign nucleotides from entering a cell’s nucleus. Some of these barriers in the Fungi include a substantial cell wall made of chitin, multiple cell and nuclear membranes to cross, and the secretion of metabolic enzymes to the outside of the cells and subsequent uptake of the nutrients. Despite these barriers, there is now evidence of multiple occurrences of HGT in the fungi.
In a recent article published in the journal Current Biology, Jason Slot and Antonis Rokas, both of Vanderbilt University, provided evidence of HGT in two Ascomycete clades. In this study, the authors identified a 23-gene cluster from the genus Aspergillus which relocated to the genus Podospora. Genes that are in this cluster synthesize the toxic compound, Sterigmatocystin, which is a precursor to aflatoxins, noted for their production in Aspergillus. Both genera are located in the subphylum Pezizomycotina, so each clade is not distantly related, but HGT was observed using different methods.
While it’s easy to observe genetic material passed from generation to generation, recognizing HGT is a little more difficult. The main way the researchers have identified HGT is using phylogenetic methods to identify gene clusters whose homology cannot be explained by lineage alone.
Thomas Richards points out in his commentary on the Slot & Rokas paper (also in Current Biology), that because fungi do not phagotrophically consume their food they are less likely to incur HGT event. There are two notable hypotheses to why we do see HGT in the fungi. First, many secondary pathway genes in Eukaryotes are encoded in gene clusters, and the fungi have a fair amount of these clusters. Gene clusters, which are more functional in a natural selection sense, are therefore more likely to persist upon transmission, as opposed to individual genes. Data from HGT studies in fungi support this hypothesis. Second, fungi are naturally, from the basis of their biology and natural history, intimately tied to other organisms, and fulfill roles as saprobes, pathogens, or symbionts. This close intimacy increases the opportunity for genes to transfer from one organism to another. Data suggests that this hypothesis is true also, as many of the recorded instances of HGT in fungi have been observed in organisms with overlapping environments.
Registration for the “Genomic Impact of Eukaryotic Transposable Elements” meeting is now open. The meeting will be held February 24th-28th 2012, at the Asilomar Conference Center, in Pacific Grove, California, USA. The conference will consist of invited speakers, general sessions from submitted abstracts, and workshop sessions devoted to computational analysis of transposable elements.
The organizers of the “Second International Conference on the Progress of the “1000 Plant & Animal Reference Genomes Project” have again announced a call for abstracts for the meeting, which will be held from the 10th to 12th of July in Shenzhen, China. I’ve noticed a large increase in the number of meetings in China (see here) and this meeting is also sponsored by the Beijing Genomics Institute (BGI).
As you can gather from the name, the “100 Plant and Animal Reference Genome Project” seeks to provide a total of 1000 plant and animal genomes for the use of researchers (For more information on the “1000 Plant and Animal Reference Genomes Project” see here).
This meeting seeks to increase the number of collaborators, particularly from a global perspective, to this project. To register for this meeting see here, and stay connected to this meeting and the BGI by following them on Twitter (@BGI-Events). You can even enter yourself in a drawing to win a gift (a soft-drink soda!) when you provide proof you have re-tweeted meeting notices from the BGI. The meeting with have two sessions: one on the progress and prospects of the 1000 plant and animal reference genome project and another on new developments in sequencing and bioinformatics technology. There will be five workshops: crop genomics and breeding, aquaculture genomics, vegetable and flower genomics, forest and fruit tree genomics, and rare animal genomics (I’m not really sure what “rare” means in this sense).
Ecological genomics is thriving as a discipline, evidenced by the number of research papers published in this area, and this is due to the large amounts of genomic data now available to researchers. Information from individual genomes, “pan-genomes”, and large scale environmental genome sequencing is giving us a more complete picture of biological diversity.
Some of the top researchers in this newly emerging discipline will be speaking at the Jacques Monod Conference “Integrative Ecological Genomics.” The meeting is held in Roscoff, Brittany, France. Registration is by application (the submission deadline is June 20th 2011) and the number of attendees is capped at 115 people. Information regarding the meeting and registration can be found here and here.