Tag Archives: Next-Generation DNA sequencing

Argonne Soil Metagenomics Meeting, October 2012

argonne meeting header

The Argonne Soil Metagenomics Meeting is in its 4th year and this year’s meeting will be held October 3rd to 5th at the Indian Lakes Resort Conference Center, right outside of Chicago.  The meeting, like past years, will focus on all aspects of soil metagenomics.  There’s a whole lot of great speakers lined up, in fact a whole lot who are addressing fungi in soils. Meeting registration is open.

Seasonal Trends In Bryophyte-Associated Fungal Communities

I recently returned from the Mycological Society of America annual meeting – this year held at Yale University in New Haven.  There were lots of great talks about fungal genomics, systematics, and ecology – and it’s always good to see old mycological friends and make new ones.

Håvard Kauserud of The University of Oslo, who spoke about recent research from his laboratory, gave one of my favorite talks of the meeting.  His talk took place during a very rewarding afternoon session on fungal ecology.  Already highly prolific, there’s been an increase in the flood of papers to come out of the Kauserud lab over the last year.  Just this month, there’s a nice commentary on the phenomenon of metagenomic tag switching during amplicon sequencing published in the journal Fungal Ecology.

Another paper published this month in the journal New Phytologist is the study “Seasonal trends in the biomass and structure of bryophyte-associated fungal communities explored by 454 pyrosequencing”, authored by Davey et al., a group of researchers both members and affiliates of the Kauserud laboratory, and it is this paper I will address here.

Davey et al 2012 header

Bryophytes represent a portion of the dominant vegetation in boreal forests, but very little is understood about the taxonomy, seasonality, or biomass of the fungi associated with them.  Additionally, microbes associated with mosses may be responsible for nitrogen fixation and nutrient immobilization as epiphytes or on forest soils.  A previous study from the Kauserud lab reported high levels of fungal biomass and active plant cell wall degrading enzymes identified from moss-associated fungi.

Figure One

As I have mentioned here numerous times, fungi are notoriously hard to identify by cultural and morphological means and are extremely diverse.  To understand this diversity, the authors performed 454 pyrosequencing of the ITS2 region of the ribosomal DNA operon for molecular taxonomic identification against a database of known fungal sequences.  This sequencing was done in concert with an ergosterol HPLC assay that is used to estimate living fungal biomass.

Figure Two

The authors identified a large numbers of fungi, some presumably moss associated, and the total amount of fungi recognized was comparable to that found in forest soils.  The majority of fungi were identified as Ascomycetes, which agrees with other studies investigating vascular plant phyllosphere communities using the primer pair ITS3 and ITS4.  Additionally, this study identified a consistent taxonomic profile as a previous study from the Kauserud laboratory using a cloning strategy and Sanger sequencing approach.  Not surprisingly, this study reports orders of magnitude more fungi but identified roughly the same groups of fungi (Helotiales, Chaetothyriales, Agaricales, and Tremellales).

Figure Three

The researchers addressed seasonal variation by sampling every eight weeks between April and January over the course of a year.  Quite interestingly, there is a strong consensus in this study with other research that provides evidence that fungi not only survive under snowpack, but also continue to grow during the winter months.  While the researchers found consistent trends with regard to season, there were fluctuations in fungal biomass when considering host bryophyte.  By using principle component analyses, the authors show that the fungal communities are structured mainly by host plant and secondarily by the type of bryophyte tissue that was sampled.  This paper is an important contribution to the growing literature that show that plant-associated fungi are extremely diverse, dynamic, and show complex relationships with host plants.

A Genome Sequence for Tomato

The average person in the United States eats more than 10 kilograms of tomatoes a year – underscoring the fact that the fruit is one of the most important plant crops in cultivation.  To improve taste, texture, and disease resistance – just to name a few traits – a large consortium of researchers has initiated and provided a draft tomato genome.  In fact, the research consortium has published the genome sequence from two varieties of tomatoes: the domesticated inbred Solanum lycopersicum strain Heinz 1706 – the variety famous for ketchup – and the wild breeding Peruvian ancestor, Solanum pimpinellifolium.

The consortium published the draft genome sequences with a paper entitled “The tomato genome sequence provides insights into fleshy fruit evolution” in the journal Nature.  The consortium started sequencing the genome officially in 2003, but heterozygosity and duplication events made assembling the genome difficult.  The tomato genome is approximately 900 Mb – smaller than the Human genome – but certainly not small by eukaryotic standards.  Genetically and phenotypically diverse, the genus Solanum is one of the largest in the angiosperms.

The genomes of Solanum lycopersicum and S. pimpinellifolium only show 0.6% divergence and there is evidence of recent hybridization between the two species.  Both species show approximately 8% genome divergence compared against close relative potato, Solanum tuberosum.  Across the genus Solanum there has been two genome triplications with subsequent gene loss: one genome triplication is ancient and shared with all the rosid clade and another triplication is shared within the Solanaceae, which appear to be highly syntenic across the family.  The genomes were completed with both Sanger- and Illumina-derived sequences and assembled with the help of physical and genetic maps developed from a long history of tomato breeding efforts.

There are 34,727 and 35,004 genes identified across the genomes of Solanum lycopersicum and S. pimpinellifolium respectively.  These findings are similar to other plant genomes as 8,615 of these genes are found to be common to tomato, potato, rice, grape, and Arabidopsis.  Expression was assessed by replicated RNA-Seq of root, leaf, flower, and fruit tissues.  A total of 18,320 orthologous gene pairs were found in tomato and potato indicating diversifying selection between the two species of Solanum.

The consortium specifically compared tomato to grape in this study, as grape and tomato shared a common ancestor at approximately 100 million years ago, before the first whole genome triplication event that preceded the rosid-asterid divergence.  Additionally, both grape and tomato have similar molecular fruit maturation mechanisms.  When comparing the genomes of tomato and grape, approximately 73% of gene models are orthologous.  By estimating genome triplication events, the researchers conclude that the genome duplication event within the Solanaceae occurred roughly 71 million years ago and approximately 7 million years prior to the tomato-potato divergence.

Having a draft genome sequence is an important mechanism to understanding the molecular biology of the tomato plant.  Genome duplication events gave rise to the diversification of genes responsible for enhanced fruit physiological and chemical development – such as lycopene synthesis – and include photoreceptors and transcription factors that influence fruit ripening.  Additionally, tomato has had a contraction in the number of gene families associated with toxic alkaloid synthesis – the chemical hallmarks of many members of the Solanaceae.  One interesting question not answered by this research is the genomic mechanism by which the tomato regulates nutrient investment in above-ground fruits while the potato regulates starch investment in below-ground tubers.

These two tomato genomes, along with the genomes of fellow Nightshades completed or in the works (potato, pepper, tobacco, petunia, eggplant, etc.), will help breeders to develop traits desired by producers, like long shelf life, and fruit quality traits desired by tomato-consumers, such as taste, color, and texture.  In addition to these benefits, the draft tomato genomes will provide insights into the biology and nutrition of the Solanaceous plants, and provide more information for comparative genomics within this important economic group of plants.

Microbial Biogeography Of Public Restroom Surfaces

A very interesting paper recently appeared in the PLOS ONE journal, authored by Flores et al. entitled “Microbial Biogeography of Public Restroom Surfaces”.  This study, conducted by the Noah Fierer and Rob Knight labs at University of Colorado – Boulder, addressed the diversity of bacteria found at various places in public restrooms.  The novel aspect of this research is the use of culture-independent next-generation sequencing to determine bacterial species found in discriminating locations in public restrooms.

The restroom has been one of the greatest inventions in human history – especially from a public health perspective.  Without toilets and sinks – not failing to mention the plumbing infrastructure to get waste away from living spaces – disease causing bacteria (and let’s not forget other infectious organisms of the human gut, such as intestinal worms) associated with human waste easily spread from human to human, especially in close living quarters.  A fascinating brief overview of the microbial history of toilets (including some great anecdotes featuring toilet visionary Sir Thomas Crapper) and a commentary of this scientific paper, written by Rob Dunn, can be found on the Scientific American Blogs site.

Using barcoded pyrosequencing of the 16S rRNA gene marker, Flores et al. observed bacterial species on ten different surface types (door handles & stall handles – both in and out, faucet handles, soap dispenser, toilet seat, toilet flush handle, floor around toilet and floor around sink) in twelve different (six male and six female) restrooms on the UC-Boulder campus on a single day.

The researchers identified 19 different bacterial phyla on all of the surfaces sampled.  The majority of sequences (approximately 92%) could be placed within four phyla, including the Actinobacteria, Bacteriodetes, Firmicutes, and Proteobacteria.  Human-associated bacteria were found strongly associated with restroom surfaces, which is not surprising for indoor environments.

Bacterial communities could be categorized by the surfaces they inhabited.  On toilets, gut-associated bacteria were the dominant group.  Skin-associated bacteria were – not surprisingly – found on surfaces touched by hands, such as door handles.  The restroom floor held the greatest diversity of bacteria – some of which were found in low abundance – as these surfaces contained soil associated, as well as human associated, bacteria.  Quite interestingly, the researchers found that some of the toilet flush handles contained soil associated bacteria, implying that some restroom users flush toilets with their feet to avoid directly touching the handles.

There were no statistically significant differences between bacterial communities found in female and male restrooms, although the relative abundances of some bacterial groups were gender associated.  The bacterial family, Lactobacillaceae, found associated with vaginas, were – not surprisingly – more abundant in and around female restroom toilets than male counterparts.

The authors used the newly developed software package, Source Tracker, to determine the similarity of bathroom surfaces to communities from expected and previously published sources, such as human skin, the human gut, urine, soil, and faucet water.  It was predicted that human skin was the primary source of restroom surface bacteria.  Human gut was a source of bacteria found on and around toilets.  Despite the presence of many typical soil bacterial groups found on restroom floors, soil was not identified as a statistically significant source, probably because soil typically contains a highly diverse taxonomic array of species, many of which are rare.  The authors state that custodial mops and ventilation systems may also have some influence on the floor surfaces but were not directly addressed in this study.

The authors show here that human-associated bacteria are the most common microbes found in public restroom surfaces.  Human influenced source patterns can be determined from the bacterial community structure within the biogeography of restrooms.  This study underscores the importance of hand washing, particularly when using public restrooms, and the techniques used in this paper could be used to track or determine likely pathogenic bacteria found on surfaces during incidents of infectious outbreaks.

Potato Genome Sequence and Analysis

With next-generation sequencing technologies dropping in price and increasing in throughput, it’s not surprising to find multiple genomes published every week in scientific journals.  Most of these articles don’t qualify for publication in the top tier of journals like they did at the onset of the next-generation sequencing boom, but some genome sequencing projects, such as the potato genome, are high profile enough to warrant publication in top tier journals.

In the July 14th issue of the journal Nature, a draft of the potato (Solanum tuberosum) genome was described in a paper authored by the Potato Genome Sequencing Consortium – a huge group of researchers from 26 institutions.

The potato is the world’s fourth most consumed food crop, the most commonly grown vegetable crop, and a member of the economically important Solanaceae family –otherwise known as the nightshades – which include tomato, peppers, aubergine (eggplant if you live in the United States), tobacco, and petunia.  Widely distributed in western South America, tuber forming Solanum species are highly morphologically diverse and easily cross with other varieties for breeding purposes.

It’s been a bumpy road sequencing the potato genome since the project was started in 2006.  The potato genome is an extremely heterozygous autotetraploid, which translates to four highly variable copies of each of the 12 chromosomes.  It’s also the first sequenced Eudicot genome in the Asterid clade, so there are no close genetic relatives to provide the basis for a guided genome assembly.

The consortium began the sequencing by creating a bacterial artificial chromosome (BAC) library of 78,000 clones from a well studied diploid line providing high quality potatoes, named RH89-039-16.  The group used the BAC library and 10,000 AFLP markers to create more than 7000 contigs which were constructed into a physical map.  The group then identified up to 150 BACs for every chromosome on the potato genome, and verified their locations using fluorescent in situ hybridization.

Heterozygosity was so high in the RH line that after thorough sequencing the group hit an impasse with the assembly of the genome.  In an attempt to complement the sequencing of the RH line, the consortium began sequencing a doubled monoploid potato clone, DM1-3 516R44, derived from a diploid wild South America accession.  The DM line has a simpler genome than the RH line and is highly homozygous.

Using both the Illumina Genome Analyzer II and Roche 454 pyrosequencing platforms, and supplementing this data with traditional Sanger sequencing, approximately 96 Gb of data was acquired for the DM line.  The group then used the SOAPdenovo computer program to assemble the reads with a final assembly of 727 Mb for the DM line and a final estimation of 844 Mb for the genome.

The consortium generated more than 31 Gb of transcriptome data from both the DM and RH line libraries.  These 48 libraries represented major tissue types, developmental stages, and included various responses to abiotic and biotic stresses.  All the reads from the RNA-Seq libraries were mapped to the assembled DM genome.  Using gene prediction methods, along with protein and EST data, the potato genome was predicted to contain 39,000 protein coding genes, an amount which is in agreement with other plant genomes.  Within these genes, there were an estimated 2,642 asterid-specific and 3,372 potato-lineage-specific genes.  Some of the predicted asterid-specific genes include many novel transcription factors, self-incompatibility factors, and defence-related proteins. The draft assembly of the genome consists of more than 60% repeated elements.  The largest class of the transposable elements is the long terminal repeat retrotransposons (LTRs) which are estimated at 30% of the potato genome.

The potato is notorious for being susceptible to many pathogens and pests.  This well known susceptibility was one of the priorities for sequencing the genome and determining genes responsible for disease resistance and pathogen defense.  The DM genome assembly contains more than 800 putative R genes, responsible for conferring disease resistance, including 408 NBS-LRR-encoding genes, 57 Toll/interleukin-1 receptor (TIR) domains, and 351 non-TIR type resistance genes.  An extreme number of pseudogenes – attributed to indels, frameshift mutations, and misplaced stop codons –were identified within known R gene motifs, which possibly explains the potato’s inability to fight off some specific diseases.

One such well known disease, Late Blight, caused by Phytophthora infestans, was responsible for the Irish Potato Famine in the 1840s..  Using information from this genome sequencing project and other studies, we now know the variety brought to Europe in the late 16th century happens to lack specific disease resistance genes for Phytophthora infestans.  One could speculate that unbridled transposon jumping caused the inactivation of many R genes in this potato variety.

Unique for the potato is the formation of tubers (the actual potatoes) through the modification of a stolon.  The tomato is very closely related to potato, but does not produce stolons or modified tubers.  The group used transcript data from both potato and tomato to address genetic regulation of the formation of stolons and the transition of stolons to tubers.  Quite interestingly, the formation of stolons and tubers coincides with an up-regulation of genes associated with starch biosynthesis, protein storage, and Kunitz protease inhibitor genes associated with pests and pathogens.

Possibly due to extremely high levels of heterozygosity, it has been difficult to improve the potato through traditional breeding efforts.  It’s estimated that there is a worldwide economic loss of 4.5 billion US dollars to potato crops from diseases each year.  Just to attempt to suppress these diseases copious amounts of pesticides and fungicides are applied to potato crop land each year.  The potato cyst nematode, for example, is an important pest that researchers hope to improve resistance to via breeding initiatives.  Having this draft potato genome sequence will aid in the characterization of existing germplasm collections and description of allelic variance in breeding efforts to avoid diseases.  The potato genome will also serve as a resource for breeders wanting to improve the quality of other economically important Solanaceous plants such as tomato, pepper, eggplant, and tobacco.

Genome Sequence of the Date Palm

Published in the June 2011 issue of the journal Nature Biotechnology was a paper reporting on the genome sequence of the data palm, Phoenix dactylifera.  This paper, authored by Al-Dous et al., addressed the genome sequencing and de novo assembly of this agriculturally important monocot tree, along with comparative genomics with other plants.

Dates have been found in the tombs of pharaohs estimated at 8,000 years old.  Fields of agriculturally planted trees, estimated to be older than 5,000 years, suggest the date palm is one of the oldest cultivated plants in the world.  Dates are the most important agricultural crop in the hot and arid regions surrounding the Arabian Gulf and their global production is close to 7 million tons yearly.

Despite a prolonged emphasis on their agriculture, there are a few problems to deal with if you are a date grower.  Typical of tree crops, there is a long generation time from seedling to fruit harvesting.  Additionally, only the female date palm provides fruit and it takes at least 5 years after seed germination to tell if you have a male or female plant.  To make it even harder for a date grower, there are more than 2000 date varieties, each exhibiting its own color, flavor, size, shape, and ripening schedule, and they are all really hard to keep track of based on conventional techniques.

In an effort to provide genetic resources for date growers and breeders, the authors of this study – who were mainly located in Qutar – sequenced and assembled 380 Mb of the estimated 658 Mb genome of the Khalas cultivar, which is known for high fruit quality.  Generated using short reads from the Illumina Genome Analyzer IIx platform, this partial sequence excluded numerous large repeated regions, includes a predicted 28,890 genes, and represented 18 pairs of chromosomes.  The authors estimate that this draft genome represents roughly 90% of the total genes and 60% of the total genome.

This genome resource also serves a comparative genomics purpose by being the first member of the widespread monocot order Arecales.  To this date, the only Monocots with sequenced genomes – for example: Corn, Rice, and Sorghum – have all been in the grass order, the Poales.

This report is missing some vital information: in addition to an incomplete genome assembly, there is no metabolic, developmental, or gene network pathway reconstruction for the date palm provided in this paper (and unfortunately this paper also includes some glaring typos in the citation section).  In place of these expected analyses, the authors conducted a throughout survey of SNPs in this Khalas cultivar, along with eight additional cultivars common in breeding programs for the date palm.  Within these nine cultivars, 3,518,029 SNPs were determined, but quite interestingly, a total of 32 SNPs could be used to differentiate the cultivars.

In addition to the throughout SNP analysis, the researchers then did a full parentage analysis of the cultivars used in this study, which includes the famous date varieties such as Deglet Noor, Dayri, and Medjool.  Here‘s an article in Nature Middle East on the importance of understanding this parentage and gender analysis.

Although this is a draft genome still being completed and undergoing resequencing, namely the tools provided by the authors, the SNP and parentage analysis, should provide date palm breeders with many resources for improved fruit quality and this genome represents an exciting piece of the monocot evolutionary puzzle.

Activity of Abundant and Rare Bacteria in a Coastal Ocean

It’s been a busy summer, but I’m back to focusing on some recent research.  In fact, there’s been a flurry of recent papers which I plan to highlight here.  I’m exploring fungal and bacterial abundance in forest soils using pyrosequencing techniques with my own research, so I was interested to read this paper on bacterial activity in oceans off the Delaware coast.

In a study from the July 18th early online edition of the journal PNAS, researchers from the University of Delaware and University of Southern California sequenced the bacteria in seawater off the Delaware coast every month over the course of three years.  The research, authored by Barbara Campbell and her colleagues, measured both 16s rDNA and rRNA using next generation pyrosequencing techniques.  By measuring both the presence of DNA (a marker for species presence and overall abundance) and RNA (a marker for relative activity or, more accurately, ribosome activity) in this constantly shifting ecosystem, the authors hoped to explore and understand abundance of both rare and frequently found bacteria in a coastal ocean environment.  I already told you about an article featuring the Rappemonad bacteria, some of which were studied in this paper.

It has been hypothesized in ocean ecosystems that abundant bacteria are found frequently because they have high growth rates and are better at competing against slower growing bacterial.  Conversely, rare bacteria have long been considered to have slower growth rates, just be poor competitors to the more abundant bacteria, or have more streamlined genomes which are better suited to wait in dormancy until the right factor, most likely a specific nutrient, comes into play.

More than 600 OTUs (Operational Taxonomic Units – a term for individuals observed from the environment) were observed and these organisms formed a typical rank abundance curve that we have come to expect from environmental sampling, so there were no surprises in that finding.

What was more surprising, or should I say interesting, was what the authors found by comparing both DNA and RNA from their samples.  After the quality control of their 454 pyrosequencing reads, the authors included more than 500,000 nucleotide samples in their analysis.  More than half of the individual bacteria cycled between abundant and rare during the three years of sampling.  Interestingly, almost half of the bacteria were always considered rare, and close to 12 percent remained rare and inactive, and less than 5 percent were considered to be always abundant throughout the sampling.  The researchers used quantitative RT-PCR to validate specific DNA and RNA concentrations for five separate OTUs to verify the findings from the pyrosequencing portion of the study.

Also quite interesting was that the authors did not observe a pronounced seasonally affected microbial component or an environmental factor that could explain the abundance or scarcity in this ocean environment.  It appears by all accounts that the microbial community observed in this study is constantly changing and may not be regulated by many other factors except the community itself.  See here for a press release from the University of Delaware on this study.