Tag Archives: Plant Biology

Potato Genome Sequence and Analysis

With next-generation sequencing technologies dropping in price and increasing in throughput, it’s not surprising to find multiple genomes published every week in scientific journals.  Most of these articles don’t qualify for publication in the top tier of journals like they did at the onset of the next-generation sequencing boom, but some genome sequencing projects, such as the potato genome, are high profile enough to warrant publication in top tier journals.

In the July 14th issue of the journal Nature, a draft of the potato (Solanum tuberosum) genome was described in a paper authored by the Potato Genome Sequencing Consortium – a huge group of researchers from 26 institutions.

The potato is the world’s fourth most consumed food crop, the most commonly grown vegetable crop, and a member of the economically important Solanaceae family –otherwise known as the nightshades – which include tomato, peppers, aubergine (eggplant if you live in the United States), tobacco, and petunia.  Widely distributed in western South America, tuber forming Solanum species are highly morphologically diverse and easily cross with other varieties for breeding purposes.

It’s been a bumpy road sequencing the potato genome since the project was started in 2006.  The potato genome is an extremely heterozygous autotetraploid, which translates to four highly variable copies of each of the 12 chromosomes.  It’s also the first sequenced Eudicot genome in the Asterid clade, so there are no close genetic relatives to provide the basis for a guided genome assembly.

The consortium began the sequencing by creating a bacterial artificial chromosome (BAC) library of 78,000 clones from a well studied diploid line providing high quality potatoes, named RH89-039-16.  The group used the BAC library and 10,000 AFLP markers to create more than 7000 contigs which were constructed into a physical map.  The group then identified up to 150 BACs for every chromosome on the potato genome, and verified their locations using fluorescent in situ hybridization.

Heterozygosity was so high in the RH line that after thorough sequencing the group hit an impasse with the assembly of the genome.  In an attempt to complement the sequencing of the RH line, the consortium began sequencing a doubled monoploid potato clone, DM1-3 516R44, derived from a diploid wild South America accession.  The DM line has a simpler genome than the RH line and is highly homozygous.

Using both the Illumina Genome Analyzer II and Roche 454 pyrosequencing platforms, and supplementing this data with traditional Sanger sequencing, approximately 96 Gb of data was acquired for the DM line.  The group then used the SOAPdenovo computer program to assemble the reads with a final assembly of 727 Mb for the DM line and a final estimation of 844 Mb for the genome.

The consortium generated more than 31 Gb of transcriptome data from both the DM and RH line libraries.  These 48 libraries represented major tissue types, developmental stages, and included various responses to abiotic and biotic stresses.  All the reads from the RNA-Seq libraries were mapped to the assembled DM genome.  Using gene prediction methods, along with protein and EST data, the potato genome was predicted to contain 39,000 protein coding genes, an amount which is in agreement with other plant genomes.  Within these genes, there were an estimated 2,642 asterid-specific and 3,372 potato-lineage-specific genes.  Some of the predicted asterid-specific genes include many novel transcription factors, self-incompatibility factors, and defence-related proteins. The draft assembly of the genome consists of more than 60% repeated elements.  The largest class of the transposable elements is the long terminal repeat retrotransposons (LTRs) which are estimated at 30% of the potato genome.

The potato is notorious for being susceptible to many pathogens and pests.  This well known susceptibility was one of the priorities for sequencing the genome and determining genes responsible for disease resistance and pathogen defense.  The DM genome assembly contains more than 800 putative R genes, responsible for conferring disease resistance, including 408 NBS-LRR-encoding genes, 57 Toll/interleukin-1 receptor (TIR) domains, and 351 non-TIR type resistance genes.  An extreme number of pseudogenes – attributed to indels, frameshift mutations, and misplaced stop codons –were identified within known R gene motifs, which possibly explains the potato’s inability to fight off some specific diseases.

One such well known disease, Late Blight, caused by Phytophthora infestans, was responsible for the Irish Potato Famine in the 1840s..  Using information from this genome sequencing project and other studies, we now know the variety brought to Europe in the late 16th century happens to lack specific disease resistance genes for Phytophthora infestans.  One could speculate that unbridled transposon jumping caused the inactivation of many R genes in this potato variety.

Unique for the potato is the formation of tubers (the actual potatoes) through the modification of a stolon.  The tomato is very closely related to potato, but does not produce stolons or modified tubers.  The group used transcript data from both potato and tomato to address genetic regulation of the formation of stolons and the transition of stolons to tubers.  Quite interestingly, the formation of stolons and tubers coincides with an up-regulation of genes associated with starch biosynthesis, protein storage, and Kunitz protease inhibitor genes associated with pests and pathogens.

Possibly due to extremely high levels of heterozygosity, it has been difficult to improve the potato through traditional breeding efforts.  It’s estimated that there is a worldwide economic loss of 4.5 billion US dollars to potato crops from diseases each year.  Just to attempt to suppress these diseases copious amounts of pesticides and fungicides are applied to potato crop land each year.  The potato cyst nematode, for example, is an important pest that researchers hope to improve resistance to via breeding initiatives.  Having this draft potato genome sequence will aid in the characterization of existing germplasm collections and description of allelic variance in breeding efforts to avoid diseases.  The potato genome will also serve as a resource for breeders wanting to improve the quality of other economically important Solanaceous plants such as tomato, pepper, eggplant, and tobacco.

Genome Sequence of the Date Palm

Published in the June 2011 issue of the journal Nature Biotechnology was a paper reporting on the genome sequence of the data palm, Phoenix dactylifera.  This paper, authored by Al-Dous et al., addressed the genome sequencing and de novo assembly of this agriculturally important monocot tree, along with comparative genomics with other plants.

Dates have been found in the tombs of pharaohs estimated at 8,000 years old.  Fields of agriculturally planted trees, estimated to be older than 5,000 years, suggest the date palm is one of the oldest cultivated plants in the world.  Dates are the most important agricultural crop in the hot and arid regions surrounding the Arabian Gulf and their global production is close to 7 million tons yearly.

Despite a prolonged emphasis on their agriculture, there are a few problems to deal with if you are a date grower.  Typical of tree crops, there is a long generation time from seedling to fruit harvesting.  Additionally, only the female date palm provides fruit and it takes at least 5 years after seed germination to tell if you have a male or female plant.  To make it even harder for a date grower, there are more than 2000 date varieties, each exhibiting its own color, flavor, size, shape, and ripening schedule, and they are all really hard to keep track of based on conventional techniques.

In an effort to provide genetic resources for date growers and breeders, the authors of this study – who were mainly located in Qutar – sequenced and assembled 380 Mb of the estimated 658 Mb genome of the Khalas cultivar, which is known for high fruit quality.  Generated using short reads from the Illumina Genome Analyzer IIx platform, this partial sequence excluded numerous large repeated regions, includes a predicted 28,890 genes, and represented 18 pairs of chromosomes.  The authors estimate that this draft genome represents roughly 90% of the total genes and 60% of the total genome.

This genome resource also serves a comparative genomics purpose by being the first member of the widespread monocot order Arecales.  To this date, the only Monocots with sequenced genomes – for example: Corn, Rice, and Sorghum – have all been in the grass order, the Poales.

This report is missing some vital information: in addition to an incomplete genome assembly, there is no metabolic, developmental, or gene network pathway reconstruction for the date palm provided in this paper (and unfortunately this paper also includes some glaring typos in the citation section).  In place of these expected analyses, the authors conducted a throughout survey of SNPs in this Khalas cultivar, along with eight additional cultivars common in breeding programs for the date palm.  Within these nine cultivars, 3,518,029 SNPs were determined, but quite interestingly, a total of 32 SNPs could be used to differentiate the cultivars.

In addition to the throughout SNP analysis, the researchers then did a full parentage analysis of the cultivars used in this study, which includes the famous date varieties such as Deglet Noor, Dayri, and Medjool.  Here‘s an article in Nature Middle East on the importance of understanding this parentage and gender analysis.

Although this is a draft genome still being completed and undergoing resequencing, namely the tools provided by the authors, the SNP and parentage analysis, should provide date palm breeders with many resources for improved fruit quality and this genome represents an exciting piece of the monocot evolutionary puzzle.

PhenoDays 2011 International Symposium

Genomics is directly increasing the amount of information at the hands of agricultural crop breeders, but phenotyping has become the research bottleneck for phenotype to genotype associations.

In an effort to alleviate this bottleneck, a group of researchers has organized the PhenoDays 2011 International Symposium which will be held October 12th to 14th in Wageningen, The Netherlands.  Symposium presentation talks will be given by researchers from both institutional and academic plant breeding groups, as well as industry representatives from the seed production industry.  In addition, there will be plant phenotyping workshops.  See the symposium website for more information and registration.

Schatz Tree Genetics Colloquium: Genetics, Ecology, and Management of Walnuts & Butternut

The Schatz Tree Genetics Colloquium is a biennial meeting focusing on the genetics of trees.  The purpose of this colloquium series is to advance the knowledge of tree genetics, breeding, and ecology, with each biennial meeting focusing on a specific family, genus, or species of tree (s).

This year’s colloquium meeting is entitled: Genetics, Ecology, and Management of Walnuts and Butternut.  This meeting will be held at the beautiful locale of the Penn State Mont Alto campus, on July 11th and 12th, 2011.  Registration for the meeting is FREE, although there is a $20 cost for the closing banquet.  Advanced registration is required as there is no on-site registration for the meeting.  Deadline for registration is June 24th, 2011.

New Phytologist Bioenergy Trees Symposium Wrap-Up

I’m just returning from the New PhytologistBioenergy Trees” symposium, which just took place from May 17th to 19th at INRA in Nancy, Lorraine, France, and I am pleased to say was a very productive meeting.  Due to technical difficulties, I was not able to contribute to the online updates via Twitter, but if you’d like to follow the meeting developments you can read the Twitter feeds using the hashtag #26NPS or following @NewPhyt.

Plant Genome Evolution Meeting, Amsterdam

Thanks to next-generation sequencing, the number of genomes that been deciphered is rapidly increasing.  Plants have somewhat lagged behind other organisms – due to very large and complex genomes requiring both sequencing and computational energy – but despite these hurdles the number of completed plant genomes are starting to increase rapidly (just look at Phytozome for more evidence of this).

In order to deal with the increasingly large amount of genomic data, the Plant Genome Evolution Meeting, held this year in Amsterdam, The Netherlands, seeks to gather researchers studying plant evolution and comparative genomics.  This symposium is sponsored by the Current Opinion series of scientific journals.  An early conference program has been announced here and registration is located here.


ChloroFilms is nonprofit project which seeks to develop plant biology education through the promotion of video content about the wonders of plants and plant associated life.  ChloroFilms provides cash awards for videos produced about plants.  Initial funding for the project came from grants from the American Society of Plant Biologists, the Botanical Society of America, Penn State Institutes for Energy and the Environment, and the Canadian Botanical Association.

Get your cameras ready and start filming!

Here are three award winning films: