Monthly Archives: January 2011

Harvesting The Apple Genome

The domesticated Apple (Malus x domestica) is the most consumed fruit of the temperate regions of the world.  Apples are members of the Rose family, the Rosaceae, and members of a larger group of plants collectively known as the Rosids, which staggeringly includes more than one third of all the flowering plants.  Numerous plants within the Rosaceae have now either had their genomes sequenced (Peach, Strawberry) or are in the works or slated for sequencing (Almond, Pear, Plum, Cherry, Rose, Raspberry, and Apricot, for example).  The latest member of the Rosaceae to have a genome sequence completed (just behind Peach and just ahead of Strawberry) is the ‘Golden Delicious’ cultivar of Apple (photo link).

The Apple Genome Consortium recently reported (see Velasco et al.) on the draft genome in the October issue of the journal Nature Genetics (see here also).  This genome was sequenced using a combination of Roche/454 pyrosequencing and traditional Sanger sequencing.  The estimated size of Apple genome is 742.3 Mb and the length of assembled contigs (603.9 Mb) is estimated to be 81% of the genome.  Repetitive elements correspond to 67% of the assembled portion of the genome (500 Mb) and 98% of the unassembled portion of the genome (138.4 Mb).

When comparing the genome of Apple to other plants there were striking differences.  A large number of putative genes have been identified in Apple (57,386).  This value is considerably higher than most plants with sequenced genomes, such as Arabidopsis thaliana (27,228), Poplar (45,654), Papaya (28,027), Brachypodium distachyon (25,532), Grape (33,514), Rice (40,577), Sorghum (34,496), Cucumber (26,682) Soybean (46,430), Maize (32,540), Strawberry (34,809) and Cocoa (28,798).  There were a total of 11,444 Apple-specific genes identified in the study.  Gene density of the Apple genome is approximately equal to that of Poplar and Grape, but is less dense than the genomes of Arabidopsis, Brachypodium, and Rice.  Like many plant genomes, the Apple genome has a large number of repeated elements which made the assembly of the genome more difficult than most.

One aspect of the Apple’s biology that has been elucidated by this genome is the development of the characteristic fruit, also known as the pome.  This fruit is only found in the tribe Pyreae, which includes both Apple and Pear.  The fruit probably evolved from the widespread duplication of MADS-box genes that, in the case of Apple, regulate the transition from flower to fruit.  The tribe Pyreae had a relatively recent (in geological time that is, less that 50 million years ago) genome wide duplication event which has contributed to the wide diversity of MADS-box genes, as well as other gene families.  In addition to this recent genome wide duplication event, there were prior duplication events which have contributed to the large size of the Apple genome.

Genes corresponding to Apple development, flowering, aroma and taste have been identified by the consortium, as well as genes within the plant that respond to disease and environmental factors, such as temperature and air pollution.  Having the genome sequence of Apple will contribute to understanding the biology of Apple and accelerate the breeding of this economically important crop.

Genome Sequence Of The Woodland Strawberry

Although the strawberry is one of the world’s favorite foods (I don’t really know how to refer to it in common terms because it is neither a fruit nor a berry, it’s technically a “fleshy receptacle” with seeds dotting the outside), it’s a relatively new crop that has its cultivated origin in the last 250 years.  Strawberry is one of many crop species, including Peach, Apple, Pear, Plum, etc., that are members of the Rosaceae, or Rose family.  As published in the December 26th advance online issue of the journal Nature Genetics an international consortium of researchers has published the genome of the woodland strawberry, Fragaria vesca (photo link).

The modern cultivated strawberry (Fragaria x ananassa) is (respectively) a genetic freak and one of the most complex crop plants from a genomic standpoint.  It has a large genome that is octaploid and has been derived from as many as four different genome duplication events.  This is one of the reasons that the cultivated strawberry plant (and “berry”) is so much larger than its wild relative: its cells are filled with DNA.  Conversely, the genome of the wild relative, Fragaria vesca, is relatively small (14 chromosomes and approximately 240 Mb in size) and this offers experimental opportunities and is advantageous from a plant breeding perspective.

This paper, authored by Shulaev et al., reports that the genome of Fragaria vesca is extremely small for a plant genome and is slightly larger than Arabidopsis thaliana, the plant with the smallest genome size, but not quite as large as close relative Arabidopsis lyrata.  The genome was generated by using a fourth-generation inbred line of the accession H4x4 and by utilizing Roche/454, Illumina/Solexa, and Life Technologies/SOLiD nucleotide sequencing technologies to sequence the genome at approximately 39X coverage.  A previously well developed linkage map derived from many years of Strawberry breeding provided a resource for the genome assembly which was assembled de novo and placed into seven pseudochromosomes.  More than 6,000 intact transposable elements were found in the genome with approximately 16% of the genome occupied by LTR-type retrotransposons.

The genome of Fragaria vesca contains an estimated 34,809 protein coding genes, with an average of 4.8 exons per gene, and an average gene size of 1,160 bp.  When comparing the genomes of Rice (Oryza), Arabidopsis, Grape (Vitis) and Strawberry, a total of 6,233 gene clusters were shared with the four species.  This is surprisingly similar to that reported for Cocoa in the same online issue of this journal.

The strawberry genome will aid in understanding the biology of this plant, as well as other plants, and will hopefully lead to breeds of more cold tolerant, disease resistant, and even more delicious tasting varieties of this crop.  The strawberry genome research consortium has provided a genome browser for public access.  The consorium is led by Kevin Folta at The University of Florida (Web Portal and Laboratory) and he has posted a detailed story of the completion of the genome over at his blog Illumination.

Sweet Science – Publishing The Chocolate Genome

Despite some added stress, competition in life can be a good thing.  This is probably more evident in the science community where the threat of being scooped can speed up the release of a particular project.  Originally scheduled for completion three years from now, two competing international research consortia have published the genome of the plant Theobroma cacao, known to the world as chocolate, within the last few months.

Theobroma cacao is a demanding plant with extremely specific needs.  The plants grow as tropical understory trees needing well-drained, yet moist, rich soils and a specific temperature window (between 15 and 32 degrees Celsius) for optimal growth.  In addition to these growing criteria, Theobroma cacao is extremely sensitive to biotic pathogens, such as fungal, bacterial, and insect pests, as well as abiotic stresses, such as drought and air pollution.  As a result, chocolate crops have faced massive losses to disease, annually averaging around 30% of the crop, and that loss puts a relatively high price on this delicious commodity.

There are three main cultivars of Theobroma cacao used in making chocolate.  Chocolate comes from the ground seeds of the Theobroma cacao plant.  Forastero is the most common variety, accounts for nearly all of the chocolate sold, and is characterized by typical chocolate flavors but has little in the way of subtlety.  Criollo is rarer, accounting for less than 5% of all chocolate grown, is more expensive, and is noted for a delicate rich flavor of high quality taste but low in typical chocolate flavors.  Trinitario is a naturally occurring hybrid of the Forestero and Criollo varieties (image link).

In the middle of September of this past year, an international consortium including the chocolate producing company Mars, IBM, the US Department of Agriculture (USDA) and numerous academic institutions (Clemson University, Indiana University, and Washington State University) released the genome of the Matina 1-6 genotype of the Forastero variety.  This genome is located on the Cacao Genome Database.  This genotype is one of the most commonly found in the production of chocolate and is used in numerous breeding backgrounds.

Now, another international consortium, including chocolate competitor Hershey, Genoscope, Pennsylvania State University (Guiltinan Lab, Schuster Lab, Carlson Lab, Axtell Lab), and collaborators in France, South Korea, and Brazil, forming the International Cocoa Genome Sequencing Consortium (ICGS), have released the genome of the B97-61/B2 genotype, a member of the Criollo variety.  This publication was released online on December 26th in Nature Genetics and will be included in future print issue of the journal.

The genome of the B97-61/B2 genotype, along with the Matina 1-6 genotype, is well assembled for draft genomes.  The B97-61/B2 genotype was completed by using a shotgun strategy utilizing Roche/454, Illumina, and traditional Sanger nucleotide sequencing, generating a total of 26 Gb of raw data corresponding to close to 17x coverage of the genome.  This draft assembly corresponds to 76% of the estimated total content of the genome, with remaining unassembled portions including repeated elements that are notoriously difficult to piece together with any accuracy.  Not surprisingly, a large number of transposable element regions (67,575 to be exact) were found in the Theobroma genome, which is similar to genomes of other sequenced plants.

With regards to the comparison to other plant genomes, the genome of Theobroma includes an estimated 28,798 protein coding genes, an average gene size of 3,346 bp, and about 5 exons per gene, which includes more genes, a similar number of exons, and a less dense genome than that of Arabidopsis thaliana.  The genomes of Theobroma, Arabidopsis, grape (Vitis), soybean (Glycine) and poplar (Populus) revealed 6,362 clusters of genes found in each of these dicot genomes.  The genome of Theobroma included 682 gene families that were genome specific and appeared to include a high level of metabolic and cellular processes, including many putative secondary metabolism genes, such as those involved in the production of flavonoids.

The completion of two complementary genomes of the chocolate plant Theobroma cacao will aid in the development of greater disease resistance which should result in improved plant breeding efforts.  Not only is our knowledge of important plant biological processes elucidated with the completion of these genomes, but hopefully this information will aid in both the improvement of chocolate quality and the reduction of economic losses experienced by chocolate growers.  Furthermore, we should now have a better understanding of the biochemistry behind flavonoid and other important chemical production in chocolate – some of these chemicals have been shown to reduce stress in people – and this could be important for those people active in competing genome sequencing consortia.