Tag Archives: Genomes

Blast2Go Short Courses Summer & Fall 2012

Last year I told you about two short courses for using Blast2Go for automated functional annotation.  These courses will again be held this year.  Here’s information about this year’s courses:

FOURTH INTERNATIONAL COURSE IN AUTOMATED FUNCTIONAL ANNOTATION AND DATA MINING

In this course you will learn tools and tips for functional annotation, visualization and analysis of novel sequence data making use of Blast2GO.  The course will be offered to 25 participants. Please register now.

UC Davis, Davis, California, US: July, 11 – 13, 2012. Registration is open only until the 12th of June !!!

CIPF, Valencia, Spain: September, October 24 – 26, 2012.  Registration opens the 1st of July

For more information and course registration please visit: http://course.blast2go.com/

The Genomes of Two Thermophilic and Biomass-Degrading Fungi, Thielavia terrestris and Myceliophthora thermophila

One of the hurdles to the production of cellulosic biofuel is the economic breakdown plant biomass.  Currently, fungi used to break down plant biomass operate at, or slightly above, room temperature.  Chemical reactions at room temperature proceed slowly, are less efficient, and may be riddled with contaminating fungi which lower the efficiency of the breakdown process.  One scientific goal is to increase the heat in bioreactors with the hopes of speeding up the degradation using efficient fungal enzymes that operate at higher temperatures.

In an effort find thermostable fungal degradative enzymes, researchers have sequenced the genomes of two fungi, Thielavia terrestris and Myceliophthora thermophila, known for their ability to survive at high temperatures, namely 40oC to 75oC.  A report entitled “Comparative Genomic Analysis of the Thermophilic Biomass-Degrading Fungi Myceliophthora thermophila and Thielavia terrestris” has been published online on October 2nd in the journal Nature Biotechnology.  (Image: Myceliophthora thermophila link)

The 38.7 Mbp genome of M. thermophila and the 36.9 Mbp genome of T. terrestris are the first thermophilic eukaryotes to have their genomes sequenced, and contain seven and six complete chromosomes, respectively.  The genome of M. thermophila contains 9,110 protein-coding genes and there are 9,813 such genes in the genome of T. terrestris.  Both filamentous Ascomycetes – placed in the class Sordariomycetes and family Chaetomiaceae – have a similar level of genomic organization, barring numerous translocations and transversions.  When considering the three species with sequenced genomes in the Chaetomiaceae, large portions of the genomes, some of which are greater than 6000 contiguous genes, are shared in syntenous blocks.

Enzymes for the breakdown of plant matter – which can include a wide array of materials from agricultural and forestry waste, recycled pulp and paper products, leaves, etc. – were discovered across the genomes of both T. terrestris and M. thermophila.  These enzymes include numerous carbohydrate-active proteins (CAZymes) which include enzymes in the glycoside hydrolase, polysaccharide lyase, carbohydrate esterase, and glycosyl transferase families.  With some slight differences in regard to the breakdown of specific plant polysaccharides, such as pectin, both fungi can be categorized as general decomposers with regards to their enzyme repertoire.

The researchers then tested the expression of some enzymes identified in these newly sequenced fungal genomes, as well as comparing their diversity to well characterized enzymes from Trichoderma reesei.  Differing from T. reesei, both M. thermophila and T. terrestris have exhibited a proliferation in the GH61 enzyme family, responsible for the degradation of plant cell wall polysaccharides, as well as the GH10 and GH11 xylanase gene families.  The researchers used RNA-Seq to compare the expression of these enzymes on differing plant materials, such as alfalfa and barley straw, which represented characteristic dicot and monocot plants, respectively.  While there are noticeable differences to the degradation of plant material from dicots and monocots by both T. terrestris and M. thermophila, orthologs from both fungal genomes show similar patterns of gene expression, particularly when growing on complex plant substrates.

Research commentaries on this publication can be found here and here.

7th Annual Joint Genome Institute Users Meeting 2012

Recently announced, the Joint Genome Institute – US Department of Energy is planning to have their annual meeting in Walnut Creek, California, during the dates of March 20th to 22nd.  Registration is now open.  This should be another great meeting and includes another impressive array of speakers.

Structural Variation in Two Human Genomes Mapped by Whole Genome de novo Assembly

I found the Li et al. paper – “Structural Variation in Two Human Genomes Mapped by Whole Genome de novo Assembly” – published in the August issue of Nature Biotechnology interesting for a number of reasons.  As someone mainly interested in fungal and plant genomics this paper is somewhat outside my research focus, but I found both the novel approach to de novo genome assembly and the emphasis on structural genome variation over single nucleotide polymorphisms (SNPs) in explaining genetic diversity to be very interesting.

By using short read sequencing technology from the Illumina platform, the researchers began by sequencing the genomes of two individuals, one person of African descent (NA18507) and one of Asian descent (YH).  As with many genome sequencing studies, there were numerous problems during the assembly process, such as alignment accuracy, recovery of long contiguous stretches of nucleotides, stretches of low or no coverage, and identifying sequencing background noise.  The authors tried to eliminate these issues by developing a strategy focusing on de novo assembly instead of mapping reads to reference genomes.

The novel pipeline was able to identify structural variants – such as insertions, deletions, rearrangements, inversions, etc. – in each of the homozygous assembled genomes, some of which were upwards of 23,000 base pairs in length.  The researchers then validated the structural variations using both experimental and computational methods, and, using data generated for the 1000 Human Genomes Project, they mapped their identified structural variations in the genomes of 106 other individuals.

While SNPs are easier to observe (perhaps the reasons why they have been emphasized so much in recent years?) it seems that structural rearrangements are perhaps the major form of variation in human genomes, and maybe, all genomes.  Structural variations were less common than SNPs, but are more individual specific and appear to be associated with phenotypic characteristics.  A next research direction would be to observe the association of structural variations to disease traits or susceptibility.

This paper also suggests that accurately assembling long genomic regions are very important to understanding structural variation.  This can be accomplished by either using technologies that naturally generate longer reads (i.e. Sanger or PacBio sequencing) or ensuring that short reads can be accurately assembled by computational methods.

As an aside: this group at BGI (formerly the Beijing Genomics Institute) also sequenced the Giant Panda genome.

Potato Genome Sequence and Analysis

With next-generation sequencing technologies dropping in price and increasing in throughput, it’s not surprising to find multiple genomes published every week in scientific journals.  Most of these articles don’t qualify for publication in the top tier of journals like they did at the onset of the next-generation sequencing boom, but some genome sequencing projects, such as the potato genome, are high profile enough to warrant publication in top tier journals.

In the July 14th issue of the journal Nature, a draft of the potato (Solanum tuberosum) genome was described in a paper authored by the Potato Genome Sequencing Consortium – a huge group of researchers from 26 institutions.

The potato is the world’s fourth most consumed food crop, the most commonly grown vegetable crop, and a member of the economically important Solanaceae family –otherwise known as the nightshades – which include tomato, peppers, aubergine (eggplant if you live in the United States), tobacco, and petunia.  Widely distributed in western South America, tuber forming Solanum species are highly morphologically diverse and easily cross with other varieties for breeding purposes.

It’s been a bumpy road sequencing the potato genome since the project was started in 2006.  The potato genome is an extremely heterozygous autotetraploid, which translates to four highly variable copies of each of the 12 chromosomes.  It’s also the first sequenced Eudicot genome in the Asterid clade, so there are no close genetic relatives to provide the basis for a guided genome assembly.

The consortium began the sequencing by creating a bacterial artificial chromosome (BAC) library of 78,000 clones from a well studied diploid line providing high quality potatoes, named RH89-039-16.  The group used the BAC library and 10,000 AFLP markers to create more than 7000 contigs which were constructed into a physical map.  The group then identified up to 150 BACs for every chromosome on the potato genome, and verified their locations using fluorescent in situ hybridization.

Heterozygosity was so high in the RH line that after thorough sequencing the group hit an impasse with the assembly of the genome.  In an attempt to complement the sequencing of the RH line, the consortium began sequencing a doubled monoploid potato clone, DM1-3 516R44, derived from a diploid wild South America accession.  The DM line has a simpler genome than the RH line and is highly homozygous.

Using both the Illumina Genome Analyzer II and Roche 454 pyrosequencing platforms, and supplementing this data with traditional Sanger sequencing, approximately 96 Gb of data was acquired for the DM line.  The group then used the SOAPdenovo computer program to assemble the reads with a final assembly of 727 Mb for the DM line and a final estimation of 844 Mb for the genome.

The consortium generated more than 31 Gb of transcriptome data from both the DM and RH line libraries.  These 48 libraries represented major tissue types, developmental stages, and included various responses to abiotic and biotic stresses.  All the reads from the RNA-Seq libraries were mapped to the assembled DM genome.  Using gene prediction methods, along with protein and EST data, the potato genome was predicted to contain 39,000 protein coding genes, an amount which is in agreement with other plant genomes.  Within these genes, there were an estimated 2,642 asterid-specific and 3,372 potato-lineage-specific genes.  Some of the predicted asterid-specific genes include many novel transcription factors, self-incompatibility factors, and defence-related proteins. The draft assembly of the genome consists of more than 60% repeated elements.  The largest class of the transposable elements is the long terminal repeat retrotransposons (LTRs) which are estimated at 30% of the potato genome.

The potato is notorious for being susceptible to many pathogens and pests.  This well known susceptibility was one of the priorities for sequencing the genome and determining genes responsible for disease resistance and pathogen defense.  The DM genome assembly contains more than 800 putative R genes, responsible for conferring disease resistance, including 408 NBS-LRR-encoding genes, 57 Toll/interleukin-1 receptor (TIR) domains, and 351 non-TIR type resistance genes.  An extreme number of pseudogenes – attributed to indels, frameshift mutations, and misplaced stop codons –were identified within known R gene motifs, which possibly explains the potato’s inability to fight off some specific diseases.

One such well known disease, Late Blight, caused by Phytophthora infestans, was responsible for the Irish Potato Famine in the 1840s..  Using information from this genome sequencing project and other studies, we now know the variety brought to Europe in the late 16th century happens to lack specific disease resistance genes for Phytophthora infestans.  One could speculate that unbridled transposon jumping caused the inactivation of many R genes in this potato variety.

Unique for the potato is the formation of tubers (the actual potatoes) through the modification of a stolon.  The tomato is very closely related to potato, but does not produce stolons or modified tubers.  The group used transcript data from both potato and tomato to address genetic regulation of the formation of stolons and the transition of stolons to tubers.  Quite interestingly, the formation of stolons and tubers coincides with an up-regulation of genes associated with starch biosynthesis, protein storage, and Kunitz protease inhibitor genes associated with pests and pathogens.

Possibly due to extremely high levels of heterozygosity, it has been difficult to improve the potato through traditional breeding efforts.  It’s estimated that there is a worldwide economic loss of 4.5 billion US dollars to potato crops from diseases each year.  Just to attempt to suppress these diseases copious amounts of pesticides and fungicides are applied to potato crop land each year.  The potato cyst nematode, for example, is an important pest that researchers hope to improve resistance to via breeding initiatives.  Having this draft potato genome sequence will aid in the characterization of existing germplasm collections and description of allelic variance in breeding efforts to avoid diseases.  The potato genome will also serve as a resource for breeders wanting to improve the quality of other economically important Solanaceous plants such as tomato, pepper, eggplant, and tobacco.

Genomic Impact of Eukaryotic Transposable Elements Meeting

Registration for the “Genomic Impact of Eukaryotic Transposable Elements” meeting is now open.  The meeting will be held February 24th-28th 2012, at the Asilomar Conference Center, in Pacific Grove, California, USA.  The conference will consist of invited speakers, general sessions from submitted abstracts, and workshop sessions devoted to computational analysis of transposable elements.

Beijing Genomics Institute’s International Conference on Genomics 6

Seems like there are a whole series of symposia right now proclaiming to be the international conference on genomics and I do not know who holds the rights to the title.  I figure more meetings in this research area can’t hurt the state of the science.  The Beijing Genomics Institute has been sponsoring an annual series, The International Conference on Genomics, now on its 6th year, and, despite the name of “international conference”, it’s always held in or around Beijing.

This year’s meeting, The International Conference on Genomics 6 (ICG-VI) aims to promote research in basic and applied genomics by sponsoring a series of presentations focused on new sequencing techniques and bioinformatic strategies.  There will be sessions centered on new sequencing techniques, transcriptomics, epigenomics, metagenomics, proteomics, bioinformatics and data mining, and social issues relating to new genomic information.  Registration for the meeting can be found here.