Accurate annotation of genomes is still a challenge. Large-scale sequencing projects usually provide additional experimental data (EST, full-length cDNA) that can be utilized in the annotation process to improve the quality of gene models. More recently sequencing efforts are concentrated on pathogens and model organisms from Fungi and Protozoa and are focused on sequencing of genomes of closely related organisms for evolution, genetics and comparative studies. These genomes are relatively small but often lack additional transcript or protein data. Using comparative multi- genome approach can greatly improve the accuracy of gene prediction compared to single genome method. The multi-genome Gnomon approach allows utilizing the transcript and protein data from closely related organisms in a single multi-genome annotation run. This method starts from a single genome Gnomon gene prediction and then uses a comparative analysis among multiple genomes to gradually improve the annotation through an iterative process. At each iteration the best models are selected and used as a training set and evidence for the next step. Transcript and protein alignments are used to guide gene model predictions. The most recent version of Gnomon can utilize RNA-Seq data giving more support to the splice junctions. Eight Aspergillus genomes have been annotated simultaneously using this method. Four of these genomes have RNA-Seq data available. The resulting annotation has proven to be more consistent across the genomes than the annotation of the individual genomes.
Full conference title:
- Asperfest 9 (2012)