What you need to know about HMMs:
A short introduction to signal and content sensors and integrated gene finding approaches based on the hidden Markov model. It is a little dated now, but still serves as an excellent introduction to the difficulties in finding genes ab initio in the human genome.
An excellent assessment, up to 2002, of available approaches to identifying protein-coding genes. It explains why exon boundaries are difficult to detect accurately, and identifies the need to incorporate more biological knowledge, and to build more specialized computational approaches for identifying protein-coding genes.
An excellent, more recent review (2003) of gene and regulatory region prediction using multiple genomes. Sets out the basics of comparative genomics
A current review paper that compares three state of the art gene finders, and offers prescriptions on how the next generation of gene finders should be designed.
The original GENSCAN paper which lays out the HMM model of genefinding. Is still a widely used standard for comparison of gene-finding programs. Open source versions of GENSCAN are mnot readily available.
This paper introduces SNAP, an ab initio gene finder which demonstrates that a simplified version of the HMM model used by GENSCAN tuned for a specific genome, outperforms the more general GENSCAN model. SNAP is downloadable here.
How do we incorporate extra knowledge into genefinders? GeneWise and Genomewise demonstrate one approach. GeneWise predicts gene structure using similar protein sequences, and Genomewise, provides a gene structure final parse across cDNA- and EST-defined spliced structure. Both algorithms are used by the Ensembl annotation system. The GeneWise algorithm is a principled combination of hidden Markov models (HMMs).
Check out the update Ensemble 2005.
This paper introduces methods for efficiently incorporating homology information to assist in gene prediction.
A systematic account of how to set up training data for HMM gene finders.
An open source program that integrates an ab-initio gene finder with other evidence obtained from homology and ESTs.
A state-of-the-art gene finder which you can access through a web interface.
The original SLAM paper. Introduces pair HMMs for the gene finding task.
Uses identification of orthologous genes and dynamic programming to make predictions of genes across species.
Version 2 of the SLAM paper with more detailed computational results and experimental validation.
This paper describes an open-source pair GHMM model for gene finding.
The application of Twinscan, Ensembl and SGP2 to the problem of predicting genes on the chicken genome using the human genome as a reference.