The mouse genome: Experimental examination of gene predictions and transcriptional start sites

Sujit Dike, Vivekanand S. Balija, Lidia U. Nascimento, Zhenyu Xuan, Jacqueline Ou, Theresa Zutavern, Lance E. Palmer, Greg Hannon, Michael Q. Zhang, W. Richard McCombie

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5′ end annotations in 58% of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (∼25% success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8% of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.

Original languageEnglish (US)
Pages (from-to)2424-2429
Number of pages6
JournalGenome Research
Volume14
Issue number12
DOIs
StatePublished - Dec 1 2004
Externally publishedYes

Fingerprint

Genome
Genes
Molecular Sequence Annotation
Genome Components
Polymerase Chain Reaction
Recombinant DNA
Nucleic Acid Regulatory Sequences
Molecular Biology
Exons
RNA

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Dike, S., Balija, V. S., Nascimento, L. U., Xuan, Z., Ou, J., Zutavern, T., ... McCombie, W. R. (2004). The mouse genome: Experimental examination of gene predictions and transcriptional start sites. Genome Research, 14(12), 2424-2429. https://doi.org/10.1101/gr.3158304

The mouse genome : Experimental examination of gene predictions and transcriptional start sites. / Dike, Sujit; Balija, Vivekanand S.; Nascimento, Lidia U.; Xuan, Zhenyu; Ou, Jacqueline; Zutavern, Theresa; Palmer, Lance E.; Hannon, Greg; Zhang, Michael Q.; McCombie, W. Richard.

In: Genome Research, Vol. 14, No. 12, 01.12.2004, p. 2424-2429.

Research output: Contribution to journalArticle

Dike, S, Balija, VS, Nascimento, LU, Xuan, Z, Ou, J, Zutavern, T, Palmer, LE, Hannon, G, Zhang, MQ & McCombie, WR 2004, 'The mouse genome: Experimental examination of gene predictions and transcriptional start sites', Genome Research, vol. 14, no. 12, pp. 2424-2429. https://doi.org/10.1101/gr.3158304
Dike, Sujit ; Balija, Vivekanand S. ; Nascimento, Lidia U. ; Xuan, Zhenyu ; Ou, Jacqueline ; Zutavern, Theresa ; Palmer, Lance E. ; Hannon, Greg ; Zhang, Michael Q. ; McCombie, W. Richard. / The mouse genome : Experimental examination of gene predictions and transcriptional start sites. In: Genome Research. 2004 ; Vol. 14, No. 12. pp. 2424-2429.
@article{51ca3f4a3bd44933b80989574dd16ec9,
title = "The mouse genome: Experimental examination of gene predictions and transcriptional start sites",
abstract = "The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5′ end annotations in 58{\%} of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (∼25{\%} success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8{\%} of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.",
author = "Sujit Dike and Balija, {Vivekanand S.} and Nascimento, {Lidia U.} and Zhenyu Xuan and Jacqueline Ou and Theresa Zutavern and Palmer, {Lance E.} and Greg Hannon and Zhang, {Michael Q.} and McCombie, {W. Richard}",
year = "2004",
month = "12",
day = "1",
doi = "10.1101/gr.3158304",
language = "English (US)",
volume = "14",
pages = "2424--2429",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "12",

}

TY - JOUR

T1 - The mouse genome

T2 - Experimental examination of gene predictions and transcriptional start sites

AU - Dike, Sujit

AU - Balija, Vivekanand S.

AU - Nascimento, Lidia U.

AU - Xuan, Zhenyu

AU - Ou, Jacqueline

AU - Zutavern, Theresa

AU - Palmer, Lance E.

AU - Hannon, Greg

AU - Zhang, Michael Q.

AU - McCombie, W. Richard

PY - 2004/12/1

Y1 - 2004/12/1

N2 - The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5′ end annotations in 58% of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (∼25% success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8% of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.

AB - The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5′ end annotations in 58% of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (∼25% success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8% of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.

UR - http://www.scopus.com/inward/record.url?scp=19944392880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944392880&partnerID=8YFLogxK

U2 - 10.1101/gr.3158304

DO - 10.1101/gr.3158304

M3 - Article

C2 - 15574821

AN - SCOPUS:19944392880

VL - 14

SP - 2424

EP - 2429

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 12

ER -