NCBI logo

UniGene

 PubMed  Entrez  BLAST  OMIM  Taxonomy  Structure
  Search

NCBI
-
logo

UniGene

 

Home Page

 

Frequently Asked Questions

 

Query Tips

 

Library Differential Display

 

Download UniGene




UniGene Organisms

 

Homo sapiens

 

Mus musculus

 

Rattus norvegicus

 

Danio rerio

 

Bos taurus

 

Xenopus laevis

 

Arabidopsis thaliana

 

Oryza sativa

 

Triticum aestivum

 

Hordeum vulgare

 

Zea mays



Related Resources

 

Human Genome Guide

 

LocusLink

 

HomoloGene

 

dbEST-Database of Expressed Sequence Tags

 

Cancer Genome Anatomy Project

 

I.M.A.G.E. Quality Control

 


UniGene Resources

 The UniGene System

UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.

In addition to sequences of well-characterized genes, hundreds of thousands novel expressed sequence tag (EST) sequences have been included. Consequently, the collection may be of use to the community as a resource for gene discovery. UniGene has also been used by experimentalists to select reagents for gene mapping projects and large-scale expression analysis.

However, it should be noted that the procedures for automated sequence clustering are still under development and the results may change from time to time as improvements are made. Feedback from users has been especially useful in identifying problems and we encourage you to report any problems you encounter.

It should also be noted that no attempt has been made to produce contigs or consensus sequences. There are several reasons why the sequences of a set may not actually form a single contig. For example, all of the splicing variants for a gene are put into the same set. Moreover, EST-containing sets often contain 5' and 3' reads from the same cDNA clone, but these sequences do not always overlap.

Currently, sequences from the animals human, rat, mouse, cow, zebrafish and clawed frog have been processed. Plant organisms are wheat, rice, barley, maize and cress. These species were chosen because they have the greatest amounts of EST data available and represent a variety of species. Additional organisms may be added in the future.

A representation of the UniGene datasets is available by ftp.

A description of the UniGene build procedure is available.



 UniGene References


An article about the UniGene Collection in the August 1997 NCBI News contains an overview of the project. Although the number of UniGene clusters has changed since that article was written due to improvements in the clustering algorithm, the article provides background information as well as a description of how the collection was used in the Transcript Map project (see Schuler et al., 1996, below).

Additional references include:

Schuler (1997). Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75(10),694-698. [PubMed]

Schuler et al. (1996). A gene map of the human genome. Science 274, 540-546. [PubMed] [SCIENCE On-line]

Boguski & Schuler (1995). ESTablishing a human transcript map. Nature Genetics 10, 369-371. [PubMed] [Full Text]

NLM | NIH | UniGene | Privacy Statement | Disclaimer | NCBI Help