UniGene is an experimental system for automatically partitioning
GenBank sequences into a non-redundant set of gene-oriented
clusters. Each UniGene cluster contains sequences that represent a
unique gene, as well as related information such as the tissue types
in which the gene has been expressed and map location.
In addition to sequences of well-characterized genes, hundreds of
thousands novel expressed sequence tag (EST) sequences have been
included. Consequently, the collection may be of use to the
community as a resource for gene discovery. UniGene has also been
used by experimentalists to select reagents for gene mapping
projects and large-scale expression analysis.
However, it should be noted that the procedures for automated
sequence clustering are still under development and the results may
change from time to time as improvements are made. Feedback from
users has been especially useful in identifying problems and we
encourage you to report any problems you encounter.
It should also be noted that no attempt has been made to produce
contigs or consensus sequences. There are several reasons why the
sequences of a set may not actually form a single contig. For
example, all of the splicing variants for a gene are put into the
same set. Moreover, EST-containing sets often contain 5' and 3'
reads from the same cDNA clone, but these sequences do not always
Currently, sequences from the animals human, rat, mouse, cow,
zebrafish and clawed frog have been processed. Plant organisms are
wheat, rice, barley, maize and cress. These species were chosen
because they have the greatest amounts of EST data available and
represent a variety of species. Additional organisms may be added in
A representation of the UniGene datasets is available by ftp.
A description of the UniGene build
procedure is available.
article about the UniGene
Collection in the August
1997 NCBI News contains an overview of the project. Although
the number of UniGene clusters has changed since that article was
written due to improvements in the clustering algorithm, the article
provides background information as well as a description of how the
collection was used in the Transcript Map project (see Schuler et
al., 1996, below).
Additional references include:
Schuler (1997). Pieces of the puzzle: expressed sequence tags and
the catalog of human genes. J Mol Med 75(10),694-698. [PubMed]
Schuler et al. (1996). A gene map of the human genome.
Science 274, 540-546. [PubMed]
Boguski & Schuler (1995). ESTablishing
a human transcript map. Nature Genetics 10, 369-371. [PubMed]