80
Body Expression Map of Human Genome
Fig. 2
Identifcation oF alternatively spliced
transcripts and their representatives: there are
millions oF redundant EST alignments, because it is
possible For several distinct ESTs to map to the
same locus. In this case, alignments that share
commonexonsonthesamestrandshou
ldbe
placed in the same group. In the fgure, all oF the
EST sequences are aligned in the 5
0
to 3
0
direction
and displayed From leFt to right. Each thick line
represents the alignment oF one EST, in which the
narrow yellow boxes are exons, orange boxes are
protein coding regions, and the brown boxes are
introns. (See color plate p. xxi.)
to annotate about 10 500 EST loci with
BodyMap gene expressions.
2.2
Identifcation o± Alternative Splice Forms
Integration of millions of EST align-
ments enables us to collect alternatively
spliced transcripts and their represen-
tatives. Figure 2 illustrates a group of
alignments in which the second line from
the bottom represents the EST align-
ment of the RefSeq sequence (TAC1,
NM
003182) that contains seven exons,
while some alternatively spliced tran-
scripts do not use the fourth and sixth
exons. Since the protein-coding region
(CDS) in the second-from-bottom align-
ment starts with the second exon and
ends with the last (seventh) exon, the
bottom alignment, which skips the sixth
exon, actually encodes a different protein
than the second-from-bottom alignment.
Similarly, the third- and fourth-from-
bottom alignments do encode distinct
proteins.
2.3
Data Cleansing and SNP Classifcation
Single nucleotide polymorphisms, DNA
sequence variations among individuals,
have been collected and are stored in
databases such as dbSNP (http://www.
ncbi.nlm.nih.gov/SNP) and Japanese SNP
of each SNP involves two sequences
that are located before and after the
SNP nucleotide, for the identi±cation of
the SNP nucleotide, thereby demanding
the alignments of the two sequences
with the human genome specify the
unique location of the SNP nucleotide.
Thus, selection of those sequences that
map with at least 99% identity to a
unique location in the draft genome is
effective in eliminating the incorrect SNPs.
Furthermore, integration of millions of
EST alignments is helpful in classifying
the aligned SNPs as regulatory, coding, or
noncoding, according to their locations.
3
Computational Methods
Integration of millions of EST alignments
on the human genome needs browsers that
facilitate ef±cient searching and brows-
ing of an enormous quantity of EST
alignments. To achieve this goal, it is
essential to meet speci±c computational
requirements, such as the acceleration
of sensitive-but-slow (dynamic program-
ming) alignment algorithms, and the res-
olution of EST orientations.
previous page 754 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online next page 756 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online Home Toggle text on/off