Body Expression Map of Human Genome
81
3.1
Query into Database of EST Alignments
Since millions of EST alignments with
the huge human genome constitute a very
large database, various ways to access
alignments are indispensable to obtain
valuable information. For instance, in a
positional cloning project for hunting a
human disease gene, if some linkage
data suggest that the gene of interest
lies between two sequence-tagged site
markers, it is informative to enumerate
all the known genes and the single
nucleotide polymorphisms in the interval
between the two markers. It would then
follow the precise analysis of alternatively
spliced transcripts or individual SNPs
that fall within the coding region of
a gene. Furthermore, association of a
BodyMap sequence with the coding region
provides
the
expression
levels
of
30
distinct human tissues as supplementary
information. These tasks are comfortably
facilitated by devising query and search
functions in the database that accepts
the STS marker name, the GenBank
accession number, the RefSeq symbol
name, the BodyMap GS number, or the
raw sequence of nucleotides of interest.
It is also helpful to provide a graphical
interface that is capable of browsing
genes in one chromosome or in one
BAC contig, the exon/intron structure
of alternatively spliced transcripts, and
the exact positions of SNPs by zooming
in and out of the map smoothly and
seamlessly. Genome browsers such as
Ensembl, UCSC, NCBI, and GRL partly
support these functions.
3.2
EfFcient Computational Alignment of ESTs
with the Human Genome
Here, we briefly mention the key ideas
relating to ef±cient and complex computa-
tional methods to align four million ESTs
to a newly revised draft genome in just
one day. Figure 3 depicts an ef±cient way
of processing millions of ESTs in a rea-
sonable amount of time while retaining
sensitivity. The algorithm consists of three
key steps. The ±rst step is to build an index
lookup table (hash table) of the positions
of all the nucleotide sequences of length
N
(called
N
-mers) in the DNA sequence.
The index aids in locating a single
N
-mer,
say 8mers, instantly, and this represents
a crucial step in accelerating the overall
performance of mapping millions of ESTs
in the genome. The second step is to ap-
proximate the start and end positions of
each EST alignment by mapping, for in-
stance, 12mers at each end of an EST,
which is illustrated in the upper part of the
±gure. The algorithm scans the EST from
±ig. 3
Accelerated dynamic
programming of EST
alignments using an index
lookup.
Approximation of the starting and ending positions
Optimal alignment of the remaining part
Genome
Genome
EST
EST
previous page 755 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online next page 757 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online Home Toggle text on/off