Body Expression Map of Human Genome
83
the use of entire genomic sequences
appears to offer an effective solution to
these problems, because subsequences
that appear at extremely low frequencies
in the genome would be good primers.
Thus, the number of occurrences (fre-
quency) of a subsequence of length
N
(an
N
-mer) would serve as a criterion
to assess subsequence uniqueness. Sub-
sequently, our primary interest is to list
N
-mers that appear only once in the
genome, that is, unique
N
-mers. The num-
ber of unique
N
-mers in human genome
expands as
N
increases, but the num-
ber converges once
N
exceeds 18. There
are about 1.7 billion unique 18mers, in-
dicating that unique 18mers make up
approximately half of all the 3 billion
18mers, and efFciently cover the entire
human genome.
4.2
Selection of Oligomers, and Genome
Markers
As mentioned above, to evaluate the speci-
Fcity of primers about 20-nucleotides long,
the Frst step is to determine how fre-
quently they occur. However, for longer
oligomers or genome markers of about
50 nucleotides, determination of their
frequencies is not useful because the fre-
quencies are usually 1. A more suitable
procedure would be to consider the mis-
match tolerance of an oligomer, that is,
the minimum number of mismatches
that allow a given oligomer to match
a subsequence other than the target se-
quence anywhere in the genome. Although
calculating the exact value of mismatch
tolerance is computationally costly and im-
practical, it becomes feasible to check if an
oligomer meets the relaxed constraint that
its mismatch tolerance is no less than a
given threshold.
See
also
DNA Libraries; Genet-
ics, Molecular Basis of; Genomic
DNA Libraries, Construction and
Applications; Gel Electrophoresis,
2D-difference.
Bibliography
Books and Reviews
Ewens, W.J.,
Grant, G.R.
(2001)
Statistical
Methods in Bioinformatics
, Springer Verlag.
GusFeld, D.
(1997)
Algorithms
on
Strings,
Trees, and Sequences: Computer Science and
Computational Biology
, Cambridge University
Press.
Mount, D.W.
(2001)
Bioinformatics:
Sequence
and Genome Analysis
,C
o
l
dS
p
r
i
n
gH
a
r
b
o
r
Laboratory.
Pevzner, P.A. (2000)
Computational Molecular
Biology: An Algorithmic Approach
, The MIT
Press.
Waterman, M.S. (1995)
Introduction to Computa-
tional Biology: Maps, Sequences and Genomes
,
CRC Press.
Primary Literature
Human Genome Sequencing
Boguski, M.S., Schuler, G.D. (1995) Establishing
a human transcript map,
Nat. Genet.
10
,
369–371.
Collins, ±.S.,
Green, E.D.,
Guttmacher, A.E.,
Guyer, M.S. (2003) A vision for the future of
genomics research,
Nature
422
, 835–847.
Lander, E.S., et al. (2001) Initial sequencing and
analysis of the human genome,
Nature
409
,
860–921.
BodyMap
Okubo, K., Hori, N., Matoba, R., Niiyama, T.,
±ukushima, A.,
Kojima, Y.,
Matsubara, K.
(1992) Large scale cDNA sequencing for
analysis of quantitative and qualitative aspects
of gene expression,
Nat. Genet.
2
, 173–179.
Okubo, K., Matsubara, K. (1997) Body expression
map of the human genome,
Encycl. Mol. Cell
Biol. Mol. Med.
203–206.
previous page 757 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online next page 759 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online Home Toggle text on/off