298
Anthology of Human Repetitive DNA
DNA transposons are most abundant
among
recruited
genes,
despite
their
relatively low abundance in the genome
compared to other TEs (Table 11). Rtl1 is
the only clear example of RT recruited
in the host gene. Genes derived from
DNA transposase are probably involved
in speciFc DNA binding and cleavage
(e.g. the V(D)J recombinase). Surprisingly,
eight genes are derived from Gypsy-like
LTR retrotransposons, whose remnants
are otherwise not detectable in the human
genome. Two genes are derived from
env
genes encoded by HERV-W and HERV-
R elements. Proteins from
env
-derived
genes may prevent retroviral infections
by saturating the cellular receptors for
exogenous retroviruses.
The extent of the contribution of TEs
to the host protein-coding capacity is still
not well known. The number of well-
supported examples of TE-derived host
proteins is relatively small (Table 11). On
the other hand, analyses of human mRNA
sequences
disclosed
several
hundreds
of mRNAs with one or several exons
derived from TEs. However, some of these
TE-containing sequences can represent
common contaminations in EST libraries.
In
addition, many of
the
transcripts
contain short reading frames, indicating
that they do not code for functional
proteins. Indeed, it should be noted that
the majority of cellular polymerase II
transcripts do not code for any proteins
and never reach the cytoplasm. Cellular
mRNA control mechanisms such as the
nonsense-mediated RNA decay protect
cells from potentially pathogenic truncated
or aberrant proteins. Indeed, products
of TE-containing transcripts are seldom
detected at the protein level.
In conclusion, while there are a few
well-documented examples of novel func-
tional genes derived from TEs, the extent
of this phenomenon is unknown. Many
proteins deposited in public databases are
hypothetical OR±s predicted by statistical
methods based on computational analysis
of genomic DNA and/or cDNA sequences.
These methods are less than 90% accu-
rate and, as a result, they may include
TEs in erroneously predicted proteins.
Analogously, databases of sites involved
in transcription regulation may include
regions whose ‘‘regulatory’’ effects are
shown
in vitro
only.
6
Databases and Programs for Analysis of
Repetitive DNA
About half of the human genome is
composed of various repetitive elements.
Detection of repetitive elements is a ba-
sic step in many biologically important
analyses, including, but not limited to,
sequence assembly during genome se-
quencing, genome annotation, similarity
searches, and gene and coding sequence
prediction. Therefore, specialized tools
and databases have been developed for
the detection and masking of repetitive
elements.
6.1
Databases of Repetitive Elements
The
Frst
database
of
human
repeti-
tive elements, Repbase, was published
in 1992. Currently it contains over 620
families and subfamilies of human re-
peats. Repbase became a regularly updated
general database of eukaryotic repetitive
elements, renamed as Repbase Update
(RU). RU is accompanied by the monthly
online journal Repbase Reports, which
publishes newly discovered eukaryotic
repeats.
previous page 298 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online next page 300 Encyclopedia of Molecular Cell Biology and Molecular Medicine read online Home Toggle text on/off