Microbiology Reader
Equipment to run microbiology work automatically

Growth Curves of any strain.
Microbiological calculations.

Microbiology Home
Microbioloy Reader
Growth Curves
Photo Album
Microorganisms
Software
Download
Purchasing
Contact Us

 

What Is DNA?

Deoxyribonucleic acid (DNA) is a nucleic acid which carries genetic instructions for the biological development of all cellular forms of life and many viruses. DNA is sometimes referred to as the molecule of heredity as it is inherited and used to propagate traits. During reproduction, it is replicated and transmitted to offspring.

In bacteria and other simple cell organisms, DNA is distributed more or less throughout the cell. In the complex cells that make up plants, animals and in other multi-celled organisms, most of the DNA is found in the chromosomes, which are located in the cell nucleus. The energy-generating organelles known as chloroplasts and mitochondria also carry DNA, as do many viruses.

DNA in brief This section presents an overview of DNA.

Disclaimer: This section is an overview of DNA in brief suitable for the lay reader, without detailed information. It is intended to serve as an introduction to the topic. More details are found in the rest of the main text. Genes can be loosely viewed as the organism's "cookbook"; A strand of DNA contains genes, areas that regulate genes, and areas that either have no function, or a function we don't know; DNA is organized as two complementary strands, head-to-toe, with bonds between them that can be "unzipped" like a zipper, separating the strands; DNA is encoded with four interchangeable "building blocks", called "bases", which can be abbreviated A, T, C, and G; each base "pairs up" with only one other base: A+T, T+A, C+G and G+C; that is, an "A" on one strand of double-stranded DNA will "mate" properly only with a "T" on the other, complementary strand; The order does matter: A+T is not the same as T+A, just as C+G is not the same as G+C; However, since there are just four possible combinations, naming only one base on the conventionally chosen side of the strand is enough to describe the sequence; The order of the bases along the length of the DNA is what it's all about, the sequence itself is the description for genes; Replication is performed by splitting (unzipping) the double strand down the middle via relatively trivial chemical reactions, and recreating the "other half" of each new single strand by drowning each half in a "soup" made of the four bases. Since each of the "bases" can only combine with one other base, the base on the old strand dictates which base will be on the new strand. This way, each split half of the strand plus the bases it collects from the soup will ideally end up as a complete replica of the original, unless a mutation occurs; Mutations are simply chemical imperfections in this process: a base is accidentally skipped, inserted, or incorrectly copied, or the chain is trimmed, or added to; all other basic mutations can be described as combinations of these accidental "operations". Overview of molecular structure

Schematic representation of the DNA which illustrates its structureAlthough sometimes called "the molecule of heredity", pieces of DNA as people typically think of them are not single molecules. Rather, they are pairs of molecules, which entwine like vines to form a double helix (see the illustration at the right).

Each vine-like molecule is a strand of DNA: a chemically linked chain of nucleotides, each of which consists of a sugar, a phosphate and one of four kinds of Aromatic hydrocarbon "bases". Because DNA strands are composed of these nucleotide subunits, they are polymers.

The diversity of the bases means that there are four kinds of nucleotides, which are commonly referred to by the identity of their bases. These are adenine (A), thymine (T), cytosine (C), and guanine (G).

In a DNA double helix, two polynucleotide strands can associate through the hydrophobic effect. Specificity of which strands stay associated is determined by complementary pairing. Each base forms hydrogen bonds readily to only one other -- A to T and C to G -- so that the identity of the base on one strand dictates the strength of the association; the more complementary bases exist, the stronger and longer-lasting the association.

The cell's machinery is capable of melting or disassociating a DNA double helix, and using each DNA strand as a template for synthesizing a new strand which is nearly identical to the previous strand. Errors that occur in the synthesis are known as mutations. The process known as PCR mimics this process in vitro in a nonliving system.

Because pairing causes the nucleotide bases to face the helical axis, the sugar and phosphate groups of the nucleotides run along the outside, and the two chains they form are sometimes called the "backbones" of the helix. In fact, it is chemical bonds between the phosphates and the sugars that link one nucleotide to the next in the DNA strand.

The role of the sequence Within a gene, the sequence of nucleotides along a DNA strand defines a protein, which an organism is liable to manufacture or "express" at one or several points in its life using the information of the sequence. The relationship between the nucleotide sequence and the amino-acid sequence of the protein is determined by simple cellular rules of translation, known collectively as the genetic code. The genetic code is made up of three letter 'words' (termed a codon) formed from a sequence of three nucleotides (eg. ACT, CAG, TTT). These codons can then be translated with messenger RNA and then transfer RNA, with a codon corresponding to a particular amino acid. Since there are 64 possible codons, most amino acids have more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region.

In many species of organism, only a small fraction of the total sequence of the genome appears to encode protein. The function of the rest is a matter of speculation. It is known that certain nucleotide sequences specify affinity for DNA binding proteins, which play a wide variety of vital roles, in particular through control of replication and transcription. These sequences are frequently called regulatory sequences, and researchers assume that so far they have identified only a tiny fraction of the total that exist. "Junk DNA" represents sequences that do not yet appear to contain genes or to have a function.

Sequence also determines a DNA segment's susceptibility to cleavage by restriction enzymes, the quintessential tools of genetic engineering. The position of cleavage sites throughout an individual's genome determines one kind of an individual's "DNA fingerprint".

The hydrogen bonds between the strands of the double helix are weak enough that they can be easily separated by enzymes. Enzymes known as helicases unwind the strands to facilitate the advance of sequence-reading enzymes such as DNA polymerase. The unwinding requires that helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other. The strands can also be separated by gentle heating, as used in PCR, provided they have fewer than about 10,000 base pairs (10 kilobase pairs, or 10 kbp). The intertwining of the DNA strands makes long segments difficult to separate.

When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in plasmid DNA, the strands are topologically knotted. This means they cannot be separated by gentle heating or by any process that does not involve breaking a strand. The task of unknotting topologically linked strands of DNA falls to enzymes known as topoisomerases. Some of these enzymes unknot circular DNA by cleaving two strands so that another double-stranded segment can pass through. Unknotting is required for the replication of circular DNA as well as for various types of recombination in linear DNA.

The DNA helix can assume one of three slightly different geometries, of which the "B" form described by James D. Watson and Francis Crick is believed to predominate in cells. It is 2 nanometres wide and extends 3.4 nanometres per 10 bp of sequence. This is also the approximate length of sequence in which the helix makes one complete turn about its axis. This frequency of twist (known as the helical pitch) depends largely on stacking forces that each base exerts on its neighbors in the chain.

The narrow breadth of the double helix makes it impossible to detect by conventional electron microscopy, except by heavy staining. At the same time, the DNA found in many cells can be macroscopic in length -- approximately 5 centimetres long for strands in a human chromosome. Consequently, cells must compact or "package" DNA to carry it within them. This is one of the functions of the chromosomes, which contain spool-like proteins known as histones, around which DNA winds.

The B form of the DNA helix twists 360° per 10.6 bp in the absence of strain. But many molecular biological processes can induce strain. A DNA segment with excess or insufficient helical twisting is referred to, respectively, as positively or negatively "supercoiled". DNA in vivo is typically negatively supercoiled, which facilitates the unwinding of the double-helix required for RNA transcription.

The two other known double-helical forms of DNA, called A and Z, differ modestly in their geometry and dimensions. The A form appears likely to occur only in dehydrated samples of DNA, such those used in crystallography experiments, and possibly in hybrid pairings of DNA and RNA strands. Segments of DNA that cells have methylated for regulatory purposes may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of the B form.

DNA sequence reading The asymmetric shape and linkage of nucleotides means that a DNA strand always has a discernible orientation or directionality. Because of this directionality, close inspection of a double helix reveals that nucleotides are heading one way along one strand (the "ascending strand"), and the other way along the other strand (the "descending strand"). This arrangement of the strands is called antiparallel.

For reasons of chemical nomenclature, people who work with DNA refer to the asymmetric termini of each strand as the 5' and 3' ends (pronounced "five prime" and "three prime"). DNA workers and enzymes alike always read nucleotide sequences in the "5' to 3' direction". In a vertically oriented double helix, the 3' strand is said to be ascending while the 5' strand is said to be descending.

As a result of their antiparallel arrangement and the sequence-reading preferences of enzymes, even if both strands carried identical instead of complementary sequences, cells could properly translate only one of them. The other strand a cell can only read backwards. Molecular biologists call a sequence "sense" if it is translated or translatable, and they call its complement "antisense". It follows then, somewhat paradoxically, that the template for transcription is the antisense strand. The resulting transcript is an RNA replica of the sense strand and is itself sense.

Some viruses blur the distinction between sense and antisense, because certain sequences of their genomes do double duty, encoding one protein when read 5' to 3' along one strand, and a second protein when read in the opposite direction along the other strand. As a result, the genomes of these viruses are unusually compact for the number of genes they contain, which biologists view as an adaptation.

Topologists like to note that the juxtaposition of the 3' end of one DNA strand beside the 5' end of the other at both termini of a double-helical segment makes the arrangement a "crab canon".

Single-stranded DNA (ssDNA) and repair of mutations In some viruses DNA appears in a non-helical, single-stranded form. Because many of the DNA repair mechanisms of cells work only on paired bases, viruses that carry single-stranded DNA genomes mutate more frequently than they would otherwise. As a result, such species may adapt more rapidly to avoid extinction. The result would not be so favorable in more complicated and more slowly replicating organisms, however, which may explain why only viruses carry single-stranded DNA. These viruses presumably also benefit from the lower cost of replicating one strand versus two.

The discovery of DNA and the double helix Working in the 19th century, biochemists initially isolated DNA and RNA (mixed together) from cell nuclei. They were relatively quick to appreciate the polymeric nature of their "nucleic acid" isolates, but realized only later that nucleotides were of two types--one containing ribose and the other deoxyribose. It was this subsequent discovery that led to the identification and naming of DNA as a substance distinct from RNA.

Friedrich Miescher (1844-1895) discovered a substance he called "nuclein" in 1869. Somewhat later, he isolated a pure sample of the material now known as DNA from the sperm of salmon, and in 1889 his pupil, Richard Altmann, named it "nucleic acid". This substance was found to exist only in the chromosomes.

Max Delbrück, Nikolai V. Timofeeff-Ressovsky, and Karl G. Zimmer published results in 1935 suggesting that chromosomes are very large molecules the structure of which can be changed by treatment with X-rays, and that by so changing their structure it was possible to change the heritable characteristics governed by those chromosomes. (Delbrück and Salvador Luria were awarded the Nobel Prize in 1969 for their work on the genetic structure of viruses.) In 1943, Oswald Theodore Avery discovered that traits proper to the "smooth" form of the Pneumococcus could be transferred to the "rough" form of the same bacteria merely by making the killed "smooth" (S) form available to the live "rough" (R) form. Quite unexpectedly, the living R Pneumococcus bacteria were transformed into a new strain of the S form, and the transferred S characteristics turned out to be heritable.

James Watson in the Cavendish Laboratory at the University of CambridgeIn 1944, the renowned physicist, Erwin Schrödinger, published a brief book entitled What is Life?, where he maintained that chromosomes contained what he called the "hereditary code-script" of life. He added: "But the term code-script is, of course, too narrow. The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power -- or, to use another simile, they are architect's plan and builder's craft -- in one." He conceived of these dual functional elements as being woven into the molecular structure of chromosomes. By understanding the exact molecular structure of the chromosomes one could hope to understand both the "architect's plan" and also how that plan was carried out through the "builder's craft." Francis Crick, James D. Watson, Maurice Wilkins, Rosalind Franklin, Seymour Benzer, et al., took up the physicist's challenge to work out the structure of the chromosomes and the question of how the segments of the chromosomes that were conceived to relate to specific traits could possibly do their jobs.

Just how the presence of specific features in the molecular structure of chromosomes could produce traits and behaviors in living organisms was unimaginable at the time. Because chemical dissection of DNA samples always yielded the same four nucleotides, the chemical composition of DNA appeared simple, perhaps even uniform. Organisms, on the other hand, are fantastically complex individually and widely diverse collectively. Geneticists did not speak of genes as conveyors of "information" in such words, but if they had, they would not have hesitated to quantify the amount of information that genes need to convey as vast. The idea that information might reside in a chemical in the same way that it exists in text--as a finite alphabet of letters arranged in a sequence of unlimited length--had not yet been conceived. It would emerge upon the discovery of DNA's structure, but few researchers imagined that DNA's structure had much to say about genetics.

Antonie Van Leeuwenhoek, 2002 Aug, 81(1-4), 15 - 25
Gene expression analysis of the response by Escherichia coli to seawater; Rozen Y et al.; Gene expression of Escherichia coli cells exposed to seawater for 20 h was compared to that of exponentially growing cells (mops-glucose 0.2%) using DNA microarray technology . The expression of most (ca . 3,000) of the 4,228 open reading frames on the microarray remained unchanged; the relative expression of about 320 genes decreased in seawater, whereas that of ca . one fourth (937) increased . Clearly coherent expression patterns were observed for several functional gene groups . Induced genes were numerous in groups specifying the degradation of small molecules (carbon compounds, amino acids and fatty acids), energy metabolism (aerobic and anaerobic respiration, pyruvate dehydrogenase and TCA cycle), chemotaxis and mobility, flagella biosynthesis, surface structures and phage related functions . Repressed genes were clustered in two groups, cell division and nucleotides biosynthesis, indicating a cessation of growth.

Water Res, 2002 Nov, 36(19), 4753 - 6
Enhancing slow sand filter performance with an acid-soluble seston extract; Weber-Shirk ML; An acid-soluble extract was obtained from Cayuga Lake (Ithaca, NY) seston and applied to slow sand filters at different application rates . Biological activity in the filters was inhibited with 3mM sodium azide . The filters were challenged with a synthetic raw water containing Escherichia coli . The Cayuga Lake seston extract (CLSE) fed filters removed up to 99.9999% of the influent coliforms while the control filter (no CLSE) removed 50% . Filter performance was correlated with the amount of CLSE applied to the filters.

J Mol Recognit, 2002 Sep-Oct, 15(5), 321 - 30
Expression and functional analysis of recombinant scFv and diabody fragments with specificity for human RhD; Asvadi P et al.; In an attempt to generate recombinant anti-D reagents for possible diagnostic and therapeutic use we cloned the genes encoding the variable (V) domains of a human anti-D antibody secreted by the lymphoblastoid cell line BTSN4 . A single-chain Fv (scFv) fragment was constructed using a 21 amino acid linker to join the genes encoding the variable domains of the BTSN4 heavy (VH) and light chains (VL) . A diabody construct was also generated by reducing the length of the scFv linker from 21 to 10 residues . The scFv and diabody constructs were cloned into the pFLAG-CTS vector, expressed in E . coli host cells and the recombinant proteins were affinity-isolated from bacterial culture medium . Analysis of the recombinant proteins indicated that they retained the D antigen binding specificity of the parental BTSN4 IgG . Furthermore, both fragments mediated agglutination of papain-treated D positive erythrocytes in the absence of a cross-linking second antibody . While the agglutinating property of BTSN4 diabody was readily explained by the non-covalent association of this protein as a bivalent dimer, oligomeric forms of BTSN4 scFv were not detected when the protein was analysed by size exclusion chromatography . Thus, the agglutinating property of the scFv is not the result of the formation of non-covalently associated multimeric forms of the antibody fragment .

J Infect Dis, 2002 Dec 15, 186(12), 1852 - 6 Epub 2002 Nov 22.
Phylogenetic background and virulence profiles of fluoroquinolone-resistant clinical Escherichia coli isolates from the Netherlands; Johnson JR et al.; Thirteen fluoroquinolone-resistant Escherichia coli (FQREC) isolates from hospitalized patients in the Netherlands were found to represent predominantly (low-virulence) phylogenetic groups A and B1 and to lack extraintestinal virulence traits . These FQREC resemble animal-source E . coli and presumably pose little threat to noncompromised hosts . Similar analysis of other FQREC is needed.

J Infect Dis, 2002 Dec 15, 186(12), 1740 - 7 Epub 2002 Nov 18.
Enterotoxigenic Escherichia coli infections and diarrhea in a cohort of young children in Guinea-Bissau; Steinsland H et al.; In an effort to describe the natural history of enterotoxigenic Escherichia coli (ETEC) infection and diarrhea, 200 children in Guinea-Bissau, West Africa, were followed up from birth until up to age 2 years with weekly stool specimen collection, regardless of whether the children had diarrhea . ETEC isolates were tested for the presence of the porcine and human heat-stable toxins (STp and STh), the heat-labile toxin (LT), and 18 of 21 known colonization factors (CFs) . The rate of primary infections increased substantially after age 3 or 6 months (depending on the type of ETEC causing the infection) . The pathogenicity of STh-containing ETEC was substantially higher than that of STp-containing ETEC, and STp and STh were associated with separate sets of CFs . Small epidemics were observed, mainly caused by STh-containing ETEC . The difference in epidemic propensity, CF association, and pathogenicity suggests that STh- and STp-containing ETEC represent 2 different groups of human ETEC . Vaccines should primarily target STh-containing ETEC.

J Infect Dis, 2002 Dec 1, 186(11), 1682 - 6 Epub 2002 Nov 04.
A naturally occurring rabbit model of enterohemorrhagic Escherichia coli-induced disease; Garcia A et al.; Enterohemorrhagic Escherichia coli (EHEC) causes hemorrhagic colitis and hemolytic-uremic syndrome (HUS) in humans . The exact mechanism by which EHEC induces disease remains unclear because of the lack of a natural animal model for the disease . An outbreak of bloody diarrhea and sudden death was investigated in a group of Dutch belted rabbits . Two of these rabbits harbored enteropathogenic E . coli O145:H(-), and 1 rabbit was coinfected with EHEC O153:H(-) . A partial Shiga toxin 1 gene (stx1) fragment from E . coli O153:H(-) was confirmed by Southern blot and sequence analysis . Toxin production was demonstrated by a HeLa cell cytotoxicity assay . Histopathologic findings in all affected rabbits included erosive and necrotizing enterocolitis with adherent bacterial rods, proliferative glomerulonephritis, tubular necrosis, and fibrin thrombi within small vessels and capillaries . Our findings provide evidence for a naturally occurring animal model of EHEC-induced systemic disease that closely resembles human HUS.

Eur J Pediatr, 2002 Dec, 161(12), 668 - 71 Epub 2002 Nov 06.
Venous sampling can be crucial in identifying the testicular origin of idiopathic male luteinising hormone-independent sexual precocity; Richter-Unruh A et al.; It has been recently shown that male LH-independent sexual precocity is caused by a somatic activating mutation in the luteinising hormone receptor (LHR) of Leydig cell tumours . In each of the patients described to date, the tumour was a well-defined, single encapsulated nodule . We present a 5.7-year-old boy with nodular Leydig cell hyperplasia, who harbours a somatic mutation of the LHRgene . The boy showed the clinical features of severe sexual precocity caused by LH-independent testosterone hypersecretion . Congenital adrenal hyperplasia, hCG- or androgen-secreting tumours, McCune-Albright syndrome, and familial male-limited precocious puberty (or testotoxicosis) were all ruled out as possible causes . A hypoechoic area was detected at the cranial pole of his right testis and a biopsy was performed . Histological examination revealed a lack of mature Leydig cells . When DNA from the affected tissue was isolated and sequenced, no somatic mutation of the LHR gene was found . To further determine the origin of the elevated testosterone levels, venous sampling was performed . Blood samples taken from the right spermatic vein showed an elevated serum testosterone concentration of 259 nmol/l . Unilateral orchiectomy of the right testis was performed, and systemic testosterone concentrations normalised . Histological examination revealed nodular Leydig cell hyperplasia . DNA analysis of the nodular tissue showed a heterozygous mutation in exon 11 of the LHR gene, which caused the replacement of aspartic acid at codon 578 by histidine . CONCLUSION: the somatic activating mutation (Asp578His) of the luteinising hormone receptor gene is not only present in Leydig cell adenomas, but can also be found in nodular Leydig cell hyperplasia . Venous sampling can play a vital role in determining the origin of elevated testosterone levels.

Science, 2002 Nov 22, 298(5598), 1582 - 7
Crystal structure of Escherichia coli MscS, a voltage-modulated and mechanosensitive channel; Bass RB et al.; The mechanosensitive channel of small conductance (MscS) responds both to stretching of the cell membrane and to membrane depolarization . The crystal structure at 3.9 angstroms resolution demonstrates that Escherichia coli MscS folds as a membrane-spanning heptamer with a large cytoplasmic region . Each subunit contains three transmembrane helices (TM1, -2, and -3), with the TM3 helices lining the pore, while TM1 and TM2, with membrane-embedded arginines, are likely candidates for the tension and voltage sensors . The transmembrane pore, apparently captured in an open state, connects to a large chamber, formed within the cytoplasmic region, that connects to the cytoplasm through openings that may function as molecular filters . Although MscS is likely to be structurally distinct from other ion channels, similarities in gating mechanisms suggest common structural elements.

J Biol Chem, 2003 Jan 31, 278(5), 3105 - 11 Epub 2002 Nov 21.
GALLEX, a measurement of heterologous association of transmembrane helices in a biological membrane; Schneider D et al.; Whereas a variety of two-hybrid systems are available to measure the interaction of soluble proteins, related methods are significantly less developed for the measurement of membrane protein interactions . Here we present a two-hybrid system to follow the heterodimerization of membrane proteins in the Escherichia coli inner membrane . The method is based on the repression of a reporter gene activity by two LexA DNA binding domains with different DNA binding specificities . When coupled to transmembrane domains, heterodimeric association is reported by repression of beta-galactosidase synthesis . The LexA-transmembrane chimeric proteins were found to correctly insert into the membrane, and reproducible signals were obtained measuring the homodimerization as well as heterodimerization of wild-type and mutant glycophorin A transmembrane helices . The GALLEX data were compared with data recently gained by other methods and discussed in the general context of heteroassociation of single TM helices . Additionally, the formation of heterodimers between the TM domains of the alpha(4) and the beta(7) integrin subunits were tested . The results show that both homo- and heterodimerization of membrane proteins can be measured accurately using the GALLEX system.

J Biol Chem, 2003 Jan 31, 278(5), 3483 - 8 Epub 2002 Nov 21.
Borg/septin interactions and the assembly of mammalian septin heterodimers, trimers, and filaments; Sheffield PJ et al.; Septins constitute a family of guanine nucleotide-binding proteins that were first discovered in the yeast Saccharomyces cerevisiae but are also present in many other eukaryotes . In yeast they congregate at the bud neck and are required for cell division . Their function in metazoan cells is uncertain, but they have been implicated in exocytosis and cytokinesis . Septins have been purified from cells as hetero-oligomeric filaments, but their mechanism of assembly is unknown . Further studies have been limited by the difficulty in expressing functional septin proteins in bacteria . We now show that stable, soluble septin heterodimers can be produced by co-expression from bicistronic vectors in bacteria and that the co-expression of three septins results in their assembly into filaments . Pre-assembled dimers and trimers bind guanine nucleotide and show a slow GTPase activity . The assembly of a heterodimer from monomers in vitro is accompanied by GTP hydrolysis . Borg3, a downstream effector of the Cdc42 GTPase, binds specifically to a septin heterodimer composed of Sept6 and Sept7 and to the Sept2/6/7 trimer, but not to septin monomers or to other heterodimers . Septins associate through their C-terminal coiled-coil domains, and Borg3 appears to recognize the interface between these domains in Sept6 and Sept7.

J Biol Chem, 2003 Jan 17, 278(3), 1407 - 10 Epub 2002 Nov 22.
The Escherichia coli copper-responsive copA promoter is activated by gold; Stoyanov JV et al.; The copA gene of Escherichia coli encodes a copper transporter and its promoter is normally regulated by Cu(I) ions and CueR, a MerR-like transcriptional activator . We show that CueR can also be activated by gold salts and that Cys(112) and Cys(120) are involved in recognition of gold, silver, and copper salts . Gold activation is unaffected by copper chelating agents but is affected by general metal chelators . This is the first example of specific regulation of transcription by gold, and we briefly speculate that the biological effects of gold antiarthritic drugs may be through their effects on copper management in eukaryotic systems.

J Biol Chem, 2003 Jan 31, 278(5), 3055 - 62 Epub 2002 Nov 20.
Involvement of tyrosines 114 and 139 of subunit NuoB in the proton pathway around cluster N2 in Escherichia coli NADH:ubiquinone oxidoreductase; Flemming D et al.; The proton-pumping NADH:ubiquinone oxidoreductase (complex I) couples the transfer of electrons from NADH to ubiquinone with the translocation of protons across the membrane . Electron transfer is accomplished by FMN and a series of iron-sulfur clusters . Its coupling with proton translocation is not yet understood . Here, we report that the redox reaction of the FeS cluster N2 located on subunit NuoB of the Escherichia coli complex I induces a protonation/deprotonation of tyrosine side chains . Electrochemically induced FT-IR difference spectra revealed characteristic tyrosine signals at 1,515 and 1,498 cm(-1) for the protonated and deprotonated form, respectively . Mutants of three conserved tyrosines on NuoB were generated by complementing a chromosomal in-frame deletion strain with nuoB on a plasmid . Though the single mutations did not alter the electron transport activity of complex I, the EPR signal of cluster N2 was slightly shifted . The tyrosine signals detected by FT-IR spectroscopy were roughly halved in the mutants Y114C and Y139C while only minor changes were detected in the Y154H mutant . The enzymatic activity of the Y114C/Y139F double mutant was 80% reduced, and FT-IR difference spectra of the double mutant revealed a complete loss the modes characteristic for protonation reactions of tyrosines . Therefore, we propose that tyrosines 114 and 139 on NuoB were protonated upon reduction of cluster N2 and were thus involved in the proton-transfer reaction coupled with its redox reaction.

J Biol Chem, 2003 Jan 24, 278(4), 2636 - 44 Epub 2002 Nov 20.
Solution scattering suggests cross-linking function of telethonin in the complex with titin; Zou P et al.; Telethonin interacts specifically with the two Z-disk IG-like domains (Z1Z2) at the N terminus of titin, the largest presently known protein . Analytical ultracentrifugation and synchrotron radiation x-ray scattering were employed to study the solution structures of Z1Z2 and its complexes with telethonin, and low resolution models were constructed ab initio from the scattering data . A seven residues-long polyhistidine tag was localized at the tip of the Z1 domain by comparison of independent models of native and His-tagged versions of Z1Z2 . The stoichiometry and shape of the complex between the telethonin construct lacking the C terminus and Z1Z2 indicate antiparallel association of two Z1Z2 molecules with telethonin acting as a central linker . The complex of full-length telethonin with Z1Z2 appears to also have a 1:2 stoichiometry at concentrations below 1 mg/ml but dimerizes at higher concentrations . These results suggest a possible role of telethonin in linking titin filaments at the Z-disk periphery.

J Bacteriol, 2002 Dec, 184(24), 7047 - 54
The DnaK chaperone is necessary for alpha-complementation of beta-galactosidase in Escherichia coli; Lopes Ferreira N et al.; We show here the involvement of the molecular chaperone DnaK from Escherichia coli in the in vivo alpha-complementation of the beta-galactosidase . In the dnaK756(Ts) mutant, alpha-complementation occurs when the organisms are grown at 30 degrees C but not at 37 or 40 degrees C, although these temperatures are permissive for bacterial growth . Plasmid-driven expression of wild-type dnaK restores the alpha-complementation in the mutant but also stimulates it in a dnaK(+) strain . In a mutant which contains a disrupted dnaK gene (DeltadnaK52::Cm(r)), alpha-complementation is also impaired, even at 30 degrees C . This observation provides an easy and original phenotype to detect subtle functional changes in a protein such as the DnaK756 chaperone, within the physiologically relevant temperature.

J Bacteriol, 2002 Dec, 184(24), 6976 - 86
The Escherichia coli gabDTPC operon: specific gamma-aminobutyrate catabolism and nonspecific induction; Schneider BL et al.; Nitrogen limitation induces the nitrogen-regulated (Ntr) response, which includes proteins that assimilate ammonia and scavenge nitrogen . Nitrogen limitation also induces catabolic pathways that degrade four metabolically related compounds: putrescine, arginine, ornithine, and gamma-aminobutyrate (GABA) . We analyzed the structure, function, and regulation of the gab operon, whose products degrade GABA, a proposed intermediate in putrescine catabolism . We showed that the gabDTPC gene cluster constitutes an operon based partially on coregulation of GabT and GabD activities and the polarity of an insertion in gabT on gabC . A DeltagabDT mutant grew normally on all of the nitrogen sources tested except GABA . The unexpected growth with putrescine resulted from specific induction of gab-independent enzymes . Nac was required for gab transcription in vivo and in vitro . Ntr induction did not require GABA, but various nitrogen sources did not induce enzyme activity equally . A gabC (formerly ygaE) mutant grew faster with GABA and had elevated levels of gab operon products, which suggests that GabC is a repressor . GabC is proposed to reduce nitrogen source-specific modulation of expression . Unlike a wild-type strain, a gabC mutant utilized GABA as a carbon source and such growth required sigma(S) . Previous studies showing sigma(S)-dependent gab expression in stationary phase involved gabC mutants, which suggests that such expression does not occur in wild-type strains . The seemingly narrow catabolic function of the gab operon is contrasted with the nonspecific (nitrogen source-independent) induction . We propose that the gab operon and the Ntr response itself contribute to putrescine and polyamine homeostasis.

J Bacteriol, 2002 Dec, 184(24), 6866 - 72
Transcription-dependent increase in multiple classes of base substitution mutations in Escherichia coli; Klapacz J et al.; We showed previously that transcription in Escherichia coli promotes C . G-to-T . A transitions due to increased deamination of cytosines to uracils in the nontranscribed but not the transcribed strand (A . Beletskii and A . S . Bhagwat, Proc . Natl . Acad . Sci . USA 93:13919-13924, 1996) . To study mutations other than that of C to T, we developed a new genetic assay that selects only base substitution mutations and additionally excludes C . G to T . A transitions . This novel genetic reversion system is based on mutations in a termination codon and involves positive selection for resistance to bleomycin or kanamycin . Using this genetic system, we show here that transcription from a strong promoter increases the level of non-C-to-T as well as C-to-T mutations . We find that high-level transcription increases the level of non-C-to-T mutations in DNA repair-proficient cells in three different sequence contexts in two genes and that the rate of mutation is higher by a factor of 2 to 4 under these conditions . These increases are not caused by a growth advantage for the revertants and are restricted to genes that are induced for transcription . In particular, high levels of transcription do not create a general mutator phenotype in E . coli . Sequence analysis of the revertants revealed that the frequency of several different base substitutions increased upon transcription of the bleomycin resistance gene and that G . C-to-T . A transversions dominated the spectrum in cells transcribing the gene . These results suggest that high levels of transcription promote many different spontaneous base substitutions in E . coli.

J Bacteriol, 2002 Dec, 184(24), 6820 - 9
Requirement for IscS in biosynthesis of all thionucleosides in Escherichia coli; Lauhon CT; Escherichia coli tRNA contains four naturally occurring nucleosides modified with sulfur . Cysteine is the intracellular sulfur source for each of these modified bases . We previously found that the iscS gene, a member of the nifS cysteine desulfurase gene family, is required for 4-thiouridine biosynthesis in E . coli . Since IscS does not bind tRNA, its role is the mobilization and distribution of sulfur to enzymes that catalyze the sulfur insertion steps . In addition to iscS, E . coli contains two other nifS homologs, csdA and csdB, each of which has cysteine desulfurase activity and could potentially donate sulfur for thionucleoside biosynthesis . Double csdA csdB and iscS csdA mutants were prepared or obtained, and all mutants were analyzed for thionucleoside content . It was found that unfractionated tRNA isolated from the iscS mutant strain contained <5% of the level of sulfur found in the parent strain . High-pressure liquid chromatography analysis of tRNA nuclease digests from the mutant strain grown in the presence of {(35)S}cysteine showed that only a small fraction of 2-thiocytidine was present, while the other thionucleosides were absent when cells were isolated during log phase . As expected, digests from the iscS mutant strain contained 6-N-dimethylallyl adenosine (i(6)A) in place of 6-N-dimethylallyl-2-methylthioadenosine and 5-methylaminomethyl uridine (mnm(5)U) instead of 5-methylaminomethyl-2-thiouridine . Prolonged growth of the iscS and iscS csdA mutant strains revealed a gradual increase in levels of 2-thiocytidine and 6-N-dimethylallyl-2-methylthioadenosine with extended incubation (>24 h), while the thiouridines remained absent . This may be due to a residual level of Fe-S cluster biosynthesis in iscS deletion strains . An overall scheme for thionucleoside biosynthesis in E . coli is discussed.

J Bacteriol, 2002 Dec, 184(24), 6796 - 802
Regulation of expression of mas and fadD28, two genes involved in production of dimycocerosyl phthiocerol, a virulence factor of Mycobacterium tuberculosis; Sirakova TD et al.; Transcriptional regulation of genes involved in the biosynthesis of cell wall lipids of Mycobacterium tuberculosis is poorly understood . The gene encoding mycocerosic acid synthase (mas) and fadD28, an adjoining acyl coenzyme A synthase gene, involved in the production of a virulence factor, dimycocerosyl phthiocerol, were cloned from Mycobacterium bovis BCG, and their promoters were analyzed . The putative promoters were fused to the xylE reporter gene, and its expression was measured in Escherichia coli, Mycobacterium smegmatis, and M . bovis BCG . In E . coli, the fadD28 promoter was not functional but the mas promoter was functional . Both fadD28 and mas promoters were functional in M . smegmatis, at approximately two- and sixfold-higher levels, respectively, than the BCG hsp60 promoter . In M . bovis BCG, the fadD28 and mas promoters were functional at three- and fivefold-higher levels, respectively, than the hsp60 promoter . Primer extension analyses identified transcriptional start points 60 and 182 bp upstream of the translational start codons of fadD28 and mas, respectively . Both promoters contain sequences similar to the canonical -10 and -35 hexamers recognized by the sigma(70) subunit of RNA polymerase . Deletions of the upstream regions of both genes indicated that 324 bp of the fadD28 and 228 bp of the mas were essential for promoter activity . Further analysis of the mas promoter showed that a 213-bp region 581 bp upstream of the mas promoter acted as a putative transcriptional enhancer, promoting high-level expression of the mas gene when present in either direction . This represents the identification of a rare example of an enhancer-like element in mycobacteria.

Free Radic Biol Med, 2002 Dec 1, 33(11), 1563 - 74
Specificity and kinetics of a mitochondrial peroxiredoxin of Leishmania infantum; Castro H et al.; In Kinetoplastida, comprising the medically important parasites Trypanosoma brucei, T . cruzi, and Leishmania species, 2-Cys peroxiredoxins described to date have been shown to catalyze reduction of peroxides by the specific thiol trypanothione using tryparedoxin, a thioredoxin-related protein, as an immediate electron donor . Here we show that a mitochondrial peroxiredoxin from L . infantum (LimTXNPx) is also a tryparedoxin peroxidase . In an heterologous system constituted by nicotinamide adenine dinucleotide phosphate (NADPH), T . cruzi trypanothione reductase, trypanothione and Crithidia fasciculata tryparedoxin (CfTXN1 and CfTXN2), the recombinant enzyme purified from Escherichia coli as an N-terminally His-tagged protein preferentially reduces H(2)O(2) and tert-butyl hydroperoxide and less actively cumene hydroperoxide . Linoleic acid hydroperoxide and phosphatidyl choline hydroperoxide are poor substrates in the sense that they are reduced weakly and inhibit the enzyme in a concentration- and time-dependent way . Kinetic parameters deduced for LimTXNPx are a k(cat) of 37.0 s(-1) and K(m) values of 31.9 and 9.1 microM for CfTXN2 and tert-butyl hydroperoxide, respectively . Kinetic analysis indicates that LimTXNPx does not follow the classic ping-pong mechanism described for other TXNPx (Phi(1,2) = 0.8 s x microM(2)) . Although the molecular mechanism underlying this finding is unknown, we propose that cooperativity between the redox centers of subunits may explain the unusual kinetic behavior observed . This hypothesis is corroborated by high-resolution electron microscopy and gel chromatography that reveal the native enzyme to preferentially exist as a homodecameric ring structure composed of five dimers.

Biochem Pharmacol, 2002 Dec 15, 64(12), 1785 - 91
Role of poly(ADP-ribose) polymerase activation in endotoxin-induced cardiac collapse in rodents; Pacher P et al.; Reactive oxygen and nitrogen species are overproduced in the cardiovascular system during circulatory shock . Oxidant-induced cell injury involves the activation of poly(ADP-ribose) polymerase (PARP) . Using a dual approach of PARP-1 suppression, by genetic deletion or pharmacological inhibition with the new potent phenanthridinone PARP inhibitor PJ34 {the hydrochloride salt of N-(oxo-5,6-dihydro-phenanthridin-2-yl)-N,N-dimethylacetamide}, we studied whether the impaired cardiac function in endotoxic shock is dependent upon the PARP pathway . Escherichia coli endotoxin (lipopolysaccharide, LPS) at 55 mg/kg, i.p., induced a severe depression of the systolic and diastolic contractile function, tachycardia, and a reduction in mean arterial blood pressure in both rats and mice . Treatment with PJ34 significantly improved cardiac function and increased the survival of rodents . In addition, LPS-induced depression of left ventricular performance was significantly less pronounced in PARP-1 knockout mice (PARP(-/-)) as compared with their wild-type littermates (PARP(+/+)) . Thus, PARP activation in the cardiovascular system is an important contributory factor to the cardiac collapse and death associated with endotoxin shock .

Psychoneuroendocrinology, 2003 Jan, 28(1), 19 - 34
Chronic treatment with the antidepressant tianeptine attenuates lipopolysaccharide-induced Fos expression in the rat paraventricular nucleus and HPA axis activation; Castanon N et al.; The antidepressant tianeptine has been shown to decrease the response of the hypothalamic-pituitary-adrenal (HPA) axis to stress and to attenuate the behavioral effects of the cytokine inducer, lipopolysaccharide (LPS) . Since LPS also activates the HPA axis, the objective of this study was to assess the effects of tianeptine on the HPA axis activation and Fos expression induced by intraperitoneal (i.p.) administration of LPS (30 and 250 microg/kg respectively) . Chronic, but not acute, tianeptine treatment (10 mg/kg twice a day for 15 days, i.p.) attenuated LPS-induced increase of plasma ACTH and corticosterone in rats bearing an indwelling catheter in the jugular vein and Fos immunoreactivity in the paraventricular nucleus (PVN) . These results open new vistas on the pharmacological activity of tianeptine and provide further insights on the action mechanisms of antidepressants in clinics.

Biochem Biophys Res Commun, 2002 Dec 6, 299(3), 438 - 45
The functional analysis of directed amino-acid alterations in ZntR from Escherichia coli; Khan S et al.; The ZntR protein from Escherichia coli is a member of the MerR-family of transcriptional regulatory proteins and acts as a hyper-sensitive transcriptional switch primarily in response to Zn(II) and Cd(II) . The binding of metal-ions to ZntR initiates a mechanism that remodels the cognate promoter, increasing its affinity for RNA polymerase . We have introduced site-directed mutations into zntR and shown that cysteine and histidine residues are important for transcriptional control and have an effect on metal-ion preference, sensitivity and magnitude of induction . We propose a three-dimensional model of the N-terminal region of ZntR based upon the coordinates of the MerR-family regulator BmrR.

Biochem Biophys Res Commun, 2002 Dec 6, 299(3), 395 - 9
Phosphate depletion enhances bone morphogenetic protein-4 gene expression in a cultured mouse marrow stromal cell line ST2; Goseki-Sone M et al.; Alkaline phosphatases (ALPs) are glycosylated, membrane-bound enzymes that hydrolyze various monophosphate esters at an optimum high pH and are present in nearly all living organisms . In Escherichia coli, extracellular phosphate (Pi) limitation induces the ALP gene, indicating a role of extracellular Pi in ALP gene regulation . However, little is known about similar mechanisms in mammalian cells . Previously, we reported that Pi starvation increased the tissue-nonspecific ALP (TNSALP) activity and regulated its expression in the mouse stromal cell line ST2, derived from mouse bone marrow . In the present study, we further examined the effects of Pi starvation on the mechanism of TNSALP induction . The specific activity of TNSALP increased markedly after treatment by Pi starvation for 5 days and RT-PCR analysis revealed that the mRNA of the bone morphogenetic protein-4 (BMP-4) gene was highly stimulated . The combination of Pi depletion and mouse BMP-4 receptor IA/Fc chimera down-regulated the TNSALP activity . These results indicated that Pi depletion stimulates the TNSALP activity for the Pi supplementation, and that this system may involve the signaling pathway of the BMP-4 gene at the transcription level.

J Mol Biol, 2002 Nov 29, 324(3), 457 - 68
Crystal structure of Escherichia coli alkanesulfonate monooxygenase SsuD; Eichhorn E et al.; The FMNH(2)-dependent alkanesulfonate monooxygenase SsuD catalyzes the conversion of alkanesulfonates to the corresponding aldehyde and sulfite . The enzyme allows Escherichia coli to use a wide range of alkanesulfonates as sulfur sources for growth when sulfate or cysteine are not available . The structure of SsuD was solved using the multiwavelength anomalous dispersion method from only four ordered selenium sites per asymmetric unit (one site per 20,800 Da) . The final model includes 328 of 380 amino acid residues and was refined to an R-factor of 23.5% (R(free)=27.5%) at 2.3A resolution . The X-ray crystal structure of SsuD shows a homotetrameric state for the enzyme, each subunit being composed of a TIM-barrel fold enlarged by four insertion regions that contribute to intersubunit interactions . SsuD is structurally related to a bacterial luciferase and an archaeal coenzyme F(420)-dependent reductase in spite of a low level of sequence identity with these enzymes . The structural relationship is not limited to the beta-barrel region; it includes most but not all extension regions and shows distinct properties for the SsuD TIM-barrel . A likely substrate-binding site is postulated on the basis of the SsuD structure presented here, results from earlier biochemical studies, and structure relatedness to bacterial luciferase . SsuD is related to other FMNH(2)-dependent monooxygenases that show distant sequence relationship to luciferase . Thus, the structure reported here provides a model for enzymes belonging to this family and suggests that they might all fold as TIM-barrel proteins.

Chem Biol, 2002 Nov, 9(11), 1237 - 45
Characterization of a specificity factor for an AAA+ ATPase: assembly of SspB dimers with ssrA-tagged proteins and the ClpX hexamer; Wah DA et al.; SspB, a specificity factor for the ATP-dependent ClpXP protease, stimulates proteolysis of protein substrates bearing the ssrA degradation tag . The SspB protein is shown here to form a stable homodimer with two independent binding sites for ssrA-tagged proteins or peptides . SspB by itself binds to ClpX and stimulates the ATPase activity of this enzyme . In the presence of ATPgammaS, a ternary complex of SspB, GFP-ssrA, and the ClpX ATPase was sufficiently stable to isolate by gel-filtration or ion-exchange chromatography . This complex consists of one SspB dimer, two molecules of GFP-ssrA, and one ClpX hexamer . SspB dimers do not commit bound substrates to ClpXP degradation but increase the affinity and cooperativity of binding of ssrA-tagged substrates to ClpX, facilitating enhanced degradation at low substrate concentrations.

FEMS Microbiol Lett, 2002 Nov 19, 217(1), 89 - 94
PCR performance of the highly thermostable proof-reading B-type DNA polymerase from Pyrococcus abyssi; Dietrich J et al.; DNA polymerase from the archaeon Pyrococcus abyssi strain Orsay was expressed in Escherichia coli . The recombinant DNA polymerase (Pab) was purified to homogeneity by heat treatment followed by 5 steps of chromatography and characterized for PCR applications . Buffer optimization experiments indicated that Pab PCR performance and fidelity parameters were highest in the presence of 20 mM Tris-HCl, pH 9.0, 1.5 mM MgSO4, 25 mM KCl, 10 mM (NH4)2SO4 and 40 microM of each dNTP . Under these conditions, the error rate was 0.66.10(-6) mutations/nucleotide/duplication . Pab DNA polymerase, having a half life of 5 h at 100 degrees C, was demonstrated to be highly thermostable in PCR conditions compared to commercial Taq and Pfu DNA polymerases . These characteristics enable Pab to be one of the most efficient thermostable DNA polymerases described, exhibiting very high accuracy compared to other available commercial DNA polymerases and robust thermostable activity . This new DNA polymerase is currently on the market under the name Isis DNA Polymerase (Qbiogene Molecular Biology).

The chemical structure of DNAIn the 1950s, only a few groups made it their goal to determine the structure of DNA. These included an American group led by Linus Pauling, and two groups in Britain. At the University of Cambridge, Crick and Watson were building physical models using metal rods and balls, in which they incorporated the known chemical structures of the nucleotides, as well as the known position of the linkages joining one nucleotide to the next along the polymer. At King's College, London, Maurice Wilkins and Rosalind Franklin were examining x-ray diffraction patterns of DNA fibers.

A key inspiration in the work of all of these teams was the discovery in 1948 by Pauling that many proteins included helical (see alpha helix) shapes. Pauling had deduced this structure from x-ray patterns. Even in the initial crude diffraction data from DNA, it was evident that the structure involved helices. But this insight was only a beginning. There remained the questions of how many strands came together, whether this number was the same for every helix, whether the bases pointed toward the helical axis or away, and ultimately what were the explicit angles and coordinates of all the bonds and atoms. Such questions motivated the modeling efforts of Watson and Crick.

In their modeling, Watson and Crick restricted themselves to what they saw as chemically and biologically reasonable. Still, the breadth of possibilities was very wide. A breakthrough occurred in 1952, when Erwin Chargaff visited Cambridge and inspired Crick with a description of experiments Chargaff had published in 1947. Chargaff had observed that the proportions of the four nucleotides vary between one DNA sample and the next, but that for particular pairs of nucleotides -- adenine and thymine, guanine and cytosine -- the two nucleotides are always present in equal proportions.

Watson and Crick had begun to contemplate double helical arrangements, and they saw that by reversing the directionality of one strand with respect to the other, they could provide an explanation for Chargaff's puzzling finding. This explanation was the complementary pairing of the bases, which also had the effect of ensuring that the distance between the phosphate chains did not vary along a sequence. Watson and Crick were able to discern that this distance was constant and to measure its exact value of 2 nanometres from an X-ray pattern obtained by Franklin. The same pattern also gave them the 3.4 nanometre-per-10 bp "pitch" of the helix. The pair quickly converged upon a model, which they announced before Franklin herself published any of her work.

The great assistance Watson and Crick derived from Franklin's data has become a subject of controversy, and it has angered people who believe Franklin has not received the credit due to her. The most controversial aspect is that Franklin's critical X-ray pattern was shown to Watson and Crick without Franklin's knowledge or permission. Wilkins showed it to them at his lab while Franklin was away.

Watson and Crick's model attracted great interest immediately upon its presentation. Arriving at their conclusion on February 21, 1953, Watson and Crick made their first announcement on February 28. Their paper 'A Structure for Deoxyribose Nucleic Acid' (http://www.nature.com/genomics/human/watson-crick/) was published on April 25. In an influential presentation in 1957, Crick laid out the "Central Dogma", which foretold the relationship between DNA, RNA, and proteins, and articulated the "sequence hypothesis." A critical confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 in the form of the Meselson-Stahl experiment. Work by Crick and coworkers deciphered the genetic code not long afterward. These findings represent the birth of molecular biology.

Watson, Crick, and Wilkins were awarded the 1962 Nobel Prize for Medicine for discovering the molecular structure of DNA, by which time Franklin had died. Nobel prizes are not awarded posthumously.

DNA-binding proteins are a broad class of protein molecules that possess certain structural motifs (i.e. helices) which enable them to stably bind both double- or single-stranded DNA. Examples of such proteins would be those whose primary function is to regulate the expression of specific genes (termed transcription factors), those proteins involved in the packaging of DNA within the nucleus (histones), nucleic acid dependent-polymerases involved in DNA replication and the transcription of mRNA, or any of many accessory proteins which are involved in these processes.

I Introduction Print Preview of Section

DNA (deoxyribonucleic acid), molecule that acts as the mechanism of biological inheritance in almost all living creatures. DNA is found in nearly all cells and contains the coded instructions that control the workings of the cell. DNA is passed from parents to offspring, and contains the coded instructions that enable the offspring to develop from a single cell into an adult body. DNA is the most important molecule in life, and an understanding of the structure and function of DNA has been the most important development in biology during the past half-century or more.

II History Print Preview of Section

Biologists had known since the late 19th century that the structures called chromosomes carry the hereditary material from parents to offspring. Chromosomes exist in a particular part of the cell—the cell nucleus—and are large enough to be visible at certain times with a light microscope. When visible, chromosomes are long, flexible entities resembling a tiny piece of string. Chromosomes contain two molecular constituents: proteins and DNA. The molecular basis of heredity was likely to reside in one or other of these molecules. For many years biologists suspected that the chromosomal proteins were more likely to be the hereditary molecules because proteins were known to be highly variable in their molecular sub-structure, whereas DNA seemed to have a rather uniform structure. It was difficult to see how a uniform molecule could give rise to the great variety of life.

However, DNA was shown to be the molecule of inheritance by an experiment in 1944. (In fact the 1944 experiment is just the most famous landmark in a series of related experiments by several teams of biologists.) Then a group of American biochemists, O.T. Avery, C. M. MacLeod, and M. J. McCarty showed that DNA causes a phenomenon called ”transformation” in bacteria. In transformation, the properties of bacterial cells are altered when the bacteria are mixed with other bacteria of a different form. Something is passed between the two kinds of bacteria, causing transformation. Avery, MacLeod, and McCarty purified the various kinds of molecule in the cells—fats, proteins, and sugar as well as nucleic acids—and showed that only DNA causes transformation.

The next important advance was the discovery by the American biochemist James Watson and British biophysicist Francis Crick in 1953 that DNA has the structure of a double helix. DNA is made up of three kinds of sub-molecule: phosphate, deoxyribose (a sugar), and various bases. DNA contains four kinds of base: adenine (A), cytosine (C), guanine (G), and thymine (T). The Austrian biochemist Erwin Chargaff had noticed that in DNA the amount of A equals the amount of T, and the amount of G equals the amount of C. This suggested that each A is bonded to a T, and each G to a C. (The two rules—that in DNA A bonds with T, and G with C— are called the “base-pairing rules”.)

The other main kind of evidence used by Watson and Crick was that obtained by X-ray diffraction. X-ray diffraction is a method used to deduce the structure of molecules that are too small to observe directly. In the case of DNA, X-ray diffraction suggested that the molecule was some sort of helix. In Watson and Crick's model, DNA consists of two strands (or “backbones”) made up of alternating phosphate and deoxyribose molecules; the bases are attached to the deoxyribose and stick out from the two strands. G bases in one strand are bonded to C bases in the other, and A bases to T bases. The bonds between the bases are hydrogen bonds. In all, the two strands are bonded together on the inside of the molecule by the bases; each strand is twisted into a helical shape, making a double helix.

Knowledge of a molecule's structure does not always help in understanding how the molecule works, but in the case of DNA, Watson and Crick's discovery was hugely important. The double helix pointed to ways in which DNA could be reproduced and to ways in which DNA could encode information.

III How Biological Information is Encoded in DNA Print Preview of Section

DNA consists of two long, complementary chains of nucleotides. A nucleotide contains one deoxyribose-phosphate link in the DNA chain, and one base. The deoxyribose-phosphate part is always the same, but there are four kinds of base and therefore four kinds of nucleotide. The letters A, C, G, and T can stand for either the bases or the nucleotides containing those bases. The DNA can then be imagined as a long series of letters, corresponding to the order of the nucleotides down the chain. A stretch of DNA can then be symbolized by something like GATACCA... The DNA in a human cell consists of about 6 billion nucleotide letters arranged in a particular sequence. The full set of DNA in a cell is called the genome.

However, the word “genome” can be used in two ways. The 6 billion nucleotides of DNA in a human cell are made up of two equal sets of 3 billion nucleotides, one inherited from that individual's mother and the other from that individual's father. The maternal and paternal DNA contain a very similar set of information, and the basic human genome consists of about 3 billion nucleotides. This is the length of DNA studied in the human genome project. However, the full genome of a cell has two such sets of DNA, and “genome” can refer to either the unit, or the doubled-up, DNA set of a cell.

Genomes differ between the two main categories of living things, prokaryotes and eukaryotes. Bacteria are the main examples of prokaryotes; almost all multicellular life forms—including mushrooms, flowers, trees, insects, and humans—are eukaryotes. Some single-celled eukaryotes, such as Paramecium and yeasts, also exist. In eukaryotes, most of the DNA is located in a special region of the cell, the nucleus. However, eukaryotic cells also have DNA outside the nucleus, in structures called organelles. In animal cells, the only organelles to contain DNA are the mitochondria. In plant cells, organellar DNA is found in chloroplasts as well as in mitochondria. Prokaryotes, such as bacteria, lack a distinct nucleus; they also lack organelles that contain DNA. Prokaryotic DNA exists in the cell but not in any special compartment. The genome of eukaryotes, therefore, can be divided into nuclear and organellar DNA. The nuclear DNA makes up the overwhelming majority: the 3 billion nucleotides of the human genome are the nuclear component. A mitochondrion contains only about 16,000 nucleotides of DNA.

The information in the DNA exists in units called genes. To a crude approximation, one gene codes for one protein. Most bodily functions are, at a molecular level, carried out by proteins. Some proteins act to digest food, other proteins act to defend our bodies from infectious disease, and others transport materials around the body. A few thousand kinds of protein are needed in the workings of one cell, and a few tens of thousands of proteins are needed in the workings of a multicellular body such as a human body. These thousands of kinds of proteins are coded for in the DNA by thousands of genes.

A gene is a stretch of nucleotides in the DNA. Not all the DNA codes for genes. Indeed in human DNA only about 5 per cent or less of the DNA codes for genes. The rest is referred to as non-coding DNA and is of uncertain function. The DNA consists of genes with inter-genic regions of non-coding DNA in-between the genes.

As previously alluded to, it is not exactly correct to say that a gene codes for a protein. Some proteins, such as haemoglobin, are assembled from more than one gene. One haemoglobin molecule is assembled from four separate genes in the DNA. The four units that are assembled to make a haemoglobin molecule are called polypeptides. All proteins are made up of one or more polypeptides. A more accurate definition of a gene is a stretch of DNA that codes for one polypeptide. However, even this definition has exceptions. For a start, some genes code not for proteins but for RNA molecules (such as ribosomal or transfer RNA) that are never translated into protein. Also, some genes can be read in multiple ways, and code for more than one polypeptide (see the description of alternative splicing under “Introns and Exons” below). These exceptions could be dealt with by defining a gene as a stretch of DNA that has a continuous RNA molecule transcribed from it.

The mechanism by which a protein is read off from the DNA is understood in detail. A protein is a chain of amino acids. Living things mainly use 20 different amino acids, and the properties of a protein follow from its particular sequence of these 20 amino acids. In the DNA, three nucleotides code for one amino acid. There are four different kinds of nucleotide in the DNA (A, C, G, and T) making 64 possible sets of three nucleotides (AAA, AAC, ACC... and so on). One set of three nucleotides, coding for one amino acid, is called a codon.

In the late 1950s and the 1960s, molecular biologists worked out which codon coded for which amino acid. The result is a table showing which amino acid is coded for by all of the 64 codons. This table is called the genetic code. For instance, an AAA codon in the DNA codes for the amino acid phenylalanine and a GAA codon codes for leucine. The DNA sequence GAAAAA codes for a leucine followed by a phenylalanine in a protein.

The genetic code contains more codons (64) than are needed to code for all 20 kinds of amino acid. The genetic code is sometimes described as showing “degeneracy” or “redundancy” for this reason. The three-fold redundancy follows inevitably from the use of four bases to code for twenty amino acids. If a set of only two bases were used (AA, AT... and so on), there would only be 16 coding units, which is not enough. The next step up, from two to three nucleotides, takes us from 16 to 64 codons. It is impossible for four bases to be arranged into exactly 20 coding units (one per amino acid), at least in a non-overlapping code. (In theory, a sequence such as ACATAA... could be read in an overlapping manner: the initial ACA, for instance, would code for the first amino acid; then CAT could code for the second amino acid; then ATA for the third, and so on. This would be one example of an overlapping code. It was once thought that DNA uses an overlapping code, because one kind of overlapping code was found to contain 20 distinct units. However, in fact DNA uses a non-overlapping code, with each triplet of nucleotides coding for one amino acid, and the code contains redundancy.)

IV Transcription and Translation Print Preview of Section

The information in the DNA is converted to make proteins in two main stages. The first is transcription. A molecule called messenger RNA (mRNA) is synthesized directly on the DNA. The DNA double helix is first unwound, and the two strands are separated. The mRNA molecule is then formed on one of the DNA strands. mRNA is, like DNA, a chain of nucleotides, with the minor difference that it contains the base uracil (U) instead of thymine (T). The nucleotides of mRNA bond to the DNA strand, with the same pairing rules as in the DNA double helix. If the DNA sequence is ACAGT, a messenger RNA with the sequence UGUCA will be formed from it. The enzyme RNA polymerase catalyses the process of transcription. The mRNA that is transcribed from a gene contains the same information as the DNA strand, but in a form from which a protein can be produced.

(The genetic code is conventionally written in terms of mRNA code, not DNA code. It was stated above that the codon AAA in DNA codes for phenylalanine. AAA in DNA-code corresponds to UUU in mRNA-code, and in the standard table for the genetic code UUU is given as one of the codons for phenylalanine [see table].)

Once the transcription of a mRNA molecule is complete, the mRNA separates from the gene. After various kinds of post-transcriptional modification (see RNA) it makes its way to another structure in the cell, called a ribosome. Ribosomes are the sites of protein assembly and are made of a second kind of RNA, called ribosomal RNA (rRNA). The mRNA binds to the ribosome. Now a third kind of RNA, transfer RNA (tRNA), brings the amino acid. A tRNA molecule has a codon at one end, which binds to the complementary codon in the mRNA. At the other end, the tRNA carries the amino acid corresponding to its codon. The various tRNAs therefore bind to the mRNA, and their amino acids are lined up in the appropriate order to make the protein. The amino acids are joined up and detached from their tRNA. A protein has been assembled. The conversion of mRNA information, at a ribosome, into a protein is called translation. In summary, the processes of transcription and translation act to decode the DNA information in a gene; the result is a protein molecule, and it is through proteins that DNA exerts its effects in the cells of living creatures.

In living cells, information flows from DNA to RNA to protein. Cells contain no mechanism to synthesize mRNA from protein, or (with rare exceptions) DNA from mRNA. Francis Crick introduced the expression “the central dogma” of molecular biology to refer to the one-way information flow. The central dogma provides a molecular explanation for the fact that acquired characters are not inherited. Heredity is Mendelian (see Mendel), not Lamarckian (see Lamarck). Some exceptions are known, in which RNA can be reverse-transcribed to form DNA, but they concern special RNA viruses or parts of the DNA that do not code for genes. The exceptions do not violate the central dogma in a deep sense, and the central dogma still stands as a good generalization about information flow in living things.

V Introns and Exons Print Preview of Section

The genes of many forms of life have an additional feature. The genes contain stretches of DNA, called exons, each of which codes for part of a protein. The exons are divided up by stretches of non-coding DNA, called introns. The genes consist of alternating introns and exons, and the full codes for a polypeptide are contained in several scattered exons, rather than a continuous stretch of DNA. The number of exons varies from gene to gene, but most genes in mice and humans have 12 exons or less (though some of our genes have over 60 exons).

The division of genes into exons and introns complicates the story of transcription. The whole gene, consisting of all the exons and introns is initially transcribed to form mRNA. Then, while the mRNA is still in the cell nucleus, the introns in the mRNA are removed in a process called splicing. The introns are said to be "spliced out". The result is a final mRNA molecule consisting only of the exons, in the correct order to code for the protein. This final mRNA moves to the ribosomes and is translated. The reason why many genes contain introns, which are then removed, is uncertain, though there are several hypotheses.

The presence of introns differs between prokaryotes and eukaryotes. Bacterial genes mainly lack introns. Each bacterial gene simply codes for a polypeptide. Bacterial genes are said to be colinear with their proteins. But in eukaryotes, at least some genes have been found to contain introns whenever they have been looked for. However, introns are rare in the single-celled yeast. Almost all (96 per cent) of yeast genes consist of a single exon, lacking any introns. In fruit flies, only 17 per cent of genes lack introns. In humans, only 6 per cent of genes lack introns.

Biologists are uncertain whether early life forms lacked introns, as bacteria mainly do today, and introns evolved relatively late when complex multicellular life arose. Alternatively, introns may have been present early on, in the common ancestors of modern bacteria and eukaryotes and then been lost in the evolution of bacteria. These are called the "introns-late" and "introns-early" theories.

In life forms whose genes do contain introns, the length of introns ranges widely but the lengths of exons are more uniform. In mice and humans, for example, most exons are 100-300 nucleotides long (coding for 30-100 amino acids); but most introns are in the 100-25,000 nucleotide range. The longest introns are almost 100,000 nucleotides long. All this intron material is spliced out of the mRNA transcription, and the initial mRNA of an average gene is about five times the length of the final mRNA that is translated.

The arrangement of genes into introns and exons enables, in some cases, one gene to be read in more than one way. The process is called alternative splicing. For example, a gene might contain five exons, divided by four introns. One mRNA molecule might be created from exons 1 to 5; a second mRNA molecule from exons 1 to 4 (with exon 5 being spliced out, as well as the four introns); and another mRNA from exons 1 to 2 and 4 to 5 (with exon 3 being discarded along with the introns). Then three different mRNA, and so three different proteins, will be read from a single gene. Several examples of alternative splicing are known. For instance, a gene called slo codes for a protein that contributes to the acoustic sensitivity of little hairs in our inner ears. Our ears are sensitive to a range of pitch because we have many hairs each tuned to be sensitive to a particular sound frequency. It might be thought that distinct genes would code for the proteins that give the different acoustic sensitivities to the different hairs. But slo contains at least 8 sites of alternative splicing, allowing at least 500 different, but related, mRNAs to be read from it. Alternative splicing is a recent discovery, and biologists do not yet know how important it is in life. It does, however, along with the combinatorial action of genes, show how enormous complexity and variety can arise from relatively few genes.

VI Non-Coding DNA Print Preview of Section

The information to code for proteins is contained in the genes. The part of the DNA that consists of genes is called coding DNA. (Though even genes contain non-coding DNA, in the form of introns.) The genes, at least in eukaryotes and particularly in multicellular eukaryotes, are located within stretches of non-coding DNA. The non-genic, non-coding DNA may have no function, and it is often called “junk” DNA for this reason. Alternatively, the non-coding DNA (or part of it) may indeed have some function, for instance in regulating gene expression, or ensuring that the genes are appropriately positioned within the DNA, or contributing to architectural features in the large-scale structure of the DNA. The function (if any) of non-coding DNA remains uncertain.

Much of the non-coding DNA is repetitive, consisting of repeats of a certain unit sequence. (The repeats are often of closely related, rather than identical, versions of the unit sequence.) In some cases, the unit sequence is relatively long, approximately as long as one gene. These are called LINEs (long interspersed elements). For example, the human genome contains about 500,000 copies of the LINE1 sequence, each of which is about 1000 nucleotides long. Somewhat shorter are the SINEs (short interspersed elements). The commonest SINE in human DNA is a 260-nucleotide sequence called Alu; the human genome contains over a million copies of Alu.

LINEs and SINEs are examples of transposable elements. Transposable elements, or transposons, are known informally as “jumping genes”. They are able to copy themselves into other sites in the DNA. The metaphor of jumping is imperfect because when we jump we move from one place to another. When a transposable element copies itself elsewhere in the DNA, it creates a second copy in addition to the original copy. Transposable elements therefore tend to proliferate through the DNA, and this helps explain the huge number of copies of certain repeat sequences in human DNA.

Finally, some repetitive DNA has short unit sequences, of approximately ten nucleotides. At various sites around the DNA the unit sequences are found with various numbers of repeats. The repeats are arranged side-by-side; side-by-side repeats of any sequence of DNA are called tandem repeats. One well studied such sequence in human DNA has a unit 10-15 nucleotides long. Human DNA has about 1000 copies of the sequence in all, scattered among several sites. At one site, there might be 10 repeats; at another site, there might be 100 repeats. The short sequences that consist of variable numbers of repeats are called “variable number of tandem repeats” (VNTR) sequences. They are also called mini-satellites.

DNA replicates by the semiconservative model of replication. This model states that during replication of double stranded DNA, the two strands separate and each separate strand is then used as a template to make a new strand. In this manner, the cell ends up with two identical molecules of the original DNA, with each molecule containing one strand from the original piece of DNA and one newly synthesized strand. This model was first proposed by Watson and Crick and was confirmed by the experiments of Meselson and Stahl. The replication of DNA requires more than just DNA polymerase. For example, the two strands of DNA must be separated before DNA polymerase can replicate the strands. The enzyme that separates the strands is known as helicase. This enzyme first binds to a specific site on the DNA known as the origin of replication (prokaryotes have only a single origin on their chromosome while eukaryotes have several on each of their chromosomes) and separates the strands - this allows the DNA polymerase to bind. h, l. As the replication process continues, the helicase moves down the DNA and to continue separating the DNA strands. Once the stands are separated, there are at least three problems that now must be addressed: First of all, the separation of the strands of DNA induces supercoiling in the remaining part of the DNA. This is similar to what happens when your coiled telephone cord gets overwound and forms knots.

VNTRs are not known to have any function in DNA, but they have been put to use by human beings. The exact number of tandem repeats at any one site changes rapidly between the generations. The exact pattern of repeats of the short unit sequence is peculiar to one individual, and his or her close relatives. This DNA provides the basis of genetic fingerprinting (see DNA Fingerprinting), in which DNA is used as forensic evidence. Genetic fingerprints have been used to establish guilt, or to establish innocence (including several people on death row), in court, and to establish paternity, or non-paternity, in lesser legal disputes.

VII DNA Replication Print Preview of Section

The DNA is copied, or replicated, once per cell division. It is copied at every cell division within the lifetime of one organism, and in the reproductive cell line that produces sperm and eggs. A large set of perhaps 50 or so enzymes catalyse the DNA replication. DNA polymerases are the most important of these enzymes.

The first step is to unwind the double helix at a certain site (called an origin of replication) and separate the two strands. Two new strands are then formed, one on each of the strands of the parental double helix. The DNA at the site of replication resembles a fork, where the two strands of the parental DNA are split apart to be copied. It is called a replication fork. DNA replication is semi-conservative: that is, each new double helix contains one strand from the parental copy and a second new strand that was copied from it. Theoretically, it could have been that DNA replication was conservative rather than semi-conservative; this would have meant that after replication one of the DNA molecules would have both strands conserved from the parental DNA, and the other DNA molecule would contain two new strands. The semi-conservative nature of DNA replication was shown in a classic experiment called the Meselsohn-Stahl experiment. Two American geneticists Matthew Meselsohn and Franklin W. Stahl labelled DNA with a heavy isotope of nitrogen 15N instead of the normal 14N. The labelled DNA could be distinguished from normal DNA. They then allowed bacteria with labelled DNA to reproduce in an environment containing normal 14N. They found that the offspring DNA all contained heavy 15N, but about half as much as in the original labelled parents. Thus half the parental DNA is preserved in each offspring DNA molecule: DNA replication is semi-conservative.

The new strand of DNA is made by placing the complementary nucleotides opposite each nucleotide in the parental strand. If the parental strand reads ... CTA ..., for example, then a G followed by an A followed by a T will be bonded next to it. Occasional mistakes occur, and an inappropriate nucleotide is inserted. A 'T' might be put next to the C, instead of a G. These so-called mismatches are detected by enzymes, called proof-reading enzymes, within the complex of enzymes that travel down the replication form. The mismatched region of the new strand is then removed, and the DNA is re-copied.

DNA replication is complicated by the fact that the two strands of the double helix are mirror-images of each other. One strand can be thought of as going left to right across the page; but the other strand goes right to left. At a molecular level, this is due to the structure of the deoxyribose in the DNA backbone. Deoxyribose contains five carbon atoms, which are conventionally numbered 1 to 5. The phosphate upstream is attached to carbon atom number 3, the phosphate downstream to carbon atom number 5. The 3’ bond in one strand of the double helix is opposite the 5’ bond in the other strand.

DNA polymerase creates new strands only in the 5 —> 3 direction. It is as if it can only copy from left to right, and not from right to left. Copying one of the strands is easy, but how is the other strand copied? The answer is that it is copied in short backwards stretches. The short stretches are called Okazaki fragments, and they are about 1000-2000 nucleotides long. The Okazaki fragments are joined together to make a continuous DNA strand.

VIII Mutations Print Preview of Section

During DNA replication, the proof-reading enzymes fail to detect a small fraction of miscopied nucleotides. The new, miscopied nucleotide can then be permanently incorporated into the DNA. These changes in the DNA sequence are called mutations. The rate of mutation is low—about 10-10 per nucleotide every time human DNA is copied. But a non-trivial number of mutations occur every human generation. The human genome is about 6.6 x 109 nucleotides long and it is copied about 200 times per generation (between the conception and reproductive maturity of an individual). In all, about 175-200 nucleotides are miscopied in every new human offspring, relative to its parents’ DNA.

An enzyme known as topoisomerase (a.k.a. gyrase) relieves the supercoiling. In addition, single-stranded DNA is not very stable and is susceptable to degradation. To prevent this, there are a set of single-stranded binding proteins that are needed to stabilize the separated strands during replication. Finally, DNA polymerase requires a primer to operate. That is, the polymerase can only add nucleotides to an existing strand (the primer!). This existing strand is laid down by the enzyme primase. Primase lays down a strand of about 8-10 nucleotides that is complementary to the replicating strand. DNA polymerase can then add the remaining nucleotides to the 3' end of the primer. Replication starts at the origin and progresses in both directions. As the strands are separated, they form replication forks which are the sites of the replication. b, g, h. Because DNA polymerase can only add to the 3' end, the replication of oned of the strands is continuous while the other is replicated in small fragments known as Okazaki fragments. This is shown in the following figure. There are now two more problems: How do the Okazaki fragments get joined together? and how does cell remove the RNA primer. An enzyme known as ligase connects the various fragments of the newly synthesized DNA. The RNA primers are replaced with DNA by a special form of DNA polymerase.

Moreover, simply miscopied nucleotides are not the only kind of mutation. Sometimes a short stretch of DNA may be copied twice over, or missed out. The reason why genetic fingerprinting is possible is that certain short sequences of repetitive DNA (the VNTR DNA) have exceptionally high mutation rates. The mutations consist of changes in the number of tandem repeats of the unit sequence: a sequence that is repeated 50 times may be copied only 49 times, or 51 times, in the offspring. The chance of a change in the number of repeats is about 1 in 100 between a human parent and offspring for every site where a VNTR sequence is located.

Some mutations do not arise as errors in DNA replication. Certain environmental mutagens, such as X-rays and UV-radiation, cause mutations. The radiation can break DNA strands, or cause chemical changes in neighbouring nucleotides such that they form sideways bonds (dimers). Through much of the 20th century, it was supposed that most mutations had external causes in environmental mutagens, but research had established by late in the century that the majority of mutations are internal copying mistakes rather than being caused by external insults.

Mutations are often harmful for the individual that inherits them. However, they also give rise to genetic variation in the population. This genetic variation is the raw material for evolution by natural selection. Mutation is, for the individual, not disadvantageous on average and mutations probably only occur at the tiny rate they do because the metabolic cost of driving the mutation rate down to zero would be prohibitively high. The fact that evolution occurs at all can then be seen as a by-product of the thermodynamic difficulty of further reductions in mutation rate. If the mutation rate really were driven down to zero, evolution would come to a stop.

IX Evolution of DNA Print Preview of Section

Almost all life on Earth uses DNA as its hereditary material. The only exceptions are certain RNA viruses, which include the influenza virus and HIV (the agent of AIDS). This suggests that DNA started to be used early in the evolution of life. Also, the genetic code is essentially (though not exactly) identical in all life. The genetic code appears to be arbitrary, in the same way that human language is arbitrary: there is nothing about a book that requires it to be called “a book”, and it is called “un livre” in French. Likewise, the triplet UUU does not have to be used to code for phenylalanine. The universality of the genetic code suggests that all modern life is descended from one common ancestor that also used the same genetic code. The code is evolutionarily hard to change once it has evolved, and Francis Crick has described the exact pattern of the genetic code as a "frozen accident".

Although DNA is ancient in evolution, DNA was probably not the earliest hereditary molecule, near the origin of life. DNA has many 'advanced' features that would prevent it from working in very simple life forms. It is a double helix, with the coding information stored inside. In order for the information to be read or replicated, the double helix has to be unwound. Special enzymes are needed to unwind the DNA, and these enzymes would not have existed early on. Moreover, DNA by itself is biologically inert; DNA cannot directly catalyse any metabolic processes. By contrast, RNA codes information in a single-stranded form that can interact directly with the environment. Some RNA molecules can also catalyse biological processes; these RNA molecules are called ribozymes. For these and other reasons, biologists suspect that DNA-based life was preceded by an “RNA world” in which simpler life forms existed and used RNA as their hereditary molecule. Still simpler life forms based on some other replicating molecule may have preceded the RNA world, but we have no evidence about the hypothetical pre-RNA stage.

Advertisement Life forms that use RNA have small hereditary molecules, coding for only limited information. RNA viruses typically contain fewer than 10 genes, and are less than 10,000 nucleotides long. RNA is a more mutable, less stable molecule than DNA. DNA-based life forms may have evolved because of the lower mutation rates they have. Bacteria had evolved by 3,500 million years ago, and all modern bacteria use DNA. Bacterial genomes are about 1-10 million nucleotides long and contain 1,000-5,000 genes. Multicellular life contains still larger genomes. The DNA of a fruit fly has about 300 million nucleotides and 14,000 genes. The DNA of a human has about 6 billion nucleotides and 30,000 genes. Thus, as more complex life forms have evolved, it has been at least in part by expansion in the codes and coding capacity of the DNA.

Who would have thought a bacterium hanging out in a hot spring in Yellowstone National Park would spark a revolutionary new laboratory technique? The polymerase chain reaction, now widely used in research laboratories and doctor's offices, relies on the ability of DNA-copying enzymes to remain stable at high temperatures. No problem for Thermus aquaticus, the sultry bacterium from Yellowstone that now helps scientists produce millions of copies of a single DNA segment in a matter of hours. In nature, most organisms copy their DNA in the same way. The PCR mimics this process, only it does it in a test tube. When any cell divides, enzymes called polymerases make a copy of all the DNA in each chromosome. The first step in this process is to "unzip" the two DNA chains of the double helix. As the two strands separate, DNA polymerase makes a copy using each strand as a template. The four nucleotide bases, the building blocks of every piece of DNA, are represented by the letters A, C, G, and T, which stand for their chemical names: adenine, cytosine, guanine, and thymine. The A on one strand always pairs with the T on the other, whereas C always pairs with G. i, k, e, h, j. The two strands are said to be complementary to each other. To copy DNA, polymerase requires two other components: a supply of the four nucleotide bases and something called a primer. DNA polymerases, whether from humans, bacteria, or viruses, cannot copy a chain of DNA without a short sequence of nucleotides to "prime" the process, or get it started. So the cell has another enzyme called a primase that actually makes the first few nucleotides of the copy. This stretch of DNA is called a primer. Once the primer is made, the polymerase can take over making the rest of the new chain.

Biologists are currently at the point of learning much more about how the DNA of various life forms codes for their various kinds of bodies. DNA can now be sequenced rapidly, and the whole genomes of several species have now been transcribed. The sequence of the human genome was completed in 2003. The DNA sequence of a living creature can be used to count how many genes are needed to build that life form, and see how the genes have evolved from other genes in related species of life. The estimate that humans contain 30,000 genes is somewhat lower than the estimates made in the pre-genomic era. Before 2001, biologists thought that humans contain about 60,000 - 100,000 genes. The full reason for the discrepancy is unknown, but part of the explanation probably lies in mechanisms that allow one gene to be read in many ways (RNA editing, alternative splicing) and in the variety of ways in which genes work together. Humans probably read off more information per gene on average than bacteria do. Biology has now moved into the genomic era, and biologists are using the massive amounts of new DNA sequence data to investigate the evolution and workings of life.

I Introduction Print Preview of Section

DNA Fingerprinting, use of a person's body samples as a means of identification. Deoxyribonucleic acid (DNA) is a genetic blueprint found in the double strand or “double helix” of molecules called chromosomes located within the cell nuclei of all living beings. With the exception of identical twins, the complete DNA of each individual is unique (see below).

In order to obtain a DNA “fingerprint”, DNA is first extracted from body tissue or fluid such as blood or saliva. Areas of DNA that can be used to distinguish one individual from another are segmented and arranged. Probes are used to mark the segments and X-ray film is placed on the probes and developed to form a pattern of black bars—the DNA “fingerprint”. DNA “fingerprints” are then compared for similarities.

II Uses of DNA Fingerprinting Print Preview of Section

DNA testing was originally developed for medical purposes in order to detect the presence of genetically inherited diseases. DNA fingerprinting, or DNA typing as it is often called, was first developed as an effective identification technique in 1985. The uses of DNA fingerprinting have expanded to include criminal investigations and forensic science. DNA fingerprinting was first used in a criminal investigation in the United Kingdom in 1987.

DNA evidence has a variety of applications in criminal investigations and forensic science. DNA evidence can confirm someone as a suspect to a crime by comparing DNA specimens found at a crime scene to a suspect's DNA. DNA evidence can also be used to exonerate a suspect.

III The Admissibility of DNA Evidence in Courts Print Preview of Section

Generally, courts have accepted the reliability of DNA testing and have admitted DNA test results into evidence. However, there have been criticisms of DNA fingerprinting for investigative or forensic purposes.

A Accuracy of Results

The accuracy of DNA fingerprinting has been challenged. First, because DNA segments rather than complete DNA strands are “fingerprinted”, there is a possibility that DNA samples taken from two individuals may yield identical results. For this reason, a finding that DNA fingerprints are identical is accompanied by the probability that the particular DNA pattern could appear in a particular segment of the population. As yet, widespread research confirming the uniqueness of DNA fingerprinting test results has not been conducted. In addition, because humans interpret the results there is always a possibility that mistakes will be made.

B Prohibitive Costs Advertisement Because DNA testing is expensive, suspects who are unable to provide their own DNA experts may not be able to adequately defend themselves if charges are brought against them based on DNA evidence and DNA experts are not provided for them. Furthermore, experts hired to testify either in support of or in opposition to the accuracy of DNA evidence may be biased.

C Misuse of Results

DNA fingerprint results may be used for unauthorized purposes such as to identify individuals with certain stigmatizing illnesses such as Acquired Immune Deficiency Syndrome (AIDS). The potential for misuse increases if DNA fingerprint results are databased.

I Introduction Print Preview of Section

Genetic Engineering, method of changing the inherited characteristics of an organism in a predetermined way by altering its genetic material. This is often done to cause micro-organisms, such as bacteria or viruses, to synthesize increased yields of compounds, to form entirely new compounds, or to adapt to different environments. Other uses of this technology, which is also called recombinant DNA technology, include gene therapy, which is the supply of a functional gene to a person with a genetic disorder or with other diseases such as acquired immune deficiency syndrome (AIDS) or cancer.

Genetic engineering involves the manipulation of deoxyribonucleic acid, or DNA. Important tools in this process are so-called restriction enzymes that are produced by various species of bacteria. Restriction enzymes can recognize a particular sequence of the chain of chemical units, called nucleotide bases, which make up the DNA molecule and cut the DNA at that location. Fragments of DNA generated in this way can be joined using other enzymes called ligases. Restriction enzymes and ligases therefore allow the specific cutting and reassembling of portions of DNA. Also important in the manipulation of DNA are so-called vectors, which are pieces of DNA that can self-replicate (produce copies of themselves) independently of the DNA in the host cell in which they are grown. Examples of vectors include plasmids, viruses, and yeast artificial chromosomes. These vectors permit the generation of multiple copies of a particular piece of DNA, making this a useful method for generating sufficient quantities of material with which to work. The process of engineering a DNA fragment into a vector is called “cloning”, because multiple copies of an identical molecule are produced. Another way, recently discovered, of producing many identical copies of a particular DNA fragment is the polymerase chain reaction. This method is rapid and avoids the need for cloning DNA into a vector.

II Gene Therapy Print Preview of Section

Gene therapy involves supplying a functional gene to cells lacking that function, with the aim of correcting a genetic disorder or acquired disease. Gene therapy can be broadly divided into two categories. The first is alteration of germ cells, that is, sperm or eggs, which results in a permanent genetic change for the whole organism and subsequent generations. This “germ line gene therapy” is considered by many to be unethical in human beings (although several babies were born as a result of this option in 2001 in the United States). The second type of gene therapy, somatic cell therapy, is analogous to an organ transplant. In this case, one or more specific tissues are targeted by direct treatment or by removal of the tissue, addition of the therapeutic gene or genes in the laboratory, and return of the treated cells to the patient. Clinical trials of somatic cell gene therapy began in the late 1990s, mostly for the treatment of cancers and blood, liver, and lung disorders; by 2000 scientists had successfully inserted a human protein gene into specific regions of sheep fibroblast cells.

III Benefits Print Preview of Section Advertisement The process of genetic engineering has great potential. For example, the gene for insulin, normally found only in higher animals, can now be introduced into a bacterial cell by way of a plasmid vector. The bacteria can then be grown in large quantities, giving an abundant source of so-called “recombinant” insulin at a relatively low cost. Production of recombinant insulin is also not dependent on the sometimes variable supply of pancreas tissue from animals. Another important use of genetic engineering is in the manufacture of recombinant factor VIII, the blood-clotting agent missing in patients with haemophilia. Virtually all haemophiliacs who received factor VIII before the mid-1980s have contracted acquired immune deficiency syndrome (AIDS) or hepatitis from viral contaminants in the blood used to make the product. Since that time, donor blood has been screened for the presence of HIV (Human Immunodeficiency Virus) and hepatitis C virus, and the manufacturing process includes steps to inactivate these viruses if they should be present. The possibility of viral contamination is eliminated completely with the use of recombinant factor VIII. Other uses of genetic engineering include increasing the disease resistance of crops, producing pharmaceutical compounds in the milk of animals, generating vaccines, and altering livestock traits.

IV Hazards Print Preview of Section

While the potential benefits of genetic engineering are considerable, so may be the potential dangers. For example, the introduction of cancer-causing genes into a common infectious organism, such as the influenza virus, could be hazardous. Consequently, in most nations, experiments with recombinant DNA are closely regulated, and those involving infectious agents are permitted only under the strictest conditions of containment. Another concern is that, despite stringent controls, some unforeseen effect might occur as the result of genetic manipulation. With developments in animal cloning proceeding rapidly—cloned animals could be used to produce organs for future transplantation into humans—there is also international concern that the private ownership of human genes should be outlawed in order to control and protect the human genetic blueprint.

DNA EXPLAINED IN EASY TERMS

DNA is material that governs inheritance of eye color, hair color, stature, bone density and many other human and animal traits. DNA is a long but narrow string-like object. A one foot long string or strand of DNA is normally packed into a space roughly equal to a cube 1/millionth of an inch on a side. This is possible only because DNA is a very thin string.

Our body's cells each contain a complete sample of our DNA. One cell is roughly equal in size to the cube described in the previous paragraph. There are muscle cells, brain cells, liver cells, blood cells, sperm cells and others. Basically, every part of the body is made up of these tiny cells and each contains a sample or complement of DNA identical to that of every other cell within a given person. There are a few exceptions. For example, our red blood cells lack DNA. Blood itself can be typed because of the DNA contained in our white blood cells.

Not only does the human body rely on DNA but so do most living things including plants, animals and bacteria.

A strand of DNA is made up of tiny building blocks. There are only four, different basic building blocks. Scientists usually refer to these using four letters for the four different building blocks. The letters are: A, T, G, and C. These four letters are short nicknames for more complicated chemical names, but actually the letters (A,T, G and C) are used much more commonly than the chemical names so the latter will not be mentioned here. Another way of referring to the building blocks or letters is to call them bases.

For example, to refer to a particular piece of DNA, we might write: AATTGCCTTTTAAAAA. This is a perfectly acceptable way of describing a piece of DNA. Someone with a machine called a DNA synthesizer could actually synthesize the same piece of DNA from the information AATTGCCTTTTAAAAA alone.

The sequence of bases (letters) can code for many properties of the body's cells. The cells can read this code. Some DNA sequences encode important information for the cell. Such DNA is called, not surprisingly, "coding DNA." Our cells also contain much DNA that doesn't encode anything that we know about. If the DNA doesn't encode anything, it is called non-coding DNA or sometimes, "junk DNA."

The DNA code, or genetic code as it is called, is passed through the sperm and egg to the offspring. A single sperm cell contains about three billion bases consisting of A, T, G and C that follow each other in a well-defined sequence along the strand of DNA.

Both coding and non-coding DNAs may vary from one individual to another. These DNA variations can be used to identify people or at least distinguish one person from another.

A PCR vial contains all the necessary components for DNA duplication: a piece of DNA, large quantities of the four nucleotides, large quantities of the primer sequence, and DNA polymerase. The polymerase is the Taq polymerase, named for Thermus aquaticus, from which it was isolated. The three parts of the polymerase chain reaction are carried out in the same vial, but at different temperatures. The first part of the process separates the two DNA chains in the double helix. This is done simply by heating the vial to 90-95 degrees centigrade (about 165 degrees Fahrenheit) for 30 seconds. But the primers cannot bind to the DNA strands at such a high temperature, so the vial is cooled to 55 degrees C (about 100 degrees F). At this temperature, the primers bind or "anneal" to the ends of the DNA strands. This takes about 20 seconds. The final step of the reaction is to make a complete copy of the templates. Since the Taq polymerase works best at around 75 degrees C (the temperature of the hot springs where the bacterium was discovered), the temperature of the vial is raised. The Taq polymerase begins adding nucleotides to the primer and eventually makes a complementary copy of the template. If the template contains an A nucleotide, the enzyme adds on a T nucleotide to the primer. If the template contains a G, it adds a C to the new chain, and so on to the end of the DNA strand. b, d, i, l, f. This completes one PCR cycle. The three steps in the polymerase chain reaction - the separation of the strands, annealing the primer to the template, and the synthesis of new strands - take less than two minutes. Each is carried out in the same vial. At the end of a cycle, each piece of DNA in the vial has been duplicated. But the cycle can be repeated 30 or more times. Each newly synthesized DNA piece can act as a new template, so after 30 cycles, 1 million copies of a single piece of DNA can be produced! Taking into account the time it takes to change the temperature of the reaction vial, 1 million copies can be ready in about three hours.

What is a Locus?

A locus (with a hard "c", LOW-KUS) is simply a location in the DNA. The plural of locus is, loci (with a soft "c", pronounced LOW-SI). Again, the DNA is a long string like object. Until it is extracted from the cells and purified, the long string of human DNA is tightly folded into bundles called chromosomes.

Currently, there are two main types of forensic DNA testing. They are RFLP analysis and PCR-based analysis.

Generally, RFLP analysis requires larger amounts of DNA and the DNA must be undegraded. Crime-scene evidence that is old or that is present in small amounts is often unsuitable for RFLP testing. Warm moist conditions may accelerate DNA degradation rendering it unsuitable for RFLP in a relatively short period of time.

PCR testing often requires less DNA than RFLP testing and the DNA may be partially degraded, more so than is the case with RFLP. However, PCR still has sample size and degradation limitations. PCR tests are extremely sensitive to contaminating DNA at the crime scene and within the test laboratory. During PCR, contaminants may be amplified up to a billion times their original concentration. Contamination can influence PCR results, particularly in the absence of proper handling techniques and proper controls for contamination.

RFLP DNA testing has basic steps:

The DNA from crime-scene evidence or from a reference sample is cut into fragments with something called a restriction enzyme. The restriction enzyme recognizes a particular short sequence such as AATT that occurs many times in a given cell's DNA. One enzyme commonly used is called Hae III (pronounced: Hay Three) but the choice of enzyme varies. For RFLP to work, the analyst needs thousands of cells. If thousands of cells are present from a single individual, they will all be cut in same place along their DNA by the enzyme because each cell's DNA is identical to that every other cell of the same person. The DNA fragments are next sorted according to size by a device called a gel. The DNA is placed at one end of a slab of gelatin and it is drawn through the gel by an electric current. The gel acts like a sieve allowing small DNA fragments to move more rapidly than larger ones. After the gel has separated the DNA fragments according to size, a blot or replica of the gel is made to trap the DNA in the positions that they end up in, with small DNA fragments near one end of the blot and large ones near the other end. The blot is now treated with a piece of DNA called a probe. The probe is simply a piece of DNA that binds to the DNA on the blot in the position where a similar sequence (the target sequence) is located. The target sequence is known to occur only in at a specific locus (position on the DNA string) The target DNA fragments that are recognized by the probe are then measured for size. These target fragments come from places in the DNA where there is lengthpolymorphism, which means that the length of the fragments tends to vary from person to person. If two samples are tested and the target DNA fragments from a particular locus are of different lengths, then the samples must be from different people. If they are the same length then the samples are either from the same person or from two people who by coincidence have fragments of the same length. Using the same probe and enzyme, the test lab will perform these same steps for many people. The length of the target DNA fragments for each person will be recorded in a database. The distribution of fragment sizes in the database provides a rough idea of how rare or common a fragment of any given length is in a particular population. The commonness of a given size of DNA fragment is called a population frequency.

Here's how DNA fingerprinting works (this procedure is done once on a sample of blood, semen, hair or skin cells from the crime scene, and again on samples taken from the suspect).

You take some DNA and cut it with restriction enzymes (defined). These chemicals cut the long DNA molecule each time they recognize a certain sequences of sub-units, or bases. It's sort of like going through a phone book and cutting it just before the name "Norman." Thus each fragment would start with Norman. (Why are we picking on Norm? No reason. In genetics, you could just as easily search for the name "ACCGTAGCACGCCAGT," which are letters in the genetic "code" of DNA) The goal of this step is to cut up the DNA evidence into manageable fragments. Then you put the fragments on a layer of gelatin and hook up an electric field to the ends of the gelatin. When you switch on the current, it attracts the fragments across the field. The shorter fragments move faster than long ones, and that sorts the fragments by size. You attach radioactive markers to certain sequences of the DNA. For example, you might label every occurrence of DADDADCAT -- a different series of "letters" in DNA's alphabet. You place the markers against a photographic film, wait a couple of weeks, and produce a picture showing the individual markers. Now you repeat the process with different markers, to multiply the accuracy of the test. This gel shows how fragments of DNA are "sorted" by an electric field. The shortest and lightest particles move the furthest. The actual image was made by tagging the bits of DNA with radioactive tracers. The radiation exposed the x-ray film you see here. Image courtesy of Miami-Dade Police Department Crime Laboratory, Jose Almirall. Still, there are problems with DNA fingerprinting. First of all, samples can be contaminated by even a microscopic bit of genetic junk. That may sound trivial (unless your life is on the line), but it's not. As a group of critics have pointed out, only one-third of 60 police department DNA labs have been accredited by the American Society of Crime Directors Laboratory Accreditation Board. That's more than the commercial DNA labs, which almost all operate without accreditation. See "Simpson Trial Shows..." in our bibliography.

There's another problem with DNA results -- making them meaningful to a jury. "When a fingerprint expert testifies, you can actually verify that they did [make the comparison]," says forensic scientist David Stoney of the McCrone Research Institute in Chicago. "A juror can understand the idea, and see what was done."

But when a genetic fingerprinting expert testifies, he says, "the juror has no reference point. They might think, 'This guy went into the lab and did something goofy. Now I have to take his word for it." Not only is the procedure more difficult to explain, but the evidence is almost entirely invisible.

And third, the test requires a good sample that's not too small or degraded. But a new technique, using polymerase chain reaction (defined), or PCR, can amplify tiny bits of DNA and produce a faster, although less exact, result. Problem is, the method uses even smaller samples, and is thus even more subject to contamination...

We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.). This structure has novel features which are of considerable biological interest.

A structure for nucleic acid has already been proposed by Pauling and Corey (1). They kindly made their manuscript available to us in advance of publication. Their model consists of three intertwined chains, with the phosphates near the fibre axis, and the bases on the outside. In our opinion, this structure is unsatisfactory for two reasons: (1) We believe that the material which gives the X-ray diagrams is the salt, not the free acid. Without the acidic hydrogen atoms it is not clear what forces would hold the structure together, especially as the negatively charged phosphates near the axis will repel each other. (2) Some of the van der Waals distances appear to be too small.

Another three-chain structure has also been suggested by Fraser (in the press). In his model the phosphates are on the outside and the bases on the inside, linked together by hydrogen bonds. This structure as described is rather ill-defined, and for this reason we shall not comment on it.

We wish to put forward a radically different structure for the salt of deoxyribose nucleic acid. This structure has two helical chains each coiled round the same axis (see diagram). We have made the usual chemical assumptions, namely, that each chain consists of phosphate diester groups joining ß-D-deoxyribofuranose residues with 3',5' linkages. The two chains (but not their bases) are related by a dyad perpendicular to the fibre axis. Both chains follow right- handed helices, but owing to the dyad the sequences of the atoms in the two chains run in opposite directions. Each chain loosely resembles Furberg's2 model No. 1; that is, the bases are on the inside of the helix and the phosphates on the outside. The configuration of the sugar and the atoms near it is close to Furberg's 'standard configuration', the sugar being roughly perpendicular to the attached base. There is a residue on each every 3.4 A. in the z-direction. We have assumed an angle of 36° between adjacent residues in the same chain, so that the structure repeats after 10 residues on each chain, that is, after 34 A. The distance of a phosphorus atom from the fibre axis is 10 A. As the phosphates are on the outside, cations have easy access to them.

The length of DNA in the nucleus is far greater than the size of the compartment in which it is contained. To fit into this compartment the DNA has to be condensed in some manner. The degree to which DNA is condensed is expressed as its packing ratio. Packing ratio - the length of DNA divided by the length into which it is packaged. For example, the shortest human chromosome contains 4.6 x 107 bp of DNA (about 10 times the genome size of E. coli). This is equivalent to 14,000 µm of extended DNA. In its most condensed state during mitosis, the chromosome is about 2 µm long. This gives a packing ratio of 7000 (14,000/2). To achieve the overall packing ratio, DNA is not packaged directly into final structure of chromatin. Instead, it contains several hierarchies of organization. The first level of packing is achieved by the winding of DNA around a protein core to produce a "bead-like" structure called a nucleosome. This gives a packing ratio of about 6. This structure is invariant in both the euchromatin and heterochromatin of all chromosomes. The second level of packing is the coiling of beads in a helical structure called the 30 nm fiber that is found in both interphase chromatin and mitotic chromosomes. g, l, d, b, j. This structure increases the packing ratio to about 40. The final packaging occurs when the fiber is organized in loops, scaffolds and domains that give a final packing ratio of about 1000 in interphase chromosomes and about 10,000 in mitotic chromosomes. Eukaryotic chromosomes consist of a DNA-protein complex that is organized in a compact manner which permits the large amount of DNA to be stored in the nucleus of the cell. The subunit designation of the chromosome is chromatin. The fundamental unit of chromatin is the nucleosome.

The structure is an open one, and its water content is rather high. At lower water contents we would expect the bases to tilt so that the structure could become more compact.

The novel feature of the structure is the manner in which the two chains are held together by the purine and pyrimidine bases. The planes of the bases are perpendicular to the fibre axis. The are joined together in pairs, a single base from the other chain, so that the two lie side by side with identical z-co-ordinates. One of the pair must be a purine and the other a pyrimidine for bonding to occur. The hydrogen bonds are made as follows : purine position 1 to pyrimidine position 1 ; purine position 6 to pyrimidine position 6.

If it is assumed that the bases only occur in the structure in the most plausible tautomeric forms (that is, with the keto rather than the enol configurations) it is found that only specific pairs of bases can bond together. These pairs are : adenine (purine) with thymine (pyrimidine), and guanine (purine) with cytosine (pyrimidine).

In other words, if an adenine forms one member of a pair, on either chain, then on these assumptions the other member must be thymine ; similarly for guanine and cytosine. The sequence of bases on a single chain does not appear to be restricted in any way. However, if only specific pairs of bases can be formed, it follows that if the sequence of bases on one chain is given, then the sequence on the other chain is automatically determined.

It has been found experimentally (3,4) that the ratio of the amounts of adenine to thymine, and the ration of guanine to cytosine, are always bery close to unity for deoxyribose nucleic acid.

It is probably impossible to build this structure with a ribose sugar in place of the deoxyribose, as the extra oxygen atom would make too close a van der Waals contact. The previously published X-ray data (5,6) on deoxyribose nucleic acid are insufficient for a rigorous test of our structure. So far as we can tell, it is roughly compatible with the experimental data, but it must be regarded as unproved until it has been checked against more exact results. Some of these are given in the following communications. We were not aware of the details of the results presented there when we devised our structure, which rests mainly though not entirely on published experimental data and stereochemical arguments.

It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic mate