Microbiology Reader
Equipment to run microbiology work automatically

Growth Curves of any strain.
Microbiological calculations.

Microbiology Home
Microbioloy Reader
Growth Curves
Photo Album
Microorganisms
Software
Download
Purchasing
Contact Us

 

What Is Genome?

In biology the genome of an organism is the whole hereditary information of an organism that is encoded in the DNA (or,for some viruses, RNA). This includes both the genes and the non-coding sequences.

More precisely, the genome of an organism is a complete DNA sequence of one set of chromosomes; for example, one of the two sets that a diploid individual carries in every somatic cell. When people say that the genome of a sexually reproducing species has been "sequenced," typically they are referring to a determination of the sequences of one set of autosomes and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite from the chromosomes of various individuals. In general use, the phrase genetic makeup is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics, which distinguishes it from genetics which generally studies the properties of single genes or groups of genes.

Types of genomes Most biological entities more complex than a virus sometimes or always carry additional genetic material besides that which resides in their chromosomes. In some contexts, such as sequencing the genome of a pathogenic microbe, "genome" is meant to include this auxiliary material, which is carried in plasmids. In such circumstances then, "genome" describes all of the genes and non-coding DNA that have the potential to be present.

In vertebrates such as humans, however, "genome" carries the typical connotation of only chromosomal DNA. So although human mitochondria contain genes, these genes are not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome, often referred to as the "mitochondrial genome".

Genomes and genetic variation Note that a genome does not capture the genetic diversity or the genetic polymorphism of a species. For example, the human genome sequence in principle could be determined from just half the DNA of one cell from one individual. To learn what variations in DNA underlie particular traits or diseases requires comparisons across individuals. This point explains the common usage of "genome" (which parallels a common usage of "gene") to refer not to any particular DNA sequence, but to a whole family of sequences that share a biological context.

Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.

Genome projects Main article: Genome project

The Human Genome Project was organized to map and to sequence the human genome. Other genome projects include mouse, rice, the plant Arabidopsis thaliana, the puffer fish, bacteria like E. coli, etc. Many genomes have been sequenced by various genome projects. The cost of sequencing continues to drop, and it is possible that eventually an individual's genome could be sequenced for around several thousand dollars (US).

Genome evolution Genomes are more than the sum of an organism's genes and have traits that may be measured and studied without reference to the details of any particular genes and their products. Researchers compare traits such as chromosome number, chromosome size, gene order, codon usage bias, and G-C content to determine what mechanisms could have produced the great variety of genomes that exist today.

Duplications play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes. Such duplications are probably fundamental to the creation of genetic novelty.

Horizontal gene transfer is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells seem to have experienced a transfer of some genetic material from their chloroplast and mitochondrial genomes to their nuclear chromosomes

Genomics is the study of an organism's genome and the use of the genes. It deals with the systematic use of genome information, associated with other data, to provide answers in biology, medicine, and industry.

Genomics has the potential of offering new therapeutic methods for the treatment of some diseases, as well as new diagnostic methods. Other applications are in the food and agriculture sectors. The major tools and methods related to genomics are bioinformatics, genetic analysis, measurement of gene expression, and determination of gene function.

Genomics appeared in the 1980s and took off in the 1990s with the initiation of genome projects for several species. The related field of genetics is the study of genes and their role in inheritance.

The first genome to be sequenced in its entirety was that of bacteriophage FX174 (5,368 kb) in 1980. The first free-living organism to be sequenced was that of Haemophilus influenzae (1.8Mb) in 1995, and since then genomes are being sequenced at a rapid pace. A rough draft of the human genome was completed by the Human Genome Project in early 2001 amid much fanfare.

Comparison of genomes has resulted in some surprising biological discoveries. If a particular DNA sequence or pattern is present among many members of a clade, that sequence is said to have been conserved among the species. Evolutionary conservation of a DNA sequence may imply that it confers a relative selective advantage to the organisms that possess it. Conservation also suggests that sequence has functional significance. It may be a protein coding sequence or regulatory region. Experimental investigation of some of these sequences has shown that some are transcribed into small RNA molecules, although the functions of these RNAs were not immediately apparent.

The identification of similar sequences (including many genes) in two distantly related organisms, but not in other members of one of the clades, has led to the theory that these sequences were acquired by horizontal gene transfer. This phenomenon is most prominent in thermophilic bacteria, where it seems that genes were transferred from Archaea to Eubacteria. It has also been noticed that bacterial genes exist in eukaryotic nuclear genomes and that these genes generally encode mitochondrial and plastid proteins, giving support to the endosymbiotic theory of the origin of these organelles.

It is often stated that a particular organism shares X percent of its DNA with humans. This number indicates the percentage of base pairs that are identical between the two species. Here is a list of genetic similarity to humans, with sources, where known.

While these numbers come from various secondary sources, the data may have originated from measures of DNA-DNA hybridization or from direct sequence comparisons.

Informally, an omics is a neologism referring to a field of study in biology, ending in the suffix -omics such as genomics or proteomics. The related neologism omes are the objects of study of the field such the genome or proteome, respectively (omes stems from the Greek for 'all', 'every' or 'complete').

The original use of the suffix "ome" was in the word "genome", which refers to the complete genetic makeup of an organism. Because of the success of large-scale quantitative biology projects such as genome sequencing, the suffix "ome" has been extended to a host of other contexts. Bioinformaticians and some molecular biologists were amongst the first scientists to start to apply the "ome" suffix widely.

New 'omes' are currently being defined by people in the field of bioinformatics. Observers have claimed that they vie for the most ridiculous 'ome', much like humourous names assigned to genes in the field of Drosophila developmental genetics. For example, son of sevenless and Darth Vader are both genes involved in regulating the cell cycle.

The omes are a useful way for computational biologists to encapsulate a particular class of cellular processes, or information processing related mechanisms.

Bioinformatics or computational biology is the use of mathematical and informational techniques, including statistics, to solve biological problems, usually by creating or using computer programs, mathematical models or both. One of the main areas of bioinformatics is the data mining and analysis of the data gathered by the various genome projects. Other areas are sequence alignment, protein structure prediction, systems biology, protein-protein interactions and virtual evolution. As a summary, the various genome projects produce many long lists of letters and one of the roles of bioinformatics is to attempt to determine the words, grammar, sentences and ultimately, meaning (functional significance) of those letters. There are many who hope that developments in this field will ultimately help in the discoveries of cures for various diseases including cancer.

Since the Epstein-Barr virus was sequenced in 1984, the DNA sequence of more and more organisms is stored in electronic databases. These data are analyzed to determine genes that code for proteins, as well as regulatory sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees). With the growing amount of data, it becomes impossible to analyze DNA sequences manually. Today, computer programs are used to find similar sequences in the genome of dozens of organisms, within billions of nucleotides. These programs can compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical. A variant of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing (that was used, for example, by Celera Genomics to sequence the human genome) does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600 nucleotides long). The ends of these fragments overlap and, aligned in the right way, make up the complete genome. Shotgun sequencing yields sequence data quickly, but the task to re-align the fragments can be quite complicated for larger genomes. In the case of the Human Genome Project, it took several months on a supercomputer array to align them correctly. Shotgun sequencing is generally preferred for smaller genomes, such as bacteria, and often used at least partially on organisms with much larger genomes.

Another aspect of bioinformatics in sequence analysis is the automatic search for genes and regulatory sequences within a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose. This so-called junk DNA may, however, contain unrecognized functional elements. Bioinformatics helps to bridge the gap between genome and proteome projects, for example in the use of DNA sequence for protein identification.

Biochemistry, 2000 Sep 5, 39(35), 10877 - 83
Sensitive monitoring of the dynamics of a membrane-bound transport protein by tryptophan phosphorescence spectroscopy; Broos J et al.; This paper presents a tryptophan phosphorescence spectroscopy study on the membrane-bound mannitol transporter, EII(mtl), from E . coli . The protein contains four tryptophans at positions 30, 42, 109, and 117 . Phosphorescence decays in buffer at 1 degrees C revealed large variations of the triplet lifetimes of the wild-type protein and four single-tryptophan-containing mutants . They ranged from <70 microseconds for the tryptophan at position 109 to 55 ms for the residue at position 30, attesting to widely different flexibilities of the tryptophan microenvironments . The decay of all tryptophans is multiexponential, reflecting multiple stable conformations of the protein . Both mannitol binding and enzyme phosphorylation had large effects on the triplet lifetimes . Mannitol binding induces a more ordered structure near the mannitol binding site, and the decay becomes significantly more homogeneous . In contrast, enzyme phosphorylation induces a large relaxation of the protein structure at the reporter sites . The implications of these structural changes on the coupling mechanism between the transport and the phosphorylation activity of EII(mtl) are discussed . Taken as a whole, our data show that tryptophan phosphorescence spectroscopy is a very sensitive technique to explore conformational dynamics in membrane proteins.

Biochemistry, 2000 Sep 5, 39(35), 10812 - 22
Proteolysis of the exodomain of recombinant protease-activated receptors: prediction of receptor activation or inactivation by MALDI mass spectrometry; Loew D et al.; Protease-activated receptors (PARs) mediate cell activation after proteolytic cleavage of their extracellular amino terminus . Thrombin selectively cleaves PAR1, PAR3, and PAR4 to induce activation of platelets and vascular cells, while PAR2 is preferentially cleaved by trypsin . In pathological situations, other proteolytic enzymes may be generated in the circulation and could modify the responses of PARs by cleaving their extracellular domains . To assess the ability of such proteases to activate or inactivate PARs, we designed a strategy for locating cleavage sites on the exofacial NH(2)-terminal fragments of the receptors . The first extracellular segments of PAR1 (PAR1E) and PAR2 (PAR2E) expressed as recombinant proteins in Escherichia coli were incubated with a series of proteases likely to be encountered in the circulation during thrombosis or inflammation . Kinetic and dose-response studies were performed, and the cleavage products were analyzed by MALDI-TOF mass spectrometry . Thrombin cleaved PAR1E at the Arg41-Ser42 activation site at concentrations known to induce cellular activation, supporting a native conformation of the recombinant polypeptide . Plasmin, calpain and leukocyte elastase, cathepsin G, and proteinase 3 cleaved at multiple sites and would be expected to disable PAR1 by cleaving COOH-terminal to the activation site . Cleavage specificities were further confirmed using activation site defective PAR1E S42P mutant polypeptides . Surface plasmon resonance studies on immobilized PAR1E or PAR1E S42P were consistent with cleavage results obtained in solution and allowed us to determine affinities of PAR1E-thrombin binding . FACS analyses of intact platelets confirmed the cleavage of PAR1 downstream of the Arg41-Ser42 site . Mass spectrometry studies of PAR2E predicted activation of PAR2 by trypsin through cleavage at the Arg36-Ser37 site, no effect of thrombin, and inactivation of the receptor by plasmin, calpain and leukocyte elastase, cathepsin G, and proteinase 3 . The inhibitory effect of elastase was confirmed on native PAR1 and PAR2 on the basis of Ca(2+) signaling studies in endothelial cells . It was concluded that none of the main proteases generated during fibrinolysis or inflammation appears to be able to signal through PAR1 or PAR2 . This strategy provides results which can be extended to the native receptor to predict its activation or inactivation, and it could likewise be used to study other PARs or protease-dependent processes.

Biochemistry, 2000 Sep 5, 39(35), 10730 - 8
Steady-state kinetic mechanism of recombinant avocado ACC oxidase: initial velocity and inhibitor studies; Brunhuber NM et al.; The gaseous plant hormone ethylene modulates a wide range of biological processes, including fruit ripening . It is synthesized by the ascorbate-dependent oxidation of 1-aminocyclopropyl-1-carboxylate (ACC), a reaction catalyzed by ACC oxidase . Recombinant avocado (Persea americana) ACC oxidase was expressed in Escherichia coli and purified in milligram quantities, resulting in high levels of ACC oxidase protein and enzyme activity . An optimized assay for the purified enzyme was developed that takes into account the inherent complexities of the assay system . Fe(II) and ascorbic acid form a binary complex that is not the true substrate for the reaction and enhances the degree of ascorbic acid substrate inhibition . The K(d) value for Fe(II) (40 nM, free species) and the K(m)'s for ascorbic acid (2.1 mM), ACC (62 microM), and O(2) (4 microM) were determined . Fe(II) and ACC exhibit substrate inhibition, and a second metal binding site is suggested . Initial velocity measurements and inhibitor studies were used to resolve the kinetic mechanism through the final substrate binding step . Fe(II) binding is followed by either ascorbate or ACC binding, with ascorbate being preferred . This is followed by the ordered addition of molecular oxygen and the last substrate, leading to the formation of the catalytically competent complex . Both Fe(II) and O(2) are in thermodynamic equilibrium with their enzyme forms . The binding of a second molecule of ascorbic acid or ACC leads to significant substrate inhibition . ACC and ascorbate analogues were used to confirm the kinetic mechanism and to identify important determinants of substrate binding.

Biochemistry, 2000 Sep 5, 39(35), 10711 - 9
Interaction of flavodoxin with cobalamin-dependent methionine synthase; Hall DA et al.; Cobalamin-dependent methionine synthase catalyzes the transfer of a methyl group from methyltetrahydrofolate to homocysteine, forming tetrahydrofolate and methionine . The Escherichia coli enzyme, like its mammalian homologue, is occasionally inactivated by oxidation of the cofactor to cob(II)alamin . To return to the catalytic cycle, the cob(II)alamin forms of both the bacterial and mammalian enzymes must be reductively remethylated . Reduced flavodoxin donates an electron for this reaction in E . coli, and S-adenosylmethionine serves as the methyl donor . In humans, the electron is thought to be provided by methionine synthase reductase, a protein containing a domain with a significant degree of homology to flavodoxin . Because of this homology, studies of the interactions between E . coli flavodoxin and methionine synthase provide a model for the mammalian system . To characterize the binding interface between E . coli flavodoxin and methionine synthase, we have employed site-directed mutagenesis and chemical cross-linking using carbodiimide and N-hydroxysuccinimide . Glutamate 61 of flavodoxin is identified as a cross-linked residue, and lysine 959 of the C-terminal activation domain of methionine synthase is assigned as its partner . The mutation of lysine 959 to threonine results in a diminished level of cross-linking, but has only a small effect on the affinity of methionine synthase for flavodoxin . Identification of these cross-linked residues provides evidence in support of a docking model that will be useful in predicting the effects of mutations observed in mammalian homologues of E . coli flavodoxin and methionine synthase.

Biochemistry, 2000 Sep 5, 39(35), 10702 - 10
Structural analysis of glyceraldehyde 3-phosphate dehydrogenase from Escherichia coli: direct evidence of substrate binding and cofactor-induced conformational changes; Yun M et al.; The crystal structures of gyceraldehyde 3-phosphate dehydrogenase (GAPDH) from Escherichia coli have been determined in three different enzymatic states, NAD(+)-free, NAD(+)-bound, and hemiacetal intermediate . The NAD(+)-free structure reported here has been determined from monoclinic and tetragonal crystal forms . The conformational changes in GAPDH induced by cofactor binding are limited to the residues that bind the adenine moiety of NAD(+) . Glyceraldehyde 3-phosphate (GAP), the substrate of GAPDH, binds to the enzyme with its C3 phosphate in a hydrophilic pocket, called the "new P(i)" site, which is different from the originally proposed binding site for inorganic phosphate . This observed location of the C3 phosphate is consistent with the flip-flop model proposed for the enzyme mechanism {Skarzynski, T., Moody, P . C., and Wonacott, A . J . (1987) J . Mol . Biol . 193, 171-187} . Via incorporation of the new P(i) site in this model, it is now proposed that the C3 phosphate of GAP initially binds at the new P(i) site and then flips to the P(s) site before hydride transfer . A superposition of NAD(+)-bound and hemiacetal intermediate structures reveals an interaction between the hydroxyl oxygen at the hemiacetal C1 of GAP and the nicotinamide ring . This finding suggests that the cofactor NAD(+) may stabilize the transition state oxyanion of the hemiacetal intermediate in support of the flip-flop model for GAP binding.

Biochemistry, 2000 Sep 5, 39(35), 10662 - 76
Evolution of enzymatic activity in the enolase superfamily: structure of o-succinylbenzoate synthase from Escherichia coli in complex with Mg2+ and o-succinylbenzoate; Thompson TB et al.; The X-ray structures of the ligand free (apo) and the Mg(2+)*o-succinylbenzoate (OSB) product complex of o-succinylbenzoate synthase (OSBS) from Escherichia coli have been solved to 1.65 and 1.77 A resolution, respectively . The structure of apo OSBS was solved by multiple isomorphous replacement in space group P2(1)2(1)2(1); the structure of the complex with Mg(2+)*OSB was solved by molecular replacement in space group P2(1)2(1)2 . The two domain fold found for OSBS is similar to those found for other members of the enolase superfamily: a mixed alpha/beta capping domain formed from segments at the N- and C-termini of the polypeptide and a larger (beta/alpha)(7)beta barrel domain . Two regions of disorder were found in the structure of apo OSBS: (i) the loop between the first two beta-strands in the alpha/beta domain; and (ii) the first sheet-helix pair in the barrel domain . These regions are ordered in the product complex with Mg(2+)*OSB . As expected, the Mg(2+)*OSB pair is bound at the C-terminal end of the barrel domain . The electron density for the phenyl succinate component of the product is well-defined; however, the 1-carboxylate appears to adopt multiple conformations . The metal is octahedrally coordinated by Asp(161), Glu(190), and Asp(213), two water molecules, and one oxygen of the benzoate carboxylate group of OSB . The loop between the first two beta-strands in the alpha/beta motif interacts with the aromatic ring of OSB . Lys(133) and Lys(235) are positioned to function as acid/base catalysts in the dehydration reaction . Few hydrogen bonding or electrostatic interactions are involved in the binding of OSB to the active site; instead, most of the interactions between OSB and the protein are either indirect via water molecules or via hydrophobic interactions . As a result, evolution of both the shape and the volume of the active site should be subject to few structural constraints . This would provide a structural strategy for the evolution of new catalytic activities in homologues of OSBS and a likely explanation for how the OSBS from Amycolaptosis also can catalyze the racemization of N-acylamino acids {Palmer, D . R., Garrett, J . B., Sharma, V., Meganathan, R., Babbitt, P . C., and Gerlt, J . A . (1999) Biochemistry 38, 4252-4258}.

Biochemistry, 2000 Sep 5, 39(35), 10656 - 61
Site-directed sulfhydryl labeling of the lactose permease of Escherichia coli: helix X; Venkatesan P et al.; Helix X in the lactose permease of Escherichia coli contains two residues that are irreplaceable with respect to active transport, His322 and Glu325, as well as Lys319, which is charge-paired with Asp240 in helix VII . Structural and dynamic features of transmembrane helix X are investigated here by site-directed thiol modification of 14 single-Cys replacement mutants with N-{(14)C}ethylmaleimide (NEM) in right-side-out membrane vesicles . Permease mutants with a Cys residue at position 326, 327, 329, 330, or 331 in the cytoplasmic half of the transmembrane domain are alkylated by NEM at 25 degrees C, a mutant with Cys at position 315 at the periplasmic surface is labeled in the presence of substrate exclusively, and mutants with Cys at positions 317, 318, 320, 321, 324, 328, 332, or 333 do not react with NEM under the conditions tested . Binding of substrate causes increased labeling of a Cys residue at position 315 and decreased labeling of Cys residues at positions 326, 327, and 329 . Studies with methanethiosulfonate ethylsulfonate indicate that Cys residues at positions 326, 329, 330, and 331 in the cytoplasmic half are accessible to the aqueous phase from the periplasmic face of the membrane . Ligand binding results in clear attenuation of solvent accessibility of Cys at position 326 and a marginal increase in accessibility of Cys at position 327 to solvent . The findings indicate that the cytoplasmic half of helix X is more reactive/accessible to thiol reagents and more exposed to solvent than the periplasmic half . Furthermore, positions that reflect ligand-induced conformational changes are located on the same face of helix X as Lys319, His322, and Glu325.

Biochemistry, 2000 Sep 5, 39(35), 10649 - 55
Site-directed sulfhydryl labeling of the lactose permease of Escherichia coli: N-ethylmaleimide-sensitive face of helix II; Venkatesan P et al.; Cys-scanning mutagenesis of helix II in the lactose permease of Escherichia coli {Frillingos, S., Sun, J . et al . (1997) Biochemistry 36, 269-273} indicates that one face contains positions where Cys replacement or Cys replacement followed by treatment with N-ethylmaleimide (NEM) significantly inactivates the protein . In this study, site-directed sulfhydryl modification is utilized in situ to study this face of helix II . {(14)C}NEM labeling of 13 single-Cys mutants, including the nine NEM-sensitive Cys replacements, in right-side-out membrane vesicles is examined . Permease mutants with a single-Cys residue in place of Gly46, Phe49, Gln60, Ser67, or Leu70 are alkylated by NEM at 25 degrees C in 10 min, and mutants with Cys in place of Thr45 and Ser53 are labeled only in the presence of ligand, while mutants with Cys in place of Ile52, Ser56, Leu57, Leu62, Phe63, or Leu65 do not react . Binding of substrate leads to a marked increase in labeling of Cys residues at positions 45, 49, or 53 in the periplasmic half of helix II and a slight decrease in labeling of Cys residues at positions 60 or 67 in the cytoplasmic half . Labeling studies with methanethiosulfonate ethylsulfonate (MTSES) show that positions 45 and 53 are accessible to solvent in the presence of ligand only, while positions 46, 49, 67, and 70 are accessible to solvent in the absence or presence of ligand . Position 60 is also exposed to solvent, and substrate binding causes a decrease in solvent accessibility . The findings demonstrate that the NEM-sensitive face of helix II participates in ligand-induced conformational changes . Remarkably, this membrane-spanning face is accessible to the aqueous phase from the periplasmic side of the membrane . In the following paper in this issue {Venkatesan, P., Hu, Y., and Kaback, H . R . (2000) Biochemistry 39, 10656-10661}, the approach is applied to helix X.

Biochemistry, 2000 Sep 5, 39(35), 10641 - 8
Site-directed sulfhydryl labeling of the lactose permease of Escherichia coli: helix VII; Venkatesan P et al.; Site-directed sulfhydryl modification in situ is employed to investigate structural and dynamic features of transmembrane helix VII and the beginning of the periplasmic loop between helices VII and VIII (loop VII/VIII) . Essentially all of the Cys-replacement mutants in the periplasmic half of the helix and the portion of loop VII/VIII tested are labeled by N-{(14)C}ethylmaleimide (NEM) . In contrast, with the exception of two mutants at the cytoplasmic end of helix VII, none of the mutants in the cytoplasmic half react with the alkylating agent . Labeling of most of the mutants is unaltered by ligand at 25 degrees C . However, at 4 degrees C, conformational changes induced by substrate binding become apparent . In the presence of ligand, permease mutants with a Cys residue at position 241, 242, 244, 245, 246, or 248 undergo a marked increase in labeling, while the reactivity of a Cys at position 238 is slightly decreased . Labeling of the remaining Cys-replacement mutants is unaffected by ligand . Studies with methanethiosulfonate ethylsulfonate (MTSES), a hydrophilic impermeant thiol reagent, show that most of the positions that react with NEM are accessible to MTSES; however, the two NEM-reactive mutants at the cytoplasmic end of helix VII and position 236 in the middle of the membrane-spanning domain are not . The findings demonstrate that positions in helix VII that reflect ligand-induced conformational changes are located in the periplasmic half and accessible to the aqueous phase from the periplasmic face of the membrane . In the following papers in this issue (Venkatesan, P., Lui, Z., Hu, Y., and Kaback H . R.; Venkatesan, P., Hu, Y., and Kaback H . R.), the approach is applied to helices II and X.

Biochemistry, 2000 Sep 5, 39(35), 10613 - 8
Interaction between two discontiguous chain segments from the beta-sheet of Escherichia coli thioredoxin suggests an initiation site for folding; Tasayco ML et al.; The approach of comparing folding and folding/binding processes is exquisitely poised to narrow down the regions of the sequence that drive protein folding . We have dissected the small single alpha/beta domain of oxidized Escherichia coli thioredoxin (Trx) into three complementary fragments (N, residues 1-37; M, residues 38-73; and C, residues 74-108) to study them in isolation and upon recombination by far-UV CD and NMR spectroscopy . The isolated fragments show a minimum of ellipticity of ca . 197 nm in their far-UV CD spectra without concentration dependence, chemical shifts of H(alpha) that are close to the random coil values, and no medium- and long-range NOE connectivities in their three-dimensional NMR spectra . These fragments behave as disordered monomers . Only the far-UV CD spectra of binary or ternary mixtures that contain N- and C-fragments are different from the sum of their individual spectra, which is indicative of folding and/or binding of these fragments . Indeed, the cross-peaks corresponding to the rather hydrophobic beta(2) and beta(4) regions of the beta-sheet of Trx disappear from the (1)H-(15)N HSQC spectra of isolated labeled N- and C-fragments, respectively, upon addition of the unlabeled complementary fragments . The disappearing cross-peaks indicate interactions between the beta(2) and beta(4) regions, and their reappearance at lower temperatures indicates unfolding and/or dissociation of heteromers that are predominantly held by hydrophobic forces . Our results argue that the folding of Trx begins by zippering two discontiguous and rather hydrophobic chain segments (beta(2) and beta(4)) corresponding to neighboring strands of the native beta-sheet.

Enzyme Microb Technol, 2000 Oct 1, 27(7), 475 - 481
Molecular cloning and expression of a novel family A endoglucanase gene from Fibrobacter succinogenes S85 in Escherichia coli; Cho KK et al.; A Fibrobacter succinogenes S85 gene that encodes endoglucanase hydrolysing CMC and xylan was cloned and expressed in Escherichia coli DH5 by using pUC19 vector . Recombinant plasmid DNA from a positive clone hydrolysing CMC and xylan was designated as pCMX1, harboring 2,043 bp insert . The entire nucleotide sequence was determined, and an open-reading frame (ORF) was deduced . The nucleotide sequence accession number of the cloned gene sequence in Genbank is U94826 . The endoglucanase gene cloned in this study does not have amino sequence homology to the other endoglucanase genes from F . succinogenes S85, but does show sequence homology to family 5 (family A) of glycosyl hydrolases from several species . The ORF encodes a polypeptide of 654 amino acids with a measured molecular weight of 81.3 kDa on SDS-PAGE . Putative signal sequences, Shine-Dalgarno-type ribosomal binding site and promoter sequences (-10) related to the consensus promoter sequences were deduced . The recombinant endoglucanase by E . coli harboring pCMX1 was partially purified and characterized . N-terminal sequences of endoglucanase were Ala-Gln-Pro-Ala-Ala, matched with deduced amino sequences . The temperature range and pH for optimal activity of the purified enzyme were 55 approximately 65 degrees C and 5.5, respectively . The enzyme was most stable at pH 6 but unstable under pH 4 with a K(m) value of 0.49% CMC and a V(max) value of 152 U/mg.

Curr Microbiol, 2000 Oct, 41(4), 295 - 9
Overexpression of protein disulfide isomerase in Aspergillus; El-Adawi H et al.; One of the major problems with the production of biotechnologically valuable proteins has been the purification of the product . For Escherichia coli and Saccharomyces cerevisiae, there are several techniques for the purification of intracellular proteins, but these are time consuming and often result in poor yields . Purification can be considerably facilitated, if the product is secreted from the host cell . In the work presented, we have constructed an expression vector (pSGNH2) for the secretion of protein disulfide isomerase (PDI; EC 5.3.4.1) from Aspergillus niger, in which the retention signal His-Asp-Glu-Leu (H-D-E-L) was modified to Ala-Leu-Glu-Gln (A-L-E-Q) via the polymerase chain reaction (PCR) method . The PDI gene was placed under the control of the A . oryzae alpha-amylase promoter . This expression vector was transformed into A . niger NRRL3, resulting in PDI secretion into the medium . The catalytic activity of overexpressed PDI from A . niger was indistinguishable from that of PDI isolated from bovine liver . With further strain improvement and optimization of culture conditions, it could be possible to raise the PDI production to the bioprocessing scale.

J Ocul Pharmacol Ther, 2000 Aug, 16(4), 353 - 61
Constitutive cyclooxygenase-1 and induced cyclooxygenase-2 in isolated human iris inhibited by S(+) flurbiprofen; van Haeringen NJ et al.; The purpose of the present study was to characterize the isoforms of cyclooxygenase (COX) in the human iris before and after stimulation with lipopolysaccharide (LPS) and to determine the selectivity of the nonsteroidal anti-inflammatory drug (NSAID), S(+) flurbiprofen, for inhibition of COX-1 and COX-2 in homogenates of this tissue . Spotblots were made of extracts of human iris in the absence and presence of LPS plus acetylsalicylic acid (aspirin) . After reacting with anti-COX-1 and anti-COX-2 immunoglobulin G, the presence of both immunoreactive COX enzymes was substantiated using an indirect immunoperoxidase method . Authentic COX-1 and COX-2 were used as controls . Using an enzyme immune assay (EIA), the production of prostaglandin E2 (PGE2) was quantified in tissue homogenates of human iris under the same conditions as described above . S(+) flurbiprofen was added to tissue homogenates in order to determine the inhibitory effect on PGE2 production . Half maximal inhibitory concentrations (IC50) of S(+) flurbiprofen for the PGE2 production in the tissue homogenates were determined from concentration inhibition curves . The selectivity of S(+) flurbiprofen for inhibition of COX-1 was expressed as the ratio of IC50 for COX-2/COX-1 . Spotblots of nonstimulated iris-extracts showed positive staining for COX-1 immunoreactivity (-ir) only . After incubation with LPS plus acetylsalicylic acid, positive staining was observed for both COX-1-ir and COX-2-ir . Concentrations of PGE2 released from homogenates of untreated iris varied from 1.5-4 ng/ml, and of LPS-stimulated tissue from 10-20 ng/ml of assay mixture . S(+) flurbiprofen inhibited PGE2 production of untreated tissue homogenates at an IC50 of 8 x 10(-10) M whereas, in the stimulated tissue, IC50 was found to be 3 x 10(-6) M . The selectivity of S(+) flurbiprofen for inhibition of constitutively present COX-1, relative to the inhibition of induced COX-2, was 3,600 . Our results indicate that specific expression of COX isoforms in normal human iris was substantiated at the protein level by immunoreaction on spotblots . COX-1 represents the constitutively present enzyme, and COX-2 appears after stimulation with LPS . At the functional level, S(+) flurbiprofen possesses a specificity for COX-1 in inhibiting PGE2 production.

J Ocul Pharmacol Ther, 2000 Aug, 16(4), 345 - 52
Flurbiprofen and enantiomers in ophthalmic solution tested as inhibitors of prostanoid synthesis in human blood; van Haeringen NJ et al.; The purpose of this study was to assess the selectivity and potency of the nonsteroidal anti-inflammatory drug (NSAID), flurbiprofen, and its enantiomers in their inhibition of cyclooxygenase-1 (COX-1) and cyclooxygenase-2 (COX-2) . An assay was used with freshly drawn, heparinized human whole blood, incubated with 25 microM calcium ionophore A23187 during 60 min to produce thromboxane B2 (TXB2) by activity of COX-1 in platelets . Incubation with E . coli lipopolysaccharide (LPS) during 24 hr produced prostaglandin E2 (PGE2) by induction of COX-2 in monocytes, suppressing any possible contribution of COX-1 activity by the addition of acetylsalicylic acid . Concentration inhibition curves were determined with racemic, S(+), and R(-) flurbiprofen in final concentrations ranging from 10(-3) to 10(-10) M . The stereoselectivity of S(+) flurbiprofen vs . R(-) flurbiprofen, expressed as the reciprocal of the ratio of the concentrations giving 50% inhibition (IC50), is 340 for COX-1 and 56 for COX-2 . The selectivity for COX-1 vs . COX-2, expressed as the reciprocal ratio of the IC50, was 32 for racemic, 16 for S(+), and 5.3 for R(-) flurbiprofen . Meloxicam in the same assay showed COX-2 selectivity with a ratio of 0.19.

Mol Endocrinol, 2000 Sep, 14(9), 1377 - 86
A novel glucocorticoid receptor binding element within the murine c-myc promoter; Ma T et al.; In the course of analyzing the murine c-myc promoter response to glucocorticoid, we have identified a novel glucocorticoid response element that does not conform to the consensus glucocorticoid receptor-binding sequence . This c-myc promoter element has the sequence CAGGGTACATGGCGTATGTGTG, which has very little sequence similarity to any known response element . Glucocorticoids activate c-myc/reporter constructs that contain this element . Deletion of these sequences from the c-myc promoter increases basal activity of the promoter and blocks glucocorticoid induction . Insertion of this element into SV40/reporters inhibits basal reporter gene activity in the absence of glucocorticoids . Glucocorticoids stimulate activity of reporters that contain this element . Recombinant glucocorticoid receptor binds to this element in vitro . An unidentified cellular repressor also binds to this element . The activated glucocorticoid receptor displaces this protein(s) . We conclude that the glucocorticoid receptor binds to the c-myc promoter in competition with this protein, which is a repressor of transcription . To our knowledge, no glucocorticoid response element with such properties has ever been reported.

J Chromatogr A, 2000 Aug 18, 890(1), 37 - 43
High-performance chromatofocusing using linear and concave pH gradients formed with simple buffer mixtures . II . Separation of proteins; Kang X et al.; The separation of proteins using high-performance chromatofocusing with linear or concave pH gradients formed using simple mixtures of buffering species in the elution buffer is investigated experimentally . The separation achieved is comparable to that using polyampholyte elution buffers with these types of systems . More specifically, protein band widths at one half of the band height in the range between 0.1 and 0.025 pH units were observed, and good resolution was achieved of protein variants differing by a single amino acid residue in separation times of 30 min or less . An especially useful elution buffer is investigated that contains only four buffering species and that produces a linear pH gradient in the range between pH 9.5 and 6.0 when used together with a particular high-performance column packing made specifically for chromatofocusing . This elution buffer and column packing combination is evaluated by using it for the chromatofocusing of equine myoglobin and human hemoglobin variants . Additional applications are described in which a polyethyleneimine derivatized silica column packing and a pH gradient that is concave in shape are used for the separation of proteins in an E . coli cell lysate.

Ann Neurol, 2000 Sep, 48(3), 330 - 5
Late-onset optic atrophy, ataxia, and myopathy associated with a mutation of a complex II gene; Birch-Machin MA et al.; Genetic defects affecting the mitochondrial respiratory chain are an important cause of neurological disease . Previously, we identified a family with complex II deficiency and late-onset neurodegenerative disease with progressive optic atrophy, ataxia, and myopathy . The affected family members are now shown to carry a C-to-T transition in one allele of the nuclear gene encoding the flavoprotein subunit of complex II . Mutation of the equivalent base in Escherichia coli generates an inactive enzyme unable to bind flavin adenine dinucleotide covalently . Compatible with these findings, our patients have an approximate 50% decrease in complex II and succinate dehydrogenase activity . These results suggest that genetic defects of nuclear-encoded subunits of the mitochondrial respiratory chain can result in late-onset neurodegenerative disease.

Sheng Wu Gong Cheng Xue Bao, 2000 Mar, 16(2), 169 - 72
{Reactivation of denatured lysozyme with immobilized molecular chaperones GroE}; Dong XY et al.; The molecular chaperones GroEL and GroES were expressed in recombinant E . coli and purified by anion exchange chromatography . The renaturation of the denatured lysozyme with the free and immobilized GroEL/ES or GroEL was studied . We show here that using free GroEL alone could reactive the denatured lysozyme up to a relative activity of over 90% . The immobilized GroEL was also effective for promoting lysozyme refolding . Moreover, the optimal temperature (i.e., 37 degrees C) and (pH(i.e., 6 to 8) for the immobilizde GroEL-facilitated lysozyme refolding operation were determined . Under the optimal condition, the activity of lysozyme could be recovered up to 85% . In addition, the immobilized GroEL was repeatedly used five times without loss of its renaturation ability, indicating its potentiality to be used in practical downstream bioprocesses.

Sheng Wu Gong Cheng Xue Bao, 2000 Mar, 16(2), 158 - 60
{Cloning of taxadiene synthase cDNA from the cell line of Taxus cuspidata}; Hu GW et al.; Taxadiene synthase plays an important role in taxol biosynthesis . RT-PCR was used for cloning taxadiene synthase cDNA fragment from the cells of T . cuspidata . The cDNA was cloned into vector pGEM and transformed to E . coli J M109 . The cloned cDNA named pCBMZ was further confirmed by Southern blotting assay and was sequenced . The result showed that taxadiene synthase cDNA of Taxus cuspidata was highly homologous with that of Taxus brevifolia.

Sheng Wu Gong Cheng Xue Bao, 2000 Mar, 16(2), 150 - 4
{Research on renaturation of recombinant human pro-urokinase expressed from Escherichia coli}; Zhu H et al.; Recombinant human pro-urokinase forms insoluble inclusion body when overexpressed in Escherichia coli, and it must be denatured and renatured in vitro before it acquires activity . This study aimed to increase the renaturation yield of denatured pro-urokinase . We have evaluated the basic renaturation conditions of pro-urokinase through qualitative and quantitative analysis of pH, temperature, denaturant concentration, protein concentration, the ratio of reduced and oxidized thiol reagents . The effects of nonspecific additives, step-wise dilution and urea gradient dialysis have been also compared . The optimal conditions of pro-urokinase renaturation with the yield about 20%-30% have been obtained.

Sheng Wu Gong Cheng Xue Bao, 2000 Mar, 16(2), 134 - 6
{Cloning and sequence of cDNA encoding ACC synthase specifically expressed in banana fruit}; Wang XL et al.; A cDNA encoding ACC synthase in banana pulp was amplified by RT-PCR and cloned in E . coli . The 5' terminal region of the ACC synthase transcript was determined by using 5' RACE procedure . The results showed that the ACC synthase cDNA in banana pulp is 1752 bp in length including 74 bp of 5' untranslated region, 1461 bp of coding region which encodes a polypeptide of 486 amino acides and 217 bp of 3' untranslated region . The Northern blot analysis indicates that the ACC synthase mRNA is specifically transcripted in banana fruits.

Cancer Gene Ther, 2000 Aug, 7(8), 1179 - 87
Escherichia coli thymidylate synthase expression protects human cells from the cytotoxic effects of 5-fluorodeoxyuridine more effectively than human thymidylate synthase overexpression; Parsels LA et al.; In this study, we compared the relative abilities of human thymidylate synthase (hTS) and Escherichia coli thymidylate synthase (eTS) expression to confer resistance to the cytotoxic effects of treatment with the TS inhibitor 5-fluorodeoxyuridine (FdUrd) . G418-selected clones expressing either form of the protein were significantly more resistant than the lacZ-expressing clone, VALZ2, to FdUrd-induced cytotoxicity . Although eTS-expressing clones expressed 2- to 3-fold more TS protein than hTS-overexpressing clones, the representative eTS-expressing clone, VAEG8, and hTS-overexpressing clone, VAHGC, were equally sensitive to an FdUrd-induced loss of clonogenicity; in addition, a large fraction of either form of exogenously expressed TS appeared to be inactive in the intact cell . The clones differed, however, in their responses to leucovorin (LV) . Although LV significantly enhanced FdUrd-induced TS inhibition, growth inhibition, and cytotoxicity in VAHGC cells, it had no effect on these parameters in VAEG8 cells . These results suggest that eTS may more efficiently confer resistance to FdUrd plus LV when expressed for the purposes of a "host protection" strategy in vivo.

Protein Sci, 2000 Aug, 9(8), 1559 - 66
Static light scattering studies of OmpF porin: implications for integral membrane protein crystallization; Hitscherich C Jr et al.; Integral membrane proteins carry out some of the most important functions of living cells, yet relatively few details are known about their structures . This is due, in large part, to the difficulties associated with preparing membrane protein crystals suitable for X-ray diffraction analysis . Mechanistic studies of membrane protein crystallization may provide insights that will aid in determining future membrane protein structures . Accordingly, the solution behavior of the bacterial outer membrane protein OmpF porin was studied by static light scattering under conditions favorable for crystal growth . The second osmotic virial coefficient (B22) was found to be a predictor of the crystallization behavior of porin, as has previously been found for soluble proteins . Both tetragonal and trigonal porin crystals were found to form only within a narrow window of B22 values located at approximately -0.5 to -2 X 10(-4) mol mL g(-2), which is similar to the "crystallization slot" observed for soluble proteins . The B22 behavior of protein-free detergent micelles proved very similar to that of porin-detergent complexes, suggesting that the detergent's contribution dominates the behavior of protein-detergent complexes under crystallizing conditions . This observation implies that, for any given detergent, it may be possible to construct membrane protein crystallization screens of general utility by manipulating the solution properties so as to drive detergent B22 values into the crystallization slot . Such screens would limit the screening effort to the detergent systems most likely to yield crystals, thereby minimizing protein requirements and improving productivity.

Protein Sci, 2000 Aug, 9(8), 1530 - 9
Function of a conserved sequence motif in biotin holoenzyme synthetases; Kwon K et al.; The biotin holoenzyme synthetases (BHS) are essential enzymes in all organisms that catalyze post-translational linkage of biotin to biotin-dependent carboxylases . The primary sequences of a large number of these enzymes are now available and homologies are found among all . The glycine-rich sequence, GRGRXG, constitutes one of the homologous regions in these enzymes and, based on its similarity to sequences found in a number of mononucleotide binding enzymes, has been proposed to function in ATP binding in the BHSs . In the Escherichia coli enzyme, the only member of the family for which a three-dimensional structure has been determined, the conserved sequence is found in a partially disordered surface loop . Mutations in the sequence have previously been isolated and characterized in vivo . In this work these single-site mutants, G115S, R118G, and R119W, of the E . coli BHS have been purified and biochemically characterized with respect to binding of small molecule substrates and the intermediate in the biotinylation reaction . Results of this characterization indicate that, rather than functioning in ATP binding, this glycine-rich sequence is required for binding the substrate biotin and the intermediate in the biotinylation reaction, biotinyl-5'-AMP . These results are of general significance for understanding structure-function relationships in biotin holoenzyme synthetases.

Protein Sci, 2000 Aug, 9(8), 1519 - 29
Ligand binding and thermodynamic stability of a multidomain protein, calmodulin; Masino L et al.; Chemical and thermal denaturation of calmodulin has been monitored spectroscopically to determine the stability for the intact protein and its two isolated domains as a function of binding of Ca2+ or Mg2+ . The reversible urea unfolding of either isolated apo-domain follows a two-state mechanism with relatively low deltaG(o)20 values of approximately 2.7 (N-domain) and approximately 1.9 kcal/mol (C-domain) . The apo-C-domain is significantly unfolded at normal temperatures (20-25 degrees C) . The greater affinity of the C-domain for Ca2+ causes it to be more stable than the N-domain at {Ca2+} > or = 0.3 mM . By contrast, Mg2+ causes a greater stabilization of the N- rather than the C-domain, consistent with measured Mg2+ affinities . For the intact protein (+/-Ca2+), the bimodal denaturation profiles can be analyzed to give two deltaG(o)20 values, which differ significantly from those of the isolated domains, with one domain being less stable and one domain more stable . The observed stability of the domains is strongly dependent on solution conditions such as ionic strength, as well as specific effects due to metal ion binding . In the intact protein, different folding intermediates are observed, depending on the ionic composition . The results illustrate that a protein of low intrinsic stability is liable to major perturbation of its unfolding properties by environmental conditions and liganding processes and, by extension, mutation . Hence, the observed stability of an isolated domain may differ significantly from the stability of the same structure in a multidomain protein . These results address questions involved in manipulating the stability of a protein or its domains by site directed mutagenesis and protein engineering.

Protein Sci, 2000 Aug, 9(8), 1497 - 502
Bypassing the kinetic trap of serpin protein folding by loop extension; Im H et al.; The native form of some proteins such as strained plasma serpins (serine protease inhibitors) and the spring-loaded viral membrane fusion proteins are in a metastable state . The metastable native form is thought to be a folding intermediate in which conversion into the most stable state is blocked by a very high kinetic barrier . In an effort to understand how the spontaneous conversion of the metastable native form into the most stable state is prevented, we designed mutations of alpha1-antitrypsin, a prototype serpin, which can bypass the folding barrier . Extending the reactive center loop of alpha1-antitrypsin converts the molecule into a more stable state . Remarkably, a 30-residue loop extension allows conversion into an extremely stable state, which is comparable to the relaxed cleaved form . Biochemical data strongly suggest that the strain release is due to the insertion of the reactive center loop into the major beta-sheet, A sheet, as in the known stable conformations of serpins . Our results clearly show that extending the reactive center loop is sufficient to bypass the folding barrier of alpha1-antitrypsin and suggest that the constrain held by polypeptide connection prevents the conversion of the native form into the lowest energy state.

Protein structure prediction is another important application of bioinformatics. The amino acid sequence of a protein, the so-called primary structure, can be easily determined from the sequence on the gene that codes for it. But, the protein can only function correctly if it is folded in a very special and individual way (if it has the correct secondary, tertiary and quaternary structure). The prediction of this folding just by looking at the amino acid sequence is quite difficult. Several methods for computer predictions of protein folding are currently (as of 2004) under development.

One of the key principles in bioinformatics is homology. In the genomic branch of bioinformatics, homology is used to predict the function of a gene. If gene A is homologous to gene B of which the function is known, it is likely to have a similar function. In the structural branch of bioinformatics homology is used to determine which parts of the protein are important in structure formation and interaction with other proteins. In a technique called homology modelling, this information is used to predict the structure of a protein once the structure of a homologous protein is known. This currently remains the only way to predict protein structures reliably.

One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting oxygen in both organisms. Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes.

Systems biology involves the use of computer simulations of cellular subsystems (such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

DNA-DNA hybridization is a method in genetics to measure the degree of genetic similarity between DNA sequences. The technique is usually used to determine the genetic "distance" between two species. When several species are compared that way, the similarity values allow the species to be arranged in a phylogenetic tree; it is therefore one possible approach to carrying out molecular systematics.

Method The DNA from the two species to be compared is extracted, purified and cut into short pieces (e.g., 600-800 base pairs). The DNA double strand is then separated by heating into two single strands. The single-stranded DNA is now allowed to anneal with the DNA pieces of the other species. The more similar the DNA, the more of the pieces will anneal and form a hybrid (thus the name) double strand. Strands with a high degree of similarity will bind more firmly, and require more energy to separate them: i.e. they separate when heated at a higher temperature than dissimilar strands. To assess this "melting temperature" the mixture is heated in small steps. At each step, samples are tested as to the amount of single- and double-stranded DNA. These results in a profile from which the amount of similar DNA, and thus the degree of genetic similarity, can be determined.

Advantages and disadvantages This technique was considered a good one since it took all possible ways of aligning the sequences into account and the melting temperature would be a good average. However it is not considered the best approach these days since sequences can be computationally aligned. There is hardly any other approach used currently, however the sequence/s used for the comparisons are the major source of contention. Not all sequences evolve at the same rate. Some are too critical and changes can be lost if the result causes loss of function of the gene product/action. Finding the corresponding genes in more distant organisms can also be difficult. Some approaches have considered using non-coding sequences since these are not believed to be affected by evolution and all mutations in them would be retained faithfully. If one assumes a constant rate of background mutation, these mutations would indicate the age of the sequence lineage. However, these too are difficult for cross genera comparisons.

The mitochondrial genome is the genetic material of the mitochondria. The mitochondria are organelles that reproduce themselves semi-autonomously when the eukaryotic cells that they occupy divide.

The genetic material forming the mitochondrial genome is similar in structure to that of the prokaryotic genetic material. It is formed of a single circular DNA molecule. Mitochondia are thought to have arisen from intracellular bacterial symbiotes, this is called the endosymbiotic theory.

The mitochondria of a sexually-reproducing animal comes only from the mother's side. The mitochondrial DNA of a human being is essentially the same as that of his or her mother.

In this way, mitochondrial genetic diseases can affect both males and females, but can only be transmitted by females to their offspring.

Compared to the nuclear genome, the mitochondrial genome possesses some very interesting features:

All the genes are carried on a single circular DNA molecule. The genetic material is not bounded by a nuclear envelope. The DNA is not packed with proteins. The genome doesn't contain a lot of non-coding (junk DNA) areas. Some codons do not follow the universal rules in translation. Some bases are considered as a part of two different genes: as the last base of a gene and the first base of the next gene.

Diploid (meaning double in Greek) cells have two copies of each somatic chromosome (non-sex chromosomes), usually one from the mother and one from the father. Most somatic cells (body cells) of higher organisms are diploid or polyploid (three or more copies of each chromosome, often found in plants), whereas their reproductive cells are usually haploid (they have only one copy of each chromosome).

When reproducing, haploid sex cells (gametes) of both parents will generally merge to form a diploid cell, the zygote, with unique genetic properties, which quickly becomes the embryo.

A somatic cell is a type of cell in an organism, such as the human body.

Cells can be divided into two types- those that are part of the germline, and cells that are not. Somatic cells are those cells that are not part of the germline.

Your liver is made entirely of somatic cells, your heart is made entirely of somatic cells, your hands are made entirely of somatic cells. Somatic cells are also called "body cells."

The cells in the germline are cells such as the gametes (sperm or ovum,) cells that produce the gametes (such as gametocytes), and event the zygote, because it leads to the production of the gametes.

But genetic material in the cells of your liver or arm or hand will never make it to your children.

All somatic cells have 46 chromosomes, making them diploid cells. Gametes, in contrast, have only 23 chromosomes, making them haploid cells. (But not all non-somatic cells have only 23 chromosomes. Consider the zygote.)

Somatic cells can be used in cloning, by a process called Somatic cell nuclear transfer.

Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project was such a project. Some have argued that the era of genomics is one of the more fundamental advances in human history.

The Human Genome Project (HGP) endeavoured to map the human genome down to the nucleotide (or base pair) level and to identify all the 20-25,000 genes present in it.

History The $3 billion project was founded in 1990 by the United States Department of Energy and the U.S. National Institutes of Health, and was expected to take 15 years. Due to widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as huge advances in computing technology, a rough draft of the genome was finished in 2000 (announced jointly by US president Bill Clinton and British Prime Minister Tony Blair on June 26, 2000), two years earlier than planned.

The consortium comprised:

China France Germany Japan United Kingdom United States On April 14, 2003, a joint press release announced that the project had been successfully completed, with 99% of the genome sequenced with 99.99% accuracy.

Another reason for the accelerated work was the commercially financed HGP at Celera Genomics, which used a new method called shotgun sequencing, and also that Celera Genomics planned to patent all genes found, unlike the gene sequences found by the original publicly-funded HGP, which are in line with the so called Bermuda Statement (Feb 1996) made freely available to the public, 24 hours a day. This sort of competition proved to be very good for the Project.

Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published actual details of their drafts. Special issues of Nature (which published the publicly-funded project's scientific paper) and Science (which published Celera's paper) contained descriptions of the methods used to produce the draft sequence, as well as analysis of said sequence. These drafts are hoped to provide a 'scaffold' of about 90% of the genome upon which gaps can be closed.

Each draft sequence has been checked at least four to five times to increase 'depth of coverage' or accuracy. Approximately 47% of the draft were high-quality sequences - the final version will have been checked eight to nine times giving an error rate of just 1 in 10,000 bases.

The human genome project is one of a number of international genome projects in biology, each aimed at sequencing the DNA of a specific organism. While the human DNA sequence offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms including mice, fruitflies, zebrafish, yeast, nematodes and many microbial organisms and parasites.

In October 2004, researchers of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 100,000.

Goals The goals of the original HGP were not only to determine all 3 billion base pairs in the human genome with a minimal error rate, but also to identify all the genes in this vast amount of data. This part of the project is still ongoing although a preliminary count indicates about 25,000 genes in the human genome, which is far fewer than predicted by most scientists.

Another goal of the HGP was to develop faster, more efficient methods for DNA sequencing and sequence analysis and the transfer of these technologies to industry.

Today, the sequence of the human DNA is stored in databases and is available for everyone on the Internet. The U.S. National Center for Bioinformatics (and sister organizations in Europe and Japan) houses the genomic sequence, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz, and ENSEMBL present additional data and annotation and powerful tools for visualizing and searching through it. Computer programs have been developed to analyse that data, as the data itself is next to useless without interpretation.

The process of identifying the boundaries of genes and other features in raw DNA sequence is called annotation and is the domain of bioinformatics. While expert biologists make the best annotators, such annotation proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language, using concepts from computer science such as formal grammars.

All humans have unique genomic sequence, as such, the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined genome of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences between individuals. Most of the current effort in identifying differences between individuals involves single nucleotide polymorphisms.

Benefits The work on automated interpretation on genome data has just begun. The knowledge gained by the understanding of the genome is hoped to boost the fields of medicine and biotechnology, eventually leading to cures for cancer, Alzheimers disease and other diseases.

For example, a biological researcher investigating a certain form of cancer may have narrowed down their search to a particular gene. By visiting the human genome database on the world-wide web, this researcher can examine what other scientists have written about this gene, including (potentially) its three-dimensional structure, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruitflies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, diseases associated with this gene... the list of datatypes is long, one reason why bioinformatics is so challenging.

One particularly exciting technology arising from genomics is the DNA microarray (also called DNA chip), an array of probes for simultaneously measuring the amount of each of the 20,000+ human genes present in a given sample. This has aroused great interest as a potential diagnostic tool for science and medicine. It seems likely that there will be many more downstream technologies as a result of the human genome project.

On a more philosophical level, the analysis of similarities between DNA sequences from different organisms is opening new avenues in the study of evolution. In many cases, evolutionary questions can now be framed in terms of molecular biology; indeed, many major evolutionary milestones (the emergence of the ribosome and organelles, the development of embryos with body plans, the vertebrate immune system) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primates, and indeed the other mammals) are expected to be illuminated by the data from this project.

In chemistry, sequence analysis is a techniques used to determine the sequence of a polymer formed of several monomers.

In molecular biology and genetics, the same process is called simply "sequencing."

The term "sequence analysis" in biology implies subjecting a DNA or amino acid sequence to sequence alignment, sequence database, repeated sequence searches, or other bioinformatics methods on a computer.

In the field of bioinformatics, a sequence database is a large collection of DNA, protein, or other sequences stored on a computer. A database can include sequences from only one organism, as in databases including all the proteins in Saccharomyces cerevisiae, or it can include sequences from all organisms whose DNA has been sequenced.

Sequence databases can be searched using a variety of methods. The most common is probably searching for a sequence similar to a certain target protein or gene whose sequence is already known to the user. The BLAST program is a method of this type.

A major problem with all the large genetic sequence databases is that records are deposited in them from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological annotations attached to these sequences, vary tremendously in quality. Also there is much redundancy, as multiple labs often submit numerous sequences that are identical, or nearly identical, to others in the databases.

Many annotations are based not on laboratory experiments, but on the results of sequence similarity searches for previously-annotated sequences. Of course, once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This leads to the transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet-lab experimental information. Therefore, one must always regard the biological annotations in major sequence databases with a considerable degree of skepticism, unless they can be verified by reference to published papers describing high-quality experimental data, or at least by reference to a human-curated sequence database.

Heredity, similarity between parents and offspring. In biology, offspring resemble their parents because the offspring inherit genes, carried on DNA molecules, from their parents (see Genetics). The word heredity is also used in a non-biological sense, in human affairs, to refer to the inheritance of cultural or material goods, such as religious or political beliefs, or land or money. This article is mainly concerned with biological heredity.

In human beings, and many related forms of life, inheritance occurs by a set of detailed mechanisms, some (but not all) of which are well understood. In molecular terms, heredity is due to DNA. The DNA codes for genes, and the genes specify particular proteins. The DNA acts as a set of coded instructions for building a body, given a particular environment. At a reproductive level, inheritance is a sexual process. The offspring contain two copies of each gene, inherited from their two parents. At a cellular level, inheritance proceeds via meiosis, a special kind of cell division that produces the gametes (eggs in females, sperm in males). In meiosis, the two copies of each gene are reduced to a single copy. When male and female gametes combine, the double set is restored. This pattern of heredity is called Mendelian (see Mendel’s Laws).

However, the particular hereditary mechanism that is used in modern humans is only one of many ways in which heredity occurs in all of life. Viruses, bacteria, and other microbes use different hereditary mechanisms. In this article, we look at the variety of hereditary mechanisms, and how they are related to the forms of life that use them.

Moreover, theoretical biologists have thought up several further ways in which heredity could conceivably occur though in fact it does not. We shall also look at some of these theoretical hereditary mechanisms to see how they shed some light on why life uses the hereditary mechanisms it does rather than some alternative mechanisms.

II The Mechanisms of Heredity in the Main Forms of Life Print Preview of Section

Heredity is one of the main defining features of life as a whole. The existence of life probably requires three conditions to be met: reproduction, heredity, and variation. Any entities that possess these three attributes will be able to evolve, and may evolve into something we recognize as life. The three conditions are related. Heredity is impossible without reproduction. However, reproduction without heredity is possible. For example, fires can reproduce: a spark from one fire may ignite a second fire elsewhere. But the attributes of the “offspring” fire—its size, its duration, the pattern of flickering flames—depend on local features such as the supplies of combustible material, oxygen, and the wind, rather than on attributes of the “parental” fire. Fires show reproduction without heredity and do not evolve by natural selection; they are not alive.

In life, reproduction is of a kind that produces heredity; offspring tend to resemble their parents. All life reproduces by “template reproduction”; that is, the parental hereditary molecule acts as a template for the production of the offspring hereditary molecule. Template reproduction is the best-known example of a method of reproduction that produces heredity. However, not all template reproduction takes the same form as DNA replication. Other examples of template reproduction include photocopying, old-fashioned printing with metal-typeset or blocks of woodcut, and industrial processes in which molten metal is poured into a mould. All these types of template reproduction produce some form of heredity. Fires, by contrast, do not spread by template reproduction.

The third feature of life, variation, is also related to the hereditary process. Variation ultimately arises because of errors, or mutations, in heredity. But the amount of variation in a population depends on the shuffling and reshuffling of genetic variants that already exist as well as the creation of new variants by mutation. The shuffling is effected by a process called recombination, and recombination is another feature of the hereditary mechanism. During meiosis, the two gene-sets that an individual inherits from its two parents are shuffled before they are sent into the gametes. The offspring in the next generation show a greater range of variation in consequence. Thus, heredity is an essential property of life—not only is heredity one of the three defining conditions of life but it also influences the other two conditions. Without heredity, life would not exist.

Moreover, the details of the hereditary mechanism influence the form that life takes on. Over evolutionary time, the hereditary mechanism has changed from simple beginnings at the origin of life to the forms seen in modern life. At each stage, the details of how inheritance occurs constrain the form that life can have. Indeed, many biologists think of the history of life as a series of ways in which genetic information is passed on from one generation to the next. Advances during evolution have depended on changes in the way that heredity occurs. (It is worth noting that hereditary mechanisms have not changed in order to cause any future evolutionary events. However, once the hereditary mechanism changes for some reason, certain other evolutionary changes may become more likely in consequence.)

The main changes that have occurred during evolution in the form of heredity are now discussed in detail.

A The Origin of Heredity

The origin of heredity, together with reproduction, represents the transition from chemistry to biology. We have only inferential and incomplete knowledge of how the transition occurred. No one has observed the origin of life; indeed it is not even known to the nearest hundred million years when life on Earth originated. Heredity and reproduction in all life forms depend on base pairing. In modern DNA the bases are four molecules symbolized by A, C, G, and T. A binds to T, G binds to C. A base sequence such as GCTT will reproduce into CGAA, which in turn reproduces into GCTT. (Modern DNA can only reproduce with the assistance of many enzymes; but the earliest replicating molecules probably reproduced without enzymatic assistance.) The earliest replicating molecules probably relied on pairing between molecules other than A, G, C, and T; but we have several reasons to think that some kind of base pairing was used. One reason is that all life uses base pairing now. Indeed, the evolutionary biologist Leslie Orgel once remarked that if you imagine stripping away all the details of Earthly life one by one to try to define what is essential to all life, you can remove almost everything—skin and bones, eating and breathing, cells, even enzymes—but finally you are left with base pairing, like the smile of the Cheshire cat.

Secondly, base pairing has the right theoretical attributes for a hereditary system that can allow the evolution of Earthly life. Base pairing allows what is called unlimited heredity. Theoreticians distinguish between mechanisms of limited heredity, which permit inheritance for only a small number of states, and mechanisms of unlimited heredity, which permit heredity among a large, or practically infinite, number of states. As an example of a system with limited heredity, consider autocatalytic cycles (cycles that catalyse themselves). Freeman Dyson has discussed how autocatalytic cycles may have arisen near the origin of life. Thus, a set of chemicals that can be symbolized by X, Y, and Z may catalyse one another in a cycle of the form X → Y, Y→ Z, Z → X. A system of X, Y, and Z can perpetuate itself, and generate more of X, Y, and Z.

However, only very simple evolution is possible here. There might be a second autocatalytic cycle, X’ → Y’ → Z’. Then XYZ systems might compete with X’Y’Z’ systems, and one or other might increase in frequency depending on the local conditions. But there are only two inherited states (XYZ or X’Y’Z’), and evolution is confined to fluctuations in their relative frequencies. In general, the number of states of autocatalytic systems is likely to be limited, because only certain sets of chemicals will form autocatalytic cycles. The evolution of life, in its modern complexity and variety, requires unlimited heredity. Base pairing permits this property. For example, with four different bases, a sequence of only five bases can have over 1,000 forms: AAAAA, AAAAC, AAACG, and so on. With a longer sequence of bases, the number of heritable states soon becomes astronomical. The evolution of life with the kind of complexity that we see in life on Earth would have required unlimited heredity, and that probably required base pairing or something like it.

B The RNA World

The earliest living systems probably had heredity but no metabolism. (In genetic terms, they had genotypes but no phenotypes.) The molecule just replicated, without doing anything more. The next stage is for life to catalyse reactions, in the way that modern enzymes do, altering the conditions around the replicating molecule such that it can copy itself more effectively. Biologists suspect that, early in the history of life, there was an “RNA world”. In the RNA world, RNA acted as the hereditary molecule and also catalysed reactions. In most of modern life, DNA is the hereditary molecule and it codes for proteins that catalyse metabolic reactions. In life forms with DNA, heredity is separate from catalysis, and the hereditary molecule contains coded information. In the RNA world, the RNA molecules acted as ribozymes: that is the RNA molecule itself acted as an enzyme rather than coding for a protein that in turn acted as an enzyme. A system in which heredity and catalysis are two functions of the same molecule is a simpler system than one in which the two functions are performed by separate molecules. The simpler system is likely to have come first in evolution. However, the possibilities in the RNA world would have been limited. The need for an RNA molecule to act as a catalyst constrains what shape it can have, and therefore its base sequence.

C Heredity by DNA

DNA cannot act as a catalyst, because all DNA molecules, whatever their base sequence, have the same structure—the famous double helix. When life evolved from using RNA to using DNA as its hereditary molecule, the separation opened up between the coding hereditary molecule and the protein catalysts. The hereditary molecule was now freed from a constraint, as it no longer needed to function as a catalyst, and could evolve a much wider variety of coding sequences.

There is a second reason why, once DNA evolved as the hereditary molecule, life could evolve to be more complex than before. The inheritance of DNA is much more accurate than the inheritance of RNA. A few modern life forms use RNA as the hereditary material. For example, some viruses, such as HIV (the agent of AIDS) and the influenza virus, reproduce via RNA. They have a mutation rate of approximately one error per 10,000 to 100,000 bases. Modern life forms that use DNA all make errors approximately once in 1 to 10 billion bases. DNA-based life forms copy themselves about 100,000 times more accurately than RNA-based life forms. Not all the improvement is due to the use of DNA rather than RNA, but some of it is.

The accuracy of heredity influences how long the hereditary molecule can be. The distinguished German chemist Manfred Eigen has shown that, approximately, a hereditary molecule cannot evolve to be longer than the inverse of its error rate. A hereditary molecule with an error rate of 1 in 1,000 cannot evolve to be more than about 1,000 bases long. If the molecule is any longer, it accumulates too many errors and the life form is unsustainable. Thus, when life evolved from the RNA world to the DNA world, the hereditary molecule could evolve to be much longer. A longer hereditary molecule can code for a more complex life form than a shorter hereditary molecule. The evolution of DNA was for this reason a key step in the rise of complex life on Earth. Without DNA, life would be constrained to be simple replicating molecules such as probably existed in the RNA world. In modern life, only a few simple viruses use RNA as their hereditary material. Larger viruses use DNA, and all cellular life—from single-celled bacteria to the largest plants and animals—uses DNA. Without DNA, there would be no trees or horses, no seaweed or fish.

D The Origin of Meiosis

In the hereditary mechanisms we have considered so far, the offspring inherits a full copy of its parent’s DNA (or RNA). Inheritance is clonal, or asexual. The offspring is genetically identical to its parent, except for mutations. The next major transition in heredity is the evolution of meiosis; the same transition also resulted in the evolution of sex and of Mendelian heredity. Mendelian heredity is the kind of inheritance discovered by Gregor Mendel. A life form with Mendelian heredity has, at least in some part of its life cycle, two copies of its hereditary material (one inherited from the father, the other from the mother). In meiosis, the two copies are reduced to a single copy. The single copy then combines with a second copy from another individual. The reason why sex and meiosis exist is uncertain; there are two main theories at present. Sex and meiosis may help life to cope with mutational error, by increasing the efficiency with which natural selection removes harmful mutations; or they may help life to cope with infectious disease, by increasing the range of novel disease-resistance genotypes.

Mendelian heredity, with sexual reproduction and meiosis, is associated with at least three other big features of life. One is complexity. Simpler life forms, such as bacteria, tend to be clonal, and it is the larger, multicellular life forms that use sex and meiosis. This could be because the more complex life forms suffer more from mutation, or more from infectious disease, than do simpler life forms.

Secondly, meiotic, sexual life comes in the form of distinct species. For example, we can recognize distinct species such as human beings and chimpanzees. Over time, a species produces a distinct lineage, as it reproduces itself from one generation to the next. Genes are shuffled around within a species by interbreeding; but genes are not usually passed between species. Each species evolves independently. On a grand scale, the result is a tree-like pattern of evolution. Lineages branch off, evolve apart, and either produce new branches or go extinct. Before the origin of meiosis, and in non-meiotic life now (such as in bacteria), life either was not arranged in distinct lineages or the lineages were much less clear-cut than in meiotic life forms. Bacteria show a range of forms, but they are less clearly organized into distinct lineages than are multicellular animals and plants.

Thirdly, Mendelian life forms evolve faster than pre-Mendelian life forms. In the fossil record, pre-Mendelian forms such as bacteria hardly change at all for hundreds, even thousands, of millions of years. Mendelian heredity evolved in an early eukaryote (cellular life form in which the DNA is carried in a distinct nucleus within the cell). The time of origin of eukaryotes is uncertain, but it may be somewhat before 2 billion years ago. In the fossil record, there is little evidence of rapid evolution this early. The tree-like pattern of evolution, with distinct branches that undergo change, radiating, splitting, and going extinct, does not clearly emerge in fossils until about 550 million years ago. However, molecular evidence suggests that the pattern is much older, dating back from at least 1 to 1.5 billion years ago. It is likely that the origin of meiosis in eukaryotes set the stage for evolution to proceed in its most familiar form—with distinct evolving lineages and a tree-like branching pattern. Before that there may have been less clear lineages, with more of a blur of forms that hardly changed, and without clear extinction events.

E The Origin of Multicellular and Weismannist Heredity

The simplest modern life forms with meiosis are single-celled creatures such as Paramecium. Meiosis probably originated in single-celled life forms in the past. Multicellular life then evolved from those single-celled ancestors. The earliest multicellular life forms were probably bundles, or rows, of cells, with all the cells having much the same form. But at some point, cell differentiation evolved: that is, a life form containing more than one kind of cell. A human body contains many (perhaps 200 or so) cell-types—skin cells, blood cells, muscle cells, and so on—and they all develop during the life of an individual. As life forms with many cell types evolved, a new kind of heredity also arose. Biologists distinguish between germ cells and somatic cells within a body. The germ cells are the cells that reproduce the next generation. They are the sperm and egg cells, together with their precursor cells. The line of germ cells, or germ-line, is potentially immortal as it reproduces down the generations. The somatic cells, or soma, consists of all the rest—all the cells in the body that die when the body dies, leaving no descendants. Blood cells, brain cells, liver cells, and so on are all somatic cells.

In human beings, and in most other animals, the somatic cells do not contribute genetically to the offspring in the next generation. Heredity is entirely carried out by the germ cells. This strict distinction between germ and somatic cells is often referred to as Weismannist heredity, after the German biologist August Weismann. Biologists before Weismann had not made, or at least appreciated, the importance of this distinction. Charles Darwin, for example, put forward a theory of heredity, which he called his “provisional hypothesis of pangenesis”, in which all the parts of the body sent hereditary information to the reproductive cells. We now know this does not in fact happen. The DNA in the germ cells is segregated off from the rest of the body.

In some colonial animals, and many plants, the distinction between germ and somatic cells is less rigid than in humans. In them, somatic cells may be capable of reproduction. For example, in marine animals such as sponges, an individual animal may be smashed to pieces by wave or rock action; the pieces may then regenerate into whole new animals. In sponges, most offspring are produced from specialized reproductive cells within the adult body, but it is also possible for offspring to be formed from other cells. Humans probably evolved from ancestors that had partly Weismannist heredity, as in sponges, and over time the division between reproductive germ cells and non-reproductive somatic cells has increased. Leo Buss has described the evolution of Weismannist heredity in plants and animals in The Evolution of Individuality. Richard Dawkins, in The Extended Phenotype (1982), thought about imaginary life forms in which the next generation is not formed from specialized reproductive cells. He suggested that any such life form would have to be simpler than Earthly multicellular life.

Gene, unit of inheritance, a piece of the genetic material that determines the inheritance of a particular characteristic, or group of characteristics. Genes are carried by chromosomes in the cell nucleus and are arranged in a line along each chromosome. Every gene occupies a place, or locus, on the chromosome. Consequently, the word locus has become loosely interchangeable with the word gene.

The genetic material is deoxyribonucleic acid, or DNA (see Nucleic Acids), a molecule that forms the “backbone” of the chromosome. Because the DNA in each chromosome is a single, long, thin, continuous molecule, the genes must be parts of that molecule; and because DNA is a chain of minute subunits known as nucleotide bases, each gene includes many bases. Four different kinds of bases exist in the chain—adenine, guanine, cytosine, and thymine—and their sequence in a gene determines its properties.

Genes exert their effects through the molecules they produce. The immediate products of a gene are molecules of ribonucleic acid (RNA); these are copies of the DNA, except that RNA has the base uracil instead of thymine. The RNA molecules from some genes play a direct part in the metabolism of the organism, but most are used to make protein. Proteins are chains of subunits known as amino acids, and the sequence of bases in the RNA determines the sequence of amino acids in the protein by means of the genetic code (see Genetics: The Genetic Code). The sequence of amino acids in a protein dictates whether it will become part of the structure of the organism, or whether it will become an enzyme for promoting a particular chemical reaction. Thus, changes in the DNA can produce changes that affect the structure or the chemistry of an organism. Due to the complexity of living organisms, usually a number of genes will influence a process or major feature, but sometimes just a few or even one gene will affect an organism considerably. For example, it has been demonstrated in mice that just one gene can significantly affect memory. Inserting an extra copy of the gene that produces a component of the neuron receptor N-methyl-D-aspartate (NMDA) into the mouse genome causes a higher NMDA activity in the mouse’s brain and the mouse learns quicker and has a better memory than non-modified mice. Conversely, mice lacking this NR2B gene have impaired learning and memory. Scientists believe NMDA facilitates the creation of bonds between neurons that permit the association of two distinct stimuli, such as touching something hot and the sensation of pain. This ability is regarded by many scientists to be at the core of memory and learning. It is hoped that such work will aid drug design or gene-based therapies for memory loss in humans if there is sufficient similarity with human brain chemistry.

The nucleotide bases in DNA that code the structure of RNAs and proteins are not the only components of genes; groups of bases adjacent to the coding sequences affect the quantities and dispositions of gene products. In higher organisms (animals and plants, rather than bacteria and viruses), the noncoding sequences outnumber the coding ones by a factor of ten or more, and the functions of these noncoding regions are largely unknown. This means that geneticists cannot yet set precise limits on the sizes of animal and plant genes in general. However, with the current advances in genetic mapping (most notably in the Human Genome Project) more and more information on the nature of genes is being gathered. For example, it is now known that the genome of the fruit fly Drosophila melanogaster contains 13,601 genes, and the human chromosome 22 contains an estimated 34.5 million building blocks of DNA, which comprise at least 545 genes and 134 pseudogenes (DNA sequences that resemble genes but do not instruct the cell to produce proteins). Further work on mapped genomes should reveal how certain genetic sequences determine specific protein synthesis and hence structure, metabolic functions, and processes.

In 1994, taking advantage of new capabilities developed by the genome project, DOE initiated the Microbial Genome Program to sequence the genomes of bacteria useful in energy production, environmental remediation, toxic waste reduction, and industrial processing. A follow-on program, Genomics:GTL builds on data and resources from the Human Genome Project, the Microbial Genome Program, and systems biology. GTL will accelerate understanding of dynamic living systems for solutions to DOE mission challenges in energy and the environment. Despite our reliance on the inhabitants of the microbial world, we know little of their number or their nature: estimates are that less than 0.01% of all microbes have been cultivated and characterized. Microbial genome sequencing will help lay a foundation for knowledge that will ultimately benefit human health and the environment. The economy will benefit from further industrial applications of microbial capabilities.

Information gleaned from the characterization of complete microbial genomes will lead to insights into the development of such new energy-related biotechnologies as photosynthetic systems, microbial systems that function in extreme environments, and organisms that can metabolize readily available renewable resources and waste material with equal facility. Expected benefits also include development of diverse new products, processes, and test methods that will open the door to a cleaner environment. Biomanufacturing will use nontoxic chemicals and enzymes to reduce the cost and improve the efficiency of industrial processes. e, g. Microbial enzymes have been used to bleach paper pulp, stone wash denim, remove lipstick from glassware, break down starch in brewing, and coagulate milk protein for cheese production. In the health arena, microbial sequences may help researchers find new human genes and shed light on the disease-producing properties of pathogens.

Microbial genomics will also help pharmaceutical researchers gain a better understanding of how pathogenic microbes cause disease. Sequencing these microbes will help reveal vulnerabilities and identify new drug targets.

Gaining a deeper understanding of the microbial world also will provide insights into the strategies and limits of life on this planet. Data generated in this young program have helped scientists identify the minimum number of genes necessary for life and confirm the existence of a third major kingdom of life. Additionally, the new genetic techniques now allow us to establish more precisely the diversity of microorganisms and identify those critical to maintaining or restoring the function and integrity of large and small ecosystems; this knowledge also can be useful in monitoring and predicting environmental change. Finally, studies on microbial communities provide models for understanding biological interactions and evolutionary history.

Los Angeles Times Date: 02/11/01 22:15

WASHINGTON -- Scientists conducting the first thorough survey of human DNA have made a remarkable discovery: To create the complex organism known as a human takes only about twice the genes of a fruit fly or a roundworm.

At the same time, though scientists once described the overwhelming majority of the human genetic code as "junk" with little apparent purpose, it is brimming with remnants of long-dead genes and bits of DNA that "live" amid the genes like parasites, reshuffling and reshaping them over time. This suggests that the "junk" plays a role in the process of evolution and deserves intensive study.

The researchers also concluded that men, far more than women, produce the genetic mutations that bring disease into the human family and allow evolution to move forward.

These and other findings will be reported today by the two teams of researchers that raced each other to map the chemical composition of human DNA, the inherited material within most cells that controls basic cellular operations and plays a role in most disease.

That race ended in a tie in June, giving both teams a claim to one of the most important scientific achievements in history. Since then the teams -- one a private U.S. company, the other an international group funded largely by the U.S. government and a British charity -- have been scouring their data in order to conduct the first broad analysis of what lies within DNA.

More than a dozen papers will be published on their findings in the journals Science and Nature.

The results were to be released today, but a British newspaper broke the embargoed story in its Sunday edition.

The two teams' conclusions, which largely agree with each other, say little directly about potential cures for disease, though that is ultimately the major goal of the research. Still, by revealing more about how genes work, as well as where in the DNA they are clustered, the work over time will help researchers studying a large variety of illnesses.

Working independently, the two teams have concluded that humans have somewhere from 26,000 to 40,000 genes, with the best bet being less than 35,000. One team, led by Celera Genomics Corp., said the number could be as low as 26,000.

Most experts until recently believed it took 100,000 genes or more to build and operate a human.

"That's a `knock you over with a feather' kind of result," said Francis Collins, leader of the international team and director of the National Human Genome Research Institute.

"On one level, it's a blow to the pride of our species: How can we hold our heads up if we have only a few more genes than a worm?" Collins said. "But what it tells you is that our complexity arises from some other source, and we will have to start looking for it."

The reports offer evidence of how the body accomplishes so much with such a small gene set. Where simpler organisms rely on each gene to produce a single protein, human cells are able to skip over parts of genes at times, allowing each gene to produce on average three versions of a protein, and sometimes as many as five.

This suggests that to understand the root causes of disease, researchers will have to intensively study how proteins, as well as genes, interact and how they go awry. Proteins are the workhorses of the body, handling such basic tasks as turning food into energy, signaling among cells and growing from an embryo into a child.

Moreover, said Collins, the studies suggest that each human protein can handle more functions than those in a simpler organism. "If a worm protein is made to clip another protein ... then it's like a cutting knife that does one simple thing," he said. But the analogous human protein "would be like a Cuisinart -- it would have lots of settings and dice and slice and have more flexibility."

The teams also found evidence of how DNA changes over time, causing organisms to evolve.

In one surprising conclusion, the international team found that about 220 genes did not evolve in a straight line from all animals that came before humans. Instead, human ancestors adopted these genes millions of years ago directly from bacteria. In essence, this shows evolution making use of whatever material it found at hand, Collins suggested.

Intriguingly, at least one of the genes in question plays a role in depression. Still, an independent scientist questioned the finding. "I think it's more likely that the gene transfer went the other way, from vertebrates to bacteria," said Philip Green of the University of Washington in Seattle.

The two teams also found hundreds of thousands of copies of mysterious DNA bits that act like parasites, detaching themselves from the genome and then reinserting themselves in a new location. Their existence had been known, but there are far more of them than scientists had thought.

Moreover, one type of these parasites has the ability to move other pieces of DNA with them. The international team found that it could move more DNA than previously known, suggesting that these elements play a role in reshuffling genes to form new ones.

Another type was found to exist only where genes are plentiful, and it may help the body respond to extreme stress, the international team said.

All this, as well as evidence of genes that became inactive long ago, exists in the 98 percent of DNA that does not produce proteins. Once, scientists considered this vast material to be junk. Now they are finding that much of the material has, or had, some kind of biological function.

In addition to studying how genes work, the teams determined that genes are not scattered uniformly throughout DNA. Instead, some areas of the genome, as the sum total of human DNA is called, are packed densely with genes, like a busy, urban area, while others are like "deserts," barren of genes.

Researchers are not sure why this clustering occurs.

In April 2000 the Subcommittee on Energy and Environment of the Committee on Science of the U.S. House of Representatives conducted hearings on the status and benefits of genome sequencing in the public and private sectors. Speakers included representatives of the U.S. HGP and Celera Genomics, members of Congress, and the director of the Office of Science and Technology Policy. Robert Waterston, directory of the HGP sequencing center at Washington University, St. Louis, pointed to fruitful data sharing by the HGP and the private sector. Examples included (1) collaborations led by the pharmaceutical company Merck to develop partial sequences identifying genes and (2) the fruit fly sequencing project by Celera and the HGP.

Examples of private-sector enrichment of public data included the SNP consortium, which generated a publicly available map containing human DNA variations. (See article.) In September 2000, Celera Genomics announced a reference database with more than 2.8 million unique SNPs, including those screened from public-sector databases. In October a public-private consortium announced the joint sequencing of the laboratory mouse. (See article.) Also, a Monsanto-University of Washington project generated a draft sequence of the rice plant genome for release to the public. These efforts show the value of sharing data to increase knowledge and ensure future discoveries for mutual benefit. j, a, j. Neal Lane (formerly Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy) echoed the importance of partnerships between public and private sectors in his testimony to the House committee. His observations follow.

"Sequencing the genome...is only the beginning of genomics," he said. "It is the first step into a future of discoveries and innovations that genomics will enable, that the public and private sectors must pursue together...An expanding, evolving partnership has made human genomic discoveries possible and is now poised to make those discoveries beneficial for everyone...I believe that the policies we have pursued will help to strengthen this partnership, allowing genomic discoveries and innovations to move steadily forward for the benefit of our nation and for all humankind."

Completed in 2003, the Human Genome Project (HGP) was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China, and others.

What is DNA sequencing? DNA sequencing, the process of determining the exact order of the 3 billion chemical building blocks (called bases and abbreviated A, T, C, and G) that make up the DNA of the 24 different human chromosomes, was the greatest technical challenge in the Human Genome Project. Achieving this goal has helped reveal the estimated 20,000-25,000 human genes within our DNA as well as the regions controlling them. The resulting DNA sequence maps are being used by 21st Century scientists to explore human biology and other complex phenomena. Meeting Human Genome Project sequencing goals by 2003 required continual improvements in sequencing speed, reliability, and costs. Previously, standard methods were based on separating DNA fragments by gel electrophoresis, which was extremely labor intensive and expensive. Total sequencing output in the community was about 200 Mb for 1998. In January 2003, the DOE Joint Genome Institute alone sequenced 1.5 billion bases for the month.

Gel-based sequencers use multiple tiny (capillary) tubes to run standard electrophoretic separations. These separations are much faster because the tubes dissipate heat well and allow the use of much higher electric fields to complete sequencing in shorter times.

Whose genome was sequenced in the public (HGP) and private projects? The human genome reference sequences do not represent any one person’s genome. Rather, they serve as a starting point for broad comparisons across humanity. The knowledge obtained is applicable to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes.

In the international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project.

Technically, it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. Using sperm does provide all chromosomes for study, including equal numbers of sperm with the X (female) or Y (male) sex chromosomes. However, HGP scientists also used white cells from the blood of female donors so as to include female-originated samples.

Scientific vs Commercial Goals The HGP's commitment from the outset was to create a scientific standard (an entire reference genome). Most private-sector human genome sequencing projects, however, focused on gathering just enough DNA to meet their customers' needs—probably in the 95% to 99% range for gene-rich, potentially lucrative regions. Such private data continue to be enriched greatly by accurate free public mapping (location) and sequence information. Celera's shotgun sequencing strategy, for example, created millions of tiny fragments that had to be ordered and oriented computationally using HGP research results. Most data at Celera, Incyte, and other genomics information-based companies are proprietary or available only for a fee. In addition, companies are filing numerous patent applications to stake claims to genes and other potentially important DNA fragments. More than the Reference Sequence DNA sequencing will continue to be a major emphasis for the foreseeable future as gene sequences are surveyed across various populations. c, k, a, g, d. Both the DOE and NIH genome programs continue to support the development of fully integrated and innovative approaches to rapid, low-cost sequencing. Other HGP goals from the final 5-year plan were to enhance bioinformatics (computational) resources to support future research and commercial applications. The HGP also aimed to explore gene function through comparative mouse-human studies, train future scientists, study human variation, and address critical societal issues arising from the increased availability of human genome data and related analytical technologies.

In the Celera Genomics private-sector project, DNAs from a few different genomes were mixed up and processed for sequencing. The DNA resources used for these studies came from anonymous donors of European, African, American (North, Central, South), and Asian ancestry. The lead scientist of Celera Genomics at that time, Craig Venter, has since acknowledged that his DNA was one of those in the pool.

Many small regions of DNA that vary among individuals (called polymorphisms) also were identified during the HGP, mostly single nucleotide polymorphisms (SNPs). Most SNPs are without physiological effect, although a minority contribute to the delightful and beneficial diversity of humanity. A much smaller minority of polymorphisms affect an individual’s susceptibility to disease and response to medical treatments.

Although the HGP has been completed, SNP studies continue in the International HapMap Project, whose goal is to identify patterns of SNP groups (called haplotypes, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource.

Who sequenced the human genome? Human Genome Project research was funded at many laboratories around the U.S. by the Department of Energy (DOE), the National Institutes of Health (NIH), or both. A list of the major U.S. Human Genome Project research sites can be found here. Other researchers at numerous colleges, universities, and laboratories throughout the United States have also received DOE and NIH funding for human genome research. At any given time, the DOE Human Genome Program has funded about 100 separate principal investigators. For DOE-funded projects, see Research. To see a list of NIH-funded projects, visit their grants database.

In addition, many large and small private U.S. companies are conducting genome research. For more on the genomics research partnership between the public and private sectors, see the Human Genome Project and the Private Sector Fact Sheet. At least 18 other countries have participated in the Human Genome Project.

What is the difference between draft sequence and finished sequence? In generating the draft sequence (released in June 2000), scientists determined the order of base pairs in each chromosomal area at least 4 to 5 times (4x to 5x) to ensure data accuracy and to help with reassembling DNA fragments in their original order. This repeated sequencing is known as genome "depth of coverage." Draft sequence data are mostly in the form of 10,000 basepair-sized fragments whose approximate chromosomal locations are known.

To generate the high-quality reference sequence, completed in April 2003, additional sequencing was done to close gaps, reduce ambiguities, and allow for only a single error every 10,000 bases, the agreed-upon standard for the HGP. Investigators believe that a high-quality sequence is critical for recognizing regulatory components of genes that are very important in understanding human biology and such disorders as heart disease, cancer, and diabetes. The finished version provides an estimated 8x to 9x coverage of each chromosome.

What genomes have been sequenced completely? The small genomes of several viruses and bacteria and the much larger genomes of three higher organisms have been completely sequenced; they are bakers' or brewers' yeast (Saccharomyces cerevisiae), the roundworm (Caenorhabditis elegans), and the fruit fly (Drosophila melanogaster). In October 2001 the draft sequence of the pufferfish Fugu rubripes, the first vertebrate after the human, was completed; and scientists finished the first genetic sequence of a plant, that of the weed Arabidopsis thaliana, in December 2000. Many more genomes have been completed since then. For information on published and unpublished genomes, see Genomes Online Database (GOLD).

What nonhuman genome sequencing projects are supported by the U.S. Department of Energy? A list of microbial genome sequencing projects supported by the U.S. Department of Energy Microbial Genome Program is available here.

What happens now that the human genome sequence is completed? The working draft DNA sequence and the more polished 2003 version represent an enormous achievement, akin in scientific importance, some say, to developing the periodic table of elements. And, as in most major scientific advances, much work remains to realize the full potential of the accomplishment.

Early explorations into the human genome, now joined by projects on the genomes of a number of other organisms, are generating data whose volume and complex analyses are unprecedented in biology. Genomic-scale technologies will be needed to study and compare entire genomes, sets of expressed RNAs or proteins, gene families from a large number of species, variation among individuals, and the classes of gene regulatory elements.

Deriving meaningful knowledge from DNA sequence will define biological research through the coming decades and require the expertise and creativity of teams of biologists, chemists, engineers, and computational scientists, among others. A sampling follows of some research challenges in genetics--what we still won't know, even with the full human sequence in hand.

Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population. SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome. Two of every three SNPs involve the replacement of cytosine (C) with thymine (T). SNPs can occur in both coding (gene) and noncoding regions of the genome. Many SNPs have no effect on cell function, but scientists believe others could predispose people to disease or influence their response to a drug.

Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence can have a major impact on how humans respond to disease; environmental insults such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies. This makes SNPs of great value for biomedical research and for developing pharmaceutical products or medical diagnostics. SNPs are also evolutionarily stable --not changing much from generation to generation --making them easier to follow in population studies.

Scientists believe SNP maps will help them identify the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to the disease.

Several groups worked to find SNPs and ultimately create SNP maps of the human genome. Among these groups were the U.S. Human Genome Project (HGP) and a large group of pharmaceutical companies called the SNP Consortium or TSC project. The likelihood of duplication among the groups was small because of the estimated 3 million SNPs, and the potential payoff was high.

In addition to the pharmacogenomic, diagnostic, and biomedical research implications, SNP maps are helping to identify thousands of additional markers along the genome, thus simplifying navigation of the much larger genome map generated by researchers in the HGP.

How can SNPs be used as risk factors in disease development?

SNPs do not cause disease, but they can help determine the likelihood that someone will develop a particular disease. One of the genes associated with Alzheimer's, apolipoprotein E or ApoE, is a good example of how SNPs affect disease development. This gene contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4. Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid.

Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE. Research has shown that an individual who inherits at least one E4 allele will have a greater chance of getting Alzheimer's. Apparently, the change of one amino acid in the E4 protein alters its structure and function enough to make disease development more likely. Inheriting the E2 allele, on the other hand, seems to indicate that an individual is less likely to develop Alzheimer's.

Of course, SNPs are not absolute indicators of disease development. Someone who has inherited two E4 alleles may never develop Alzheimer's, while another who has inherited two E2 alleles may. ApoE is just one gene that has been linked to Alzheimer's. Like most common chronic disorders such as heart disease, diabetes, or cancer, Alzheimer's is a disease that can be caused by variations in several genes. The polygenic nature of these disorders is what makes genetic testing for them so complicated.

The answer to this question is based on information provided by the Genome News Network.

Human Genome Project SNP Mapping Goals In 1998, as part of their last five-year plan, the DOE and NIH Human Genome Programs established goals to identify and map SNPs. These goals were as follows:

Develop technologies for rapid, large-scale identification and scoring of SNPs and other DNA sequence variants. Identify common variants in the coding regions of most identified genes. Create a SNP map of at least 100,000 markers. Develop the intellectual foundations for studies of sequence variation. Create public resources of DNA samples and cell lines.

What is The SNP consortium (TSC)? In April 1999, ten large pharmaceutical companies and the U.K. Wellcome Trust philanthropy announced the establishment of a consortium headed by Arthur L. Holden to find and map 300,000 common SNPs. The goal was to generate a widely accepted, high-quality, extensive, publicly available map using SNPs as markers evenly distributed throughout the human genome. In the end, many more SNPs (1.8 million total) were discovered than planned originally. Now that the SNP discovery phase of the TSC project is essent