|
|
|
Journal of Bacteriology, March 2003, p . 2017-2021, Vol . 185, No . 6 Definition of the Escherichia coli MC4100 Genome by Use of a DNA Array
Joseph E . Peters, Howard Hughes Medical Institute, Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland Received 5 July 2002/ Accepted 10 December 2002
The original E . coli K-12 strain was isolated in 1922 (3) . A K-12 derivative called MG1655, which was cured of an endogenous lambda phage by UV induction and cured of its conjugal plasmid by growth in the presence of acridine orange, was sequenced and is the basis of a commercially available whole-genome array (7) . The MG1655 derivative MC4100 was constructed over 25 years ago when it was used to isolate gene and protein fusions to the lacZ gene product, ß-galactosidase, through the use of bacteriophage derivatives (8) . MC4100 provided an important host in early gene expression work (reviewed in reference 5) . We chose to use one of the newest tools designed for genome-wide gene expression analysis to analyze this strain used in some of the earliest expression analysis experiments . Well beyond providing a proof-in-principle for mapping deletion endpoints, the accurate description of MC4100 is important for the modern understanding of E . coli . For largely historical reasons, MC4100 has been the strain background of choice for many genetic experiments . The genetic nature of MC4100 will continue to be important in relating work with this strain to the sequenced MG1655 K-12 strain . Additionally, the genetic nature of MC4100 will help investigators to decide if MC4100 is an appropriate strain for a given experiment . The original K-12 strain went through numerous alterations, including X-ray, UV, and chemical mutagenesis, as well as being a genetic recipient in multiple crosses with various E . coli K-12 and E . coli B derivatives to generate the strain MC4100 . While the physical analysis of the chromosome by pulsed-field gel electrophoresis identified three deletions, the actual extent of these deletions has remained unclear (10, 14-16) . We used the E . coli whole-genome array to identify the deletion endpoints in a related strain derivative. The Sigma/Genosys Panorama E . coli array consists of PCR-amplified DNA products corresponding to the 4,290 open reading frames of strain MG1655 applied as duplicate 10-ng spots on a nylon membrane . Our strategy was to identify deletions in a non-MG1655 E . coli strain by isolating and radioactively labeling chromosomal DNA from MC4100 and probing the E . coli MG1655 gene array . As described below, most genes were found to be present based on a ratio of the intensities of 1 (MC4100 intensity/MG1655 intensity) . Low ratios indicate putative deletions that could be confirmed by PCR-mediated sequence analysis of the region .
To study MC4100 [F- araD139
Chromosomal DNA was isolated from strains grown in Luria broth, treated with RNase, subjected to phenol-chloroform treatment, and suspended in Tris-EDTA, pH 8.0 (2, 19) . DNA was sheared to 1-kb fragments by sonication and radioactively labeled with [ Multiple open reading frame deletions can be identified in the K-12 derivative MC4100. Deletions were identified in MC4100 by calculating a ratio of the normalized spot intensity from MC4100 to MG1655, i.e., MC4100 normalized intensity/MG1655 normalized intensity . Spot intensity was normalized by dividing the background corrected intensity by the overall average intensity for the blot . If a gene is found in both MC4100 and MG1655, the value should equal 1 . When the MC4100/MG1655 ratios were plotted in gene order, we found that most values were about 1 (Fig . 1) . However, we found over 100 MG1655 open reading frames that appeared to be missing in MC4100 based on low intensity ratios . Examination of the ratios plotted in gene order pinpointed the three multiple open reading frame deletions already known in MC4100 and showed their extent, ykfD-b0350, b1137-mcrA, and fruB-yeiR (Fig . 1; Table 1) . The array data also suggested a previously unknown single open reading frame deletion in the fim genes that was confirmed by PCR (see below) .
Interdigitated within the open reading frames that were shown to be missing by PCR were some open reading frames with ratios that were close to the value expected for genes that are present, e.g., within 1 standard deviation of the mean . These "false-positive" values are most easily explained by technical reasons such as cross-hybridization with similar genes in the genome, although we cannot rule out the unlikely possibility that certain genes moved to new positions in the chromosome . Homologous sequences could come from a variety of sources and allow cross-hybridization sufficient for the false positives identified within the deletions . The six IS1 elements found in MG1655 are all on the array, making the missing IS1 elements within the MC4100 deletions appear to be present . Paralogs could also allow cross-hybridization to give false-positive values within deletions: for example, the putative permease YagG (b0270), which appeared to be present within one of the deletions, could cross-hybridize with seven other known and putative permeases found on the MG1655 array, ranging from 25 to 45% identity across the genes . Because cross-hybridization could also stem from contributions from many different open reading frames found throughout the chromosome, a one-for-one documentation comparing each false positive with another open reading frame is not possible . A deletion of the b1137-mcrA genes indicates that MC4100 lacks the e14 element found in MG1655. e14 is a genetic element that can move into and out of the chromosome and is found in a specific location in the E . coli K-12 genome . Multiple putative missing open reading frame values grouped in the region of the e14 element found in MG1655 (6) (b1137-mcrA in Fig . 1 and Table 1) . Amplification and sequencing with PCR primers specific to genes flanking the region (icdA and b1160) confirmed that 15,203 bp including the e14 element were missing in MC4100 (Table 1) . The loss of e14 is further supported by the observation that MC4100 does not possess the restriction and modification system normally ascribed to e14 (17; M . Sibley and L . Raleigh, personal communication) . It remains unclear how the e14 element was lost . The e14 element could have excised and been lost, or the intervening region could have been lost by host-mediated homologous recombination between the 166-bp near-perfect direct repeats that flanked the element in MG1655 . DNA sequencing indicates that the near-perfect repeat in mcrA was maintained and not the repeat found in the icdA gene . A deletion of the fruB-yeiR genes likely accounts for the fruA25 allele of MC4100. Multiple putative missing open reading frames fell in consecutive genes including a portion of the fruBKA operon encoding functions for fructose transport and catabolism (fruB-yeiR in Fig . 1 and Table 1) . A deletion had previously been suggested to be associated with the fruA25 allele of MC4100 (15) . PCR primers were designed for the open reading frames flanking the putative deletion, fruK and b2174, and sequencing of the resulting PCR product confirmed a 6,678-bp deletion by comparison to the MG1655 genome sequence . The deletion removes all of fruB and the first of 29 amino acids of the coding region from the 312-amino-acid FruK protein . Therefore, the fruA25 allele actually leaves the fruA gene intact but removes the fruBKA promoter . The genome array can detect a single open reading frame deletion. Inspection of the array data indicated a few single open reading frames that gave low ratio values, yaiE, yccE, acs, and fimB (Fig . 1) . Using PCR, we confirmed that fimB, the open reading frame which gave the lowest ratio value, did indeed have a deletion: DNA sequencing indicated that a 1,018-bp deletion removed 533 bp of the fimB gene along with 5 bp of the adjacent fimE gene . The fimB-fimE deletion was associated with an IS1 insertion . IS1 insertion has previously been shown to sometimes be associated with the deletion of adjacent DNA sequence (21) . Our ability to detect the fimB deletion indicates that using chromosomal DNA to probe whole-genome blots provides a sensitive tool for detecting deletions that are less than a kilobase in size that include a portion of an open reading frame . Of additional significance, we found that by detecting a deletion in fimB we were able to identify the insertion of foreign DNA sequences . This suggests that array techniques may also be utilized for the detection of heterologous sequences such as pathogenicity and fitness islands . Unlike techniques such as restriction fragment length polymorphism, this array technique could identify newly inserted DNA even if there was not a net change in the size of a given region . Because the 1,018-bp deletion of the MG1655 sequence was replaced with 767 bp of heterologous IS1 DNA sequence, a net physical drop of only 251 bp was realized; a 251-bp deletion would likely be missed with most restriction mapping techniques involving rare cutting enzymes . There were some single open reading frames that gave high ratios of MC4100/MG1655 intensity (Fig . 1) . There are multiple possible explanations for high ratio values . Overrepresented values could indicate that the strain in question contains more copies of these open reading frames than does MG1655 . While it seems unlikely, we cannot formally rule out the chance that they result from amplification of single open reading frames . Amplification of certain genes can occur when a gene is under strong selection (1, 22) . Additionally, it is formally possible that these genes were deleted in the isolate of MG1655 that we obtained from the American Type Culture Collection . Experiments with MC4100 suggest the limits of the Panorama array for detecting deletions. PCR with primers flanking yaiE, yccE, and acs resulted in the same sizes of fragments for the strains MG1655 and MC4100, indicating that no deletion of these genes occurred . It is unclear why some open reading frames display low ratio values . Knowing the lowest MC4100/MG1655 ratios where no deletion really exists is important in extending this array technology to other non-MG1655 E . coli strains . Using the array technology to predict putative missing genes in unsequenced E . coli strains would require establishing an appropriate threshold for ratio values . The threshold limit for assigning putative deletions should err on the side of having some false negatives to minimize the chance of missing deletions . Based on results with MC4100, a threshold of 1.5 standard deviations below the mean seems appropriate, because this is just within the minimum value obtained with the three false-negative open reading frames yaiE, yccE, and acs (Fig . 1) . This threshold would introduce an extra three open reading frames as potential false negatives . Analysis of the deletion endpoint identified by PCR also suggests the portion of a gene that must be missing to register by the array technique (Table 2) . Our results suggest that deletions that remove only a portion of a gene can be detected . However, very small deletions leaving 80% of an open reading frame intact will appear to be present . Because the results are calculated as a ratio of signals in comparing a tester strain to MG1655, the percentage of the open reading frame that is missing, and not the actual number of base pairs, is important . Therefore, smaller deletions will be detectable in small open reading frames that might go missing in larger open reading frames .
We thank Mary Berlyn, Lise Raleigh, and Marion Sibley for providing strains and information . We thank the members of the Craig lab for comments on the manuscript .
What Is Molecular Microbiology?,
What Is Yeast?,
What Is Nitrification?,
What Is Protein?,
What Is Bioreactor?,
n,
Microbiology,
r,
Microbes,
i,
Bacteria,
o,
Microorganisms,
e,
Bacteriology,
i,
Bacillus subtilis,
n,
Escherichia coli,
a,
E coli O157,
e,
Escherichia coli,
r,
Pathogenic bacterium,
a,
Bacteria,
a,
S. cerevisiae,
r,
Cephalosporin,
o,
S. cerevisiae,
n,
Escherichia coli,
e,
Microbial,
r,
S. cerevisiae,
c,
Escherichia coli,
o,
Meningococcus,
o,
Anaerobic bacteria,
s,
Pseudomonas aeruginosa,
i,
Enterobacters,
e,
Cell suspensions,
o,
Salmonella typhimurium,
n,
Bacteroides,
o,
Antibiotics
|
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||