|
|
|
Journal of Bacteriology, June 2002, p . 3287-3295, Vol . 184, No . 12 Definition of the Mycobacterial SOS Box and Use To Identify LexA-Regulated Genes in Mycobacterium tuberculosis
Elaine O . Davis,* Edith M . Dullaghan,, Division of Mycobacterial Research, National Institute for Medical Research, London NW7 1AA, England Received 4 January 2002/ Accepted 29 March 2002
The basic principles of this regulatory mechanism are found in many other species of bacteria, although the DNA sequence of the LexA binding site, or SOS box, varies . Thus, while the SOS box in E . coli and other enterobacteria has the consensus sequence taCTGTatatatatACAGta (where the bases in lowercase are less well conserved than those in uppercase) (12), it has been suggested that in rhizobia the SOS box is GAAC(N)7GTAC (29); in Bacillus subtilis the SOS box, originally thought to be GAAC(N)4GTTC (4, 5), has more recently been refined as CGAACRNRYGTTCG (30) . A motif similar to the original short version of the B . subtilis SOS box has been found upstream of the recA and lexA genes and has been shown to bind LexA in mycobacteria (7, 18, 19) . However, the specific bases required for LexA binding have not been determined, as demonstrated by the excessive number of hits found when this sequence was used to search the Mycobacterium tuberculosis genome sequence (3) . A precise definition of the mycobacterial SOS box would allow the identification of LexA binding sites in the M . tuberculosis genome and thus aid in the discovery of other LexA-regulated genes . Therefore, we have undertaken an analysis of the effect of single base changes in the mycobacterial SOS box on LexA binding in vivo by comparing the induction ratios obtained, using a transcriptional fusion to a LexA-regulated promoter . Using the information obtained, we have been able to identify a number of novel LexA-regulated genes in M . tuberculosis .
Recombinant DNA techniques. Plasmid DNA was prepared using SNAP Miniprep kits (Invitrogen) . Site-directed mutagenesis was performed as described in the QuickChange Site-Directed Mutagenesis Kit (Stratagene) . For other DNA manipulations, standard DNA protocols were followed (24) . For each mutant made, the sequences of the promoter region and the junctions to the vector were determined on an ABI Prism 377 DNA sequencer using the ABI Prism dRhodamine dye terminator cycle sequencing kit (PE Applied Biosystems) . Introduction of clones into M . smegmatis and verification by PCR and sequencing. Published protocols were followed for preparing electrocompetent cells of M . smegmatis mc2155 (22) and for electroporation (11) . A preparation of total DNA suitable for PCR was isolated after streaking out the resulting transformants using an InstaGene matrix (Bio-Rad) according to the manufacturer's instructions except for using more bacteria . The insert and junctions of each clone were isolated as a PCR product using the primers PMINT2 (ACGAGGGGCATTCACACCAGATTG) and LACR (TTCCCAGTCACGACGTTGTAAAA) with 2.5 U of Pfu Turbo (Stratagene), and the cycle conditions were 94°C for 2 min and then 25 cycles of 94°C for 30 s, 58°C for 30 s, and 72°C for 1 min, followed by 72°C for 7 min . Nucleic acid sequences of these PCR products were then determined on an ABI Prism 377 DNA sequencer using the ABI Prism dRhodamine dye terminator cycle sequencing kit (PE Applied Biosystems) .
Induction conditions.
To induce DNA damage in M . smegmatis transformants, mitomycin C was added to one aliquot of a culture at an A600 of 0.4 to 0.5 to a final concentration 0.2 µg ml-1 and incubated for 5 h . An equal volume of the same culture was incubated for the same period of time without any addition, to provide an uninduced control . Following this, the bacteria were harvested, washed three times in Z buffer (17) without ß-mercaptoethanol (Z*), and stored as a pellet at -20°C . For M . tuberculosis, cultures were induced at an A600 of
Preparation of cell extracts and ß-galactosidase assays. Untreated and mitomycin C-treated bacteria were resuspended in 1 ml of Z* buffer and lysed in the presence of glass beads (150 to 212 µm; Sigma) using a Ribolyser (Hybaid) at a speed setting of 6.5 for 30 s . The supernatant was collected by centrifugation for 5 min at full speed in a microcentrifuge, and the centrifugation was repeated to ensure no carryover of beads or cell debris occurred . An aliquot of the cell extract was used to determine its protein concentration using a bicinchoninic acid protein assay kit (Pierce) . To the remaining extract, ß-mercaptoethanol was added to a final concentration of 50 mM . These samples were then used to assay ß-galactosidase activity as described previously (17) but with half-size reaction mixtures (500 µl) and reading of the absorbance of 300 µl of reaction mix in a flat-bottom microtiter plate at 405 nm . The specific activity in units per milligram of protein was calculated using the formula defined by Miller (17) . Computer searches. Searches of the whole M . tuberculosis H37Rv genome were performed using the facilities provided at the TubercuList website (http://genolist.pasteur.fr/TubercuList/) . RNA extraction. Commercially available kits were used for the isolation of total RNA (Hybaid Ribolyser Blue kit) from bacterial cultures (100 ml) . Contaminating DNA in the RNA preparations was digested using RNase-free DNase (Roche), and the RNA was subsequently cleaned up using an RNeasy MiniKit (Qiagen) . RNA concentrations were determined spectrophotometrically at 260 nm . Microarray analysis. A whole-genome DNA microarray for M . tuberculosis consisting of PCR products designed to minimize cross-hybridization and covering 90% of the predicted open reading frames (ORFs) was kindly supplied by J . Hinds, J . A . Mangan, and P . D . Butcher . Five micrograms of total RNA was used for first-strand cDNA synthesis using fluorescently labeled Cy3-dCTP and Cy5-dCTP (Amersham Pharmacia Biotech) in a standard reverse transcriptase reaction with Superscript II (Invitrogen) . Briefly, total RNA with 6 µg of random primers (Invitrogen) in a reaction volume of 11 µl was denatured at 95°C for 5 min and snap-cooled on ice . Then, 5 µl of 5x first-strand buffer (Invitrogen), 2.5 µl of dithiothreitol (100 mM) (Invitrogen), 2.3 µl of deoxynucleoside triphosphate mix (5 mM concentrations of dATP, dGTP, and dTTP and 2 mM dCTP) (Amersham Pharmacia Biotech), 1.7 µl of Cy3- or Cy5-dCTP, and 2.5 µl of Superscript II were added . The labeling reaction mixtures were incubated at 25°C for 10 min and then at 42°C for 90 min . Microarray slides were incubated in a prehybridization buffer (3.5x SSC buffer [1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate], 0.1% sodium dodecyl sulfate [SDS], and 10 mg of bovine serum albumin/ml) at 60°C for 20 min . After incubation, slides were washed in distilled water for 1 min and then isopropanol for 1 min . Cy3- and Cy5-labeled RNA samples were combined and cleaned up using a MinElute column (Qiagen) and eluted in 13.5 µl of distilled water . After the addition of 4x SSC and 0.3% SDS, samples were denatured at 95°C for 2 min, cooled, and applied to the microarray slide, which was then covered with a glass coverslip . Each slide was placed in a waterproof hybridization chamber and submerged in a 60°C water bath overnight . After hybridization, slides were washed in 1x SSC plus 0.05% SDS for 2 min and then washed twice in 0.06x SSC for 2 min each . Slides were scanned using a GenePix Axon 4000A scanner (Axon Instruments) at dual wavelengths and set to 600 V . The image data were quantified using GenePix Pro 3.0 software, and bad spots were removed . The data were further analyzed using Genespring 4.0.3 (Silicon Genetics) . The data were normalized using each gene's measured intensity divided by its control channel value in each sample .
All of the double changes at both the ±2 and the ±7 positions eliminated DNA damage induction (data not shown), resulting in induction ratios of 0.7 to 0.9, similar to that seen with half of the original motif altered (Fig . 2b, half-site) . A similar result was found for four of the six single changes at ±7, with only a very low residual degree of induction seen for the remaining two single changes at these positions (Fig . 2b) . The induction ratios were 1.7 for T at +7 and 1.3 for A at -7, compared with an induction ratio of 5.0 for the wild-type sequence . Thus, the bases at the ±7 positions are indeed important in determining whether or not LexA will bind . Of the six single changes at ±2, two resulted in no DNA damage induction (G at +2 and C at -2), while four yielded levels of induction intermediate between that of the wild-type sequence and none (Fig . 2b) . Thus, some bases at the ±2 positions are not compatible with LexA binding, while others permit LexA binding to a reduced extent . Definition of the mycobacterial LexA binding site within the context of the recA SOS box. We then extended the study to examine the roles of individual bases throughout one-half of the symmetrical SOS box and further bases flanking it . M . tuberculosis LexA had previously been shown to bind only to this site within 390 bp upstream of recA by DNase I footprinting (19), where the protected region was clearly centered on the SOS box motif and extended to positions ±11 or 12 in the numbering system used here . With a single exception, the analysis of the ±2 and ±7 positions showed that comparable results were obtained when the reciprocal change was made in each half of the motif (Fig . 2b), so for this more extensive study we confined changes to one-half of the SOS box . Single changes to each of the other nucleotides were made at each position from -1 to -9 (Fig . 3a) .
Mismatches from the consensus which are functional. The analysis of the recA SOS box had indicated that no substitutions of the G at position -6 were compatible with LexA binding . In view of the palindromic nature of the SOS box, this would imply that only a C at position +6 would be functional . However, the SOS box upstream of the lexA gene has a T at this position (+6) and has previously been shown to bind LexA in vitro (18) and more recently to confer DNA damage induction on LexA (unpublished results) . These apparently contradictory observations can be resolved by examining the bases at ±8 which were previously not suspected of being important in determining LexA binding . When these bases are included in the analysis, we can see that the lexA SOS box has the optimum sequence at each position apart from that at +6 (T rather than C), yielding a functional site . However, the recA SOS box has already got a mismatch from the optimum LexA binding sequence at +8 (G rather than A), in the presence of which a further mismatch at ±6 is not tolerated . To confirm this hypothesis, we changed the base at position +8 in the recA SOS box to A, both in the wild-type sequence and in the sequence already containing an A at -6 (equivalent to the T at +6 in the lexA SOS box) . We expected that in the presence of A at +8 LexA would be able to bind to the site containing A at -6, resulting in DNA damage-inducible expression . Furthermore, we would predict that the perfect palindrome created by the introduction of A at +8 in the wild-type sequence would yield a site with a higher affinity for LexA, which would be reflected in a lower expression level in the uninduced state and likely a higher induction ratio . Both of these predictions were fulfilled (Fig . 4) . With the perfect palindrome (A + 8), the uninduced expression was reduced to 20 U of ß-galactosidase/mg of protein, compared with ca . 80 for the wild-type recA SOS box, and the induction ratio increased from 5 to 14 . In addition, the construct containing A at +8 and A at -6 was DNA damage inducible; although the induction seen was only 2.2-fold, this is to be compared with 0.9-fold for A at -6 in the native recA SOS box .
Many more changes from the optimal sequence were tolerated compared with the previous analysis insofar as expression remained DNA damage inducible, although to a reduced extent (Fig . 5b) . The greatest degrees of induction were found with the changes to G or T at -2, although these changes had remained inducible even in the context of the recA SOS box . Although expression was also inducible with the remaining base change at this position (C -2), this mutation had a much stronger effect on the induction ratio, reducing it to only twofold . The change to C at -8, which is equivalent to the mismatch found in the native recA SOS box, resulted in ca . fivefold induction, which is also what was found above with the wild-type sequence . Two other changes also yielded ca . fivefold induction and might, therefore, be expected to occur in other native SOS boxes; these were A at -7 and C at -5 . A number of changes permitted an induction of two- to threefold; in addition to A at -6 and C at -2, discussed above, these changes were A or G at -8, T at -5, and A or T at -3 . As the lexA SOS box contains a mismatch from the optimal sequence equivalent to A at -6, it is likely that any of these differences might be found singly in SOS boxes for other genes . It is difficult to know whether the 1.6-fold induction seen with T at -4 and G at -7 is significant in terms of the response of the bacterium to DNA damage .
Previously, searching the genome sequence (6) of M . tuberculosis with the originally defined motif (GAACNNNNGTTC) identified 35 ORFs preceded by a motif which was a perfect match within 500 bp of the upstream sequence (3), which corresponded to 28 distinct sites . The study described here shows that the mycobacterial SOS box is actually more extensive than this, with the bases at positions ±7 and ±8 playing important roles in determining LexA binding . We therefore repeated this search of the M . tuberculosis genome with the optimum SOS box sequence defined above and with the constraint that the motif should be within 500 bp upstream of a start codon . The number of such ORFs was now reduced to 4 with no mismatches, corresponding to 3 sites, and 21 ORFs with a single mismatch, corresponding to 16 distinct sites, a much more plausible number . We then used the newly defined consensus sequence for the mycobacterial SOS box to search the M . tuberculosis genome sequence with no constraints on the location of motif . This search identified 4 sites with no mismatches and 34 sites with a single mismatch . Of these 34 sites, 14 would not be expected to bind LexA at all, owing to the base which differs from the consensus . A further four sites may not be functional, as the induction ratio was only 1.6 in the test sequence containing those changes . In addition, six sites, including two of these four sites, have a C at -2 (or a G at +2), a substitution which was not tolerated in the presence of the native mismatch in the recA SOS box; thus, these sites may not be functional for LexA binding . Therefore, we would predict that there are a total of 16 functional LexA binding sites with zero or one mismatch from the newly defined consensus in the M . tuberculosis genome (Table 2) .
Are the genes with predicted SOS boxes DNA damage inducible in M . tuberculosis? As part of a separate study we are seeking to identify DNA damage-inducible genes of M . tuberculosis empirically by using microarray analysis . Therefore, we examined the microarray data specifically for the genes listed in Table 2 as potentially being regulated by LexA binding to the identified SOS boxes to see which ones were DNA damage inducible . For many of the SOS boxes there were two potentially regulated genes, one on each strand, although in most cases we would only expect one of each pair to be regulated by a particular site . As alluded to above, only one of the four genes preceded by an SOS box with no mismatches (Rv3370c or dnaE2) was induced following mitomycin C treatment of M . tuberculosis . However, all three of the genes where the perfect SOS box overlapped or was contained within the coding sequence were induced (Table 2) . Thus, it would appear that all four LexA binding sites with no mismatches from the consensus are involved in regulating gene expression following DNA damage . A gene which was induced following DNA damage was found for 10 of the 12 SOS boxes with a mismatch from the consensus which were expected to bind LexA . In addition, an ORF, Rv0071, which might be inducible but for which the data were not clear, was appropriately positioned by one of the remaining two sites . In contrast, only one gene, Rv2100, located near the sites thought unlikely to bind LexA, was DNA damage inducible; this gene may not be regulated by that site, as another site is also situated within its coding sequence . Those sites for which no potentially regulated ORF is given in Table 2 were positioned further into an ORF, which was in any case not DNA damage inducible (data not shown) . The majority of the motifs found in the search for which the mismatch was predicted to be incompatible with LexA binding were located far from ORFs, with only one being within 300 bp of a predicted gene and three within 400 bp . None of these genes exhibited DNA damage-inducible expression (data not shown) . Thus, it would appear that the newly defined mycobacterial SOS box has good predictive power in terms of identifying LexA binding sites in the M . tuberculosis genome . In one case, two divergently transcribed genes (Rv2578c and Rv2579 or linB) are apparently both regulated by a single SOS box located between them . Although this also appears to be the case for Rv2719c and Rv2720 or lexA, we have shown in a separate study (unpublished data) that Rv2719c is not regulated by this SOS box . Thus, the locations of the SOS boxes which look to be functional range between 123 bp upstream of the translational start site and 240 bp into the coding sequence, although it should be emphasized that these coding regions are predicted ones and some of these ORFs may actually have alternative start sites . The induction ratio for recA observed in the microarray analysis is somewhat higher than that found with the lacZ reporter fusion to the wild-type sequence upstream of recA which was used in the analysis of the SOS box . This is likely to be due to the combined effects of a number of factors . Firstly, the reporter construct used contains only one of the two recA promoters, whereas obviously both are involved in expressing recA from the genome in the global analysis . Secondly, the reporter construct was assayed in M . smegmatis, whereas the microarray measured expression in M . tuberculosis and there may be a difference in the degree of induction between the two species . Thirdly, the uninduced control for the ß-galactosidase assays was incubated in parallel with the induced sample, while for the microarray analysis it consisted of a zero time point . Finally, and linked to the above, in the microarray analysis we were measuring RNA rather than the more stable ß-galactosidase protein, accumulation of which from low-level expression during the uninduced incubation would reduce the apparent induction ratio .
For the reasons stated above in the Results section, we considered it likely that the same motif would be recognized in M . tuberculosis as in M . smegmatis . Therefore, we went ahead with searching the M . tuberculosis genome sequence with the new consensus and identified 16 sites which we expected to be functional for LexA binding . The subsequent discovery in M . tuberculosis of DNA damage-inducible genes associated with all but 2 of these 16 sites supports this hypothesis . Furthermore, of eight sites identified with single mismatches predicted to be incompatible with LexA binding, only one had a gene nearby which was induced by DNA damage, and that gene was also associated with another site which was expected to bind LexA . Taken together, these results suggest that the SOS box defined in M . smegmatis also applies to M . tuberculosis . In some cases, there were ORFs in both orientations on the chromosome within a reasonable distance of the SOS box, either (or rarely, both) of which might be regulated, while in a few cases there was no identified ORF in a suitable location . In these latter cases it might be either that the identified SOS box is not performing a regulatory function or that there could be an as-yet-unidentified ORF nearby . Small ORFs in particular are difficult to predict in genome annotation projects, and some small M . tuberculosis proteins have been identified recently which do not have annotated ORFs in the genome sequence (16) . By examining the expression before and after DNA damage of the alternative ORFs near SOS boxes, 15 genes were identified whose expression is most likely regulated by LexA . For eight of these genes, the SOS box was located upstream of the ORF, as expected by analogy with LexA regulation in E . coli (26) . Surprisingly, however, four of the SOS boxes overlapped the predicted translational start site and a further three (or possibly four, depending on whether both SOS boxes within Rv2100 are functional) SOS boxes were located within the coding sequence . This positioning would imply that in M . tuberculosis LexA can repress expression by interfering with transcription after it has begun as well as by affecting transcription initiation, as is the case in E . coli (26) . However, it is possible that translation may actually begin further downstream than these predicted start sites and, thus, all the SOS boxes could in fact be upstream of the genes they regulate .
Interestingly, only 5 of the 15 genes predicted to be LexA regulated have known functions, with the other 10 genes representing novel DNA damage-inducible LexA-regulated genes . Of the five named genes, three (recA, lexA, and ruvC) were already known to be DNA damage inducible in M . tuberculosis (3, 7, 19) . The remaining two are dnaE2, which encodes the
At first sight, it may seem surprising that other known DNA repair genes which are part of the SOS regulon in E . coli were not identified in this analysis . However, in a previous study where we specifically examined a number of such genes in M . tuberculosis for DNA damage induction and LexA binding (3), we found that some of the M . tuberculosis homologs were not induced following exposure to mitomycin C (recN, dinP, dinG), while some others did exhibit DNA damage-inducible expression but no binding of LexA was detectable to DNA from 500 bp upstream of the coding region to 100 bp within it (uvrA, ssb) . Thus, it appears likely that there is an alternative mechanism of DNA damage induction in M . tuberculosis as well as that controlled by LexA . This study has identified a set of 15 DNA damage-inducible genes which are most likely regulated by LexA in M . tuberculosis . A similar approach used for E . coli brought the total number of LexA-regulated genes in that species to 31 (8) . We do not expect the list of genes identified here to be exhaustive, as we have only considered sites with up to one mismatch from the optimal sequence here . There could be some functional SOS boxes with more than one mismatch . Indeed, Rv2719c appears to be regulated by multiple SOS boxes each containing two or more mismatches (unpublished data) . Our search of the M . tuberculosis genome revealed the presence of 351 sites containing two mismatches . Elimination of those sites where one of the mismatches alone would prevent LexA binding leaves 96 potential sites, of which we would expect only a small number to be functional . However, identifying which of these sites might be functional is not straightforward with the information currently available, as we cannot tell which combinations of mismatches which individually permit LexA binding would retain that ability and which would not . Our initial analysis using the native recA SOS box suggests that very few pairs of mismatches may result in a functional site .
What Is Molecular Biology?,
What Is Genome?,
What Is Yeast?,
What Is Amino Acid?,
What Is Biofilm?,
i,
Bacterium,
r,
Bacteria,
a,
Microorganism,
i,
Microbe,
a,
Microbes,
s,
Bacteria,
n,
Bacteriological,
r,
Microorganism,
a,
Yeasts,
s,
Microbiological,
r,
Prokaryotes,
o,
Escherichia coli,
r,
Escherichia coli,
e,
Meningococcus,
n,
Schizosaccharomyces,
s,
Bacteriophages,
a,
Microbial,
s,
Bactericidal,
c,
Escherichia coli,
a,
Escherichia coli,
i,
Haemophilus,
e,
Antimicrobial,
n,
Yeasts,
e,
Bacillus,
o,
Escherichia coli,
n,
Flavobacterium
|
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||