|
|
|
Journal of Bacteriology, August 2003, p . 4997-5002, Vol . 185, No . 16
Identification of Long Intergenic Repeat Sequences Associated with DNA Methylation Sites in Caulobacter crescentus and Other
|
| ABSTRACT |
|---|
A systematic search for motifs associated with CcrM DNA methylation sites revealed four long (>100-bp) motifs (CIR sequences) present in up to 21 copies in Caulobacter crescentus . The CIR1 and CIR2 motifs exhibit a conserved inverted repeat organization, with a CcrM site in the center of one of the repeats .
| TEXT |
|---|
Methylation of DNA performs key functions in eukaryotic and prokaryotic cells . Bacterial adenine DNA methylation usually occurs in restriction-modification systems, which differentiate between self and non-self DNA (26) . Two prominent bacterial methyltransferases, however, are not part of restriction-modification systems: Dam in Escherichia coli and other
-proteobacteria (4, 6) and CcrM in Caulobacter crescentus and other
-proteobacteria (22) . Dam and CcrM regulate gene transcription and the timing of DNA replication initiation and can be important for virulence (7, 8, 20) .
Dam is not essential in E . coli . Regulation of transcription by Dam methylation in E . coli requires sequences in addition to the GATC methylation site . Two well-studied examples are phase variation in the pyelonephritis-associated pili (pap) operon (9, 25) and the outer membrane protein antigen 43 promoter (9) . In both cases, regulation depends on specific Dam methylation sites, which are distinguished by their surrounding sequence .
In contrast, CcrM is an essential gene in the
-proteobacteria C . crescentus (22), Brucella abortus (20), Sinorhizobium meliloti, and Agrobacterium tumefaciens (12) . DNA methylation in C . crescentus regulates transcription in the promoter for ccrM itself (23) and the P1 promoter of ctrA, a global transcriptional regulator (19) . Therefore, we sought to determine whether the CcrM recognition site, GANTC, is associated with conserved motifs . We identified four large (>100-bp) intergenic motifs in C . crescentus that contain conserved CcrM sites . Two of these motifs and several other motifs in other
-proteobacteria share three features: (i) they are composed of two inverted repeats; (ii) a CcrM site is in the center of one of the inverted repeats; and (iii) a conserved central linker joins the two inverted repeats . These novel motifs in
-proteobacteria may mediate regulatory functions of CcrM .
Genome sequences were downloaded from GenBank (ftp://ftp.ncbi.nih.gov/GenBank/genomes/Bacteria) and processed with the Genome-Tools package (http://genome-tools.sourceforge.net) (13) . Sequence alignments were done with the CLUSTALW 1.82 software program (24) and BLAST (1) . Consensus RNA secondary structures were predicted by using ConStruct 2.0 (14, 15), which uses the RNAfold 1.4 algorithm (10, 16, 28) . Default settings were used for CLUSTALW and ConStruct .
We examined 15 bp of sequence centered on each CcrM site (5 bp upstream and downstream of each GANTC) in C . crescentus . Excluding those which were associated with known transposases or insertion elements (17), four 15-mers occurred more than four times in intergenic sequences (Table 1; also shown are results for other
-proteobacteria) . Sequence conservation around each of these 15-mers extended to over 100 bp (for alignments, see supplementary materials at http://caulobacter.stanford.edu/CIR) . Using BLASTN to identify matches to each long motif, we found that only one or two matches do not contain CcrM sites . These long conserved motifs are therefore called Caulobacter CcrM-associated intergenic repeat 1 (CIR1) to CIR4 . Two of these motifs, Caulobacter CIR1 and CIR2 (present in 21 and 16 copies, respectively) (Fig . 1A and B), appear to be conserved in other bacteria; only these two motifs in C . crescentus and related motifs in other
-proteobacteria are discussed below .
|
|
The local gene organization around each Caulobacter CIR1 and CIR2 sequence is shown in Table 2 . CIR1 and CIR2 are often shortly downstream of flanking open reading frames (ORFs) (Fig . 2); the stop codon is often the beginning of the CIR1 or CIR2 consensus sequence (a distance of 1 in Table 2) . In these cases, CIR1 and CIR2 have not truncated the flanking ORFs (based on BLASTP [1] compared to the GenBank [5] nonredundant database, only 1 of 26 ORFs whose stop codon is supplied by a CIR1 or CIR2 sequence is truncated) . The identities of the flanking ORFs suggest no function for the CIR sequences .
|
|
Because Caulobacter CIR1 and CIR2 are close to flanking genes, we expect them to be at least partially transcribed . Both motifs are composed of two 52-bp inverted repeat sequences (arms) separated by a 12-bp linker and thus, if transcribed, are predicted to form two long stem-loops joined by the linker (Fig . 3A) . Of 38 differences between CIR1 and CIR2, 20 are compensatory changes preserving potential base pairing . The linkers are conserved and nonpalindromic, allowing CIR1 and CIR2 to be oriented . The CcrM site is in the middle of one of the arms (blue circles in Fig . 3A) . The presence of exactly one CcrM site seems important: only 2 of 37 CIR1 and CIR2 sequences have CcrM sites in both arms . Additionally, the arms (each individually inverted repeats) are nearly inverted repeats of each other, but one arm contains a single difference which destroys what would otherwise be a complementary GANTC site .
|
Two 110-bp motifs in Brucella melitensis (Brucella CIR1 and CIR2, present in 39 and 35 copies, respectively) are strikingly similar to the Caulobacter CIR1 and CIR2 motifs (Fig . 1C and D) . The Brucella CIR1 and CIR2 motifs are (i) composed of two inverted repeat arms joined by a central linker (Fig . 3B), (ii) have a CcrM site in the center of one of the inverted repeats, (iii) have a conserved central linker, and (iv) sometimes provide stop codons for flanking ORFs (data not shown) . The Brucella CIR1 motif is also often downstream of flanking ORFs (Fig . 2) . These ORFs are not related to the flanking ORFs in Caulobacter (finding the best BLAST hit in C . crescentus of the 137 ORFs flanking Brucella CIR1 and CIR2 sequences results in only two ORFs flanking Caulobacter CIR1 or CIR2 sequences; by random chance, one would expect to find three) . Thus, flanking ORFs again provide no suggestions for CIR functions .
Potentially related CIR motifs in other
-proteobacteria are diagrammed in Fig . 4 (for full sequences and alignments, see supplementary materials) . The Mesorhizobium CIR1 motif is shorter than those in Caulobacter and Brucella, and the central linker is different . However, it is also composed of two inverted repeats (arms) with a conserved CcrM site in the center of one arm . The Sinorhizobium CIR1 is composed of two inverted repeats, but the conserved CcrM site is within the central linker, whose sequence differs from the Caulobacter and Brucella linkers . However, two motifs previously identified in S . meliloti, RIME1and RIME2 (for Rhizobium-specific intergenic mosaic elements 1 and 2) (18), also have two inverted repeat arms joined by a central linker . The linker sequence in RIME1 is similar to the Caulobacter and Brucella CIR1 and CIR2 linker, but RIME1 has no conserved CcrM site in its arms . The lack of conserved CcrM sites in RIME1 and RIME2 explains why these sequences were not found by our searches . We found only a previously identified 440-bp motif associated with CcrM sites in Rickettsia prowazekii, with no resemblance to other CIR sequences . Notably, R . prowazekii lacks a CcrM homolog .
|
A similar search in E . coli for Dam-associated motifs yielded only three 14-mers (the Dam recognition site is 4 bp instead of 5 bp) . These were associated with the IS5 transposase, the 23S rRNA gene cluster, and an Rhs element (for "rearrangement hot spot," a large, protein-coding repeat element) (27) . Accordingly, no previously identified repeated intergenic sequence in E . coli K-12 is associated with Dam sites . The Caulobacter and Brucella CIR1 and CIR2 motifs resemble IRU/ERIC sequences in E . coli (11, 21) . IRU/ERIC sequences are
120 bp long, highly conserved, palindromic, and present in similar numbers . IRU/ERIC sequences were also found by sequence analysis; they are transcribed and have detectable transcriptional termination activity . However, gene regulation is probably not their primary role because this does not explain their extensive conservation (2, 11, 21) . By a similar argument, then, gene regulation is likely not the primary function of the CIR sequences .
The IRU/ERIC sequences differ from CIR1 and CIR2 in important ways, however . IRU/ERIC sequences have no consensus methylation sites, appear usually between genes in an operon (Fig . 2), and have a single conserved stem-loop in their predicted RNA secondary structure (11, 21) . No other previously identified repeated intergenic sequences outside of
-proteobacteria are analogous to the Caulobacter and Brucella CIR1 and CIR2 motifs; these CIR motifs are thus a new class of repeated intergenic sequences .
Like repeated intergenic sequences in other bacteria, the function of the CIR motifs is unknown . The association with methylation sites is novel, suggesting that understanding them may shed light on the functions of CcrM methylation . Their predilection for the end of genes suggests involvement in gene regulation, but they are not similar to known transcriptional terminators, and this would not explain their conservation . Their high conservation suggests a maintenance process, such as gene conversion (as has been postulated for the IRU/ERIC sequences) . The GC content of the Caulobacter CIR1 and CIR2 sequences is 44.8% ± 6.3% (all other intergenic sequences are 64.8% ± 11.5%), which suggests a foreign origin . However, they are not similar to known transposases or insertion elements . Furthermore, these sequences may be modular, since there is one hybrid Caulobacter CIR1/CIR2 sequence (arrows in Fig . 1A; Fig . 4), and several other CIR sequences seem to have variants based on different arm sequences (see supplementary materials) . Since repeated sequences seem to be found ubiquitously in intergenic sequences in all organisms (3), further characterization of CIR motifs and other intergenic sequences, both upstream and downstream of genes, is essential for understanding genome function and evolution .
| ACKNOWLEDGMENTS |
|---|
This work was supported by National Institute of Health grant GM51426 and NIH grant 2T32GM07365 to the Medical Scientist Training Program (S.L.C.) .
| FOOTNOTES |
|---|
* Corresponding author . Mailing address: Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, 279 Campus Dr., Stanford, CA 94304-5329 . Phone: (650) 725-7678 . Fax: (650) 725-7739 . E-mail: shapiro{at}cmgm.stanford.edu .
| REFERENCES |
|---|
What Is Bioassay?,
What is Food Microbiology?,
What Is Staphylococcus Aureus?,
What Is Dna?,
What Is Functional Genomics?,
n,
Bacterium,
n,
Microorganisms,
e,
Bacteria,
i,
Microorganism,
e,
Microbiology,
i,
Cell cultures,
o,
Antimicrobial,
n,
Salmonella,
n,
Escherichia coli,
o,
Pseudomonas aeruginosa,
i,
Meningococcus,
c,
Antibiotics,
n,
Pichia,
s,
Pasteurella,
o,
Paracocci,
s,
Pseudomonas aeruginosa,
n,
Haemophilus,
e,
Microbiological,
c,
Citrobacter,
s,
Yeasts,
r,
Staphylococcus aureus,
o,
Microbiological,
a,
Escherichia coli,
a,
Microflora,
o,
Pseudomonas aeruginosa,
i,
Microorganism
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||