|
|
|
Journal of Bacteriology, November 2002, p . 6163-6173, Vol . 184, No . 22 Genome Analysis and Strain Comparison of Correia Repeats and Correia Repeat-Enclosed Elements in Pathogenic Neisseria
Shi V . Liu,1, Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania,1 Sir William Dunn School of Pathology,2 Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom3 Received 10 July 2002/ Accepted 19 August 2002
Correia et al . identified a 26-bp sequence as a repetitive element in the pathogenic Neisseria spp . (8) . Subsequent studies showed that Correia repeats (CR) often constitute parts of longer repetitive sequence elements (9, 29) . By using two-dimensional S1 nuclease heteroduplex mapping, Correia et al . estimated that in Neisseria gonorrhoeae there are ca . 20 copies of 152-bp elements whose ends are composed of inverted repeats of the 26-bp CR sequence (9) . Gotschlich et al . identified a 105-bp sequence element that also contains the CR sequences as terminal inverted repeats, and these authors estimated that the 105-bp element is present at least 20 times in the genome of N . gonorrhoeae R10 (23) . The 154-bp Correia element(s) (CE) was considered to be a transcriptional terminator in the division cell wall (dcw) cluster of N . gonorrhoeae CH811 and a search of the sequence databases revealed another 19 copies of similar elements adjacent to different neisserial genes (14) . In genome sequences of Neisseria meningitidis, 163 copies of the 26-bp inverted repeat were found in N . meningitidis MC58 (49) and a total of 286 CE (sequences bounded by 26-bp inverted repeats) were found in N . meningitidis Z2491 (41) . More recently, Mazzone et al . reported a total of 270, 259, and 110 copies of nemis (Neisseria miniature insertion sequences) in whole genomes of N . meningitidis Z2491 and MC58 and N . gonorrhoeae FA1090, respectively (37) . In the present study, we first analyzed the sequence conservation in all CR found in three completed neisserial genomes: N . meningitidis Z2491 (41), N . meningitidis MC58 (49), and N . gonorrhoeae FA1090 (http://www.genome.ou.edu/gono.html) . We then used the most conserved region of the CR to identify those sequence elements that were enclosed by an inverted pair of the CR in these three complete genome sequences . We analyzed the detailed sequence features of CR-enclosed elements (CREE) and determined DNA sequences upstream of the sialyltransferase gene, lst, in several Neisseria strains to identify the size of indels in the loci of 107-bp CREE . The common sequence features identified among all CREE and the potential mechanisms for the formation of CREE may assist in an understanding of their origin, propagation, and function within the genome . The distinction between N . meningitidis and N . gonorrhoeae may supplement our understanding of the differential pathogenesis of these two clearly related but distinct pathogens .
PCR amplification of lst upstream regions (5'-lst). To obtain template DNA for amplification of 5'-lst, several colonies of Neisseria were suspended in 20 µl of sterile H2O and boiled for 10 min, chilled on ice for 2 min, and then incubated with RNase A (final concentration of 16 µg/ml) at 37°C for 30 min . These suspensions were centrifuged at 8,000 x g for 5 min, and supernatants were collected and placed on ice for immediate use or stored at -20°C for later use . To amplify the 5'-lst, we used primers synthesized according to sequence information derived from N . meningitidis MC58 (accession no . U60660) (18) . The forward primer, FuseBam-For1 (5'-CGCTGGATCCGACATCAATATCGG), starts 124 bp from the initiation codon of putative isocitrate dehydrogenase gene icd on its minus strand and contains a BamHI restriction site at its 5' end (underlined) . The reverse primer, FuseBam-Rev1 (5'-CAAAGGATCCTTTTTCAAGCCC), starts 24 bp from the initiation codon of lst on its minus strand and also contains a BamHI site (underlined) . PCR was performed in thin-wall glass capillary tubes by using an Air Thermo-Cycler (Idaho Technology Model 1605; Idaho Falls, Idaho) with a two-step program . Step one consisted of five cycles of 94°C for 0 s, 40°C for 0 s, and 72°C for 15 s . Step two consisted of 30 cycles of 94°C for 0 s, 55°C for 0 s, and 72°C for 15 s . DNA sequencing. DNA fragments amplified by PCR were cut from agarose gels after electrophoresis and purified by using a Wizard PCR Preps DNA purification system (Promega, Madison, Wis.) . Nucleotide sequences were determined with the terminal primers described above by cycle sequencing with fluorescently labeled dideoxynucleotides (DyeDeoxy terminators; Perkin-Elmer) by using automated DNA sequencing facilities at the MCP Hahnemann School of Medicine . Genome sequences. The genome sequence of N . meningitidis MC58 (49) was obtained from The Institute of Genomic Research (ftp://ftp.tigr.org/pub/data/n_meningitidis/) . The genome sequence of N . meningitidis Z2491 (41) was obtained from the Sanger Center (http://www.sanger.ac.uk/Projects/N_meningitidis/) . The genome sequence of N . gonorrhoeae FA1090 was from the University of Oklahoma (http://www.genome.ou.edu/gono.html) . The last search for CR and CREE in N . gonorrhoeae FA1090 was based on sequence released on 15 September 2000 . It contains 2,154,110 sequence characters . BLAST analysis of DNA sequences. BLAST searches were performed by using BLASTN (version 1.4.7) in the version 9.1 Wisconsin package (Genetics Computer Group) . The ungapped BLASTN program was used for searching homologues of the prototypic 26-bp CR (5'-GTACCGGTTTTTGTTAATTCACTATA) and a 105-bp sequence element was identified in the DNA sequence upstream first in N . meningitidis MC58 . The GCG output information was parsed into a Microsoft Access database by using a custom written program . The copy number, length, and genomic location of the archived CR and CREE sequence information was analyzed by using the filtering, sorting, and querying functions of the Microsoft Access program . For sequence comparison, we converted all sequence elements to the same stranded orientation . Consensus sequences for CRs were found by using GCG's PILEUP and LINEUP programs . Consensus sequences for 105-bp sequence elements and other CREE were obtained by using the CLUSTALW multiple sequence alignment tool (http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html) and the BoxShade program (http://www.ch.embnet.org/software/BOX_form.html) . For analysis of the flanking sequences of the 105-bp elements, a custom written computer program was used to retrieve each 105-bp sequence and its 1-kb flanking region on each side . These retrieved 2,105-bp sequences were used as queries for BLASTN searches against the GenBank "nr" (nonredundant) database by using NCBI's BLAST 2.0 website (http://www.ncbi.nlm.nih.gov/BLAST/) with the default parameters . FINDPATTERNS analysis of DNA sequences. Single completed genomic sequences were divided into smaller segments of ca . 300 kb to comply with the length limitation of the GCG programs . These segments were then converted into a GCG-utilizable format by using the FROMFASTA and the data set programs in the GCG package . To avoid potential failure in detecting sequences at the ends of the divided segments, the complete genome sequences were divided into both eight and seven segments, and the number of findings were examined for agreements . The presentation of repeats in the genomes are based on original sequence coordinates indicated in the complete linear sequences . Searching for CREE was performed by using the FINDPATTERNS program of the GCG package . The search pattern used was TATAGTGGATT(N)[0, 328]AATCCACTATA, where "(N)[0, 328]" means any sequence with a length between 0 and 328 bp . This maximum limit of 328 bp in the core region is due to the total length limit of 350 bp in the FINDPATTERNS program . During the execution of the program, we used mismatch values from 0 to 6 . The GCG output information was parsed into a Microsoft Access database by using a custom-written program . Then the copy number, length, and genomic location of the archived sequence information were analyzed by using the filtering, sorting, and querying functions of Microsoft Access . Gene and ORF identification. The matched segments were manually inspected for genes or open reading frames (ORFs) as identified in the genome reports for N . meningitidis Z2491 (41) and N . meningitidis MC58 (49) . The distance of 105-bp elements relative to those identified genes or ORFs was determined by manual inspection of the genomic coordinates . Fine contextual analysis was then performed for N . meningitidis MC58 by using ACEDB (R . Durbin and J . T . Thierry-Mieg . 1991 . A C . elegans database . Documentation, code, and data available at http://www.acedb.org), as used in the annotation of the genome sequence of this strain (49) .
Most of the CREE lengths occur one to three times in each neisserial genome . Only the 154- to 156-bp and the 105- to 107-bp CREE have four or more copies per genome (Table 2) . Interestingly, CREE of these lengths are also often found as simple CREE contained within complex CREE (Table 2) . The longest simple CREE within the limitation of our FINDPATTERNS analysis is 341 bp, which was found only once in each of the three neisserial genomes . The 341-bp CREE of the two N . meningitidis strains are 100% homologous, whereas the 341-bp CREE of N . gonorrhoeae FA1090 shows little homology with the 341-bp CREE of N . meningitidis, having homologies of only 18 bp at the 5' end and 11 bp at the 3' end . Sequence patterns of simple CREE. To facilitate sequence analysis, we oriented all simple CREE into the same sequence direction . We then inspected all sequences to determine whether CREE of the same length belonged to one or more subtypes . Finally, we aligned each CREE length and subtype that had two or more copies in the genome . We collected all of the consensus CREE sequences and the original sequences of the single-occurrence CREE into a new data set for further analyses . For the convenience of sequence comparison and presentation, we separated the symmetrical terminal inverted repeat regions from the asymmetrical core regions of each CREE and ordered them in separate tables . The terminal inverted repeats of simple CREE are listed in web tables 2 to 4 for N . meningitidis Z2491, N . meningitidis MC58, and N . gonorrhoeae FA1090, respectively, and the core regions of CREE from N . meningitidis strain Z2491, N . meningitidis MC58, and N . gonorrhoeae FA1090 are listed in web tables 5 to 7, respectively . Inspection of these consensus sequences further revealed the modular nature of the sequence composition in the TIR, as well as the core region of CREE (Fig . 2) .
In general, 105-bp indels are scattered in each genome (Fig . 5A) . However, the two N . meningitidis strains showed some similarity in the spacing pattern of some 105-bp elements . This similarity became clearer when the 14 matched 105-bp loci (Table 4) were plotted in pairs (Fig . 5B) . Among these 14 pairs of matched 105-bp loci, 11 pairs are located in similar chromosomal locations . The other three pairs are in different chromosomal locations but the relative positions among these three loci are similar in the two N . meningitidis strains . When the DNA sequence segment containing these three 105-bp element loci was inverted, these three loci matched well between the two N . meningitidis genomes (data not shown) . This result confirms that there is a major sequence inversion between the two N . meningitidis genomes (49) .
Genes and ORFs associated with or adjacent to 105-bp indels. The homologous sequences matched by the BLAST analysis with the above 2,105-bp sequence queries were examined for genes and ORFs according to the published annotations for the two N . meningitidis genomes . Most 105-bp indels are intergenic (web tables 12 to 14), whereas some are located within ORFs; however, these ORFs are almost exclusively identified as transposases or insertion sequences except for two of the 17 copies of 105-bp indels in N . gonorrhoeae FA1090, which are inserted into ORFs other than those coding for transposases . Importantly, some 105-bp indels are located close (<200 bp) to Neisseria virulence factors . For example, among the identifiable genes, 105-bp indels are associated with immunoglobulin A-specific serine endopeptidase (iga2), sialyltransferase (lst), outer membrane protein opcA, and class I outer membrane protein porA in the two N . meningitidis strains (web tables 12 and 13) . The 105-bp indels are also associated with some metabolic genes such as glyceraldehyde-3-phosphate dehydrogenase (gapA and gapC) and acetate kinase (ackA) and also transporter genes for an ABC transporter ATP-binding protein . The 105-bp indels are adjacent to different sets of genes and ORFs in N . gonorrhoeae FA1090 (web table 14) . Detailed analysis of the N . meningitidis MC58 sequence using the ACEDB graphical interface revealed additional contextual information . In four instances (numbers 13, 16, 17 and 23) the 105-bp indels are located between convergent 3' ends of genes, and a further six elements (numbers 3, 4, 5, 6, 11 and 19) are located remotely from likely promoter locations so that they are unlikely to affect expression . One (number 15) is inserted into a dead gene . Four elements (numbers 10, 12, 21, and 22) are present in the same location within copies of the IS1106 transposase . In seven instances (numbers 1, 2, 7, 8, 14, 18, and 20), the 105-bp indels are located such that they would be expected to form part of the promoter region of the associated gene, typically with the terminal TA forming part of a putative -10 element . The genes in which the 105-bp indels are likely to influence expression are (in the corresponding order): pilF (NMB0329), hypothetical proteins (NMB01387 and NMB0882), lst (NMB0922), a flavin reductase homologue (NMB1359), a hypothetical protein (NMB1782), and a peptidase homologue (NMB1877) . The 105-bp indel associated with hypothetical protein NMB0882 is actually located between two divergent promoters but it is not located such that it would be expected to influence the adjacent cysT gene (NMB0881) . Thus, 7 of 23 (ca . 30%) 105-bp indels in N . meningitidis MC58 may affect expression of associated genes .
In the present study, we first identified the most conserved region of CR as an 11-bp terminal sequence . Then we used this 11-bp sequence to formulate a sequence pattern that allows retrieval of any sequence bracketed by an inverted pair of this 11-bp sequence (within a specified overall length range and mismatch in the 11-bp sequence region) . The FINDPATTERNS analysis performed in this way is much easier than BLASTN analysis for identifying CREE . This FINDPATTERNS analysis also gives more consistent and comparable data than the equivalent BLASTN analysis since it uses the same sequence pattern for identifying related sequences of varied lengths compared to using different queries in the BLASTN analysis . In fact, this FINDPATTERNS analysis is more powerful than the equivalent BLASTN analysis because it allows detection of greater diversity in the core regions and includes a wider range of lengths, i.e., more overall hits of more complex CREE . The accuracy of the FINDPATTERNS-based analysis was tested in a direct comparison with the BLASTN-based analysis of 107-bp CREE (105-bp indel) . FINDPATTERNS analysis detected 26, 24, and 19 copies of 107-bp CREE, whereas BLASTN analysis retrieved only 26, 23, and 17 copies of 105-bp indels in N . meningitidis Z2491, N . meningitidis MC58, and N . gonorrhoeae FA1090, respectively . In comparing the number of CR and the number of CREE, it is apparent that CREE are indeed the most common form of CE . This is especially true for the two N . meningitidis genomes in which almost all identified CR are found in CREE, usually as pairs of TIR . We made several interesting new observations in the present study . First, some longer CREE contain CREE of shorter lengths . Second, although the length profile of CREE shows several discrete clusters, there is often a continuous spreading of lengths within each cluster . Third, both the terminal regions of CREE and the core regions of CREE consist of distinctive sequence blocks . Fourth, the shortest CREE can be as short as 28-bp and comprise just the inverted CR . Our analysis revealed that variations of CREE length reflect various combinations of several common sequence blocks (Fig . 2; see also web tables 2 to 7) and that CREE of the same length do not necessarily share the same sequence . In fact, many CREE lengths contain subtypes of sequences . These lengths include 159-, 156-, 155-, 106-, 96-, and 73-bp CREE in N . meningitidis Z2491 (web tables 2 and 5); 156-, 154-, 106-, and 105-bp CREE in N . meningitidis MC58 (web tables 3 and 6); and 155- and 105-bp CREE in N . gonorrhoeae FA1090 (web tables 4 and 7) . The differences in the subtype sequences within these CREE are often due to the different combinations of the terminal inverted repeat sequence blocks and the core sequence blocks . However, the differences are sometimes also due to the presence of completely different core sequences . For example, the 341-bp CREE of N . gonorrhoeae FA1090 is different from the 341-bp CREE of the two N . meningitidis strains . The CREE described in the present study do not contain any ORFs and cannot have any transposase activity . Thus, they are not insertion sequences by conventional definition (36) . In this sense, we feel that it may be misleading to call CE (most of them are shown in the present study as CREE) as "nemis (Neisseria miniature insertion sequences)" (37) . Instead, CREE may be better treated as indels, i.e., regions of DNA that are present on the chromosome of an organism but absent from closely related organisms (7) . The experimental determination of DNA sequences upstream of sialyltransferase gene lst in multiple strains of N . meningitidis and N . gonorrhoeae revealed that a 107-bp CREE is actually a combination of a 105-bp indel with a TA dinucleotide in the flanking target sequence upstream of the 105-bp indel . Since all CREE have TA dinucleotides at their termini, it is reasonable to assume that all CREE are also made of target TA dinucleotides and indels of different lengths . This is supported by the observations of Abadi et al . that a 157-bp indel is located between the divergently transcribed mtrR and mtrC genes in N . meningitidis but not in the same region in N . gonorrhoeae (1) . Upstream of this 157-bp indel in the N . meningitidis sequence is a TA dinucleotide . Thus, a 159-bp CREE is formed when this 157-bp indel combined with this flanking TA . Currently, little is known about the origin or propagation of CREE . The discrete clustering of CREE around several lengths and the modular nature of CREE sequences suggest that the sequence diversity of CREE did not arise through sequential single base additions or deletions . Rather, different CREE might form via insertion or deletion of smaller sequence blocks plus some single-base-pair mutations . The question remains whether all CREE are mobile or whether some CREE are simply "genetic fossils" of past DNA mobilization events . It also remains to be determined how frequent and by what mechanism this mobilization occurs . For example, the 105-bp indel found upstream of the lst gene of N . meningitidis strains might be formed through an insertion of a 103-bp or a 105-bp sequence element (Fig . 6) . If the inserted sequence originally exists as a 103-bp sequence element, then a duplication event of the target TA sequence must occur to add another TA to the 3' end of the 103-bp element and form a 105-bp indel . The fact that all CREE contain a TA direct repeat at both ends supports the possibility of TA duplication . The observation of TA dinucleotides as the most common direct repeat found in the target sequence flanking CREE further suggests that repeated TA duplication may occur as multiple insertion events in the same target sites and that repeated insertion and target duplication events account for the spreading or diffusion of CREE length within each length cluster . Although target duplication during the insertion of noncoding short sequence elements has not been directly demonstrated in prokaryotes, such events have been described for short interspersed elements in eukaryotes (33) . A recent study showed that a 107-bp interspersed repeat in Streptococcus pneumoniae could be mobilized via trans-mobilization by using the transposase of IS630-Spn1 and, interestingly, a TA dinucleotide exists upstream of this 107-bp repeat (39) . Thus, it is possible for the noncoding CREE to be passively mobilized and short sequences such as a TA dinucleotide in the insertion target to be duplicated in the insertion process .
The relationship between CR and CREE resembles that between REP sequences and REP elements . The conservation of terminal regions in CR, and thus all CREE may reflect a highly conserved function for this DNA sequence fragment such as serving as a binding site for transposase . The 28-bp shortest CREE may represent the simplest indel structure on which a transposase can work . The diversification of core region may form the structural basis for the diverse function for various CREE . In this regard, there have been increasing reports of associating CREE with distinctive functions . For example, an earlier report demonstrated that a 106-bp CREE has promoter activity for the uvrB gene in N . gonorrhoeae (3) . A study found that a 154-bp CREE appears to act as a transcriptional terminator (14) . A recent study demonstrated that nemis (corresponding to some CREE described here) are cotranscribed with nearby cellular genes and subsequently processed at either one or both TIR (37) . Since many CREE are located upstream of the coding regions of genes (web tables 1 and 12 to 14), the possibility for their involvement in a much wider range of gene regulation exists . For example, a 107-bp CREE is located near the regF-regG gene cluster (10), a 154-bp CREE is located downstream of the mtrE gene in N . gonorrhoeae (11), a 156-bp CREE exists in N . gonorrhoeae between divergently transcribed frpB and groES (48), a 152-bp CREE is located between carA and carB in both pathogenic and commensal Neisseria strains (34), and a 159-bp CREE is located between divergently transcribed mtrR and mtrC genes in N . meningitidis but not in the same region in N . gonorrhoeae (1) . A total of 20 copies of CE were found in intergenic regions adjacent to different neisserial genes, including some virulence genes (14) . However, the functional significance of these CREE remains to be experimentally determined . Pathogenic Neisseria strains contain very "plastic" genomes (2, 6, 17) and are naturally competent for transformation (46), and they show evidence of extensive horizontal gene transfer (15, 32) . The presence of a large number of interspersed DNA repeats in pathogenic genomes could affect the functional and evolutionary behaviors of these pathogens . The abundance of CREE (260, 255, and 98 nonredundant copies) and the percentage of nucleotides contained in these CREE (1.67, 1.55, and 0.59%) in the genomes of N . meningitidis Z1491, N . meningitidis MC58, and N . gonorrhoeae FA1090, respectively, are higher than that described for comparable intergenic repeats in other prokaryotic species . For example, the best-studied REP sequences and REP elements account for ca . 0.54% of the E . coli K-12 genome (4) . The impact of these dispersed CREE in pathogenic neisserial genomes may be greater than currently realized because of their abundance and proximity to several virulence genes . Pathogenic Neisseria strains present an interesting example of morphologically and biochemically similar organisms that cause very distinctive diseases, including life-threatening disseminated septicemia and meningitis caused by N . meningitidis and localized urogenital tract disease caused by N . gonorrhoeae . Analyses of the physical chromosomal maps of N . meningitidis (13, 16) and N . gonorrhoeae (12) show a high degree of conservation in overall gene organization between N . meningitidis Z2491 and N . gonorrhoeae FA1090 (13) and between N . meningitidis B1940 and N . gonorrhoeae FA1090 and MS11 (16) . Previous DNA hybridization studies estimated that N . meningitidis and N . gonorrhoeae are 90% homologous in genes that are common to both species (24, 27) . Whole genome sequence comparison showed that 91.2% of the 2,158 ORFs of N . meningitidis MC58 are similar to the ORFs of N . meningitidis Z2491 (49) . There is no doubt that differences in the genes or ORFs (31, 42, 50) are important in determining the pathogenic differences between different Neisseria strains . However, the question is whether the distinctive pathogenic behaviors of different pathogens can be completely explained by these differences in the genes or ORFs . In this regard, the characterization of significant extragenic differences in one major family of neisserial repeat, CREE, among three strains of pathogenic Neisseria strains may offer some valuable additional insights . A better understanding of CR and CREE may help the study of the evolutionary history and the phylogenetic classification of Neisseria . It is known that DNA loss via indel mutation is a determining factor for genome size reduction in eukaryotes (43) . The relative genome sizes of three pathogenic Neisseria strains are 100% (N . meningitidis MC58), 96.1% (N . meningitidis Z2491), and 94.8% (N . gonorrhoeae FA1090) . The contribution of CREE to the size variations of these three genomes is 0.08% between the two N . meningitidis genomes and ca . 1% between N . gonorrhoeae and N . meningitidis genomes . Thus, deletion or insertion of CREE alone can be a significant factor in altering the genome size of pathogenic Neisseria strains . CREE may serve as hot spots for genomic recombination and rearrangements, which may involve even larger segments of DNA and affect many different sets of genes, making CREE an important extragenic DNA component in the study of genome function .
This work was supported in part by grants AI33505 and AI20897 to R.F.R . from the National Institutes of Health . N.J.S . is supported by a Wellcome Trust Research Fellowship in Medical Microbiology .
What Is Botulism?,
What Is Biofilm?,
What Is Amino Acid?,
What Is Protein?,
What Is Bioreactor?,
n,
Bacterium,
i,
Microbe,
i,
Bacteria,
e,
Microorganism,
c,
Microbiology,
e,
Streptococcal,
e,
Salmonella,
i,
Bacillus,
o,
Cell suspensions,
c,
Microbial,
i,
Microflora,
i,
Haemophilus,
n,
Staphylococcus,
a,
S. cerevisiae,
o,
Agrobacterium,
i,
Erythromycin,
e,
Aeromonades,
o,
Yeasts,
a,
Salmonella,
i,
Agrobacterium,
c,
Gram negative,
n,
Pseudomonas aeruginosa,
s,
Microbial,
r,
Vibriosis,
c,
Pasteurella,
r,
Fermentations
|
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||