|
|
|
Journal of Bacteriology, June 2003, p . 3307-3316, Vol . 185, No . 11 How Clonal Is Staphylococcus aureus?Edward J . Feil,1* Jessica E . Cooper,1 Hajo Grundmann,2 D . Ashley Robinson,1 Mark C . Enright,1 Tony Berendt,3 Sharon J . Peacock,4 John Maynard Smith,5 Michael Murphy,6 Brian G . Spratt,7 Catrin E . Moore,3 and Nicholas P . J . Day3 Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY,1 Microbiology and Infectious Diseases, Queen's Medical Centre, University Hospital Nottingham, Nottingham NG7 2UH,2 Centre for Tropical Medicine, Nuffield Department of Clinical Medicine,3 Nuffield Department of Clinical Laboratory Sciences, University of Oxford,4 National Blood Service, John Radcliffe Hospital, Oxford OX3 9DU,6 School of Biological Sciences, University of Sussex, Falmer, Brighton BN1 9QG,5 Department of Infectious Disease Epidemiology, Faculty of Medicine, Imperial College London, St . Mary's Hospital, London W2 1PG, United Kingdom7 Received 6 January 2003/ Accepted 17 March 2003
The extent to which homologous recombination contributes to the emergence and subsequent diversification of clones is also at present unclear, although this question has important implications both for the choice of the most appropriate typing strategy for effective epidemiological surveillance and for vaccine design . The population structure of S . aureus has been studied previously by a variety of techniques, including multilocus enzyme electrophoresis (21, 22), pulsed-field gel electrophoresis (12), and multilocus sequence typing (MLST) (4) . These studies have revealed a highly clonal population, consistent with the view that S . aureus, unlike the freely recombining pathogenic species Streptococcus pneumoniae (10), Neisseria meningitidis (9), and Helicobacter pylori (32), is not naturally transformable . However, as discussed elsewhere (11, 30) relatively high rates of recombination within a bacterial population can be masked by sampling bias or the temporary expansion of adaptive genotypes, and detailed analysis of nucleotide sequence data is required to confirm inferences about recombination rates obtained from the apparent extent of linkage disequilibrium in the population . Here we present an analysis of the sequence data obtained from the
characterization by MLST of 334 S . aureus isolates recovered
from persons with invasive disease or asymptomatic carriage .
These data consist of the nucleotide sequences of
Of the 155 isolates recovered from patients with cases of invasive disease, 28 were methicillin-resistant S . aureus (MRSA), of which only one was recovered from a patient with a case of community-acquired disease . Twenty-three of the MRSA isolates belonged to a single clone (sequence type 36 [ST36]), three belonged to ST22 (this clone included the single MRSA isolate from the community), and the remaining MRSA isolates corresponded to ST12 and ST38 . MLST. MLST was carried out with an ABI 3700 capillary sequencer and a standard set of 14 primers as described previously (4) . The seven genes included in the S . aureus MLST scheme are arcC, aroE, glpF, gmk, pta, tpi, and yqiL . Information on these loci, the primer sequences, and PCR conditions is available on the MLST website (http://www.mlst.net) . Isolates are defined by the alleles present at the seven loci (the allelic profile), and each unique allelic profile is assigned as an ST . Isolates with the same ST, therefore, have identical sequences at all seven MLST loci and are considered to be members of a single clone . The revised data set (3) was carefully checked for errors, particularly the sequences of variant alleles within single-locus variants (SLVs; those strains which differ at only one locus out of seven from an assigned clonal ancestor; see below) . All the variant alleles within SLVs were reamplified and resequenced . Assignment to clonal complexes. The program BURST was used to divide the 334 isolates into clonal complexes, which are defined as groups of STs in which every ST shares at least five of seven identical alleles with at least one other ST in the group (3) . The genotype (ST) that gave rise to each clonal complex (the clonal ancestor) initially will diversify to produce variants that differ at only one of the seven loci (SLVs) . Ancestors of clonal complexes were therefore assigned on the basis that they differ at a single locus from the highest number of other genotypes in the clonal complex (i.e., they define the most SLVs) . The likely pattern of descent of each ST from the clonal ancestor is then displayed graphically (Fig . 1) . (This analysis has previously been carried out on the corrected data set [3].) To assess whether any of the clonal complexes represent hypervirulent lineages (i.e., lineages that contain a disproportionate frequency of disease isolates), we examined the distribution of disease and carriage isolates between clonal complexes by contingency table analysis . The frequencies of isolates from persons with nasal carriage, community-acquired disease, and hospital-acquired disease within all clonal complexes and clones that were represented by more than five isolates were compared, with the remaining singletons arbitrarily placed in a single group . The distribution of isolates of differing epidemiological origins was then tested by the chi-square test .
Estimates of recombination rates. BURST was used to identify the most likely (i.e., parsimonious) ancestral ST within each clonal complex and all those strains which have diverged from the predicted clonal ancestor at a single locus but have remained unchanged at the other six (SLVs; Fig . 1) . Estimates of the fraction of these SLVs that have arisen by recombination can be made by a method described elsewhere (9, 10) . Briefly, the sequences of the alleles that differ between each ancestral ST and its associated SLVs are compared and are assigned as resulting from either a recombinational replacement or a point mutation . Putative point mutations are assigned on the basis of two criteria: firstly, alleles which have arisen by a single point mutation will differ only at a single nucleotide site, and secondly, de novo point mutation will result in an allele that is very likely to be unique within the data set . Variant alleles in SLVs satisfying these two criteria are thus assigned as having arisen by point mutation . Those that differ at multiple nucleotide sites, or which differ at a single site but correspond to alleles found elsewhere in the data set, are assigned as having arisen by recombination . A comparison between allelic and nucleotide divergence. If clones diversify predominantly by the stepwise accumulation of point mutations, then two strains which have only very recently diverged will show a high level of similarity both in terms of their respective allelic profiles and in terms of the sequences of their nonidentical alleles . In other words, two very closely related strains will share many identical loci, and furthermore those loci which do differ will do so only at a small number of nucleotide sites . However, if clones diversify predominantly by recombination, then alleles which happen to differ between very closely related strains may do so at a large number of nucleotide sites, owing to the fact that these alleles have been imported from an unrelated lineage . Therefore, under a mutational model, allelic and sequence divergence will show a positive correlation (both parameters will reflect time since the common ancestor), but this relationship will not hold if alleles change predominantly by recombination (when identical alleles are excluded) . We examined the relationship, over all pairwise comparisons, between the number of allelic mismatches and the average number of nucleotide differences per locus (excluding loci that have remained identical) . For each pairwise comparison between STs, the number of loci that differed and the mean number of nucleotide differences between nonidentical loci (m) were computed . The average of m was obtained for all pairs of STs that differed at a given number of loci and was plotted against the number of allelic differences . Only a single example of each ST was used in this analysis, since this minimizes the influence of any sampling effects resulting from the overrepresentation of specific clones . In order to calibrate the results from S . aureus, the approach was also used on MLST data for S . pneumoniae, a naturally transformable species that is known to recombine at a high frequency (10) . All 575 unique STs from the S . pneumoniae MLST database (as of August 2002) were analyzed by this approach . A program for implementing this analysis (written by D . A . Robinson) is available on request . ML analysis of congruence. Multilocus data sets are ideal
for examining the degree to which the phylogenetic signal varies
between gene loci . An ML method has been described which scores two
trees as either significantly congruent (similar) or no more
congruent than two trees of random topology (8) .
The approach compares the ML scores of gene trees against the 99th
percentile of the distribution of scores for 200 trees of random
topology given the reference data . The HKY85 model of nucleotide
substitution was used for tree reconstruction, with the
transition/transversion (Ti/Tv) ratio and the
Two genes, A and B, are scored as significantly congruent if the
difference between the likelihood scores of the trees for gene A and
gene B (
Distribution of disease isolates between clonal complexes. Isolates from persons with nasal carriage, community-acquired invasive disease, and hospital-acquired invasive disease were evenly distributed among the clonal complexes, suggesting no significant differences in their propensity to cause disease (P = 0.24; Table 1) . There was a preponderance of isolates from the hospital-acquired disease group in CC30/39, but this was not statistically significant and can be explained by the existence of isolates from the EMRSA-16 clone (ST36) within CC30, all 23 strains of which were from patients with hospital-acquired disease .
The two remaining SLVs both belong to CC30/39 . The only case of parallel paths in the splits graphs of clonal complexes concerns the relationships of these two SLVs (ST34 and ST40) in CC30/39, which implies that they may have arisen by recombination (Fig . 1) . ST34 is the more likely to have arisen by recombination because the variant allele differs at two nucleotide changes from the predicted ancestral allele in ST30, and the variant allele is present in strains outside CC30/39 . The origin of ST40 is more difficult to characterize, because it falls midway between the two putative ancestors of this complex (i.e., it is an SLV of both ST30 and ST39, which themselves differ at two loci) . ST40 possibly reflects a step in the mutational pathway between these two ancestral clones, and this is supported by the observation that ST40 differs at only a single nucleotide site from ST30 (in tpi) and from ST39 (in pta) . This may account for the widespread distribution of the variant alleles in ST40 (at tpi and pta) throughout this clonal complex; thus, in this case the presence of the variant alleles within a number of other genotypes is not convincing evidence for recombination and may instead reflect identity by descent . If we cautiously assign ST40 and ST34 as having arisen by recombination, then we can estimate that 33 of the SLVs have arisen by point mutation and that only two have arisen by recombination . As a conservative estimate (with respect to the frequency of mutation), it therefore appears that during the initial stages of clonal diversification alleles are at least 15-fold more likely to change by point mutation than by recombination . S . aureus contrasts with N . meningitidis and S . pneumoniae, which are both naturally transformable, where alleles change between 5- and 10-fold more frequently by recombination than by mutation (7) . Of the 33 putative point mutations listed in Table 2, 23 are nonsynonymous, 9 are synonymous, and 1 is a nonsense mutation (resulting in the allele glpF13); the possible significance of this observation is discussed below . The presence of all of these point mutations was verified by reamplifying these gene products and resequencing the alleles on both strands . This approach has been presented previously with MLST data for S . aureus prior to corrections in the data (8) . This original analysis suggested that the impact of recombination on clonal diversification in S . aureus is more significant than point mutation and gave estimates comparable to those for N. meningitidis and S . pneumoniae; this estimate was also noted in a recent review (11; see also reference 31) . We wish to emphasize that we believe this original estimate to be erroneous, and that the present analysis of the revised data, as discussed above, most accurately reflects the microevolutionary events occurring within clonal complexes of S . aureus . The relationship between nucleotide and allelic divergence. To examine further the validity of the above approach to estimating the contribution of recombination and mutation to clonal diversification, we extended the analysis to all pairwise comparisons of STs, rather than just focusing on the pairs of STs within major clonal complexes that differ at a single locus . The number of allelic mismatches was plotted against the average number of nucleotide changes at nonidentical loci for the S . aureus data set and for the complete S . pneumoniae MLST data set held on the MLST website (Fig . 2) . For S . aureus, there was a clear positive trend between the proportion of differing loci and the average number of nucleotide changes per locus, but no such trend was apparent for S . pneumoniae, thus supporting the suggestion of high rates of recombination within the latter species and relatively low rates within the former .
Congruence analysis. By examining the degree of phylogenetic consistency between gene trees, it is possible to gauge the impact of recombination and the feasibility of reconstructing a meaningful intraspecies consensus tree . The results of the congruence analysis for the set of 25 diverse STs are given in Table 3 . Of the 42 pairwise comparisons of the seven loci, 23 are significantly congruent . This result contrasts markedly with a previous analysis of congruence on S . aureus MLST data, where only 5 of 42 comparisons were significantly congruent (8); this original estimate is now believed to be unreliable since the analysis was done prior to the corrections in the MLST data set . The significant congruence in 55% of the S . aureus tree comparisons contrasts with that obtained for S . pneumoniae, where all pairwise comparisons were found to be noncongruent (8), and lends further support to the suggestion that recombination is much less frequent in S . aureus than in S . pneumoniae . However, 19 of the pairwise comparisons remain noncongruent, which suggests that recombination has had some impact in S . aureus .
The relationships between STs within each major group are often inconsistent between gene loci . This is compatible with a history of recombination but may also reflect an insufficient number of phylogenetically informative sites to allow a robust reconstruction of the branching order . To investigate further the relationships between the 25 diverse STs, an unrooted ML tree was reconstructed from the concatenated sequences of the seven loci by using the same likelihood model that was used to reconstruct trees for individual gene loci, and this tree also supports the conserved node evident in the individual gene trees (Fig . 4b) . Synonymous and nonsynonymous substitutions. The suggestion that recombination has distorted the relationships between clonal lineages seems at odds with the evidence that clones diversify predominantly by point mutation rather than by recombination . A possible way to reconcile these conflicting lines of evidence is suggested by the observation that 23 of the point mutations within SLVs were nonsynonymous, 9 were synonymous, and 1 resulted in the generation of a stop codon in glpF (Table 1) . The genes used for MLST were chosen on the basis that they are ubiquitous "core" housekeeping genes, subject to stabilizing selection; it might therefore be expected that synonymous mutations, which are far more likely to be neutral, should outweigh nonsynonymous substitutions . In order to compare the ratio of synonymous to nonsynonymous substitutions within clonal complexes to that between clonal complexes, the average dS/dN ratios were computed over all pairwise comparisons for each locus from the diverse set of 25 STs . On average over all loci, the dS/dN ratio was far higher for comparisons between the diverse genotypes (8.6) than for comparisons between SLVs and their clonal ancestors (1.3) (Table 4) . In comparing SLVs with their ancestral sequences, we found 729.76 synonymous sites and 2,468.24 nonsynonymous sites (these values are averages for the ancestral sequences of each SLV) . Nine synonymous substitutions, and 23 nonsynonymous substitutions within SLVs (Table 2) correspond to a dS of 0.012, a dN of 0.009, and a dS/dN ratio of 1.32 . This implies that many of the nonsynonymous point mutations observed within the SLVs will in time be lost from the population by purifying selection . The relative impact of point mutation in the long-term evolution of this species may therefore be inflated when only the initial stages of clonal diversification are examined .
Disease isolates are equally represented in all the clonal complexes, suggesting that there is no link between MLST genotype and the propensity to cause disease . This finding appears to be in contrast to a recent study by Booth et al., who detected differences in the frequencies of specific lineages (as defined by pulsed-field gel electrophoresis) when comparing samples of clinical and carried isolates (1) . A possible explanation is that the present comparison is based upon clinical and carried isolates drawn from a single well-defined population, thus minimizing any differences between the samples that reflect geographical or temporal structuring and are unrelated to virulence . There is strong evidence in some pathogens for marked differences in the population structures of isolates recovered from persons with disease and carriage . For example, population studies of the gram-positive pathogen S . pneumoniae have demonstrated that isolates from carriers are more diverse than those from disease patients (20, 28, 34), and a recent study suggests that different clones (as defined by MLST) and serotypes show differing potential to cause invasive disease (2) . The carriage population structure of the gram-negative pathogen N . meningitidis has also been shown to be more diverse than samples associated with invasive disease (16), and there is some experimental evidence that different clones of this species may differ in their ability to cause invasive disease (35) . The explanation as to why the subpopulations of invasive and asymptomatically carried S . aureus appear to be identical by MLST in the present study is unclear but may in part reflect the fact that "invasive disease" encompasses a very wide range of disease symptoms caused by this species and the associated plethora of putative virulence determinants so far identified (17) . Furthermore, despite this finding we do not argue that all S. aureus isolates are equally virulent . The influx and loss of virulence determinants carried on mobile elements will play a large part in determining the virulence of an isolate . The movement of these genes may occur so rapidly that their presence or absence is only weakly linked to the relatively stable clonal background defined by MLST . A recent study by Peacock et al . (27), of the same bacterial strain collection on which the present study is based, noted that the presence or absence of seven putative virulence factors is significantly correlated with the epidemiological origin of the strain (i.e., from disease or asymptomatic carriage) . This study demonstrated that bacterial factors do contribute toward the ability of S . aureus to cause disease, whereas the MLST data indicate that these differences are generally not reflected in the "core" genome . Put another way, isolates of the same ST differ in their content of virulence genes and may therefore differ in their ability to cause disease . Unlike earlier studies of the impact of recombination (8), the study by Peacock et al . was carried out with reference to the corrected S . aureus MLST data set, and their conclusions are therefore not compromised by the original errors in the data . The MLST data also provide no evidence that strains responsible for nosocomial disease represent a distinct subpopulation from strains causing community-acquired disease or strains recovered from asymptomatic carriers . Although the acquisition of genes conferring drug resistance within certain clones confers a strong selective advantage in the hospital environment, MRSA and vancomycin-insensitive S . aureus clones being the most important examples, the MLST data reveal that these clones have evolved from genotypes which were already common in the population (5) . Evidence concerning recombination rates . (i) Clonal diversification. The analysis of diversification within clonal complexes suggests that alleles, and individual nucleotide sites, are at least 15-fold more likely to change by point mutation than by recombination . This estimate is supported by a similar analysis of 117 isolates recovered from Nottingham, United Kingdom, in which the same clonal complexes are present (13) . These data add a further 13 SLVs to the clonal complexes, 12 of which appear to have arisen by point mutation and 1 of which appears to have arisen by recombination (data not shown) . Extending the analysis to include all pairwise comparisons between different genotypes reveals a trend of increasing nucleotide divergence with increasing allelic divergence . This result is also consistent with a predominantly mutational mode of evolution . The power of this approach is limited by the number of loci used for MLST; in the S . aureus data set over 50% of all pairwise comparisons differ at all seven loci . This analysis reduces all of these comparisons, which will include the vast majority of comparisons between clonal complexes, to a single data point . The results of both of these analyses contrast strikingly with those obtained with MLST data from S . pneumoniae and N . meningitidis and highlight the fact that the initial stages of clonal diversification in S . aureus appear to be predominantly driven by point mutation, rather than recombination . (ii) Phylogenetic relationships between clonal lineages. Although there is limited evidence for recombination over the short term (within clonal complexes), or the short to medium term (as suggested by the relationship between allelic and nucleotide diversity), there is evidence that recombination has contributed to the evolution of the S . aureus population over the longer term . Many of the phylogenetic relationships between the closely related clonal complexes are poorly supported and inconsistent between individual gene trees (Fig . 4a), although this may in part reflect factors other than recombination, such as a paucity of phylogenetically informative sites . Despite these inconsistencies, there is also evidence for a conserved node dividing the subsample of 25 diverse STs approximately into halves . The significance of this observation within the context of the evolutionary history of this population is unclear; nevertheless, it serves well to illustrate the middle ground occupied by the significance of homologous recombination on the core genome of this species over the long term . On the one hand, and in contrast to S . pneumoniae and N . meningitidis, recombination has not been so frequent as to completely eliminate the intraspecies phylogenetic signal in this species . On the other hand, certain alleles at specific loci appear to have been horizontally transferred between the two major phylogenetic groups, and statistical tests of congruence between loci identified 45% of tree comparisons as being not significantly congruent . Interestingly, over half of these noncongruent comparisons involved the arcC locus, which encodes carbamate kinase . An inspection of the genome sequence in the vicinity of arcC reveals the presence of a putative virulence factor, clumping factor B (clfB), approximately 1 kb downstream of arcC, and two further putative virulence factors, aureolysin (aur) (29) and isaB, approximately 6 kb upstream of this locus . clfB is known to be associated with the cell wall (25, 26), and isaB is known to elicit an immune response (18); both of these genes probably encode proteins that are exposed to the host immune response and hence are likely to be subject to diversifying selection . Recombinational replacements within these genes, selected as they introduce genetic diversity, will frequently extend into flanking genes and may influence the sequence evolution of arcC . Although this explanation needs to be examined more closely, such a hitchhiking effect has also been noted within an MLST gene (ddl) of S . pneumoniae (6) . The analysis of congruence described here can thus be used to identify gene loci in which recombination has had a particularly high impact on the phylogenetic signal; this has two implications . Firstly, genes such as arcC, which appear to be behaving atypically, could subsequently be removed from the analysis in order to reconstruct the most meaningful phylogeny for a given group of strains . Alternatively, the approach may be employed to investigate differences between loci where there is some a priori reason that they may exhibit various degrees of congruence; thus, the likely effect of diversifying selection on genes encoding proteins exposed to the host immune response can be systematically examined . The short-term survival of nonsynonymous point mutations. The evidence discussed above suggests an inflation of mutation, relative to recombination, over very short-term evolution . The ratio of nonsynonymous to synonymous substitutions occurring within clonal complexes approaches parity, whereas for pairwise comparisons of diverse STs it is >8 (Table 4) . This suggests that de novo nonsynonymous mutations, though mostly deleterious, are rarely lethal, and most will survive long enough to be sampled . It is particularly striking that one of these point mutations has resulted in a stop codon in the glpF gene . However, the action of purifying selection will mean that few of these deleterious mutations will survive over the longer term . A precedent is set by the study of Nachman et al . (23), who compared the ratios of synonymous and nonsynonymous substitutions within the mitochondrial gene NADH dehydrogenase subunit 3 (ND3) of humans and chimpanzees . They found a higher ratio of nonsynonymous substitutions when comparing sequences within a species than when comparing sequences from different species . Their conclusion was that most of the intraspecies protein polymorphisms are slightly deleterious and are lost from the population before becoming fixed in the different species . For S . aureus, it is possible that most of the polymorphisms within clonal complexes are also slightly deleterious and will mostly have been eliminated in those rare adaptive genotypes that occasionally give rise to new clonal complexes . Concluding remarks. The results discussed above present a complex picture of the influence of recombination on the evolution and population structure of S . aureus . Firstly, the striking clonal structure of the population is, with the caveats outlined in the introduction, an indication that recombination has had negligible impact on the diversification of the core genome of this species . Such a view is consistent with an examination of intraclonal diversity, which suggests that the vast majority of clonal variants arise by point mutation, rather than recombination . Going further back in the tree, and hence considering longer time scales, phylogenetic approaches suggest that at least some recombination has occurred . This may be explained, at least in part, by purifying selection resulting in the extinction of many de novo point mutations over time . Finally, the atypical phylogenetic signal within arcC demonstrates that recombination has had more influence on some gene loci than others, despite the fact that MLST genes were chosen on the basis that they are likely to represent the stable "core" of the genome . Thus, the degree to which the perceived importance and stability of essential metabolic genes equate with their phylogenetic consistency remains an open question .
We are very grateful to Eddie Holmes and Laurence Hurst for useful discussions and critical comments on the manuscript and to Paul Wilkinson for technical assistance .
What Is Pcr?,
What Is Functional Genomics?,
What Is Anthrax?,
What Is Nitrification?,
What Is Yeast?,
s,
Microbe,
s,
Bacteria,
n,
Microorganism,
e,
Microorganisms,
i,
Bacterium,
e,
Enterococci,
c,
Antibiotic prophylaxis,
o,
Escherichia coli,
e,
Campylobacter,
o,
Microorganisms,
i,
Escherichia coli,
o,
Escherichia coli,
s,
Escherichia coli,
i,
Escherichia coli,
a,
Botulin,
s,
Salmonella typhimurium,
e,
Water treatment,
c,
Minimum inhibiting concentration,
a,
Yeasts,
o,
S. cerevisiae,
i,
Antibiotics,
c,
Escherichia coli,
c,
Escherichia coli,
n,
Campylobacter,
s,
MIC,
r,
Halophilic bacterium
|
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||