|
|
|
Scientific
Publications - Work Done by Microbiology Reader
Markus Covert, Eric Knight, Jennifer Reed, Markus Herrgard,
Bernhard Palsson, ABSTRACT The flood of high-throughput biological data has led to the expectation that computational (or in silico) models can be used to direct biological discovery, enabling biologists to reconcile heterogeneous data types, find inconsistencies and systematically generate hypotheses. Such a process is fundamentally iterative, where each iteration involves making model predictions, obtaining experimental data, reconciling the predicted outcomes with experimental ones, and using discrepancies to update the in silico model. Here we have reconstructed, on the basis of information derived from literature and databases, the first integrated genome-scale computational model of a transcriptional regulatory and metabolic network. The model accounts for 1,010 genes in Escherichia coli, including 104 regulatory genes whose products together with other stimuli regulate the expression of 479 of the 906 genes in the reconstructed metabolic network. This model is able not only to predict the outcomes of high-throughput growth phenotyping and gene expression experiments, but also to indicate knowledge gaps and identify previously unknown components and interactions in the regulatory and metabolic networks. We find that a systems biology approach that combines genome-scale experimentation and computation can systematically generate hypotheses on the basis of disparate data sources.
SUPPLEMENTARY DATA We have divided our supplementary material into three main sections: network model reconstruction, phenotype data comparison, and microarray data comparison. Each section refers to one or more Microsoft Excel workbooks that contain the actual data.
Network model reconstruction Corresponding worksheets: Supplementary Data 1-3
We based our regulatory model on a pre-existing model of metabolism in Escherichia coli (iJR904)1. There are some differences between the metabolic network used in this study and the network which was presented earlier; the changes are summarized in the table below.
The “Regulation List” worksheet contains a list of all the genes in the model. Each gene is listed by B number and gene name, and the regulatory rules for expression or activity (in the case of transcription factors) are included with references listed by their PubMed Ids (which can be entered into PubMed to obtain the full references). Chapters from the book “Escherichia coli and Salmonella: cellular and molecular biology” edited by F.C. Neidhardt, are indicated first by an NH (e.g. chapter 22 is listed as NH 22). The “Parameters” worksheet lists all the parameters used in the simulations, including time delays for transcription and translation, the biomass function, non-growth associated ATP maintenance flux, initial metabolite concentrations, and initial biomass concentrations. Concentrations highlighted in yellow on the sheet are the ones that vary across the different experiments. This sheet also includes the lower limits (which correspond to maximal uptake rates) and upper limits for exchange fluxes of extracellular metabolites. Exchange fluxes include those listed in iJR904 with the addition of an h2s exchange flux (see table above). Exchange fluxes are written in the direction that external metabolites are depleted from the system, so a negative flux value corresponds to that metabolite entering the system. Condition dependent changes to lower limits on the exchange fluxes were also taken into account when running simulations, where the lower limit of an exchange flux is temporarily set to zero if that metabolite is not present in the medium. The “Abbreviations” worksheet contains a list of the metabolite abbreviations, and their definitions, that are used in the previous worksheets. Those ending in “(e)” are external metabolites rather than intracellular metabolites. The metabolite list matches that reported in iJR904, with the exception of six additional extracellular metabolites: 5dglcn(e), btn(e), cbi(e), h2o2(e), ppa(e), and thym(e). These eight metabolites act as stimuli for the regulatory network.
Phenotype data comparison Corresponding worksheets: Supplementary Data 4-5 A more detailed version of Figure 2 is given in “Biolog-Model Comp”, where the predictions of both the regulated and unregulated models are compared with experimental data from the ASAP website (https://asap.ahabs.wisc.edu/annotation/php/logon.php). We considered plates PM1, PM2 and PM3 for this study, and only used data where the knockout strain and the environmental condition could be simulated by the model (e.g., no knockout strains of non-metabolic or associated regulatory genes). The data was compiled as shown and normalized by a cutoff parameter, which we took as 1.2 times the negative control value. If the normalized Biolog growth value was greater than the cutoff parameter, we assigned the condition a qualitative value of “growth”; otherwise, the condition was assigned “no growth”. As shown in “Sensitivity”, variation of this parameter may change some of our specific recommendations for model expansion/biological discovery, but does not affect the overall conclusions of our study. As mentioned in the main text, 18.3% of the cases were only predicted correctly when regulatory constraints were incorporated with the metabolic model. The following Table lists the carbon sources with significantly higher fractions of agreement between model prediction and experimental observations.
We describe the regulatory effects which lead to these phenotypes briefly here, beginning with the carbon sources. Growth on citrate as a carbon source depends on a transporter (encoded by citT) which is only expressed anaerobically. Part of the pathway for sucrose utilization involves xylose isomerase (encoded by xylA). Synthesis of this enzyme is induced by XylR, which is only active if xylose is present in sufficient concentration. 1,2-Propanediol utilization depends on an L-lactate dehydrogenase, whose expression depends on the presence of L-lactate. Several required genes in the pathway for butyric and tartaric acid utilization (atoB, atoD, atoE) are regulated by the activating protein AtoC, which is only active when stimulated by AtoS, which in turn is activated by acetoacetate. For the nitrogen sources, the presence of guanine downregulates a number of genes involved in pyrimidine and purine biosynthesis (purA, purB, pyrB, pyrF, prsA); prsA is also needed in histidine biosynthesis. Nitrate and nitrite environments require nitrate reductase for utilization, and the subunit for this enzyme encoded for by nirB is expressed only under anaerobic conditions, as mediated by Fnr. Similarly, allantoin utilization requires the allC gene product (allantoate amidohydrolase), which is downregulated under aerobic conditions. A more thorough treatment of the environments and knockouts which were not correctly predicted by the model is given as Appendix A, below. Finally, there were only 11 cases (out of 13,750) where the results of the comparison between model predictions and experimental observations shown in Figure 2b differed when calculated using iMC1010v2. For all 11 of these cases, iMC1010v1 predicted no growth and iMC1010v2 predicted growth, indicating that the changes are a result from the relaxation of regulatory rules. Most of the cases (8 out of 11) are now predicted more accurately with iMC1010v2.
Microarray data Corresponding worksheets: Supplementary Data 6-9 The raw data for the microarray study is given in the accompanying folder “Covert Microarray Raw Data”. The dChip processed data is in “dCHIP data”, and contains a list of all the replicates, coded by ec_aer_<strain>_<O if aerobic, nO if anaerobic>_<replicate>. Therefore, the first replicate of the ∆arcA-fnr- double knockout culture under aerobic conditions would be ec_aer_arcAfnr_O_a. Specifc information on culture growth rates, substrate uptake rates and byproduct secretion rates is given in “Culture Data”. More detail of Figure 3, together with the actual rules generated and some notes, are all shown in “New Rules”, and the quantitative real-time RT-PCR verification of certain of the shifts is shown in “RT-PCR”. The MIAME checklist located in http://www.mged.org/Workgroups/MIAME/miame_checklist.html is filled out in Appendix B.
Appendix A. Details of the environments/knockout strains whose growth phenotypes were incorrectly predicted by the model.
The following sections detail possible reasons for the discrepancies and model predictions for carbon and nitrogen sources as well as for knockout strains. Some of the high discrepancy growth conditions were retested on a Bioscreen C (Helsinki, Finland) with five replicates; Bioscreen measures growth rates by monitoring OD. M-9 minimal media with 0.2% carbon source was used to test K-12 MG1655 growth on different carbon sources; W-salts media (10.5 g of K2HPO4, 4.5 g of KH2PO4, and 0.241 ml of 1 M MgSO4 per liter) supplemented with 0.2% succinate and 0.2% nitrogen source was used to test wildtype growth on different nitrogen sources. Two controls were used: M-9 minimal media with no carbon source and 0.2% succinate W-salts media with no nitrogen source. Cells were precultured overnight in 0.2% succinate M-9 minimal media and transferred into the different media conditions Bioscreen was run over three days, and the relative growth rates (growth rate divided by the appropriate control growth rate) are reported in the tables below. MEME 2 and MAST 3, sequence alignment and comparison tools, were used as reported previously1 to identify putative genes for some of the enzymes that could resolve model and experiment discrepancies.
Carbon Sources
1 Fractional agreement tells what fraction of the 110 cases the regulatory model predicts the Biolog results. 2 Biolog results report for the 110 knockouts, how many grow and do not grow on the media 3 Relative growth rate is with respect to the control. NA indicates that this source was not tested.
Formic Acid (+/-/-), Glycine (+/+/-) and Acetoacetic Acid (-/+/+) The metabolic and regulatory models incorrectly predict growth phenotypes as measured on the Biolog plates with formate and acetoacetate as carbon sources, while only the regulatory model disagrees with experimental observations with glycine. According to the Biolog plates, E. coli grows with formate as a carbon source and does not grow with acetoacetate as a carbon source (growth on the latter carbon source has been observed 4. Mixed Biolog results are observed for growth on glycine. Wildtype K-12 was retested for growth on all three carbon sources using the Bioscreen; in all cases the results are in agreement with the regulatory model predictions and disagree with the Biolog results.
Thymidine (+/-/-) Both the regulated and unregulated models predict that thymidine can not be used as the sole carbon or nitrogen source. Thymidine can be converted to thymine by thymidine phosphorylase, this enzyme is already in the metabolic network. Older experimental studies have shown that thymine can be degraded by some strains of E. coli 5,6 and it has been proposed that E. coli B contains the reductive pathway involved in uracil and thymine degredation 6 (EC numbers 1.3.1.2 or 1.3.1.1, 3.5.2.2, 3.5.1.6). Sequence comparisons using MEME and MAST indicate that 1.3.1.2 might be encoded by b2106 and 3.5.2.2 might be encoded by b2873 or b0512. Identification of this pathway in E. coli K-12 MG1655 would explain the observed Biolog data. Incorporating the associated metabolic genes and knowledge on how they are regulated would increase the predictive ability of the model.
L-Glutamic Acid (-/+/+) The inability to grow on glutamate as the sole carbon source is believed to be due to a low transport capacity (NH 20). If measured a maximum rate for the uptake of glutamate can be used to further constrain the solutions predicted by the models.
g-Amino Butyric Acid, L-arginine, Ornithine and Putrescine (-/+/+) Both models predict growth on g-amino butyratate (GABA), arginine, ornithine and putrescine as a sole carbon source. This is in disagreement with the Biolog and Bioscreen data, which indicate that these substrates are not suitable carbon sources. The gab pathway, needed for the degradation of GABA and putrescine, is reported to be expressed at a low constitutive level that is not sufficient to support growth on GABA 7 (strain W3110 is able to utilize GABA as a carbon source 8. In addition to the gab pathway arginine and ornithine, can also be degraded by enzymes in the ast pathway, but this latter pathway is only expressed under nitrogen limitation. The gabDPTC operon is induced under nitrogen limitation allowing these compounds to be used as a nitrogen sources 8. Constraining the maximum allowable fluxes through the gab pathway or including regulation of these genes in the model would explain the lack of growth and increase the predictive capabilities of the models.
Nitrogen Sources (Succinate medium)
1 Fractional agreement tells what fraction of the 110 cases the regulatory model predicts the Biolog results. 2 Biolog results report for the 110 knockouts, how many grow and do not grow on the media 3 Relative growth rate is with respect to the control. NA indicates that this source was not tested.
Adenine,N-Acetyl-D-Mannosamine and Putrescine (-/+/+) These three nitrogen sources do not support growth according to the Biolog data, but are predicted to support growth by the regulated and unregulated models. It has been shown previously that E. coli can use adenine as a sole nitrogen source 8, suggesting that the Biolog results might be inaccurate. N-acetyl-D-mannosamine and putrescine were also tested as nitrogen sources using the Bioscreen— growth rates were significantly higher than the control indicating that the Biolog results are incorrectly measuring a lack of growth.
L-Lysine, L-Methionine, L-Phenylalanine and Xanthine (+/-/-) Both the Biolog data and Bioscreen data indicate that lysine, methionine, phenylalanine, and xanthine can be used as an alternate nitrogen sources. Neither the regulated or unregulated model predicts growth with these substrates as nitrogen sources, indicating that the metabolic enzymes, which allow incorporation of nitrogen from these substrates, are missing from the metabolic network. For the case of lysine, we could not find any data on how nitrogen is removed from lysine. Proposed pathways for methionine, phenylalanine, and xanthine utilization are summarized below. Methionine aminotransferase activity has been observed in E. coli B, where methionine and a-ketoglutarate are converted to 2-oxo-4-methylthiobutyric acid and glutamate 9, 2-oxo-4-methylthiobutyric acid is then converted into ethylene 10. The pathway and associated genes have not been found in K-12 and so have not yet been included in the models. Including the phenylpyruvate decarboxylase reaction, which converts phenylpyruvate to phenylacetate (EC 4.1.1.43), as well as the complete phenylacetate degradation pathway (which has not yet been fully characterized) would enable the model to use phenylalanine as a nitrogen source. A xanthine dehydrogenase activity has been assigned to the xdhA gene product, where xanthine would be converted to uric acid and then presumably to allantoin 8. Allantoin can not be used as a nitrogen source under aerobic conditions, so how nitrogen is removed from the base remains unclear biochemically.
Alanine-Leucine (+/+/-) Leucine represses the synthesis of biosynthetic enzymes for isoleucine and valine, which is why the model predicts that E. coli won’t grow on with leucine or alanine+leucine as the sole nitrogen source. Experimentally growth with just leucine as the nitrogen source does not permit growth, but growth with both alanine and leucine allows for growth. A lower concentration of leucine might allow for growth with alanine if the repression of the isoleucine and valine biosynthetic enzymes is relaxed.
Guanosine (-/+/+) Biolog data indicates growth with guanosine in 64 knockouts and no growth with 46 knockouts. Performing more replicates of the Biolog data and possibly testing the knockout strains on the Bioscreen would provide more information as to whether the model or the Biolog data is more accurate.
Knockout strains
All of the major failure modes between model predictions of knockouts and Biolog data, are the case where the regulated and unregulated models predict the knockout to be lethal but the experimental data seems to suggest that they are not lethal. Most of these discrepancies involve knockouts which prevent the production of a biomass component.
glgA-, glgC- (+/-/-) These two genes are involved in the synthesis of glycogen. Three different hypotheseis can be made from the model and data discrepancies: (1) glycogen is not an essential biomass component, (2) glycogen phoshporylase is reversible, or (3) there is a new redundant pathway for glycogen synthesis. If incorporated into the model any of these possibilities could resolve the model and data disagreements.
argB-, argC-, argD-, argE-, argG- (+/-/-) The following genes involved in arginine biosynthesis: argB, argC, argD, argE, and argG, are all lethal deletions according to the model but not in the Biolog data. For argB, argC, argD, and argE the growth phenotype can be explained by making a few reactions reversible in the model (ABUTD, PTRCTA, and ORNDC); the backwards reactions allow for a new route converting glutamate into ornithine (and then arginine). No information could be found regarding the reversibility of these enzymes. For argG there must be another isozyme.
purD-, purH-, metA (+/-/-) The genes, purD and purH are responsible for the enzymes needed in the early and late steps of purine biosynthesis. One of the early reactions of methionine biosynthesis is carried out by the metA gene product. E. coli will obviously need to still make purines and methionine, so isozymes or alternate synthesis route |