|
|
|
Scientific
Publications - Work Done by Microbiology Reader
Lecture notes, Spring 2003: BE 203 - Bioinformatics III (Systems Biology), BE 160C/203 Topic I.7, April 23rd, 2002, University of California, San Diego, Department of Bioengineering, Genetic Circuits Research Group Data for reconstruction: ProteomicsSteve Fong / Bernhard Palsson INTRODUCTION This topic will discuss mainly experimental methods for proteomics. Most of the methods have been developed in the last few years so that as a field proteomics is still in its infancy. However, there is great excitement about the prospects of proteomics research for understanding cellular function and reconstruction of genetic circuits. The hope is that just like genomics is today in a few years time we will be routinely be able to use data from proteomics technologies to reconstruct genetic circuits.
OVERVIEW The proteomics part will be heavily focused on experimental methods for the simple reason that there is very little material available on proteomics data analysis and even less on reconstruction using this data.
1. Proteomics tasks: 2. Proteomics methods: 3. Proteomics data analysis: a. Protein interaction network analysis
PROTEOMICS OVERVIEW Proteomics is a vastly larger field in terms of different methods and applications than either genomics or transcriptomics. For this reason there are multiple ways in which one could organize an overview on proteomics. Because many of the experimental methods can be used for multiple tasks, this lecture is organized by the different experimental methods and the specific applications and data analysis techniques are described under each method.
What is proteomics? Proteome: Expressed protein complement of a genome
WHAT IS PROTEOMICS? Proteomics is not a very well defined topic it could be described as a largescale study of protein structure, expression, and function (including modifications and interactions). Proteomics is currently an extremely hot topic and people from many different fields biology, bioengineering, bioinformatics, computer science, chemistry - are actively developing ways to characterize the proteome. In this lecture we will ignore a major part of proteomics large-scale structural characterization of proteins. Although knowing the structures of all proteins would certainly help in annotating their functions, this type of data is only indirectly connected to reconstruction of genetic circuits.
Challenges in proteomics The number of different proteins is much larger than the
number of genes or mRNAs:
CHALLENGES IN PROTEOMICS Proteins are much more difficult to characterize and work with than nucleic acids. This is basically the reason for the predominance of molecular biology over biochemistry in biological research over the last 30 years or so. This slide lists just some of the major reasons why characterizing the proteome is expected to be much more difficult than characterizing the genome or transcriptome. The flip side is that the proteome is vastly more interesting for a lot of practical applications than any of the other omes. For example most drugs act on proteins, not nucleic acids, and enzymes, not nucleic acids, are used as catalysts in bioprocessing applications.
Some proteomics tasks 1. Protein interaction mapping
TYPICAL PROTEOMICS TASKS There are many possible tasks in proteomics out of which only fo ur fairly well-developed ones are listed here. For most of these tasks there are more than one method and all the methods have their own advantages and disadvantages.
Yeast two-hybrid (Y2H) method Standard genetic screen for physical protein-protein
interactions
YEAST TWO-HYBRID METHOD Figure caption: Principle of two-hybrid library and array screens. (a) Typical two-hybrid screens use a library of random DNA or cDNA fused to a transcriptional activation domain (AD), expressed in yeast ('preys'; circles denote plasmids). The library clones are mated to a strain of opposite mating type that expresses a protein of interest ('bait', B) as a fusio n to a DNA-binding domain (DBD). If bait and prey interact in the resulting diploid cells, they reconstitute a transcription factor, which activates a reporter gene whose expression allows the diploid cell to grow on selective media (here, without histidine). As an alternative to mating, prey libraries can also be transformed into the bait strain in order to express bait and prey in the same cell. In any case, positive clones have to be picked, their DNA isolated and the encoded plasmids sequenced in order to identify interacting proteins. (b) Array screens use defined sets of cloned prey ORFs or fragments thereof that are mated systematically to a certain bait strain. Matings and two-hybrid tests can be automated when large sets of preys have to be assayed, as in the case of whole genomes. -- Curr Opin Chem Biol 2002 Feb;6(1):57-62
Applications of Y2H screens Yeast: Two genome-wide Y2H screens have been performed
resulting in >6000 interactions between >4500 proteins (Uetz et al. Nature
403:623 (2000) and Ito et al. PNAS 98:4569 (2001))
APPLICATIONS OF Y2H-TYPE SCREENS There have been three major Y2H screen projects: Two in yeast and one in H. pylori. In addition smaller Y2H screens have been performed for C. elegans and D. melanogaster. Mammalian two- hybrid screens have also been developed to avoid problems associated with doing interaction screens in yeast for non-yeast proteins. -- Figure caption: Mammalian two-hybrid system used by Suzuki et al. About 3500 mouse cDNAs were amplified by PCR and these PCR products (ORF X and ORF Y) mixed with another PCR product that carried a cytomegalovirus (CMV) promoter and either a Gal4 DNA-binding domain (DBD) or a VP16 transcriptional activation domain (AD). Because the two PCR products have overlapping sequences, they can be fused into one DNA fragment by a secondary PCR reaction using primers at the ends of the individual fragments. The final PCR fragments were transfected into tissue culture cells (CHO-K1) together with a reporter plasmid that carried a luciferase gene (Luc). When the encoded proteins interact, the luciferase reporter gene is transcribed and its activity can be measured as fluorescence. All 3500 Χ 3500 protein combinations were tested. To speed up the screening procedure, various numbers of baits and preys were co-transfected (i.e. pooled), and positive signals were later deconvoluted to identify interacting proteins.
ANALYSIS OF INTERACTION DATA Nature 2001 May 3;411(6833):41-2 Lethality and centrality in protein networks. Jeong H, Mason SP, Barabasi AL, Oltvai ZN.
Figure caption: Characteristics of the yeast proteome. a, Map of protein protein interactions. The largest cluster, which contains 78% of all proteins, is shown. The colour of a node signifies the phenotypic effect of removing the corresponding protein (red, lethal; green, non- lethal; orange, slow growth; yellow, unknown). b, Connectivity distribution p(k) of interacting yeast proteins, giving the probability that a given protein interacts with k other proteins. The exponential cut-off6 indicates that the number of proteins with more than 20 interactions is slightly less than expected for pure scale-free networks. In the absence of data on the link directions, all interactions have been considered as bidirectional. The parameter controlling the short-length scale correction has value k0 1. c, The fraction of essential proteins with exactly k links versus their connectivity, k, in the yeast proteome. The list of 1,572 mutants with known phenotypic profile was obtained from the Proteome database13. Detailed statistical analysis, including r = 0.75 for Pearson's linear correlation coefficient, demonstrates a positive correlation between lethality and connectivity. For additional details, see http://www.nd.edu/~networks/cell.
Integration with other data Y2H with phage display-derived binding motif analysis to
reduce false positives (Tong AHY et al. Science 295:321 (2002)) INTEGRATION OF INTERACTION DATA WITH OTHER DATA TYPES Science 2002 Jan 11;295(5553):321-4 Bioinformatics 2001 May;17(5):455-60 Pac Symp Biocomput 2002;:413-24 Proteins 2002 May 1;47(2):219-27 Genome Res 2002 Jan;12(1):37-46 Nat Genet 2001 Dec;29(4):482-6
MASS SPECTROMETRY Figure caption: The mass spectrometry approach. (a) A single-stage mass spectrometer. The instrument consists of three components: an io nization source, mass analyzer and ion detector. The mass analyzer that is shown is a time-of-flight (TOF) mass spectrometer. Mass-to-charge ratio (m/z) values are determined by measuring the time it takes ions to move from the ion source to the detector. The time that is required to move this distance can be directly correlated with the m/z value. A mass spectrum of a protein dige st is shown to the right of the figure. (b) The components of one type of tandem mass spectrometer. The instrument consists of an ion source, first mass analyzer, gas-phase collision cell, second mass analyzer and ion detector. The first mass analyzer can be used to isolate a particular m/z value for dissociation in the collision cell. The dissociation products are then analyzed in the second mass analyzer. A tandem mass spectrum for a peptide produces a ladder of fragment ions that represent amide bond cleavage. A peptide spectrum is shown to the right of the mass spectrometer.
Types of mass specs Number of stages:
TYPES OF MASS SPECTROMETERS Mass spectrometry was originally developed as a method to characterize small (organic) molecules in analytical chemistry. In principle all the techniques developed in this application can be modified to work on proteins (which are just large organic molecules). However, in most cases this requires a lot of technology development to allow high reliability and throughput. This slide lists just some of the different types of techniques used in protein mass spectrometry.
PROTEIN IDENTIFICATION USING MASS SPEC Figure caption: Mass spectrometric identification of a protein. A protein is digested using a highly specific protease (typically trypsin) (a) and the derived peptides are analysed in a mass spectrometer (b). One peptide species is selected for collision with an inert gas, such as argon, in the mass spectrometer. The derived fragments of this peptide are measured to give the product mass spectrum (c). Some of these fragments differ in the ir mass by individual amino acids ― leucine and isoleucine having identical masses. Part of the sequence can therefore be read out from a series of peaks in the spectrum. This sequence information is placed in the peptide by the mass of the fragments, and is used in the `peptide sequence tag' in conj unction with the mass of the peptide and the specificity of the protease (tryptic digest results in K or R at the C terminus of the peptide) to search for a match in the database. A single peptide sequence tag is usually sufficient to unambiguously link a database entry with the investigated protein (d). Alternatively, fragment masses are measured and automatically compared with the predicted fragment masses of all peptides derived from a database to find the best match. In any case, the confidence increases with the number of fragmented peptides matching the same entry. Typically 2―70% of the sequence of the entry is covered by the experimental data, depending on the sample amount. Abbreviations: m/z, mass: charge ratio; S. cerevisiae, Saccharomyces cerevisiae.
Mass spec data analysis
ANALYSIS OF MASS SPEC SPECTRA Figure caption: Protein identification using peptide mapping information. (a) In the experiment, the proteins are digested with an enzyme and the masses of the proteolytic peptides are measured with mass spectrometry. (b) In the database search, each protein sequence in the database is digested according to the specificity of the enzyme. The masses of the resulting peptides are calculated and a theoretical mass spectrum is constructed. The measured mass spectrum is compared with the theoretical mass spectrum. Genome Res 2001 Feb;11(2):290-9
Mass spectrometry applications MASS SPEC APPLICATIONS IDENTIFYING PROTEIN COMPLEXES Figure caption: Analysing protein interactions. In the 'co-precipitation/mass spectrometry' approach used by Gavin et al.1 and Ho et al.2, an 'affinity tag' is first attached to a target protein (the 'bait'; a). b, Bait proteins are systematically precipitated, along with any associated proteins, on an 'affinity column'. c, Purified protein complexes are resolved by one-dimensional SDS PAGE, a technique that involves running an electric charge through the complexes on a gel, so that proteins become separated according to mass. d, Proteins are excised from the gel, digested with the enzyme trypsin, and analysed by mass spectrometry. Database- search algorithms (bioinformatics) are then used to identify specific proteins from their mass spectra.
Traditional 2DGE based profiling
PROTEIN EXPRESSION PROFILING USING 2DGE For a review on 2DGE in general see Lilley KS et al. Curr Opin Chem Biol 6:46 (2001)
Genetic analysis of the mouse brain proteome. Klose J, Nock C, Herrmann M, Stuhler K, Marcus K, Bluggel M, Krause E, Schalkwyk LC, Rastan S, Brown SD, Bussow K, Himmelbauer H, Lehrach H. Abstract: Proteome analysis is a fundamental step in systematic functional genomics. Here we have resolved 8,767 proteins from the mouse brain proteome by large-gel two -dimensional electrophoresis. We detected 1,324 polymorphic proteins from the European collaborative interspecific backcross. Of these, we mapped 665 proteins genetically and identified 466 proteins by mass spectrometry. Qualitatively polymorphic proteins, to 96%, reflect changes in conformation and/or mass. Quantitatively polymorphic proteins show a high frequency (73%) of allele -specific transmission in codominant heterozygotes. Variations in protein isoforms and protein quantity often mapped to chromosomal positions different from that of the structural gene, indicating that single proteins may act as polygenic traits. Genetic analysis of proteomes may detect the types of polymorphism that are most relevant in disease-association studies. Figure caption: 2-DE brain protein pattern from a B6-SPR hybrid. From the whole 2-DE pattern, consisting of an acid half and a basic half (see Methods), only the acid half is shown. The three protein spot families show that spot families can be recognized in 2-DE patterns on the basis of genetic variation. The spot family of heat-shock 70-kD protein 4 (HSP70, red) consists of 52 'isospots', which occur in 52 double spots (hybrid spots). The two allelic forms vary in the vertical direction in all 52 double spots, with a spacing of 0.5 mm and the B6 spot on top. The -enolase 2 family (blue) shows 24 double spots, which vary again in the vertical direction with the B6 spot on top, but with a spacing of 1 mm. The lactate dehydrogenase 2 B chain spot family (LDH2, yellow) includes 17 double spots that form in each case horizontal spot pairs spaced at 25 mm, with the B6 spots always on the left side. This family also includes 10 paV spots, which are interpreted as a difference in degradation rate between the two allelic forms 11. Four spots were present on the basic half of the 2-DE pattern (data not shown). Nat Genet 2002 Apr;30(4):385-95.
Modern profiling techniques
MODERN PROTEIN EXPRESSION PROFILING TECHNIQUES Figure captions: MudPIT: Multidimensional chromatography for the separation of complex peptide mixtures. The peptides are first separated and fractionated by electrostatic charge using strong cation exchange high-performance liquid chromatography (SCX HPLC). Increasing salt concentrations are applied to the column, either in a stepwise manner or as a continuous gradient to the mobile phase. Each of the discrete peptide fractions is then further separated by hydrophobicity using RP- ΅LC with a gradient of increasing organic solvent. The eluting peptides are analyzed by ESI MS/MS. ICAT: Quantitative analysis of protein abundance using stable- isotope dilution. Mixtures of proteins (e.g. those extracted from cells in two different biological states) are labeled separately such that the proteins exist in either an isotopically 'light' (L) or 'heavy' (H) form. This can be achieved by metabolic labeling during cell growth on stable- isotope-enriched or depleted media [24] or through chemical reaction with an externally introduced isotopically coded reagent [22,31,32] . The labeled protein mixtures are then combined, proteolyzed and the peptides detected by MS. Alternatively, separate protein mixtures can be first proteolyzed, the peptides isotopically labeled, combined and detected by MS [25 27] . The relative abundances of labeled peptides, and hence the relative abundances of the proteins from which they were derived, are determined by comparison of the signal intensities between the light and heavy forms of the peptides. The signal intensities are simultaneously detected at discrete mass-to-charge (m/z) values in the mass spectrometer.
Profiling applications
PROTEIN EXPRESSION PROFILING APPLICATIONS Nat Biotechnol 2001 Mar;19(3):242-7 Science 2001 May 4;292(5518):929-34
Figure captions: Sensitivity of MudPIT to a wide variety of protein classes The percentage of proteins identified in this study from a variety of protein classes is presented. The percentages were determined by dividing the number of proteins identified in the study in each category shown by the total number of predicted proteins from each category shown. MIPS (ref. 24) and the Yeast Proteome Database33 were used to obtain the predicted numbers of proteins from S. cerevisiae in each class. From left to right are the percentages identified of total proteins, proteins with a CAI <0.2, proteins with a pI < 4.3, proteins with a pI >11, proteins with a MW <10kDa, proteins with a MW >180 kDa, integral membrane proteins (IMPs) with three or more predicted transmembrane domains, and peripheral membrane proteins (PMPs). Comparison of protein vs mRNA expression ratios Scatter plot of protein expression versus mRNA expression ratios. Ratios of wt+gal to wt gal protein expression, measured for each of 289 genes using the ICAT technique, are plotted against the corresponding mRNA expression ratios measured by microarray. Many genes with elevated mRNA or protein expression in wt+gal were metabolic ( ) or ribosomal ( ), whereas genes involved in respiration ( ) almost always had reduced expression levels. Names of genes that were indistinguis hable in both mRNA and protein (due to high sequence similarity) are separated by a slash.
Phosphoprotein profiling
PHOSPHOPROTEIN PROFILING Nat Biotechnol 2002 Mar;20(3):301 Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM Abstract: Protein kinases are coded by more than 2,000 genes and thus constitute the largest single enzyme family in the human genome. Most cellular processes are in fact regulated by the reversible phosphorylation of proteins on serine, threonine, and tyrosine residues. At least 30% of all proteins are thought to contain covalently bound phosphate. Despite the importance and widespread occurrence of this modification, identification of sites of protein phosphorylation is still a challenge, even when performed on highly purified protein. Reported here is methodology that should make it possible to characterize most, if not all, phosphoproteins from a whole-cell lysate in a single experiment. Proteins are digested with trypsin and the resulting peptides are then converted to methyl esters, enriched for phosphopeptides by immobilized metalaffinity chromatography (IMAC), and analyzed by nanoflow HPLC/electrospray ionization mass spectrometry. More than 1,000 phosphopeptides were detected when the methodology was applied to the analysis of a whole-cell lysate from Saccharomyces cerevisiae. A total of 216 peptide sequences defining 383 sites of phosphorylation were determined. Of these, 60 were singly phosphorylated, 145 doubly phosphorylated, and 11 triply phosphorylated. Comparison with the literature revealed that 18 of these sites were previously identified, including the doubly phosphorylated motif pTXpYderived from the activation loop of two mitogen-activated protein (MAP) kinases. We note that the methodology can easily be extended to display and quantify differential expression of phosphoproteins in two different cell systems, and therefore demonstrates an approach for "phosphoprofiling" as a measure of cellular states. Figure caption: Enrichment of phosphopeptides from complex mixtures by chemical introduction of an affinity tag. Blue wavy lines represent proteins or peptides in a mixture. The wavy line with 'P' designates a phosphoprotein or phosphopeptide. The affinity tag (in this case, biotin) allows selective isolation of the species of interest.
Protein chips
PROTEIN CHIPS Figure caption: Classes of capture molecules for protein microarrays. For specific interaction analysis, different classes of molecules can be immobilized on a planar surface to act as capture molecules in a microarray assay. (a) Illustrates antigenantibody interaction and (b) shows a scheme of a Sandwich immunoassay. In (c), a specific proteinprotein interaction is shown. A different class of binders is shown in (d), where synthetic molecules referred to as aptamers act as capture molecules. They can be composed of nucleotides, ribonucleotides or peptides. Interactions of enzymes with their specific substrates are shown in (e), where a substrate (s) for kinases is immobilized and phosphorylated (P) by the respective kinase. A typical example for a receptorligand interaction is given in (f), where synthetic low molecular mass compounds are immobilised as capture molecules.
Protein expression profiling
PROTEIN CHIP APPLICATIONS: PROTEIN EXPRESSION PROFILING Genome Biol 2001;2(2):RESEARCH0004 Protein microarrays for highly parallel detection and
quantitation of specific proteins and antibodies in complex solutions. Abstract: Figure caption: Antibody array detection of labeled antigens. 114 different antibodies were spotted onto poly-L-lysine coated slides 6-12 times each at a 375 m spacing. Six protein mixes were labeled and detected according to the Materials and methods section. The inset in each panel highlights anti-Flag and anti-IgG spots, and the labels indicate the concentration of the antigen applied to each array. The images were normalized (see the Materials and methods section) and contrast adjusted to better show bright features.
Protein interaction screening
APPLICATIONS OF PROTEIN CHIPS: PROTEIN INTERACTION SCRENING Science 2001 Sep 14;293(5537):2101-5 Abstract: To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid- interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications. Figure caption: GST::yeast proteins were purified in a 96-well format. (A) Sixty samples were examined by immunoblot analysis using anti-GST; 19 representative examples are shown. Greater than 80% of the preparations produce high yields of fusion protein. (B) 6566 protein samples representing 5800 unique proteins were spotted in duplicate on a single nickel-coated microscope slide. The slide was probed with anti-GST (10). (C) An enlarged image of one of the 48 blocks is depicted to the right of the proteome chip.
Protein activity profiling
PROTEIN ACTIVITY PROFILING Nat Genet 2000 Nov;26(3):283-9 Abstract: We have developed a novel protein chip technology that allows the high-throughput analysis of biochemical activities, and used this approach to analyse nearly all of the protein kinases from Saccharomyces cerevisiae. Protein chips are disposable arrays of microwells in silicone elastomersheets placed on top of microscope slides. The high density and small size of the wells allows for high-throughput batch processing and simultaneous analysis of many individual samples. Only small amounts of protein are required. Of 122 known and predicted yeast protein kinases, 119 were overexpressed and analysed using 17 different substrates and protein chips. We found many novel activities and that a large number of protein kinases are capable of phosphorylating tyrosine. The tyrosine phosphorylating enzymes often share common amino acid residues that lie near the catalytic region. Thus, our study identified a number of novel features of protein kinases and demonstrates that protein chip technology is useful for high-throughput screening of protein biochemical activity. Figure caption: Protein chip fabrication and kinase assays. a, Kinase activities were detected using protein chips. PDMS was poured over the acrylic mold. After curing, the chip containing the wells was peeled away and mounted on a glass slide. The next step included modification of the surface and then attachment of proteins to the wells. Wells were blocked with 1% BSA before kinase, 33P -ATP and buffer were added. After incubation for 30 min at 30 °C, the chips were washed extensively and exposed to both X-ray film and a phosphoimager, which has a resolution of 50 m and is quantitative. For 12 substrates each kinase assay was repeated at least twice; for the remaining 5 the assays were performed once. b , An enlarged picture of the protein chip.
Conclusions Proteomics refers to large-scale studies of the protein
complement of a genome using a variety of experimental techniques
CONCLUSIONS Proteomics is an exiting and complex field that holds great promise for biological, medical, and bioengineering applications. Right now proteomics methods have been developed to the point where their power can be demonstrated in large-scale studies of interesting biological systems. However, we are still a couple of years away from being able to use these technologies routinely the way genomics or transcriptomics methods are used. On the computational side, due to lack of published data only baby steps have been taken in proteomics data analysis. Also, the utilization of proteomics data in genetic circuit reconstruction appears to be a few years away, but eventually this type of data will be useful for reconstruction of at least systems where modifications of protein states play a central role such as signal transduction.
References
Back to Automation in Microbiology main page
|
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||