|
|
|
Journal of Bacteriology, March 2004, p . 1311-1319, Vol . 186, No . 5
Identification and Mapping of Self-Assembling Protein Domains Encoded by the
Escherichia coli K-12 Genome by Use of
|
| ABSTRACT |
|---|
Self-assembling proteins and protein fragments encoded by the
Escherichia coli genome were identified from E . coli K-12 strain
MG1655 . Libraries of random DNA fragments cloned into a series
of
repressor fusion vectors were subjected to selection forimmunity to
infection by phage
.
Survivors were identified bysequencing the ends of the inserts, and
the fused protein sequencewas inferred from the known genomic
sequence . Four hundred sixty-threenonredundant open reading
frame-encoded interacting sequencetags [ISTs] were recovered from
sequencing 2,089 candidates.These ISTs, which range from 16 to 794
amino acids in length,were clustered into families of overlapping
fragments, identifyingpotential homotypic interactions encoded by
232 E . coli genes.Repressor fusions identified ISTs from
genes in every protein-basedfunctional category, but membrane
proteins were underrepresented.The IST-containing genes were
enriched for regulatory proteinsand for proteins that form
higher-order oligomers . Forty-eight[20.7%] homotypic proteins
identified by ISTs are predictedto contain coiled coils . Although
most of the IST-containinggenes are identifiably related to proteins
in other bacterialgenomes, more than half of the ISTs do not have
identifiablehomologs in the Protein Data Bank, suggesting that they
mayinclude many novel structures . The data are available online
at http://oligomers.tamu.edu/.
| INTRODUCTION |
|---|
For many proteins, quaternary structure is intimately coupledto
function and stability . This coupling allows the regulationof many
cellular processes to be controlled through specificassembly or
disassembly of protein complexes as well as by conformationalchanges
that alter how subunits contact one another.
Proteins use a wide variety of quaternary structures to assemble multisubunit complexes . Genome-wide identification of protein interactions by use of genetic [21, 36, 46, 57, 58] or biochemical[15, 19, 41] screens has provided a wealth of insight into the diversity of structures used for self-assembly . In the annotation of predicted open reading frames [ORFs], assembly interactions are an important feature that provides insights into structureand function . In addition, the involvement of a gene productin a multimeric complex suggests strategies for the generationof assembly-based inhibitors for functional studies [18] . The possibility that protein interactions represent a large and largely underexploited target for drug discovery has also been discussed [6, 55]
The study of the protein interactome has focused on heterotypic interactions, as these can provide links between proteins ofunknown function and proteins of known function . However, homotypic interactions, which are found in both homomultimeric proteinsand as subcomplexes of heteromultimeric proteins, may be themost common way to form protein complexes in nature [31] . Although by definition self-interaction does not link a protein's functionto that of another protein, homotypic interactions are importantin the study of protein structure, function, and evolution andshould be just as useful as heterotypic interactions as potentialtargets for disruption in functional studies or drug development.
Homotypic interactions are poorly recovered by both two-hybridand
biochemical interaction screens . It has been shown thata modified
version of a one-hybrid system based on fusion proteinsto
bacteriophage
repressor can be used to identify homomultimerizationdomains from
the Saccharomyces cerevisiae genome [36] . Our general
strategy is to sample genomes for self-assembling domains from
libraries of genomic DNA fragments cloned downstream of the
repressor DNA-binding domain . Clones that confer immunity to
infection by phage
identify self-assembling proteins and proteinfragments . Here, we
describe a more extensive study to identifyand partially localize
homotypic interaction domains encodedby the Escherichia coli
genome.
| MATERIALS AND METHODS |
|---|
Strains, plasmids, and media. The strains used in this study
are derivatives of AG1688 [F'128lacIq lacZ::Tn5/araD139
[ara-leu]7697
[lac]X74
galE15 galK16rpsL[Strr] hsdR2 mcrA mcrB1] [20] .
The repressor fusion librarieswere transformed into JH787 [AG1688 [
80
Su-3]] . The screen forinsert dependence was done with
LM58 [JH787 [
LM58]]
and LM59[AG1688 [
LM58]].
LM58
is a
imm21
specialized transducing phagethat carries a pL-cat
reporter . Repressor fusion libraries wereconstructed in pLM99
[GenBank accession no.
AF308739], pLM100[GenBank accession no.
AF308740], and pLM101 [GenBank accessionno.
AF308741] [35-37] . These vectors
contain an amber mutationat codon 103 of the cI segment, between the
DNA-binding domainand the DNA insert, which is used for screening
for insert dependence[see below] . Expression of the fusion proteins
is driven bythe P7107 promoter [59], a
weak constitutive promoter derivedfrom an operatorless PlacUV5 .
The P7107 promoter contains multiplemutations relative to
PlacUV5 and has the sequences TTTATG andTACATT,
respectively, at the -35 and -10 hexamers . While wedo not know
precisely how strong this promoter is, expressionis below the basal
levels observed in lacIq1 strains from multicopy
expression vectors with the lacUV5 promoter lacking the lacO2
and lacO3 operators.
Luria broth [LB] and LB agar were prepared from premixed powders [Difco] . 2XYT broth was prepared as described by Miller [38].
Repressor fusion library construction. E . coli K-12 MG1655 [kindly provided by Debby Siegele] was usedto prepare genomic DNA . Fifteen high-complexity libraries weregenerated by using either a multienzyme approach [22] or physicalDNA shearing [43] to generate inserts used for the repressor fusion libraries . Enzymes were purchased from New England Biolabs [Beverly, Mass.] unless indicated otherwise.
For the multienzyme approach, a combination of restriction enzymes was used to partially digest E . coli genomic DNA and generate ends that are compatible with the cloning sites present in pLM99, pLM100, and pLM101 . Ten micrograms of genomic DNA was partially digested for 1 h at the temperature recommended by the manufacturer. Equal amounts of separate CviTI [Megabase Research, Lincoln, Neb.] [8 U], BstUI [5 U], RsaI [2.5 U], and HpyCH4V [2.5 U] partial digests were pooled and cloned into the SmaI site of pLM99, pLM100, and pLM101 to generate three libraries [EB099, EB100, and EB101] in different reading frames . For the ET099,ET100, and ET101 libraries, partial digests of E . coli DNA with TaqI [0.2 U] were ligated into repressor fusion vectors digested with BstBI . The EN099, EN100, and EN101 libraries were generated from partially NlaIII [2.5 U]-digested E . coli DNA ligated intorepressor fusion vectors digested with SphI . The ES099, ES100,and ES101 libraries were generated by using partially Sau3AI [2.5 U]-digested E . coli DNA ligated into repressor fusion vectors digested with BglII . In all cases, the digested vector DNA was treated with calf intestinal alkaline phosphatase [Roche] prior to use in ligations.
For mechanical shearing, the DNA was fragmented with a HydroShear apparatus [GeneMachines, San Carlos, Calif.] according to the manufacturer's instructions . Five micrograms of genomic DNAwas subjected to 20 cycles at speed code 5 . The average sizeof the resultant DNA fragments was about 2 kb . After shearing,the ends were converted to blunt ends by adding 4.5 U of T4DNA polymerase and 21 U of Klenow fragment supplemented with250 µM deoxynucleoside triphosphates in 1x EcoPol buffer [10 mM Tris-HCl [pH 7.5], 5 mM MgCl2, 7.5 mM dithiothreitol], the reaction mixture was incubated for 40 min at 25°C . The blunt-ended DNA fragments were repurified with a Qiagen QIAquickPCR purification kit . The EH099, EH100, and EH101 librarieswere generated by cloning sheared and DNA polymerase-treatedDNA into the three repressor fusion vectors digested with SmaIand treated with alkaline phosphatase as described above.
The complexity of the libraries was estimated from the numberof transformants obtained in the absence of phage selection.We estimate that each library contains on the order of 106 independent inserts . To estimate the fraction of the clones that contained inserts, primers flanking the multiple cloning site [cI-up, 5'-AGTATGCAGCCGTCACTTAG-3', and LM3-R, 5'-GGGGTTATGCTAGTTATTGC-3'] were used to PCR amplify 60 randomly chosen clones from eachlibrary . We estimate that 95% of the clones contained a genomicinsert from each of the libraries, with a typical insert sizeof 1,000 ± 500 bp . Amplification reactions were doneby PCR with Taq DNA polymerase [Promega].
Selection and screening procedure. Detailed procedures for
selection and screening have been describedpreviously [37] .
Briefly,
107
JH787 transformants containingunamplified fusion libraries were
plated on LB-ampicillin-kanamycinplates seeded with 108
PFU of
KH54
and
KH54h80/plate .
The KH54mutation is a deletion of cI, which prevents the selection
phagefrom forming lysogens, which would be immune to
.
KH54
and
KH54h80
use different receptors to infect E . coli; using both phages
simultaneously reduces the background of receptor mutants that
would be seen with only one of the two phages . The plates were
incubated at 37°C overnight, and survivors were picked into96-well
plates for further analysis . We performed insert dependencetests as
described previously [35-37] for the
EB099, EB100,and EB101 libraries and found that all of the survivors
wereinsert dependent, as judged by the dependence of repressor
functionon an amber suppressor that allows translation of the
insert.Therefore, the other libraries were not evaluated for insert
dependence.
Identification of interacting fragments. Cultures of isolated colonies were grown overnight in 1.5 mlof 2XYT+Amp broth [200 µg/ml] for plasmid preps in 2-ml-deepwell plates [Whatman] . Plasmid DNA was extracted from the positiveclones by the Promega MagnaSil method on a BioMek 2000 laboratoryautomation workstation . Plasmid preps were stored at 4°Cuntil sequencing reactions were performed.
Inserts were identified by automated dye terminator DNA sequencing from the cI-up and LM3-R primers . DNA sequencing reactions weredone with the ABI Big Dye terminator kit [Applied Biosystems],and sequences were obtained at the Laboratory for Plant Genome Technologies at Texas A&M University . Sequence trace fileswere processed with Phred [12] or Sequencher [Gene Codes Corp., Ann Arbor, Mich.] . The sequences from each end of the inserts were identified by BLAST [1] searches against the E . coli proteindatabase [National Center for Biotechnology Information] locatedat ftp://ftp.ncbi.nlm.nih.gov/blast/db/ecoli.aa.Z, and the full sequences of the interacting sequence tags [ISTs] encoding self-assembling domains were inferred from the reference E . coli genome sequence. Annotations assigning gene names were from EcoGene [50] . TheDNA traces, FASTA files, and BLAST reports generated for the identification of ISTs were stored in the Doodle [Database of Oligomerization Domains from Lambda Experiments] database at http://oligomers.tamu.edu [L . Mariño-Ramírez,X . Tang, and J . C . Hu, unpublished data].
| RESULTS |
|---|
Identification of homotypic ISTs by use of repressor fusions.
The general scheme for our selection for gene fragments encoding
self-assembling proteins and protein domains is shown in Fig.
1 . We constructed a total of 15 libraries containing
quasi-randomgenomic DNA fragments of the E . coli K-12 strain
MG1655 as describedin Materials and Methods . Each of the repressor
fusion librarieswas then subjected to selection for phage immunity,
and theends of the inserts from 2,089 survivor clones were
sequenced.By comparing the end sequences to the MG1655 [4]
reference sequence,we identified the cloned segments in each
candidate, which werefer to as ISTs [21] . The
immune clones identified fall intotwo categories: ORF-encoded clones
[2,005 clones] and non-ORF-encodedclones [84 clones] . An ORF-encoded
IST is defined as a DNA fragmentfrom a repressor fusion that is read
in the same reading frameas it is in annotated E . coli ORFs
in the EcoGene database [50].The 2,005
ORF-encoded ISTs identified 463 nonredundant ORF-encodedISTs . These
ISTs were clustered into families of overlappingfragments,
identifying potential homotypic interactions in 232E . coli
proteins [Table 1] . Most of the non-ORF-encoded fusions
were very short, typically 12 to 20 amino acids [aa] in length,
similar to those observed in previous studies [23,
60] . A currentlist of ORF-encoded ISTs is
available is available in the Doodledatabase at http://oligomers.tamu.edu.
|
|
Annotation of proteins containing homotypic interactions.
Figure 2 shows the distribution of genes with ISTs based on
the functional classification in GenProtEC [48] .
Repressor fusionsidentified ISTs from genes in every protein-based
functionalcategory . Overall, the distribution of IST-containing
genesis similar to that observed for the complete genome . However,
relative to the genome, repressor fusions are underrepresented
in the functional categories for cell structure and transport[which
contains many membrane proteins].
|
Nevertheless, the ISTs from the 27 nonredundant proteins foundin the
cell structure and transport categories include 9 thatare annotated
as integral membrane or transmembrane proteinsin SWISSPROT . In these
cases, the IST could correspond to aperiplasmic or cytoplasmic
oligomerization domain from a membraneprotein . For example, a
fragment of the Kch protein correspondingto an intracellular
C-terminal dimerization domain was foundas an IST . This domain
contains a conserved hydrophobic dimerinterface also found in
eukaryotic transporters [26] . ISTs werealso found
in the C-terminal domain of YajC, a membrane-associatedprotein that
interacts with the SecA translocation machinery[10,
42], and in EmrE, a multidrug transporter that has been
shown to be oligomeric [49].
Genes from the regulation category are overrepresented amongthe IST-containing genes, consistent with the idea that oligomeric transcription factors are a major component of this functional category . ISTs were found in 62 proteins annotated as transcription factors . The most abundant family of transcriptional regulatorsin E . coli is the LysR family . Thirteen of the 46 LysR family members were identified . Nine of the 12 DeoR family membersof transcriptional regulators were identified . Five of the 15PurR family members of transcriptional regulators were alsoidentified along with 3 of the 4 RpiR family members.
The diversity of proteins identified by ISTs is also reflectedin the evolutionary families they represent . The clusters oforthologous groups of proteins [COGs] database classifies theproteins encoded by 43 sequenced genomes according to theirhomologous relationships [52] . Of the 232 homo-oligomeric proteinsidentified by ISTs, 210 have a COG assignment, indicating thatthey are members of conserved protein families . These 210 proteinsare distributed among 153 different COGs of 1,905 present inthe E . coli genome [Table 1].
We are especially interested in ISTs that might identify new oligomerization domains or motifs . However, we expect that manyof the ISTs will be from proteins whose structural basis forassembly has already been determined . To determine how the ISTsare distributed among known and unknown structures, we performedBLAST sequence similarity searches against the Protein DataBank [PDB] [3] database . Using a cutoff E value of 10-6 andsequence identities of more than 70% to detect E . coli proteinsor very close homologs, we found that 23 of the 232 proteinsidentified by ISTs have structures in the PDB . Twenty-one ofthe 23 structures found are annotated as homotypic oligomersin the Protein Quaternary Structure [PQS] database [17] witha variety of oligomerization states [Fig . 3a] . Although repressor fusions are able to find homodimers, our selection appears to be biased towards recovering higher-order oligomers [Fig . 3b].
|
Coiled-coil predictions for all the E . coli ORFs revealed that
495 ORFs, or 11.5%, are predicted to contain coiled coils byusing
the COILS2 algorithm [34] . Forty-eight homo-oligomeric
proteins identified here [20.7%] are predicted to form coiled
coils, indicating that the homotypic interaction dataset isenriched
for coiled coils . In 40 of these cases [83.3%], theIST includes the
region encoding the coiled coil.
Mapping assembly domains within ORFs. The position of an IST within a gene defines a region sufficientfor forming a homotypic interaction . The sizes of the ISTs rangefrom 16 aa for EmrE to 794 aa for YebT . Figure 4a shows thedistribution of the lengths of the shortest IST found for eachIST-containing gene as a fraction of the length of the completeORF . Although a majority of the ISTs comprise >90% of thefull-length gene product, in many cases the shortest ISTs suggestthe presence of a distinct domain that is sufficient for oligomerization.Some genes are represented by single ISTs, whereas in othercases, several ISTs are found for the same gene . The ISTs fromthe multienzyme libraries generate more multiple hits to thesame genes than the libraries made by random shearing . MultipleISTs in a gene can be used to identify the minimal region or regions involved in a homotypic interaction . For example, we found eight ISTs from ParC, the A subunit of DNA topoisomeraseIV; the overlap between these maps the oligomerization domainto aa 333 to 475 [Fig . 4b].
|
In most cases the ISTs overlap, suggesting that a single regionis
required for oligomerization . However, ISTs only identifyregions
that are sufficient to self-assemble and do not ruleout the
possibility that more than one part of a protein canoligomerize . For
MutS, the oligomerization domain in the crystalstructure does not
overlap with the minimal IST we identifiedbetween aa 789 and 853
[Fig . 5a] . The homodimeric E . coli MutS
structure was determined by using a fragment containing aa 1to 800 .
The IST corresponds to an additional oligomerizationdomain at the
carboxy terminus of the MutS protein, which allowsMutS dimers to
form tetramers [32].
|
Two ISTs, containing residues 68 to 558 and 529 to 857, werefound in
the ClpB protein, a heptameric ATP-dependent chaperone[28] .
Although the two ISTs overlap, each contains one of thetwo AAA motif
domains identified within ClpB [Fig . 5b] . The
oligomerization of these two segments of ClpB that have beenexamined
in vitro [39, 53] . The C-terminal IST
contains sequencesshown to form hexamers . Interestingly, the
N-terminal IST correspondsto a domain that behaves like a monomer in
vitro, indicatingthat the immunity phenotype of the N-terminal IST
could involveeither improper folding of this fragment when fused to
cI orbridging interactions with some other molecule . The N-terminal
domain of ClpB has been shown to bind unfolded protein [53],
raising the possibility that cI-ClpB [aa 68 to 558] self-assembles
by binding to unfolded parts of the fusion protein.
In four cases, the homotypic ISTs correspond to domains of known structure [SucB, Kch, ArgR, and DnaX] . The crystal structureof an oligomeric domain of SucB was determined for a fragmentlocated between aa 173 and 405 [PDB accession no . 1E2O] . Theminimal IST was found between aa 191 and 405, and the amino acids not present in the IST are unstructured in the crystal structure . The oligomerization domain of the Kch protein islocated between aa 241 and 393 [PDB accession no . 1ID1], andits minimal IST was found between aa 229 and 405 . The hexamerizationdomain of the arginine repressor [ArgR] is located between aa80 and 156 [PDB accession no . 1XXA], and its minimal IST wasfound between aa 48 and 156 . The minimal IST for DnaX was foundin between aa 247 and 455, which overlaps with the amino acidsseen in the oligomerization domain of the gamma subunits inthe clamp loader complex [aa 1 to 373; PDB accession no . 1JR3]. Translational frameshifting within dnaX generates two gene products,tau and gamma [5, 14, 56] . The domain III fragment [aa 222 to382] of both tau and gamma has been shown to form homotetramersin vitro [7, 16].
| DISCUSSION |
|---|
Using large-scale functional selections with
repressor fusions,we identified homotypic interactions for 232
proteins encodedby the E . coli genome . As with a similar
study with yeast genomicfragments [37], there are
several criteria to support the ideathat the ISTs identified here
represent bona fide oligomerizationdomains . First, the strong bias
toward fusions from annotatedORFs and in the correct reading frame
is consistent with a requirementfor correct folding; at higher
expression levels, peptides encodedby sequences that are not in
frame with annotated ORFs are common[23,
60] . Second, in several cases, structural or biochemical
evidence in the literature supports the oligomerization state
of specific ISTs . Third, in no case do we identify a fusionto a
protein where we have been able to find evidence that thefused
domain should be monomeric . Nevertheless, for many ofthe genes
identified here, the ISTs should be viewed as strongbut not
definitive evidence for oligomerization . False positives,while rare,
are expected in the repressor system under our conditions.For
example, repressor fusions to E . coli dihydrofolate reductase,
a well-characterized monomeric protein, are immune to phageinfection
and purify as a mixture of monomers and dimers [unpublisheddata].
Among the genes identified by ISTs, we find oligomerization domains that have been previously identified and many that arenovel . The proteins with ISTs that have entries or close homologsin the PDB not only serve as positive controls but also givean idea of the range of different homotypic molecular architecturesthat can be identified by use of repressor fusions . We usedannotations from the PQS database [17] to evaluate the oligomerization of ISTs that correspond to known protein structures . PQS usesan automated algorithm to guess the oligomerization state ofa protein by evaluating the surface area buried by protein-proteincontacts in crystal structures . While PQS annotations are notperfect, they provide a best guess in cases where biochemicaldata are not available . In the few cases where PQS annotationsdo not mark IST-containing proteins as homotypic oligomers,there is good reason to believe that they are homo-oligomeric.GlnG [NtrC] and DnaX have structures in the PDB but are notpart of the homotypic PQS subset . However, it is well establishedthat NtrC forms homotypic oligomers [30, 45], and repressorfusions have been used to study the oligomerization determinantswithin NtrC [13] . The dnaX gene encodes two proteins, the tauand gamma subunits of DNA polymerase III holoenzyme . Translationalframeshifting occurs at residue 430, which is within the DnaXISTs . Thus, the phage immunity of these constructs could bedue either to cI fusions to segments of tau, which dimerizesto hold together the two catalytic subunits in the DNA polymeraseIII holoenzyme [29], or to cI fusions to segments of the gammasubunit, which forms a homotetramer in vitro [7] and is partof a heteropentameric subcomplex of the clamp loader in DNApolymerase III holoenzyme in vivo [25] . For FruR, a member ofthe PurR family of transcriptional regulators, a nuclear magnetic resonance structure is available for the N-terminal part ofthe protein [PDB accession no . 1UXD] . However, the ISTs forFruR include C-terminal domains that are not present in thenuclear magnetic resonance structure . Although the oligomericstate of FruR is unknown, the sequences corresponding to theFruR IST form homodimers or homotetramers in other members of the PurR family.
From analysis of E . coli proteins of known structure and their close relatives, it is likely that on the order of half of the proteins encoded in the genome are involved in homotypic assemblies or subassemblies of larger heterotypic complexes [L . Mariño-Ramírez and J . C . Hu, unpublished data] . The 232 proteins identifiedby ISTs thus represent a sampling of the possible oligomerizationdomains encoded in the E . coli genome rather than an exhaustive enumeration.
We find oligomeric proteins in all functional categories . TheISTs are biased toward transcription factors and against membrane proteins . The bias toward transcription factors is likely toreflect the tendency of regulators to be active oligomers atvery low expression levels, comparable to those used here toavoid false positives . In addition, the low expression levelsof most transcription factors may be favorable for the recoveryof ISTs, as abundant dimeric proteins could interfere with theactivity of repressor fusions by titrating them into inactiveheteromultimeric complexes . The recovery of ISTs will also bedependent on the topology of the repressor fusions; the repressordomains must be close enough to each other to bind the operatorhalf-sites . This may prevent fusions to integral membrane proteinsfrom properly localizing . A recent report of the use of repressorfusion vectors specifically tailored to detect transmembranedomains identifies nine proteins that were missed here [33].That study assayed for the activity of repressor fusion proteinsat expression levels above those used here.
Among the 232 proteins, we found several proteins that havebeen previously identified by others by use of repressor fusions:IbpB [23], PspA [11], and NtrC [13] . However, there are E . coliproteins that are known to form active repressor fusions thatwere not found in our screen, FtsZ [8], MalK [27], PhoB [13],Fur [51], BglG [2], and YigA [23], consistent with the ideathat our screen was not saturated . In addition, there are manyproteins we have not recovered as ISTs that we think should be recoverable in repressor fusions . These include LacZ, LacI, CAP, TrpB, and many other well-studied stable oligomers . Inseveral cases, we obtained ISTs for some members of a conservedfamily of proteins but not from others that are likely to oligomerize similarly . The LysR family of transcriptional regulators [LTTRs]is the second largest family of proteins present in the E . coli genome . It is likely that all LTTRs assemble through similar homotypic interactions into dimers or tetramers, but we recovered ISTs for only 13 of the 43 members of the LTTR family foundin E . coli . Similarly, we have identified ISTs for some, butnot all, members of the PurR family of transcription factors.
Despite subjecting numbers of clones that should be sufficientto
provide full coverage to phage selection, the ISTs recoveredfrom
both sheared and restriction enzyme libraries are stillmissing many
oligomeric proteins, indicating that nonrandomfactors remain .
Although random shearing provided a dramaticimprovement in coverage
compared to previous studies which usedpartial restriction digests
only [54], several factors may skewthe recovery
of ISTs even from sheared DNA . First, the shearingprocess itself may
not be perfectly random . Second, differentnumbers of fusion
junctions may be possible for different oligomericproteins, so that
some proteins would be overrepresented evenin a perfectly random
library . Third, genes that are adjacentto other genes that are toxic
in high copy will be underrepresented.Fourth, our ability to recover
oligomerization domains depends,of course, on the ability of a
fusion protein to assemble enougholigomers to provide immunity to
phage
infection . Some oligomericproteins may simply have dissociation
constants that are toohigh to support repressor activity at the
expression levelsprovided by the weak constitutive promoter in our
vectors [37].Based on the phenotypes of fusions
to variant GCN4 leucine zippers[59,
61], we estimate that cells expressing dimeric repressor
fusions with dissociation constants in the low micromolar range
should be immune to
.
However, these estimates are based onmany extrapolations from in
vitro to in vivo conditions andmay not be applicable to other
proteins . Note also that steady-stateexpression levels are not the
only factor affecting whethera clone is recovered . Freshly
transformed cells must expresssufficient repressor activity from the
plasmid to confer immunitybefore encountering a phage particle
seeded on the plate . Weknow that the plating efficiency on phage
plates after transformationwith plasmids carrying different
repressor fusions varies overseveral orders of magnitude under the
conditions we used toprevent the recovery of transformation siblings
[data not shown].The observed bias toward recovering higher-order
oligomers mayreflect their improved ability to bind cooperatively to
adjacentoperators within OR and OL or to form
looped repressor-operatorcomplexes between OR and OL
[9, 47].
While the present screen has not reached saturation, the ISTswe identified already provide us with a wealth of informationabout specific E . coli proteins and about oligomeric proteinsin general . The identification of an IST defines oligomerizationas a biochemical property of the protein containing it and oftenmaps the oligomerization domain within the protein coding sequence.This is often the only functional annotation for hypotheticalORFs and may provide an entry for further studies of biologicalfunction . For example, repressor fusions can be used to studyhow the activity of specific proteins is modulated by controllingoligomerization [2, 24].
In many cases, ISTs identify an oligomerization domain in a specific region of the protein . This suggests the existenceof a contiguous and independent folding unit in the proteinthat drives oligomerization . In other cases, the ISTs foundare very close to the full length of the protein . In some cases,the entire protein may be necessary to observe the homotypicinteraction . For example, we recovered multiple clones encodingFolX fusions to aa 1 to 120, suggesting that the entire proteinwas required for a homotypic interaction . FolX forms an octamericring-like structure where the entire protein appears to be requiredfor proper folding [PDB accession no . 1B9L] [44] . However, wecannot conclude that subdomains are not sufficient for the oligomerizationof most proteins where the IST covers the entire ORF, as wemay have simply failed to sample the appropriate fragments.
More than half of the proteins identified with repressor fusions do not have an identifiable homotypic homolog in the PDB andmay represent new folds . The identification of oligomeric subdomainsmay be useful for structure determination . Multidomain proteinsare often difficult to study, and the ISTs should define independentlyfolding domains that may be more amenable to structure determinationthan the full-length protein.
IST data can be combined with evolutionary analysis to provide better domain mapping and functional assignments . For example, multiple-sequence alignments of the DeoR family of transcription factors suggest two conserved domains separated by a nonconserved linker . Sequence analysis identified a helix-turn-helix located towards the N terminus of these proteins . The best-characterized member of this family, the DeoR repressor, appears to be anoctamer in solution [40], but the location of the oligomerization domain was not previously described . Nine of the members ofthe DeoR family of transcriptional regulators were identifiedby using repressor fusions [Fig . 6] . The ISTs include various amounts of the C-terminal end of the ORF, assigning oligomerization function to the conserved C-terminal domain.
|
| ACKNOWLEDGMENTS |
|---|
This work was supported by Public Health Service Grant R01GM63652-01
from the NIGMS to J.C.H . L.M.-R . was supported in part by a
Fulbright/Colciencias/IIE predoctoral fellowship . N.R . was supported
in part by the National Science Foundation REU program.
We thank Patricia Klein and Eun-Gyu No for invaluable help with library construction and DNA sequencing, respectively . Rodolfo Aramayo, John Mullet, Debby Siegele, and members of the Hu lab provided useful advice and discussions . Additional technical assistance was provided at various stages of the project byBarbara Blum, Brian Hatten, and Svenja Simon-Marshall.
| FOOTNOTES |
|---|
* Corresponding author . Mailing address: Department of
Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station,
TX 77843-2128 . Phone: [979] 862-4054 . Fax: [979] 845-4946 . E-mail: jimhu@tamu.edu .
Present address: Computational Biology Branch, National Centerfor
Biotechnology Information, National Library of Medicine,National
Institutes of Health, Bethesda, MD 20894.
Present address: DCMB, University of Texas Southwestern Graduate
School, Dallas, TX 75390.
| REFERENCES |
|---|
What Is Molecular Microbiology?,
What Is Water Purification?,
What Is Botulism?,
What Is Rhizobia?,
What Is MIC?,
a,
Microorganism,
s,
Bacterium,
o,
Microbiology,
s,
Bacteria,
a,
Microorganisms,
c,
Edwardsiella,
n,
S. cerevisiae,
s,
S. cerevisiae,
n,
Streptomycin,
a,
Escherichia coli,
c,
Staphylococcus aureus,
s,
Erythromycin,
r,
Pseudomonas aeruginosa,
o,
Streptococcal,
r,
Erythromycin,
i,
Microorganisms,
s,
Bacillus,
s,
Shigella,
e,
Haemophilus,
r,
Escherichia coli,
o,
Streptococci,
n,
Denitrificans,
r,
Bacteria,
e,
Microbial,
n,
Cholera,
o,
Escherichia coli
© 2005
Transgalactic Ltd (manufacturer of Bioscreen C software) |
Privacy Statement | P.O. Box
1393, 00101 Helsinki, Finland,
Last modified: May 25, 2005
| ||||||