<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1745-7580-6-8</ui><ji>1745-7580</ji><fm>
<dochead>Research</dochead>
<bibl>
<title>
<p>An integrated approach to epitope analysis II: A system for proteomic-scale prediction of immunological characteristics</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Bremel</snm><mi>D</mi><fnm>Robert</fnm><insr iid="I1"/><email>robert_bremel@iogenetics.com</email></au>
<au id="A2"><snm>Homan</snm><mnm>Jane</mnm><fnm>E</fnm><insr iid="I1"/><email>jane_homan@iogenetics.com</email></au>
</aug>
<insg>
<ins id="I1"><p><sup>1</sup>ioGenetics LLC, 3591 Anderson Street, Madison, WI 53704, USA</p></ins>
</insg>
<source>Immunome Research</source>
<issn>1745-7580</issn>
<pubdate>2010</pubdate>
<volume>6</volume>
<issue>1</issue>
<fpage>8</fpage>
<url>http://www.immunome-research.com/content/6/1/8</url>
<xrefbib><pubidlist><pubid idtype="pmpid">21044290</pubid><pubid idtype="doi">10.1186/1745-7580-6-8</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>19</day><month>5</month><year>2010</year></date></rec><acc><date><day>2</day><month>11</month><year>2010</year></date></acc><pub><date><day>2</day><month>11</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Bremel and Homan; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Improving our understanding of the immune response is fundamental to developing strategies to combat a wide range of diseases. We describe an integrated epitope analysis system which is based on principal component analysis of sequences of amino acids, using a multilayer perceptron neural net to conduct QSAR regression predictions for peptide binding affinities to 35 MHC-I and 14 MHC-II alleles.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>The approach described allows rapid processing of single proteins, entire proteomes or subsets thereof, as well as multiple strains of the same organism. It enables consideration of the interface of diversity of both microorganisms and of host immunogenetics. Patterns of binding affinity are linked to topological features, such as extracellular or intramembrane location, and integrated into a graphical display which facilitates conceptual understanding of the interplay of B-cell and T-cell mediated immunity.</p>
<p>Patterns which emerge from application of this approach include the correlations between peptides showing high affinity binding to MHC-I and to MHC-II, and also with predicted B-cell epitopes. These are characterized as coincident epitope groups (CEGs). Also evident are long range patterns across proteins which identify regions of high affinity binding for a permuted population of diverse and heterozygous HLA alleles, as well as subtle differences in reactions with MHCs of individual HLA alleles, which may be important in disease susceptibility, and in vaccine and clinical trial design. Comparisons are shown of predicted epitope mapping derived from application of the QSAR approach with experimentally derived epitope maps from a diverse multi-species dataset, from <it>Staphylococcus aureus</it>, and from vaccinia virus.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>A desktop application with interactive graphic capability is shown to be a useful platform for development of prediction and visualization tools for epitope mapping at scales ranging from individual proteins to proteomes from multiple strains of an organism. The possible functional implications of the patterns of peptide epitopes observed are discussed, including their implications for B-cell and T-cell cooperation and cross presentation.</p>
</sec>
</sec>
</abs>
</fm><meta>
<classifications>
<classification id="refman" subtype="user_supplied_xml" type="bmc"/>
</classifications>
</meta><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>The availability of proteomic information is increasing exponentially. This is especially true for pathogenic microorganisms. Integration and interpretation of vast amounts of data from the analysis of proteomic information, so that it may be useful to bench scientists and clinicians is a growing challenge. Achieving this goal is essential if bioinformatic analysis is to lead to improved vaccines and antibody therapies and to a better understanding of patient and population responses to infections, cancers, autoimmune epitopes, and allergens. Experimental approaches to definition of epitopes are time consuming and expensive; predictive methods can provide maps which could reduce the effort needed in experimental characterization.</p>
<sec>
<st>
<p>Current Challenges in Epitope Analysis</p>
</st>
<p>In reviewing approaches to epitope characterization described in the literature, both experimentally and through the use of computer-based analysis, three broad shortcomings become apparent.</p>
<p>First, literature reports of experimental approaches to epitope characterization have often been narrow in scope, based on the response of individual patients, cells from a few individual donors or single strains of mice, or focused on isolated peptides. This has generated valid data, but which is specific to the narrow set of circumstances and not reflective of the broader host or organism population. Discovering binding affinity for an MHC molecule of a single HLA haplotype will not necessarily be predictive for a population of diverse heterozygotic individuals. Many literature reports claim T-cell epitope characterization but fail to report the MHC restriction (mouse) or HLA of cells used. By limiting consideration to isolated peptides, an important feature of cell biology is overlooked. Binding to MHC-I and MHC-II molecules is a competitive and dynamic process <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>. MHC molecules bind to peptides selected from among all those competitors which result from the proteolysis of the whole organism. Predictive determinations of preferential epitope binding can thus only be made when considered in the context of the whole proteome, or, at very least, the whole protein, but not for isolated peptides.</p>
<p>Second, from an epidemiologic perspective the outcome of infection is dependent on the interface between a population of heterozygous hosts and a diverse array of microbial strains. Many possible interactions of individual and strain are possible. Depending on the context, the challenge in vaccine design may be to choose the best combination of epitopes conserved across multiple strains of an organism to protect an entire immunogenetically diverse community (for infectious diseases), or to select the immunostimulant optimal for a specific patient (in cancer immunotherapeutics).</p>
<p>Third, while there is broad recognition that strong T-cell responses are essential to good memory, and in many cases to effective immunity, efforts to characterize B-cell and T-cell responses have not always been well integrated.</p>
<p>B-cell and T-cell cooperative interaction in antigen presentation has been the subject of many landmark papers <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp>. More recently, Sette <it>et al </it>demonstrated that, at least for vaccinia, T-cell stimulation is specific to a B-cell epitope located within the same protein <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, pointing to a close determinant association between B-cell and T-cell epitopes. Cross reactivity, or polyspecificity, is a necessary feature of the T-cell recognition of epitopes comprised of MHC-peptide complexes <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<p>There has been increasing recognition that, both for anti-infective immunity, and for cancer immunity, distinctions between the role of MHC-I and MHC-II in responding to intra or extra-cellular organisms are not clear cut <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>. MHC-II molecules bind longer peptides (15-20 amino acids) whereas MHC-I molecules bind shorter peptides of ~9 amino acids or less <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. Binding of MHC molecules to peptides is characterized by a large degree of degeneracy and it is now recognized that a particular MHC molecule may bind peptides that vary widely in composition and origin <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<p>B-cell epitopes may be continuous or discontinuous peptides, in some cases requiring multiple linear peptides to be configured together to make up a complete epitope <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. Location of B-cell epitope motifs in loops external to the cell membrane may allow for grouping into a multi-component epitope. Multiple peptides may need to act together to provide an immunostimulant adequate to initiate a B-cell response. Batista has described the need for B-cells to have sufficient stimulation to form immune synapses, initiating and enabling the uptake of surface proteins <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>. In other cases B-cell responses occur independent of T cell stimulation <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
<p>Most successful antimicrobial vaccines target surface exposed B-cell epitopes and vaccines have been evaluated by their ability to stimulate an antibody response. Peptide epitopes are a major component of the overall epitope complex, or immunome, and are genetically specified. In many cases antibodies to bacterial proteins are indeed protective, and complement fixing antibodies have been used as an index of vaccinal efficacy <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>.</p>
<p>Immunization protocols for laboratory production of antibodies have long recognized the utility to linkage to a known T-cell epitope <abbrgrp>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
</abbrgrp>. T-cell responses to epitopes arrayed in an organism of interest are harder to evaluate <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Those working in reverse vaccinology <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> have been frustrated by the difficulty of reliably characterizing T-cell epitopes <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. Proteins with multiple transmembrane domains have proven challenging to express as sub-unit vaccines <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp>. In understanding the interaction of B-cell and T-cell responses, it is therefore useful to readily understand the topology of epitopes relative to the cell membrane. In the case of immunotherapeutic cancer vaccines, the ability to stimulate a multifaceted T-cell response may be even more necessary <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>State of the Art: Epitope prediction programs</p>
</st>
<p>Various bioinformatic programs for B-cell epitope and T-cell epitope analysis are available on the Internet (Additional File <supplr sid="S1">1</supplr>) and have contributed significantly to our understanding. However, a number of limitations are evident. Limits on the sequence size which can be submitted to website servers generally only allows single protein analysis and thus preclude contextual understanding of competitive binding affinity for a whole proteome.</p>
<suppl id="S1">
<title>
<p>Additional File 1</p>
</title>
<text>
<p>
<b>Listing of internet sites with relevant computing and resource sites (PDF)</b>.</p>
</text>
<file name="1745-7580-6-8-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<sec>
<st>
<p>B-cell epitope predictions</p>
</st>
<p>Schemes for prediction of B-cell epitopes have been available for nearly 30 years. Hopp and Woods <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp> first proposed the use of amino acid sequences to identify the most immunogenic regions in proteins and recognized the relative importance of surface exposure, a concept furthered by Parker <it>et al </it>
<abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. By using various lengths of peptides as indices to produce scoring metrics, about 70% of the epitopes in a small set of proteins could be accurately predicted. A wide array of methods has been published since, but the predictive performance has not greatly improved <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. The field has recently been critically reviewed by Davydov and Tenevitsky <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, who use a preferred binary classification metric AROC method of evaluation. Recalculated, the accuracy reported by Hopp and Wood and the contemporary AROC values are not substantially different.</p>
<p>The availability of the BepiPred program over the Internet (on the servers at the Center for Biological Sequence Analysis (CBS)), and its ability to process partial proteome-scale sequence data, led us to initially utilize this program <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. Interestingly, the algorithms rely heavily on the work of Parker <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. We subsequently found that the amino acid principal components NN regression approach, which we describe in the accompanying paper <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>, and which uses the physical property data sets of Hopp and Woods <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>, Parker <it>et al </it>
<abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>, and others, could produce outputs indistinguishable from BepiPred <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. This enabled us to consolidate the computations into a single platform along with MHC binding predictions and facilitated integration with genomic data processing programs.</p>
</sec>
<sec>
<st>
<p>MHC binding Predictions</p>
</st>
<p>DeGroot reviews T-cell epitope mapping systems available publically and developed commercially <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. Many T-cell epitope prediction programs depend on substitution matrix scoring of individual amino acids. As we have discussed in a companion paper <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>, this does not provide a complete physicochemical description of the binding relationship. Substitution matrices are the backbone of bioinformatics, but were originally developed to assist in understanding evolutionary genetic relationships, not physicochemical properties. Quantitative structure activity relationship (QSAR) approaches that utilize the physicochemical properties of interacting species as a foundation are a more appropriate method. These have been applied by one group but in the context of peptides rather than proteins or proteomes <abbrgrp>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
<abbr bid="B34">34</abbr>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>.</p>
<p>The bioinformatic approaches currently available and discussed above are designed to analyze B-cell epitopes or T-cell epitopes but, despite the recognized interplay of B and T cells, fail to integrate the two to provide a complete picture of the immunome.</p>
<p>Virtually all website based programs understandably place limits on sequence size. Further complicating this is the absence of uniformity in size limitations, making consistent data manipulation challenging. The outputs are difficult to integrate when obtained piecemeal. More importantly, from a practical viewpoint, software reliability testing over the internet is at best challenging. Where the programs can be acquired for local use, the Unix/Linux platforms favored by the bioinformatics community are not commonly available in laboratory settings so converting the programs into functional utility in a local setting is not trivial.</p>
<p>Our first goal was thus to produce a unified system, that consolidated the various immunological metrics into one set of tools and operated within the context of commercially available software on widely-used computing platforms. MHC-I and MHC-II binding using the neural network and partial least squared platforms of JMP<sup>&#174; </sup>(also JMP<sup>&#174; </sup>Genomics) <url>http://www.jmp.com</url> is described in an accompanying paper <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. Secondly, we recognized the need to examine the interface of immunogenetically diverse patient populations along with an array of different strains of the same organism. Thirdly, we considered a graphical display that allowed visualization of the output of very complex statistical computations to be desirable. Our conceptual model in approaching this third goal was the superior level of understanding of land use provided by geographic information systems (GIS) which overlay multiple information sets of physical and economic geography. We have applied this concept to the microbial surfome "landscape". In this paper we describe an integrated bioinformatics analysis system which we believe approaches these goals.</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Selection of Benchmark Datasets</p>
</st>
<p>We sought appropriate benchmark datasets to test the system developed. None were available which provided comparable levels of information on all the features we sought to integrate. While a useful repository, the IEDB tabulation of individual epitopes is less useful for proteomic-scale work. We selected three datasets for comparison. The "AntiJen" database provided a benchmark for evaluating a diverse repository of epitopes within the context of entire protein molecules. Two well studied infectious organisms, <it>Staphylococcus aureus </it>and vaccinia virus, enabled retrospective comparisons with published data.</p>
<sec>
<st>
<p>AntiJen</p>
</st>
<p>We examined reference datasets of mapped B-cell epitopes on various websites. Additions or subtractions of sequences have been made to some datasets (reviewed in Davydov <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>). We sought datasets where epitopes had been mapped for the entire length of a protein and which provided a wide array of source proteins. We downloaded the datasets of identified B-cell epitopes from the site at CBS. The largest one, labeled "AntiJen", is a derivative of that described by Toseland <it>et al </it>
<abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp> (but no longer available at the weblink provided in this publication). From the annotations, some of the proteins appear to trace to the time of Hopp and Woods <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>. This dataset may be accessible from other websites but we report herein our use of it as downloaded from CBS <abbrgrp>
<abbr bid="B38">38</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp> (currently accessible at <url>http://www.cbs.dtu.dk/suppl/immunology/Bepipred.php</url>).</p>
<p>As downloaded, the "AntiJen" data set comprised 124 proteins spanning mammalian, viral, protozoan, bacterial, and other origins (Additional File <supplr sid="S2">2</supplr>, Table S2a), in which 246 B-cell epitopes have been defined experimentally by various labs and various methods. Larsen <it>et al </it>state that "the proteins of this data set are not fully annotated, and the annotation for the non-epitope stretches is not known" <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>.</p>
<suppl id="S2">
<title>
<p>Additional File 2</p>
</title>
<text>
<p>
<b>AntiJen data (PDF)</b>. S2a. Table of proteins in AntiJen set. S2b. Summary table of analytical results. S2c. Representative graphics from AntiJen set.</p>
</text>
<file name="1745-7580-6-8-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Staphylococcus aureus</p>
</st>
<p>Many experimental studies have been conducted to define epitopes on several proteins of <it>Staph. aureus </it>(See Additional File <supplr sid="S3">3</supplr>, Table S3a). Proteomes of multiple strains are available in Genbank. We worked with the 15 strains listed in Additional File <supplr sid="S3">3</supplr>, Table S3b.</p>
<suppl id="S3">
<title>
<p>Additional File 3</p>
</title>
<text>
<p>
<b>
<it>Staph. aureus </it>data (PDF)</b>. S3a. Table of epitopes mapped experimentally in <it>Staph. aureus</it>. S3b. Strains of <it>Staph aureus </it>analyzed. S3c. Maps of three <it>Staph. aureus </it>toxins. S3d. Map of <it>Staph. aureus </it>Protein A.</p>
</text>
<file name="1745-7580-6-8-S3.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Vaccinia</p>
</st>
<p>In view of the detailed experimental epitope mapping information available for vaccinia <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
<abbr bid="B42">42</abbr>
</abbrgrp> and the demonstration by Sette <it>et al </it>
<abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> of the deterministic linkage of B-cell and T-cell epitopes at a protein level in the I1L core protein of vaccinia, we also processed the vaccinia proteome, and report on the results for a subset of four proteins as an illustration of the use of the integrated system to provide predicted HLA-specific differences in binding affinity.</p>
</sec>
</sec>
<sec>
<st>
<p>Process Description</p>
</st>
<p>A system for integrated analysis of proteome scale epitope information was designed which comprises a number of sub-processes. All computations were done and graphics generated using JMP<sup>&#174; </sup>version 8 <url>http://www.jmp.com</url>. Figure <figr fid="F1">1</figr> provides an overview of the system; the sub-processes are described briefly below and in detail in Additional File <supplr sid="S4">4</supplr>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Elements of peptide epitope prediction process</p></caption><text>
   <p><b>Elements of peptide epitope prediction process</b>.</p>
</text><graphic file="1745-7580-6-8-1" hint_layout="double"/></fig>
<suppl id="S4">
<title>
<p>Additional File 4</p>
</title>
<text>
<p>
<b>Process description (PDF)</b>.</p>
</text>
<file name="1745-7580-6-8-S4.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>
<b>Process A </b>consists of developing a set of Neural Network (NN) binding predictions for 14 MHC-II and 35 MHC-I molecules. Once developed these equations are stored for further use. Briefly, principal component amino acid analysis was carried out on the physical properties of amino acids measured in a total of thirty-one different published studies. In the NN each amino acid is assigned 3 numerical values based its principal components rather than the standard alphabetical representation commonly used in bioinformatics. This type of descriptor is commonly used in QSAR analysis where it is known as the "z"-scale <abbrgrp>
<abbr bid="B43">43</abbr>
<abbr bid="B44">44</abbr>
</abbrgrp>. The principal component descriptors are uncorrelated, mutually orthogonal metrics, and embody about 90% of the variance in all physical properties of the 20 amino acids commonly found in proteins. The z-scales are not in themselves physical properties, but rather uncorrelated dimensionless proxies for amino acid physical properties that can be used predictively: z<sub>1 </sub>is a hydrophobicity or polarity correlate, z<sub>2 </sub>a size correlate and z<sub>3 </sub>an electronic correlate. A characteristic of principal component analysis is that it also produces a set of descriptors that are appropriately weighted for regression analysis. This process is described in detail in a companion paper where it is benchmarked against several other prediction schemes <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>.</p>
<p>
<b>Process B</b>, also described in the companion paper, consists of replacing the alphabetic notation of amino acids by z-scales so that each 9-mer in the proteome is represented as a vector of 27 numbers and each 15-mer as a vector of 45 numbers. These numerical values are then used to compute predicted binding affinities for peptides in the proteome using the NN prediction equations from Process A.</p>
<p>
<b>Process C </b>involves the use of one of several publicly available programs for protein topology predictions. We have variously used PHOBIUS <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>, PHILIUS <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, MEMSAT <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>, and TMH <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. The output is a probability prediction for each amino acid in the protein as being intracellular, extracellular, within a membrane or a signal peptide. A determination of B-cell epitope predictions is also made. Unlike the MHC predictions which provide a predicted affinity, in the case of B-cell epitopes we are making a binary "yes-no" probability prediction that a specific amino acid lies in a B-cell epitope. The B-cell epitope probability may be achieved by submission to one of several publicly available programs for B-cell epitope predictions <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>, however we generated a B-cell epitope prediction based on principal components, enabling us to achieve this step as an integral part of the process.</p>
<p>
<b>Process D </b>integrates the output from the first three processes and involves the use of self-organizing mapping algorithms to identify Coincident Epitope Groups (CEG) for protein segments likely to be accessible to the immune system. CEGs are peptides in which high affinity MHC binding peptides and B cell epitopes are found to overlap or whose borders lie within a user-specifiable distance of each other. As described herein the distance was set at 3 amino acids.</p>
<p>
<b>Process E </b>is a database task to assemble nearly identical protein sets (NIPS) from different strains of organisms to arrive at a minimalist set of conserved or near-conserved peptide epitopes for further consideration.</p>
</sec>
<sec>
<st>
<p>Standardization</p>
</st>
<p>To facilitate further statistical procedures, the MHC binding affinities (as natural logarithms) were standardized. Standardization is a common statistical process where the data points are transformed to a mean of zero and unit variance (and standard deviation as the standard deviation is the square root of the variance). Thus all binding affinities of all different supertypes, and paired supertype combinations, were put on the same basis for further computations. This process is reversible so that a more experimentally meaningful ic<sub>50 </sub>can be obtained at any point if desired. Secondly, the Bayesian probabilities for each individual amino acid being in a B-cell epitope were subjected to global standardization like that for the MHC binding affinities. Thus, all the peptides and other metrics subject to statistical screening are standardized, so that thresholding or other selections are made on single or joint normal probability distributions.</p>
<p>Following the standardization processes, the tables of binding affinities contained columns of the original predicted binding affinity data for the different MHC supertypes (as natural logarithms) and the original B-cell epitope probabilities, as well as corresponding columns of standardized (zero mean, unit standard deviation) data of the immunologically relevant endpoints.</p>
</sec>
<sec>
<st>
<p>Design of Graphical Output</p>
</st>
<p>Visualization of all epitope components in relation to topology facilitates understanding of function. A graphical scheme (Step 13 in Figure <figr fid="F1">1</figr>) was developed that made it possible to readily visualize the topology of proteins at the surface of the organism as well as three standardized probabilities for high affinity binding petides for MHC-I and MHC-II, and B-epitopes. Predictions for MHC-I and MHC-II binding were done routinely for all organisms, although it is recognized that MHC-I is generally considered most relevant for intracellular infectious organisms and MHC-II for extracellular organisms. Simultaneous visualization of both provides a method of conceptualization of potential cross-presentation of epitopes.</p>
<p>We adopted a convention for graphical display in which the amino acids positions are aligned along the X axis from N to C. The Y axis is in standardized units (zero mean, unit std dev) to show MHC binding affinity. Topological information is displayed in the background shading. The permuted minima ln(ic50) representing the mean population phenotype are plotted for MHC-I and/or MHC-II at each peptide position the number representing a mean of 105 (MHC-II) or 630 (MHC-I) allelic combinations at that position. This is windowed average minima &#177; 4 amino acids from the plotted point. A smooth line fit through the points is produced using the polynomial filter of Savistsky and Golay <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. Another line is overlaid to show the standardized probability of B-cell epitope binding. Across the base of the graphic we use ribbons of various colors and intensities to designate regions of high binding affinity and coincidence of B-cell and MHC binding. Different thresholding stringencies can be applied to the ribbons (see Additional File <supplr sid="S5">5</supplr>); we have mostly found that the 25<sup>th </sup>percentile of the permuted minimum distribution of MHC binding along with the 25<sup>th </sup>percentile of B-cell epitope probability to be useful thresholds and these are used throughout the graphics below. The 25<sup>th </sup>percentile should be clearly understood to be a threshold from within the permuted minima distributions that have their own mean and standard deviations, and not the 25<sup>th </sup>percentile of all peptides. Colored vertical lines are used to show the behavior of any particular HLA allele as compared to the permuted population phenotype. The line extends from the permuted population value to the standardized value of the indicated HLA at the N-terminus of the peptide 9-mer or 15-mer. By creating a further overlay experimentally defined epitopes can be compared with predictions. The display can easily be rescaled to visualize the individual amino acids in the peptides.</p>
<suppl id="S5">
<title>
<p>Additional File 5</p>
</title>
<text>
<p>
<b>Global standardization criteria (PDF)</b>.</p>
</text>
<file name="1745-7580-6-8-S5.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Observations of Single Proteomes</p>
</st>
<p>Table <tblr tid="T1">1</tblr> is summary of the binding affinities for MHC-II supertypes for the surfome (surface proteome) and secretome (secreted proteins) of <it>Staphylococcus aureus </it>COL (Genbank genome accession number = <ext-link ext-link-id="NC_002951" ext-link-type="gen">NC_002951</ext-link>). The surfome consists of all proteins coded for in the genome that have a molecular signature(s) predicting their insertion in cell membranes. Some proteins in the surfome also have signal peptides that control topology but do not lead to secretion.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>MHC-II binding affinities of all overlapping 15-mers in the surfome of Staphylococcus aureus COL NC_002951.</p></caption><tblbdy cols="8">
      <r>
         <c ca="left">
            <p>
               <b>MHC-II Supertype</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Ave ln(ic50)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Std Dev ln(ic50)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%-tile ln(ic50)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Ave IC50 (nM)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Ave-SD ic50 (nM)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%-tile ic50 (nM)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Ave-2SD ic50 (nM)</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0101</b>
            </p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>3.11</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>88.27</p>
         </c>
         <c ca="right">
            <p>3.95</p>
         </c>
         <c ca="right">
            <p>1.72</p>
         </c>
         <c ca="right">
            <p>0.18</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0301</b>
            </p>
         </c>
         <c ca="right">
            <p>6.29</p>
         </c>
         <c ca="right">
            <p>1.93</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>540.59</p>
         </c>
         <c ca="right">
            <p>78.15</p>
         </c>
         <c ca="right">
            <p>45.28</p>
         </c>
         <c ca="right">
            <p>11.30</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0401</b>
            </p>
         </c>
         <c ca="right">
            <p>5.31</p>
         </c>
         <c ca="right">
            <p>2.59</p>
         </c>
         <c ca="right">
            <p>1.95</p>
         </c>
         <c ca="right">
            <p>202.23</p>
         </c>
         <c ca="right">
            <p>15.12</p>
         </c>
         <c ca="right">
            <p>7.04</p>
         </c>
         <c ca="right">
            <p>1.13</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0404</b>
            </p>
         </c>
         <c ca="right">
            <p>5.23</p>
         </c>
         <c ca="right">
            <p>2.76</p>
         </c>
         <c ca="right">
            <p>1.63</p>
         </c>
         <c ca="right">
            <p>187.57</p>
         </c>
         <c ca="right">
            <p>11.84</p>
         </c>
         <c ca="right">
            <p>5.12</p>
         </c>
         <c ca="right">
            <p>0.75</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0405</b>
            </p>
         </c>
         <c ca="right">
            <p>4.38</p>
         </c>
         <c ca="right">
            <p>1.90</p>
         </c>
         <c ca="right">
            <p>1.92</p>
         </c>
         <c ca="right">
            <p>79.92</p>
         </c>
         <c ca="right">
            <p>11.96</p>
         </c>
         <c ca="right">
            <p>6.81</p>
         </c>
         <c ca="right">
            <p>1.79</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0701</b>
            </p>
         </c>
         <c ca="right">
            <p>4.29</p>
         </c>
         <c ca="right">
            <p>2.84</p>
         </c>
         <c ca="right">
            <p>0.62</p>
         </c>
         <c ca="right">
            <p>73.33</p>
         </c>
         <c ca="right">
            <p>4.27</p>
         </c>
         <c ca="right">
            <p>1.85</p>
         </c>
         <c ca="right">
            <p>0.25</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0802</b>
            </p>
         </c>
         <c ca="right">
            <p>7.05</p>
         </c>
         <c ca="right">
            <p>2.00</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>1151.07</p>
         </c>
         <c ca="right">
            <p>155.45</p>
         </c>
         <c ca="right">
            <p>88.42</p>
         </c>
         <c ca="right">
            <p>20.99</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_0901</b>
            </p>
         </c>
         <c ca="right">
            <p>5.85</p>
         </c>
         <c ca="right">
            <p>2.48</p>
         </c>
         <c ca="right">
            <p>2.64</p>
         </c>
         <c ca="right">
            <p>346.90</p>
         </c>
         <c ca="right">
            <p>29.03</p>
         </c>
         <c ca="right">
            <p>13.99</p>
         </c>
         <c ca="right">
            <p>2.43</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_1101</b>
            </p>
         </c>
         <c ca="right">
            <p>5.58</p>
         </c>
         <c ca="right">
            <p>2.52</p>
         </c>
         <c ca="right">
            <p>2.35</p>
         </c>
         <c ca="right">
            <p>265.50</p>
         </c>
         <c ca="right">
            <p>21.39</p>
         </c>
         <c ca="right">
            <p>10.46</p>
         </c>
         <c ca="right">
            <p>1.72</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_1302</b>
            </p>
         </c>
         <c ca="right">
            <p>7.14</p>
         </c>
         <c ca="right">
            <p>1.95</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>1257.67</p>
         </c>
         <c ca="right">
            <p>178.85</p>
         </c>
         <c ca="right">
            <p>101.68</p>
         </c>
         <c ca="right">
            <p>25.43</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB1_1501</b>
            </p>
         </c>
         <c ca="right">
            <p>5.86</p>
         </c>
         <c ca="right">
            <p>2.74</p>
         </c>
         <c ca="right">
            <p>2.31</p>
         </c>
         <c ca="right">
            <p>351.12</p>
         </c>
         <c ca="right">
            <p>22.61</p>
         </c>
         <c ca="right">
            <p>10.07</p>
         </c>
         <c ca="right">
            <p>1.46</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB3_0101</b>
            </p>
         </c>
         <c ca="right">
            <p>8.26</p>
         </c>
         <c ca="right">
            <p>1.95</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>3861.57</p>
         </c>
         <c ca="right">
            <p>547.81</p>
         </c>
         <c ca="right">
            <p>312.37</p>
         </c>
         <c ca="right">
            <p>77.71</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB4_0101</b>
            </p>
         </c>
         <c ca="right">
            <p>5.69</p>
         </c>
         <c ca="right">
            <p>2.20</p>
         </c>
         <c ca="right">
            <p>2.81</p>
         </c>
         <c ca="right">
            <p>294.70</p>
         </c>
         <c ca="right">
            <p>32.68</p>
         </c>
         <c ca="right">
            <p>16.67</p>
         </c>
         <c ca="right">
            <p>3.62</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>DRB5_0101</b>
            </p>
         </c>
         <c ca="right">
            <p>4.92</p>
         </c>
         <c ca="right">
            <p>2.60</p>
         </c>
         <c ca="right">
            <p>1.58</p>
         </c>
         <c ca="right">
            <p>136.76</p>
         </c>
         <c ca="right">
            <p>10.12</p>
         </c>
         <c ca="right">
            <p>4.85</p>
         </c>
         <c ca="right">
            <p>0.75</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Average</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>5.74</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>2.40</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>2.64</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>631.2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>80.2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>44.7</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10.7</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Exp(Average) nM</b>
            </p>
         </c>
         <c ca="right">
            <p>310.5</p>
         </c>
         <c ca="right">
            <p>11.0</p>
         </c>
         <c ca="right">
            <p>14.1</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The surface proteome consists of all proteins that have one or more predicted transmembrane helices in their structure. The statistics were derived from approximately 216,000 15-mers for 14 supertypes, or about 3.02 million binding predictions. The NN were trained and the predictions were made in the natural logarithmic domain (ln). The statistical parameters are for the entire proteome, as this would constitute the population of peptides presented binding to MHC molecules on the surface of antigen presenting cells.</p>
   </tblfn></tbl>
<p>Prediction of B-cell epitopes, MHC-II binding, and topology for 15 strains of <it>Staph. aureus </it>(listed in Additional File <supplr sid="S3">3</supplr>, Table S3b) have been done. Predicted B-cell epitopes were found to be located inside and outside the bacterial cell membrane, but virtually none in the transmembrane domains, perhaps due to alpha helical structure of the transmembrane peptides. In the very few instances where extension into membranes did occur (&lt;2%), the predicted B-cell epitope only penetrated a few amino acids. This may represent an error in the prediction of the edge of the transmembrane domain.</p>
<p>A summary of the topology of the proteins and the predicted MHC-I and MHC-II binding affinity of peptides in <it>Staph. aureus </it>COL is shown in Table <tblr tid="T2">2</tblr>. It is recognized that the immunological response to <it>Staph. aureus </it>should be mediated primarily through MHC-II; MHC-I is included for completeness. The table shows the relatively higher binding affinity of peptides found in membrane spanning segments of the proteome. The difference in the ln(ic50) of the MHC-II affinities of the intracellular and extracellular segments is small but is statistically significant.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Characteristics of the surface and secreted proteome of <it>Staphylococcus aureus </it>COL</p></caption><tblbdy cols="4">
      <r>
         <c ca="center">
            <p>
               <b>Total Proteins</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>with TMH</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>with TMH and SP</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Secreted</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>2,615</p>
         </c>
         <c ca="center">
            <p>649</p>
         </c>
         <c ca="center">
            <p>69</p>
         </c>
         <c ca="center">
            <p>186</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>MHC-I</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>9-mers</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Avg ln(ic50) of group</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>inside</p>
         </c>
         <c ca="center">
            <p>48,222</p>
         </c>
         <c ca="center">
            <p>9.3</p>
         </c>
         <c ca="center">
            <p>238,320</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>membrane</p>
         </c>
         <c ca="center">
            <p>51,643</p>
         </c>
         <c ca="center">
            <p>8.4</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>outside</p>
         </c>
         <c ca="center">
            <p>138,455</p>
         </c>
         <c ca="center">
            <p>9.3</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>MHC-II</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>15-mers</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Avg ln(ic50) of group</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>inside</p>
         </c>
         <c ca="center">
            <p>40,422</p>
         </c>
         <c ca="center">
            <p>6.4</p>
         </c>
         <c ca="center">
            <p>210,466</p>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>membrane</p>
         </c>
         <c ca="center">
            <p>38,698</p>
         </c>
         <c ca="center">
            <p>4.6</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left" indent="1">
            <p>outside</p>
         </c>
         <c ca="center">
            <p>131,346</p>
         </c>
         <c ca="center">
            <p>6.6</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Total proteins and protein topology and MHC binding characteristics to different protein domains of <it>Staph aureus </it>COL surface and secreted proteins. Peptides within transmembrane domains to have a significantly higher binding affinity to both MHC-I and MHC-II. The ln(ic50) means shown are for peptides in the dataset for which both their N-terminus and C-terminus in the predicted transmembrane domain. All means were different from one another by ANOVA p &lt; .0001. TMH = transmembrane helix; SP = signal peptide.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Interface with a Diverse Host Population</p>
</st>
<p>The array of genetic variants (supertypes) of HLA molecules in the human population vastly exceeds that for which there are peptide training sets. Additionally, and yet further increasing the combinatorial possibilities, is the fact that each individual has both parental genotypes of MHC on their cell membranes. Despite the combinatorial complexity, examination of the statistics of the predicted binding affinities to a number of different proteins in the proteome of <it>Staph. aureus </it>gave rise to several observations which suggested that it would be possible to derive a system for determining the probability of binding not only for single supertypes, but for a population of combinatorial supertypes for which a trained NN was available. The processes outlined above make it possible to put entire proteomes (or multiple proteomes) consisting of millions of binding affinities into a single data table, in a familiar spreadsheet interface on a standard workstation computer.</p>
<p>Table <tblr tid="T3">3</tblr> shows the predicted binding affinities for each of the DRB supertypes in combination with each of the other DRB molecules (105 permutations) simulating heterozygous individuals (detail in Additional File <supplr sid="S6">6</supplr>). Inside an antigen presenting cell where peptides from a digested organism (e.g. <it>Staph. aureus </it>COL) are coming into contact with MHC-II molecules, those molecules with higher affinity (smaller of the two ln affinity numbers) would be expected to dominate in the binding process. One of the striking features that emerges from this table (bottom rows Table <tblr tid="T3">3</tblr>) is the overall advantage of heterozygosity. Individuals randomly inheriting combinational pairs of the 14 supertypes stand to have a higher binding affinity than if they had only one type. A second observation is the segregation of different alleles within the sorted list. The heterozygosity advantage and the 10 percentile threshold, being in a range considered a useful biological range of affinity, suggested the possibility of averaging over all genotypes as a means of predicting binding in a population of individuals carrying MHC-II molecules of unknown genotype on their cells (as would be the case in a randomly selected vaccinee population). These results suggest that combinatorial pairs of supertypes need to be considered in statistical selection and screening processes, for example in clinical trials.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>MHC-II binding affinity of heterozygous and homozygous pairs</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>S1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>S2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%tile S1</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%tile S2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%tile Average</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>10%tile min of pair</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Top Ten</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>2.175</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0401</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>1.95</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>1.245</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0404</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>1.63</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>1.085</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0405</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>1.92</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>1.23</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0701</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>0.62</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>0.58</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>2.51</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0901</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>2.64</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>1.59</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_1101</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>2.35</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>1.445</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="left">
            <p>DRB1_0101</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="right">
            <p>2.58</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Bottom Ten</p>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>4.145</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>4.215</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB3_0101</p>
         </c>
         <c ca="left">
            <p>DRB1_0301</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
         <c ca="right">
            <p>4.775</p>
         </c>
         <c ca="right">
            <p>3.81</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>4.55</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB3_0101</p>
         </c>
         <c ca="left">
            <p>DRB1_0802</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
         <c ca="right">
            <p>5.11</p>
         </c>
         <c ca="right">
            <p>4.48</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB3_0101</p>
         </c>
         <c ca="left">
            <p>DRB1_1302</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
         <c ca="right">
            <p>5.18</p>
         </c>
         <c ca="right">
            <p>4.62</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>DRB3_0101</p>
         </c>
         <c ca="left">
            <p>DRB3_0101</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
         <c ca="right">
            <p>5.74</p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Mean</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>2.92</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>2.37</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>2.64</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>1.88</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Std Dev</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>1.47</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>1.41</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>1.07</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>1.08</b>
            </p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Ten percentile MHC-II binding affinity statistics for 105 different heterozygous and homozygous supertype combinations for 15-mer peptides from the surface proteome of Staphylococcus aureus COL. The results were obtained using 14 MHC-II supertypes for which training sets were available to train the NN. The surface proteome is defined as proteins that are predicted to have one or more transmembrane helices and are therefore expected to be inserted into the cell membrane. Top and bottom ten pairs are shown in this summary table; complete data set in Additional File <supplr sid="S6">6</supplr>.</p>
   </tblfn></tbl>
<suppl id="S6">
<title>
<p>Additional File 6</p>
</title>
<text>
<p>
<b>Complete dataset summarized in Table 3 (PDF)</b>.</p>
</text>
<file name="1745-7580-6-8-S6.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>The different alleles and allelic combinations shown in Table <tblr tid="T3">3</tblr> have significantly different means and variance and this complicates statistical analysis and thresholding. All of the proteins in the <it>Staph. aureus </it>surfome, comprising about 210,000 15-mers, were used in a global standardization process (summarized in Additional File <supplr sid="S5">5</supplr>). By using all the 15-mers in the proteome for standardization, the statistical processes are brought into line with the biological process where an engulfed foreign organism would be digested and the peptides presented would be the entire repertoire of the organism. Furthermore, the construction of normally distributed populations provides a means of rigorous and meaningful statistical screening and selection processes from normal Gaussian distributions (Figure <figr fid="F2">2</figr>).</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Example of the global standardization process</p></caption><text>
   <p><b>Example of the global standardization process</b>. The global standardization process using DRB*0101 15-mers of the combined secretome and surfome from NC_002951 <it>Staphylococcus aureus </it>COL (289,760 peptides). The highlighted area shows the peptides with an N-terminal amino acid predicted to be in a membrane that is shifted from the mean in the original data but is coincident with the mean after standardization. Lower panel. Average lnN(ic50) of DRB*0101 15-mers done on a protein basis. In this case the histogram bars are the number of proteins with the indicated ln(ic50). The non-normal distribution is caused by proteins with transmembrane domains with higher binding affinity.</p>
</text><graphic file="1745-7580-6-8-2" hint_layout="single"/></fig>
<p>A second distribution anomaly is shown in Figure <figr fid="F2">2</figr>. Not only does the binding affinity vary across different MHC alleles as described above, it also varies between proteins in a proteome, giving rise to distributions like that seen in Figure <figr fid="F2">2</figr>. This clearly demonstrates why any organism-level conclusions about binding affinity cannot be made based on measurements made on peptides in isolation and should preferably not be made on individual proteins. The global standardization process based on standardized ln(ic50) ranks all peptides and produces a ranking of affinities of all peptides in a proteome against all other peptides in that proteome. This is the situation that would arise as an infectious organism is digested in an antigen presenting cell.</p>
</sec>
<sec>
<st>
<p>Correlations between MHC-I and MHC-II</p>
</st>
<p>By examining the plots of many different proteins with different types of data portrayal we observed that, despite individual 15-mer peptides showing widely different predicted binding affinities for the different MHC supertypes, there was a tendency for high binding for all supertypes to locate in certain regions of molecules and low binding in other regions. This can be seen by undulations in the averaged mean affinities across a protein sequence. Not only was this the case among MHC-II supertypes, but was also seen with the overall means of all MHC-I and MHC-II supertypes (Figure <figr fid="F3">3</figr>; Table <tblr tid="T4">4</tblr>). After examining many different proteins individually it emerged that each protein has a characteristic undulation pattern regardless of the supertype or MHC. Computed on an affinity basis the variations are very large with the mean affinity varying over a thousand-fold for peptides from different regions within a protein molecule. This long-range variation is superimposed on a large peptide to peptide variation within the protein.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Example of long range variation in mean MHC II affinity across a single protein (Thermonuclease precursor Staphylococcus aureus COL gi 57650135)</p></caption><text>
   <p><b>Example of long range variation in mean MHC II affinity across a single protein (Thermonuclease precursor Staphylococcus aureus COL gi 57650135)</b>. (A) DRB4*0101 and (B) DRB1*0404 have a high correlation (r = 0.6) while (C) DRB3*0101 and (D) DRB1*0901 have a low correlation.</p>
</text><graphic file="1745-7580-6-8-3" hint_layout="double"/></fig>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Pearson correlation coefficient of ln(ic50) for pairs of MHC-I and MHC-II alleles.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>MHC-I</b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>MHC-II</b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Top Ten</p>
         </c>
         <c ca="left">
            <p>
               <b>HLA 1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>HLA 2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>r</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>HLA 1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>HLA 2</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>r</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>A*2402</p>
         </c>
         <c ca="left">
            <p>A*2301</p>
         </c>
         <c ca="right">
            <p>0.67</p>
         </c>
         <c ca="left">
            <p>DRB4*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.63</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="left">
            <p>B*4402</p>
         </c>
         <c ca="right">
            <p>0.60</p>
         </c>
         <c ca="left">
            <p>DRB1*0701</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.63</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>A*2403</p>
         </c>
         <c ca="left">
            <p>A*2301</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
         <c ca="left">
            <p>DRB1*1501</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.62</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="left">
            <p>B*4002</p>
         </c>
         <c ca="right">
            <p>0.51</p>
         </c>
         <c ca="left">
            <p>DRB1*0405</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.58</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4501</p>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="right">
            <p>0.50</p>
         </c>
         <c ca="left">
            <p>DRB1*1101</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.55</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>A*2403</p>
         </c>
         <c ca="left">
            <p>A*2402</p>
         </c>
         <c ca="right">
            <p>0.49</p>
         </c>
         <c ca="left">
            <p>DRB1*1501</p>
         </c>
         <c ca="left">
            <p>DRB1*0701</p>
         </c>
         <c ca="right">
            <p>0.54</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="left">
            <p>B*1801</p>
         </c>
         <c ca="right">
            <p>0.47</p>
         </c>
         <c ca="left">
            <p>DRB5*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*1101</p>
         </c>
         <c ca="right">
            <p>0.53</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4501</p>
         </c>
         <c ca="left">
            <p>B*4002</p>
         </c>
         <c ca="right">
            <p>0.45</p>
         </c>
         <c ca="left">
            <p>DRB4*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*1501</p>
         </c>
         <c ca="right">
            <p>0.52</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*5301</p>
         </c>
         <c ca="left">
            <p>B*5101</p>
         </c>
         <c ca="right">
            <p>0.41</p>
         </c>
         <c ca="left">
            <p>DRB5*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0404</p>
         </c>
         <c ca="right">
            <p>0.52</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*5301</p>
         </c>
         <c ca="left">
            <p>A*2402</p>
         </c>
         <c ca="right">
            <p>0.40</p>
         </c>
         <c ca="left">
            <p>DRB1*1101</p>
         </c>
         <c ca="left">
            <p>DRB1*0802</p>
         </c>
         <c ca="right">
            <p>0.51</p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Bottom Ten</p>
         </c>
         <c ca="left">
            <p>B*5701</p>
         </c>
         <c ca="left">
            <p>B*1801</p>
         </c>
         <c ca="right">
            <p>-0.26</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0405</p>
         </c>
         <c ca="right">
            <p>0.23</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4501</p>
         </c>
         <c ca="left">
            <p>A*2601</p>
         </c>
         <c ca="right">
            <p>-0.26</p>
         </c>
         <c ca="left">
            <p>DRB1*1302</p>
         </c>
         <c ca="left">
            <p>DRB1*0405</p>
         </c>
         <c ca="right">
            <p>0.23</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*5701</p>
         </c>
         <c ca="left">
            <p>B*5401</p>
         </c>
         <c ca="right">
            <p>-0.27</p>
         </c>
         <c ca="left">
            <p>DRB1*1302</p>
         </c>
         <c ca="left">
            <p>DRB1*1101</p>
         </c>
         <c ca="right">
            <p>0.23</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>A*3002</p>
         </c>
         <c ca="left">
            <p>A*2301</p>
         </c>
         <c ca="right">
            <p>-0.28</p>
         </c>
         <c ca="left">
            <p>DRB1*0901</p>
         </c>
         <c ca="left">
            <p>DRB1*0301</p>
         </c>
         <c ca="right">
            <p>0.22</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4501</p>
         </c>
         <c ca="left">
            <p>A*6801</p>
         </c>
         <c ca="right">
            <p>-0.29</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0101</p>
         </c>
         <c ca="right">
            <p>0.22</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="left">
            <p>A*3002</p>
         </c>
         <c ca="right">
            <p>-0.29</p>
         </c>
         <c ca="left">
            <p>DRB1*0802</p>
         </c>
         <c ca="left">
            <p>DRB1*0301</p>
         </c>
         <c ca="right">
            <p>0.21</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4002</p>
         </c>
         <c ca="left">
            <p>A*6801</p>
         </c>
         <c ca="right">
            <p>-0.31</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*1101</p>
         </c>
         <c ca="right">
            <p>0.20</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4402</p>
         </c>
         <c ca="left">
            <p>A*2902</p>
         </c>
         <c ca="right">
            <p>-0.31</p>
         </c>
         <c ca="left">
            <p>DRB5*0101</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="right">
            <p>0.20</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4501</p>
         </c>
         <c ca="left">
            <p>A*2301</p>
         </c>
         <c ca="right">
            <p>-0.32</p>
         </c>
         <c ca="left">
            <p>DRB1*0301</p>
         </c>
         <c ca="left">
            <p>DRB1*0101</p>
         </c>
         <c ca="right">
            <p>0.20</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4403</p>
         </c>
         <c ca="left">
            <p>A*2902</p>
         </c>
         <c ca="right">
            <p>-0.34</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0802</p>
         </c>
         <c ca="right">
            <p>0.19</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>B*4002</p>
         </c>
         <c ca="left">
            <p>A*3002</p>
         </c>
         <c ca="right">
            <p>-0.46</p>
         </c>
         <c ca="left">
            <p>DRB3*0101</p>
         </c>
         <c ca="left">
            <p>DRB1*0901</p>
         </c>
         <c ca="right">
            <p>0.15</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The Pearson correlation coefficients were computed for all allele combinations on a random sample of 1000 15-mers (MHC-II) and 1000 9-mers (MHC-I) from the <it>Staphylococcus aureus </it>COL dataset. All coefficients are significant p &lt; 0.0001. Note that for MHC-I there are both negative and positively correlated pairs. Some of the coefficients ~0.0 were not statistically significant in this case.</p>
   </tblfn></tbl>
<p>It also became apparent that, despite the large differences in affinities between peptides, for a particular peptide some of the ic50 values were highly correlated across MHC alleles. The Pearson correlation coefficients for the top ten and bottom ten pairwise comparisons are shown in Table <tblr tid="T4">4</tblr>. For MHC-II all of the pairwise correlations are statistically significant and positive, though of varying magnitude. For MHC-I there is a subset that is positively correlated, another that is negatively correlated and a third group of non-correlated alleles.</p>
</sec>
<sec>
<st>
<p>Windowing of High Affinity Binding</p>
</st>
<p>The positive correlations among MHC alleles and the other statistical characteristics led us to experiment with methods of computing binding metrics that encompassed a population of heterozygotic combinations; effectively a population phenotype.</p>
<p>From an immunological perspective the low affinity peptides are irrelevant; it is more useful to calculate a running average of high affinities in a way that captures the undulations in binding affinity across a protein sequence, while also capturing the population phenotype. Based on these concepts we developed a system to compute an average of standardized affinities for the permuted pairs for all supertypes within an adjustable (filtering) window. The window is defined as a stretch of contiguous amino acids positions over which averaging was carried out. Various windows (filtering stringencies) were tested, but the most useful smoothing was achieved with a window of &#177; half the size of the binding pocket (&#177; 4 amino acids) as shown in Figure <figr fid="F4">4</figr>.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Demonstration of application of a binding window around a high affinity binding 15-mer</p></caption><text>
   <p><b>Demonstration of application of a binding window around a high affinity binding 15-mer</b>. (A) Actual standardized binding affinity N-terminus of the 15-mer begins at the point plotted. (B) peptide movement window &#177; 4 amino acids and (C) a binding window of &#177; 7 amino acids. The fine line in panels B and C is identical to that plotted in A. Semi-transparent colors yellow = extracellular, green = transmembrane domains, and pink = intracellular predicted by Phobius. Protein: Thermonuclease precursor <it>Staphylococcus aureus </it>COL gi 57650135.</p>
</text><graphic file="1745-7580-6-8-4" hint_layout="single"/></fig>
<p>For MHC-II this is reasonably simple to envisage, as the ends of the pocket are open and peptides longer than 15 amino acids could undergo rapid association:dissociation "jiggling" until the highest binding configuration is found. In practice, the range of affinity constants in a pool of peptides may be as much as 1000 fold so that higher affinity peptides will very rapidly occupy the MHC binding site and remain bound in place. For MHC-I, with closed ends on the binding pocket, the possibilities are more limited.</p>
<p>Another factor, which has not been included in the predictions at this point, is the effect of the differential proteolysis that will contribute to the variable lengths of peptide with a possibility to interact with a binding pocket. Several tests of the potential impact of proteasomal cleavage were carried out with the webserver NetChop 3.1 at CBS on sample protein sets (not shown). From those experiments it appears that peptides are very likely to be cut into pieces shorter than 9 amino acids, so that the MHC-I presentation of a peptide is the result of interception and capture by the MHC binding reaction during proteolytic cleavage. The patterns of proteolytic cleavage of proteins by lysozomal enzymes suggest that they would be equally aggressive and that peptide processing for MHC-II presentation would be expected to be comparable.</p>
</sec>
<sec>
<st>
<p>Permuting Windows for the Population</p>
</st>
<p>The permuted minima within the window described above (and shown in Figure <figr fid="F4">4</figr>) are averaged to arrive at a single number for all MHC-I allelic combinations and another for all MHC-II combinations for each particular amino acid position. Through experimentation we found that this process produced metrics whose undulations tracked the visually obvious patterns of MHC binding in proteins (as seen in Figure <figr fid="F3">3</figr>). As the numbers were drawn from a standardized dataset the resulting sample were also normally distributed albeit at a distance (negative) from the population mean as a whole. Thus, statistical thresholds could be applied to these metrics that were based on normal populations. Each allelic combination was given an equal weight. This is a generalizable concept; it is also possible to compute the predicted population phenotype for various subpopulations using appropriate weighting for genetic frequencies. The windowing operation is not dependent on standardized populations and the actual ln(ic50) can be used as well to compute a running average of binding affinity for any single allele.</p>
<p>The output of these computational processes was tabulated in a master database for the organism (Figure <figr fid="F1">1</figr>, Step 13). Selected coincident epitope groups comprising regions of proteins where peptides met three criteria were determined. In these both binding threshold for MHC and the B-cell epitope probability threshold were in the 10 percentile range and the run of amino acids in the predicted BEPI peptide was &#8805;4 amino acids. Selection of the 10th percentile in two characteristics in normally distributed variables on a probability basis should be a product of two probabilities or about a 1% coincidence where MHC binding regions overlapped either partially or completely with predicted B-epitope regions.</p>
</sec>
<sec>
<st>
<p>Graphical Display</p>
</st>
<p>Figure <figr fid="F5">5</figr> shows an annotated example of the graphical output from the system we have described above.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Annotated multidimensional overlay graphics of integrated analysis</p></caption><text>
   <p><b>Annotated multidimensional overlay graphics of integrated analysis</b>. <it>Cryptosporidium parvum </it>(Iowa II) hypothetical protein cdg5_540. GI: 126649159 as an example. The portion of the overlay graphic shown contains annotations related to various cellular features and protein topology well as the standardized predictions critical to the immunological recognition of the protein. This provides a graphical means of visualizing a multidimensional database of related information. At various magnification levels actual peptide sequences can be visualized as well as experimentally mapped locations for any desired HLA molecule.</p>
</text><graphic file="1745-7580-6-8-5" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Examples of Retrospective Comparisons</p>
</st>
<sec>
<st>
<p>AntiJen multi species benchmark dataset</p>
</st>
<p>A summary table of our findings on the AntiJen dataset and representative examples of graphical output for a cross section of these proteins is provided in Additional File <supplr sid="S2">2</supplr>, Table S2b and Additional File <supplr sid="S2">2</supplr>, Figure S2c.</p>
<p>The AntiJen data set comprises proteins from viruses, bacteria, protozoa, mammals, plants, and a number of other sources. Some are surface proteins. B-cell epitopes were found to be located predominantly in the external surface loops, and to a lesser degree in the cytoplasm. As with <it>Staph. aureus</it>, we observed CEGs in all types of proteins. A large percentage (&gt;20%) of B-cell epitopes were affiliated (i.e. overlapping or with their borders within 3 amino acids) with one or more MHC-I or MHC-II high affinity binding peptides. Over 78% of MHC-I high affinity binding peptides were affiliated with one or more B-cell epitopes, as were &gt;95% of MHC-II binding domains. MHC-I and MHC-II high affinity binding domains tended to be affiliated with each other.</p>
<p>Predicted epitopes were more prevalent in membrane associated proteins. Many proteins, particularly those with no transmembrane regions, had quite sparse epitope distribution, nevertheless CEGs were observed in most.</p>
</sec>
<sec>
<st>
<p>Staphylococcus aureus</p>
</st>
<p>We have completed analyses of B-cell epitopes, MHC-II binding, and topology for 15 strains of <it>Staph. aureus </it>(strains included are listed in Additional File <supplr sid="S3">3</supplr>, Table S3b). Table <tblr tid="T5">5</tblr> summarizes the results of the analysis, based on predicted average minimum binding permuted for an immunogenetically diverse heterozygous host population. The overlaps of the various epitopes were computed using a SOM algorithm on the centroids and dispersions of the predicted epitope segments.</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Summary of analysis of <it>Staphylococcus aureus </it>strains</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>Output for <it>Staph aureus </it>COL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Metric</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>No. of Proteins in <it>Staph aureus </it>Col proteome</p>
         </c>
         <c ca="left">
            <p>2615</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>No. of Surface Proteins (with Transmembrane Helices)</p>
         </c>
         <c ca="left">
            <p>649</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of Proteins with Signal Peptides</p>
         </c>
         <c ca="left">
            <p>255</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Sub-proteome analyzed (secreted and membrane affiliated)</p>
         </c>
         <c ca="left">
            <p>835</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total B-cell epitopes > 4 aa long</p>
         </c>
         <c ca="left">
            <p>14,089</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total B-cell epitopes overlapping or borders within 3 aa of a MHC-II high affinity binding peptide</p>
         </c>
         <c ca="left">
            <p>4,527</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percentage of B cell epitopes overlapping or bordering within 3 aa of a MHC-II high affinity binding peptide</p>
         </c>
         <c ca="left">
            <p>32.13%</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total MHC-II high affinity binding peptides</p>
         </c>
         <c ca="left">
            <p>3,230</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Output for 15 Strains of <it>Staph. aureus</it></b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Metric</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Proteomes of <it>Staphylococcus aureus </it>strains analyzed</p>
         </c>
         <c ca="left">
            <p>15</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Unique CEGs detected (all strains)</p>
         </c>
         <c ca="left">
            <p>5646</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CEGs conserved in 15/15 strains</p>
         </c>
         <c ca="left">
            <p>572</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Additional CEGs conserved in 14/15 strains</p>
         </c>
         <c ca="left">
            <p>364</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Median CEG (amino acids)</p>
         </c>
         <c ca="left">
            <p>25</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Minimum CEG (amino acids)</p>
         </c>
         <c ca="left">
            <p>15</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Maximum CEG (amino acids)</p>
         </c>
         <c ca="left">
            <p>60</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Proteins conserved in 15 strains with conserved CEGs (secreted and membrane affiliated)</p>
         </c>
         <c ca="left">
            <p>140</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p><it>Staphylococcus aureus </it>COL is used as an example of the properties analyzed and observed within a strain. The data from 15 strains of <it>Staph. aureus </it>is summarized in the lower half of the Table. The strains are listed in Additional File <supplr sid="S3">3</supplr>, Table S3b. Epitopes conserved in 14 of 15 strains tend to have only a single amino acid change. Epitopes characterized as B cell epitopes are within the top 25% on a permuted population basis. Peptides characterized as a MHC-II high affinity binding peptide are n the top 25% of binding affinities on a permuted population basis as defined in Table 4. A CEG is a coincident epitope group comprising a stretch of amino acids where overlapping or adjacent B-cell epitopes as well as MHC high affinity binding peptides are predicted.</p>
   </tblfn></tbl>
<p>Of all predicted B-cell epitopes in <it>Staph. aureus </it>COL strain, 32.13% were found to be overlapping or affiliated with MHC-II high affinity binding peptides; 66.54% of MHC-II binding peptides were found in affiliation with B-cell epitopes.</p>
<p>Within the 15 strains of <it>Staph. aureus </it>we mapped a total of 5646 CEGs. Of these, 572 were conserved across all 15 strains. A further 364 were found in 14 of the 15 strains with usually only a single amino acid change in the one non-conserved strain. Of the approximately 2615 proteins making up the proteome of each of the 15 strains, 98 proteins within the surfome and a further 42 proteins in the secretome were conserved across all strains and thus contained conserved CEGs.</p>
<p>To evaluate our observations alongside the experimental findings of others, we identified a number of publications which provide characterization of epitopes within five <it>Staph. aureus </it>proteins, documenting both predicted B-cell epitopes and MHC binding. In Additional File <supplr sid="S3">3</supplr>, Table S3a we tabulate the correlation between our observations of CEGs and the experimental findings for these. Much of the data pre-dates genome sequencing projects and thus reconciling the literature with Genbank is challenging. For example, in some cases peptides were reported with amino acid numbering without consideration of the signal peptide cleavage in the Genbank proteome repository. In some cases, such as with the staphylococcal toxins, the cleavage is substantially distal from the starting methionine. Figure <figr fid="F6">6</figr> shows the graphical plots with the published experimental results overlaid for two proteins; additional plots are found in Additional File <supplr sid="S3">3</supplr>, Figure <figr fid="F3">3c</figr> and Figure <figr fid="F3">3d</figr>. We caution that our plots show the permuted human population average positions for MHC-II binding; publications may report on experimental results derived from single HLA or mouse MHCs. Overall there is remarkable correlation. Where the most detailed fine epitope mapping is compared to our prediction the mapped contact points either overlap or are within 5 aa. In other cases where mapping is not so detailed the overlapping is extensive. We have predicted additional high affinity MHC-I and MHC-II binding peptides not identified in the literature.</p>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Predicted and experimentally mapped epitope regions for two proteins from <it>Staphylococcus aureus </it>COL (NC_002951)</p></caption><text>
   <p><b>Predicted and experimentally mapped epitope regions for two proteins from <it>Staphylococcus aureus </it>COL (NC_002951)</b>. Graphic overlay using proteome-wide standardization of scoring metrics. Blue line is the predicted population phenotypic MHC-II binding. This is computed as described in the methods for fourteen MHC-II alleles permuted as dizygotic combinations within a window &#177; 4 of the amino acid position indicated. The orange line is the predicted B-cell epitope probability for the particular amino acid being within a B-cell epitope. Actual computed data points are plotted along with the line that is the result of smoothing with a polynomial filter <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Blue horizontal bands are the regions of high probability MHC II binding phenotype and orange horizontal bars are high probability predicted B-cell epitope regions. The percentile probabilities used as the threshold are as described in the text and is indicated in the number within the box at the left. The red diamonds (or groups thereof) are experimentally mapped regions of Ig binding. The experimental mapping is described in more detail in Additional File <supplr sid="S3">3</supplr> Table S3a. (A) LPXTG cell wall surface anchor protein, IsdB (GI:57651738) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. (B) ABC transporter, ATP-binding protein (GI:57651892) <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>.</p>
</text><graphic file="1745-7580-6-8-6" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Staph. aureus Iron Regulated Determinant B (IsdB) NC_002951.57651738</p>
</st>
<p>The <it>Staph. aureus </it>protein IsdB is a vaccine candidate; recent papers characterize its immunological features <abbrgrp>
<abbr bid="B50">50</abbr>
<abbr bid="B51">51</abbr>
</abbrgrp>. In Figure <figr fid="F6">6A</figr> predicted regions of MHC-II binding peptides and predicted B-cell epitope regions are shown along with the positions at which point mutations abrogated monoclonal antibody binding <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp>. Several features are noteworthy in the patterns seen, as they appear commonly in proteins we have reviewed. First, the population permuted high affinity binding regions vary by over 2 standard deviation units. This corresponds to over a 1000-fold range in average highest predicted affinity binding. Regions where predicted high affinity human MHC binding peptides are absent sometimes extend over several hundred amino acids (see also <it>Staph. aureus </it>Protein A, Additional File <supplr sid="S3">3</supplr>, Figure S3d). Second, regions with a paucity of high affinity MHC binding peptides tend to have long stretches of high probability B-cell epitopes. All the regions of monoclonal antibody contacts mapped by Brown <it>et al </it>
<abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp> are located in regions of where the population would be predicted to have high affinity MHC-II binding and the alignment with the predicted B-cell epitopes is strong. It should be pointed out that the 25th percentile (standardized) B-cell probability threshold is twice the stringency recommended by the BepiPred server; using a lower stringency resulted in all the mapped B-cell epitopes overlapping the predictions. The muteins produce a fine map of discontinuous epitopes and, thus, while the B-cell epitope prediction algorithms can only predict individual linear peptide probabilities, it appears that they reliably predict sub-segments of discontinuous epitopes.</p>
</sec>
<sec>
<st>
<p>Staph. aureus ABC transporter protein NC_002951.57651892</p>
</st>
<p>Figure <figr fid="F6">6B</figr> shows the predicted and mapped patterns for an ABC transporter protein. This protein was identified as a target for clinical immunotherapeutic development based on the work of Burnie <it>et al </it>
<abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>, which documents antibodies recognizing this protein in the sera of patients with staphylococcal septicemia. Unfortunately, neither the HLA of the patients, nor the strains of <it>Staph. aureus </it>were documented. There are several families of this protein and they are present in all strains of <it>Staph. aureus</it>. It is predicted by virtue of its signature motif as an ATP transporter to be a membrane protein. We include proteins of this sort in our surfome dataset by the use of PSORTb <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp> regular expression motif identifiers. We found the equivalent protein in all <it>Staph. aureus </it>currently available to be in two groups differing by two point mutations. A notable feature of the ABC transporter is that all the mapped monoclonal antibody binding regions are in areas predicted to have high affinity MHC -II binding. Two caveats about the results point to a common issue when attempting to reconcile predictions with older studies. The mapped regions were selected for more detailed study by Burnie <it>et al </it>
<abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp> simply because the ELISA results for the peptides were more than two standard deviations above the mean. Hence there is not a complete panel of experimental results to compare the B-cell epitope predictions. Clearly the experiments would also have produced a number of peptides over 1 standard deviation etc. but those were not reported. As a result, the curators of the IEDB have classified all other peptides in this protein as "B-cell epitope (true negative)". No MHC binding results were reported.</p>
</sec>
<sec>
<st>
<p>Vaccinia</p>
</st>
<p>The complete proteome for VACV Western Reserve was downloaded from Genbank and processed as described above. We generated graphical output for all the proteins and then compared the output for proteins reported as containing immunodominant binding T-cell epitopes <abbrgrp>
<abbr bid="B54">54</abbr>
<abbr bid="B55">55</abbr>
</abbrgrp>. Figure <figr fid="F7">7</figr> shows graphical output for I1L (GI:68275867). Additional File <supplr sid="S7">7</supplr> shows comparable output for proteins A10L (GI:68275926), A14L (GI:68275930), and A17L (GI:68275934).</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>Overlay epitope maps of locus I1L (GI:68275867) from Vaccinia virus Western Reserve</p></caption><text>
   <p><b>Overlay epitope maps of locus I1L (GI:68275867) from Vaccinia virus Western Reserve</b>. (A) Vertical lines (dark red) are the N-terminal positions of predicted high affinity binding 9-mer peptides for A*0201 predicted by neural net regression. (B) Vertical lines are the N-terminal positions of predicted high affinity binding 9-mer peptides for A*1101 (red) and B*0702 (blue) predicted by neural net regression. (C) Higher resolution showing fine detail of A*0201 mapping. In all three panels the experimental overlay is for MHC I 9-mer peptides mapped in HLA A*0201/K<sup>b </sup>transgenic mice <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Symbols as described in legend to Figure 6. Background is unshaded because this protein is predicted to lack any membrane domains.</p>
</text><graphic file="1745-7580-6-8-7" hint_layout="double"/></fig>
<suppl id="S7">
<title>
<p>Additional File 7</p>
</title>
<text>
<p>
<b>Vaccinia additional figures (PDF)</b>.</p>
</text>
<file name="1745-7580-6-8-S7.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>The experimental studies by Pasquetto <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>, to which we made comparisons, were done in transgenic mice carrying human MHC-I molecules. Thus they represent perhaps the most clear attempt to match <it>in silico </it>predicted to experimental human MHC binding. Figure <figr fid="F7">7</figr> depicts plots for protein I1L shown at two different magnifications, to enable the visualization of peptide sequences in the overlays. As I1L lacks transmembrane domains the background has been left uncolored. The colored vertical lines indicate the specific location of the leading edge (N-terminus of a 9-mer) of predicted high affinity peptides for the particular indicated HLA. The colored lines extend below the permuted population average and indicate that specific HLA shows higher affinity binding for that peptide than does the population as a whole. Also shown are the locations of predicted B-cell epitopes. Notably, the peptides experimentally mapped by Pasquetto <it>et al </it>
<abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp> (and shown in Figure <figr fid="F7">7</figr> by red diamonds) are ones with predicted binding affinity of at least 2.5 standard deviations below the mean.</p>
<p>Protein I1L was reported to also contain a B-cell epitope and led to the suggestion that B-cell and T-cell epitopes being deterministically linked within the same protein <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. Based on the permuted population phenotype, we predict MHC-I and MHC-II high affinity binding peptides, and multiple B-cell epitopes, affiliated in three CEGs. The predictions for each HLA used in transgenic mice by Pasquetto <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp> were examined. HLA-A*0201 (Figure <figr fid="F7">7A</figr> and at higher resolution in 7C) shows a peak of very high affinity binding for the aa 211-219 peptide RLYDYFTRV, a remarkable 3.95 deviations below the mean. The predicted initial amino acid of this peak binding coincides exactly with the initial arginine in the 9-mer described by Pasquetto <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>. Interestingly, we also predict that HLA-A*0201 mice should detect binding of a similar high affinity starting at amino acid 74. As there are ten B-cell binding regions in the top 25% probability, any one or a combination of these could account for the linked epitope response noted by Sette <it>et al </it>
<abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, however a group of three predicted B-cell epitopes lie within positions 198-233. Figure <figr fid="F7">7B</figr> shows the binding affinities predicted for HLA-A*1101 and HLA-B*0702. There are also high peaks of affinity, but not coincident with those of HLA-A*0201.</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<sec>
<st>
<p>Epitope Analysis System</p>
</st>
<p>We describe an integrated epitope analysis system which is based on multi-dimensional and orthogonal physicochemical properties of sequences of amino acids using a multilayer perceptron neural net to conduct QSAR regression predictions for peptide affinities to 35 MHC-I and 14 MHC-II alleles. The system allows rapid processing of single proteins, entire proteomes or subsets, as well as multiple strains of the same organism. It allows consideration of diversity of both microorganisms and of host immunogenetics.</p>
<p>The program can be used to predict B-cell epitope peptides and MHC-I and MHC-II binding peptides, across strains or unique to one strain of organism, plus spatial and topological correlation of membrane proximity. It predicts peptide affinities for HLA supertypes, heterozygous pairs and population-permuted heterozygous pairs.</p>
<p>The system is built on JMP<sup>&#174; </sup>(and JMP<sup>&#174; </sup>Genomics) data visualization and statistical platform framework and configured to run on a desktop computer and generate graphical and tabular outputs. The predictions can be expanded to other MHC molecules as more MHC training sets become available.</p>
<p>We have tested the system retrospectively against proteins from two organisms for which epitopes have been documented independently by many labs and have used the 'AntiJen' benchmark data set. The approach we describe has performed well for a wide variety of prokaryotic and eukaryotic proteins, including mammalian cellular surface and secreted proteins. The graphical visualization output allows perception of patterns among predicted epitopes heretofore not recognized. Just as GIS have permitted a more integrated view of landscapes and have provided insights which have aided land use and public policy decisions, the layering and integration of all available immunological and topological information offers new insights into organization in the immune response.</p>
</sec>
<sec>
<st>
<p>Epitope Patterns</p>
</st>
<p>Having applied the integrated analysis system to the test data sets, as well as to other proteomes, a number of patterns emerged. Furthermore, the ability to visualize coincident features within proteins brings a new level of insight into possible functional interactions.</p>
<sec>
<st>
<p>Concordance in Binding Affinity on a Peptide Basis Across HLAs</p>
</st>
<p>The initial core observation was that there is long range coincidence of predicted high affinity MHC binding peptides among different HLA's in a population. When a given protein is broken down into sequential peptides (9-mers or 15-mers) and binding is compared against an array of all HLAs, it is possible to map areas of higher and lower predicted binding affinity and there is general concordance between HLAs. Distribution is non-random and areas of high binding can be defined on a population basis. We observed closer positive correlation within MHC-IIs at any given peptide than within MHC-I. This non-random distribution and areas of high binding affinity make it possible to plot a population phenotype binding affinity (using the permuted pairs) and accurately predict peptides which will serve as high affinity MHC binders for the population at large, given that all have the benefit of heterozygous alleles (e.g. in mass vaccination). We have designed the system to permit variation of the stringency in evaluating binding affinity, but have focused on the top 25% binding affinity.</p>
<p>While there is remarkable general coincidence between HLA alleles for the optimal binding positions when viewed as a population, there are subtle - and sometimes not so subtle - differences in highest affinity peptide binding position of alleles when examined singly. These differences between individual HLAs point to associations of HLA with a number of diseases, and in particular the opportunity to construct personalized vaccines with knowledge of the patient's HLAs. It also has implications when isolated peptides are selected as subunit vaccines for a diverse population and for the design of clinical trials representative of HLA alleles. Reassuringly, heterozygosity offers a distinct advantage in T-cell epitope presentation, bringing new meaning to the survival of the "fittest".</p>
</sec>
<sec>
<st>
<p>Regional Correlations Within Proteins</p>
</st>
<p>When the graphical plots for complete proteins are viewed, other patterns emerge. Many predicted MHC-II and MHC-I binding patterns show minima in the same region. Both MHC classes appear to sample the same "epitope structural space" on the protein, although not necessarily at the same time or in the same location in an antigen presenting cell.</p>
<p>Epitope clustering observed by others <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp> appears to take on a more systematic organization when viewed as an integrated map. Our predictive analysis of the multi-species benchmark dataset, and the other organisms we have examined, lead to the conclusion that there are three groups of immunogenic peptides/polypeptides:</p>
<p indent="1">a. Peptides which comprise B-cell epitopes overlapping with or in close proximity (within a few amino acids) to peptides binding with high affinity to MHC-I or MHC-II molecules, which we have called CEGs;</p>
<p indent="1">b. Those containing B-cell epitopes only;</p>
<p indent="1">c. Peptides which bind to MHC-I or MHC-II molecules, and which are not associated with B-cell epitopes.</p>
<p>The first two groups of epitopes (a) and (b), comprising B-cell epitopes, are found external, and to a lesser degree internal to cell membranes. Virtually no predicted B-cell epitopes are mapped within membranes. Group (c) comprising MHC molecules without B-cell epitopes includes some of the highest affinity MHC binding regions located in membranes (Table <tblr tid="T2">2</tblr>).</p>
<p>The associations between predicted MHC binding peptides and predicted B-cell epitopes in CEGs are not a random event. In the <it>Staph. aureus </it>dataset approximately a third of B-cell epitopes have associated MHCs (&gt;20% in AntiJen proteins), and over two-thirds of MHC high affinity binding regions have affiliated B-cell epitopes.</p>
<p>B-cell epitopes without close MHC binding regions could be components of more complex epitopes as the result of folding or positioning in the membrane. Thus what appear as isolated B-cell epitope sequences may actually be physically associated with components of B-cell epitopes in CEGs. Alternatively, they may act alone as T-independent antigens. Staphylococcal protein A is one example of a peptide which functions under some circumstances as a T-independent B-cell antigen <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. Additional File <supplr sid="S3">3</supplr>, Figure S3d shows the remarkable B-cell epitope, but also the pattern of predicted MHC binding peptides in this protein.</p>
<p>We have shown that there is a very significantly higher predicted binding affinity among those MHC-II found within membranes relative to those outside or inside of cells. Some of the highest predicted MHC binding affinity peptides are those located within membranes. Epitopes mapped in proteins A17L and A14L in vaccinia are examples of such peptides.</p>
</sec>
<sec>
<st>
<p>Distribution of Peptide Binding Affinity within a Population</p>
</st>
<p>Binding of peptides to HLA molecules is a competitive process. The distribution of binding affinity determines which peptides are most likely to bind. Distributions of ln(ic50) binding affinities on a protein basis are not normally distributed across alleles (Figure <figr fid="F2">2</figr>). Some alleles approach a normal distribution but many show a bimodal distribution (i.e. peptides tend to be high binders or low binders, rather than medium binders). This characteristic makes standard statistical sampling challenging. Whether or not a protein has a transmembrane domain, some peptides are simply high affinity binders. Standardization makes it possible to establish thresholding criteria based on normally distributed populations and joint-probability distributions.</p>
<p>Using normalized distributions it is possible to compute population phenotypes which effectively "capture" the intra-protein regional correlations among MHC alleles. In practical terms, only high affinity matters, so we developed a scheme for computing a running average minimum over a variable amino acid "window" (Figure <figr fid="F3">3</figr>). In the population patterns using the whole proteins plots (e.g. Figure <figr fid="F6">6</figr>) we have used unweighted means, i.e. we have assumed all HLA alleles to be equally represented in the population. It would be possible to calculate phenotypes for various sub-populations by changing allele weighting. This might be appropriate, for instance, if selecting a vaccine for an ethnically isolated population.</p>
</sec>
<sec>
<st>
<p>B-cell Receptor Prediction Programs</p>
</st>
<p>B-cell predictive programs like BepiPred, and the principal component-based analogue of this we have developed, are likely to simply predict regions with physicochemical characteristics that lead to surface exposure and hence which form sites accessible for immunoglobulin attachment as predicted first by Hopp and Wood, and Parker <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp>. The correlation with the fine mapping of the discontinuous epitopes for <it>Staph. aureus </it>IsdB <abbrgrp>
<abbr bid="B50">51</abbr>
</abbrgrp> provides strong support for the concept that the B-cell epitope predictors identify linear sub-regions of discontinuous epitopes in proteins. We can speculate how, by hypervariable region mutations during antibody maturation, successive mutations might give rise to new molecules binding to additional short regions in the same vicinity and thereby lead to higher binding affinities for the new antibody mutein. By this concept a discontinuous epitope is a logical outcome of stepwise mutations during the maturation process. The reported "under performance" of B-cell epitope predictors <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> may be attributable at least in part to characteristics like those we map in <it>Staph. aureus </it>Protein A (region aa 325 to aa 475) and IsdB (aa 460 to aa 610). This characteristic pattern is seen frequently in proteome-scale overlay comparisons.</p>
</sec>
<sec>
<st>
<p>Comparative Standards</p>
</st>
<p>In evaluating a new predictive analytical system, the key question is how accurately does it predict? Which begs the question "relative to what gold standard?" Given the multidimensionality of the interface between host immunogenetics and pathogen, a "gold standard" has to be specific to the combination of HLA and peptide. Furthermore, measurements of binding affinity which has been derived using peptides in isolation from competition from the rest of the protein/proteome are of limited utility. In light of this we do not believe a standard four quadrant (true pos, false pos, true neg, false neg) scoring system is achievable at present. There are presently few examples in the literature (except perhaps Brown <it>et al </it>
<abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp>) where the mapping of B-cell and T-cell epitopes has been done with sufficient detail to actually fill in the scoring quadrants necessary to compute an AROC. The three different types of epitope patterns mentioned above further complicate any attempts at developing a simple scoring system. Our prediction represents the theoretical maximum at any particular chosen statistical threshold. The higher the threshold stringency the fewer peptides that will meet the criteria.</p>
<p>In practice a number of factors limit the number of peptides which actually serve as epitopes. Timing of expression, or expression only under certain environmental conditions, is one filter <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>. A further major determinant is the rate of proteolysis, first to make the peptides available for binding, and secondly to degrade them into subunits below the threshold of recognition of MHCs. As with epitope binding, enzyme action on the proteins of the organism as a whole is a competitive process. Proteolytic cleavage is a critical process which determines which peptides are available to be bound by MHC molecules and hence displayed on cell surfaces as potential T-cell epitopes, and when such peptides become available or cease to be available due to further digestion. A peptide may have a predicted high binding affinity to an MHC protein, but if it contains a protease cleavage site precluding binding, it may never be presented at the cell surface. This is a factor not yet integrated into the analytical system we describe primarily because the cleavage training sets one would need to produce reliable neural net predictions for the relevant proteases are not available.</p>
</sec>
</sec>
<sec>
<st>
<p>Functional Implications of Patterns Observed</p>
</st>
<p>B-cell recognition gives rise to an antibody response. MHC-peptide binding is an intermediate step to presentation of MHC-peptide complexes for T cell recognition. We can speculate on whether a CEG represents a functional as well as a physical association. Given the consistency of the pattern, a functional association appears likely in which B-cell binding leads to uptake of the adjacent, overlapping or identical peptide to yield high affinity MHC binding peptides (via MHC-I or MHC-II), and, once bound, the peptide-MHC complex can lead to a productive T-cell response.</p>
<p>When viewed in the light of the observations of Batista of B-cell attachment and "pinching off" of surface segments during the formation of an immune synapse <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>, one can envision how this physical proximity might facilitate the internalization of a peptide with a competitive advantage as a high affinity MHC binder. The dual presentation of B and T-cell epitopes would be consistent with the B-cell-T-cell interaction proposed by Lanzavecchia <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. Similarly, preservation of B-cell epitopes by dendritic cells <abbrgrp>
<abbr bid="B57">57</abbr>
<abbr bid="B58">58</abbr>
<abbr bid="B59">59</abbr>
</abbrgrp> would lead to the preservation and delivery to B-cells, not only of B-cell epitopes but also of overlapping MHC binding peptides.</p>
<p>The concept of specificity runs throughout the immunology literature. Generally specificity implies an absolute lock and key relationship. Rather, we see a pattern consistent with the dynamic competition for higher affinity MHC binding positions within the constraints of each set of unique host-pathogen interactions. Confronted with peptides from a different protein or different organism, the competition among peptides for binding to the same MHC molecules would simply be expanded. However, as affinity may range over binding constants of 1000 fold or more, there is a marked difference between high binders and weak binders. This is not inconsistent with a degeneracy in the binding specificity of peptides to MHC molecules. What matters is simply the ability of an MHC molecule to bind with high affinity a peptide representing the microorganism (or other immunogen) presented at that moment. The same MHC could just as well bind a peptide from another unrelated immunogen, not present in the same cell at that time. Given this multi step process, the effective "specificity" of the adaptive immune response is the product of multiple sequential binding affinities - {B-cell epitope binding }x {MHC binding} x{T-cell receptor binding to MHC-peptide complex}. No one of these steps needs to confer complete "specificity" but the combination increases the uniqueness or specificity of the antigen-immune response, essentially as a combination lock.</p>
<p>We do not speculate on the downstream implications of T-cell epitope binding, but note that T-cell stimulation may have both up and down-regulation effects and both positive and negative cytokine mediated feedback loops. Whether MHC binding regions located in CEGs have different roles from those MHC binding regions which are unrelated to B-cell epitopes is unknown. High affinity MHC binding regions located in membranes may be hidden until internalized and released by proteolysis, as proposed by Benacerraf <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. This could occur through internalization of whole organisms or when such peptide fragments are "towed along" by internalization of adjacent tethered peptides.</p>
<p>It is of interest that we predict that a large percentage of predicted MHC-I high affinity binding peptides are coincident with those which bind MHC-II well. This implies that the precise intracellular pathway, and hence which proteolytic machinery the peptide encounters, may allow such peptides to stimulate T<sub>CD8 </sub>or T<sub>CD4</sub>, simultaneously or sequentially. Others have noted the need to understand the degree of overlap in the peptidome leading to stimulation of each pathway <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>.</p>
<p>Given the apparent frequent overlap of B-cell epitopes with MHC binding regions, we suspect the literature contains observations on many peptide epitopes where, depending on what the experiment was designed to observe, function as either B-cell or T-cell epitopes has been described. Vaughan <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>, in reviewing the literature on epitope characterization for <it>Plasmodium</it>, noted that 14% epitopes characterized by some workers as B-cell epitopes are reported by others to be T-cell epitopes, a percentage not dissimilar from that seen in the surfome of <it>Staph. aureus </it>and the AntiJen dataset.</p>
<p>At higher resolution, the fine structure of the diverse binding patterns of different HLAs shown (Figure <figr fid="F7">7</figr>) in vaccinia (and observed in other proteins; unpublished data) may shed light on the concept of immunodominance, in which immune responses by an individual are directed to a few peptides <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. At one level it underscores the utility of HLA transgenic mice as indicators of immunodominance (binding affinity) for humans <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>, while also calling into question extrapolation of peptide level MHC binding evaluations conducted in murine inbred strains and raising questions about the need to reflect immunogenetic diversity in clinical trials. Taking a broader perspective, if each HLA shows highest binding to a very narrow peptide sequence in given region of a protein (immunodominant peptide) it may represent a risk of microbial escape mutants <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. However, because other HLAs bind to different adjacent peptides with high affinity, heterozygosity provides a backstop. Viewed in an immunogenetically diverse population, it provides a possible survival strategy in which any one escape mutant does not threaten an entire host population.</p>
<p>Overall, the predicted patterns which emerge from mapping imply yet greater coordination and organization of B-cell and T-cell responses that have heretofore been recorded. We do not underestimate the gap between prediction and experimental testing; however the ability to envision hypothetical functional interactions is a necessary precursor to design of experimentation.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We present a predictive bioinformatics model which provides a means of rapidly analyzing whole proteins or proteomes and predicting a large number of B-cell peptide epitopes and high affinity MHC binding regions indicative of T-cell epitopes. The model also permits correlation of peptide epitopes with topological features. A visualization system of graphical overlays enables ready appreciation of the potential interplay of the features identified. The model has broad applicability to a wide array of proteins and demonstrated its performance with several well documented proteins.</p>
<p>Patterns seen in examination of a protein dataset derived from many organisms and with <it>Staph. aureus </it>and vaccinia show a consistent pattern of frequent coincidence of B-cell epitopes with MHC high affinity binding regions, suggesting that the physical proximity of B-cell epitopes to peptides with high affinity for MHC-I and MHC-II may be the norm, and of functional significance. This hypothesis remains to be further tested experimentally.</p>
<p>The data presented here for <it>Staph. aureus</it>, vaccinia, and the proteins in the AntiJen data set are of course a retrospective look at experimentally mapped epitopes with a view to validation of the integrated analysis system we have developed. We have processed a number of other proteomes and the challenge now begins as to how best to put the system to work as a prospective tool to support vaccine and antibody design and to provide better understanding of the immune response.</p>
</sec>
<sec>
<st>
<p>List of Abbreviations</p>
</st>
<p>AROC: Area under the receiver operator characteristic curve; CBS: Center for Biological Sequence Analysis; CEG: Coincident Epitope Group; GIS: Geographic Information Systems; ic50: Inhibitory concentration 50%; IEDB: Immune Epitope Database; NIPS: Near identical protein sequence; QSAR: Quantitative Structure Activity Relationships; SOM: Self-organizing map.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>Drs. Bremel and Homan are founding scientists and employees of ioGenetics LLC. Patent applications have been filed on components of the technology described herein.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>RDB designed and performed computational analysis. Both authors conceived the study design, contributed to its execution, and drafted the manuscript. Both authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>The authors thank Drs. Michael Imboden and Gary Splitter for their review and helpful comments. They also thank the anonymous reviewers of Immunome Research for their comments.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Binding interactions between peptides and proteins of the class II major histocompatibility complex</p></title><aug><au><snm>McFarland</snm><fnm>BJ</fnm></au><au><snm>Beeson</snm><fnm>C</fnm></au></aug><source>Med Res Rev</source><pubdate>2002</pubdate><volume>22</volume><fpage>168</fpage><lpage>203</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/med.10006</pubid><pubid idtype="pmpid" link="fulltext">11857638</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Peptide binding to MHC class I and II proteins: new avenues from new methods</p></title><aug><au><snm>Yaneva</snm><fnm>R</fnm></au><au><snm>Schneeweiss</snm><fnm>C</fnm></au><au><snm>Zacharias</snm><fnm>M</fnm></au><au><snm>Springer</snm><fnm>S</fnm></au></aug><source>Mol Immunol</source><pubdate>2010</pubdate><volume>47</volume><fpage>649</fpage><lpage>657</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molimm.2009.10.008</pubid><pubid idtype="pmpid" link="fulltext">19910050</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>A hypothesis to relate the specificity of T lymphocytes and the activity of I region-specific Ir genes in macrophages and B lymphocytes</p></title><aug><au><snm>Benacerraf</snm><fnm>B</fnm></au></aug><source>J Immunol</source><pubdate>1978</pubdate><volume>120</volume><fpage>1809</fpage><lpage>1812</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">77879</pubid></xrefbib></bibl><bibl id="B4"><title><p>Determinant selection is a macrophage dependent immune response gene function</p></title><aug><au><snm>Rosenthal</snm><fnm>AS</fnm></au><au><snm>Barcinski</snm><fnm>MA</fnm></au><au><snm>Blake</snm><fnm>JT</fnm></au></aug><source>Nature</source><pubdate>1977</pubdate><volume>267</volume><fpage>156</fpage><lpage>158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/267156a0</pubid><pubid idtype="pmpid">16073428</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Relative contribution of "determinant selection" and "holes in the T-cell repertoire" to T-cell responses</p></title><aug><au><snm>Schaeffer</snm><fnm>EB</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au><au><snm>Johnson</snm><fnm>DL</fnm></au><au><snm>Bekoff</snm><fnm>MC</fnm></au><au><snm>Smith</snm><fnm>JA</fnm></au><au><snm>Grey</snm><fnm>HM</fnm></au><au><snm>Buus</snm><fnm>S</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1989</pubdate><volume>86</volume><fpage>4649</fpage><lpage>4653</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.86.12.4649</pubid><pubid idtype="pmcid">287328</pubid><pubid idtype="pmpid">2471972</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Antigen-specific interaction between T and B cells</p></title><aug><au><snm>Lanzavecchia</snm><fnm>A</fnm></au></aug><source>Nature</source><pubdate>1985</pubdate><volume>314</volume><fpage>537</fpage><lpage>539</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/314537a0</pubid><pubid idtype="pmpid">3157869</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Selective CD4+ T cell help for antibody responses to a large viral pathogen: deterministic linkage of specificities</p></title><aug><au><snm>Sette</snm><fnm>A</fnm></au><au><snm>Moutaftsi</snm><fnm>M</fnm></au><au><snm>Moyron-Quiroz</snm><fnm>J</fnm></au><au><snm>McCausland</snm><fnm>MM</fnm></au><au><snm>Davies</snm><fnm>DH</fnm></au><au><snm>Johnston</snm><fnm>RJ</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Rafii-El-Idrissi</snm><fnm>BM</fnm></au><au><snm>Hoffmann</snm><fnm>J</fnm></au><au><snm>Su</snm><fnm>HP</fnm></au><au><snm>Singh</snm><fnm>K</fnm></au><au><snm>Garboczi</snm><fnm>DN</fnm></au><au><snm>Head</snm><fnm>S</fnm></au><au><snm>Grey</snm><fnm>H</fnm></au><au><snm>Felgner</snm><fnm>PL</fnm></au><au><snm>Crotty</snm><fnm>S</fnm></au></aug><source>Immunity</source><pubdate>2008</pubdate><volume>28</volume><fpage>847</fpage><lpage>858</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.immuni.2008.04.018</pubid><pubid idtype="pmcid">2504733</pubid><pubid idtype="pmpid">18549802</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>A very high level of crossreactivity is an essential feature of the T-cell receptor</p></title><aug><au><snm>Mason</snm><fnm>D</fnm></au></aug><source>Immunol Today</source><pubdate>1998</pubdate><volume>19</volume><fpage>395</fpage><lpage>404</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0167-5699(98)01299-7</pubid><pubid idtype="pmpid" link="fulltext">9745202</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Polyspecificity of T cell and B cell receptor recognition</p></title><aug><au><snm>Wucherpfennig</snm><fnm>KW</fnm></au><au><snm>Allen</snm><fnm>PM</fnm></au><au><snm>Celada</snm><fnm>F</fnm></au><au><snm>Cohen</snm><fnm>IR</fnm></au><au><snm>De</snm><fnm>BR</fnm></au><au><snm>Garcia</snm><fnm>KC</fnm></au><au><snm>Goldstein</snm><fnm>B</fnm></au><au><snm>Greenspan</snm><fnm>R</fnm></au><au><snm>Hafler</snm><fnm>D</fnm></au><au><snm>Hodgkin</snm><fnm>P</fnm></au><au><snm>Huseby</snm><fnm>ES</fnm></au><au><snm>Krakauer</snm><fnm>DC</fnm></au><au><snm>Nemazee</snm><fnm>D</fnm></au><au><snm>Perelson</snm><fnm>AS</fnm></au><au><snm>Pinilla</snm><fnm>C</fnm></au><au><snm>Strong</snm><fnm>RK</fnm></au><au><snm>Sercarz</snm><fnm>EE</fnm></au></aug><source>Semin Immunol</source><pubdate>2007</pubdate><volume>19</volume><fpage>216</fpage><lpage>224</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.smim.2007.02.012</pubid><pubid idtype="pmcid">2034306</pubid><pubid idtype="pmpid">17398114</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Antigen processing via autophagy--not only for MHC class II presentation anymore?</p></title><aug><au><snm>Munz</snm><fnm>C</fnm></au></aug><source>Curr Opin Immunol</source><pubdate>2010</pubdate><volume>22</volume><fpage>89</fpage><lpage>93</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.coi.2010.01.016</pubid><pubid idtype="pmpid" link="fulltext">20149615</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Uncovering the interplay between CD8, CD4 and antibody responses to complex pathogens</p></title><aug><au><snm>Moutaftsi</snm><fnm>M</fnm></au><au><snm>Tscharke</snm><fnm>DC</fnm></au><au><snm>Vaughan</snm><fnm>K</fnm></au><au><snm>Koelle</snm><fnm>DM</fnm></au><au><snm>Stern</snm><fnm>L</fnm></au><au><snm>Calvo-Calle</snm><fnm>M</fnm></au><au><snm>Ennis</snm><fnm>F</fnm></au><au><snm>Terajima</snm><fnm>M</fnm></au><au><snm>Sutter</snm><fnm>G</fnm></au><au><snm>Crotty</snm><fnm>S</fnm></au><au><snm>Drexler</snm><fnm>I</fnm></au><au><snm>Franchini</snm><fnm>G</fnm></au><au><snm>Yewdell</snm><fnm>JW</fnm></au><au><snm>Head</snm><fnm>SR</fnm></au><au><snm>Blum</snm><fnm>J</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>Future Microbiol</source><pubdate>2010</pubdate><volume>5</volume><fpage>221</fpage><lpage>239</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2217/fmb.09.110</pubid><pubid idtype="pmpid" link="fulltext">20143946</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Vaccination of cytotoxic T lymphocyte-directed peptides elicited and spread humoral and Th1-type immune responses to prostate-specific antigen protein in a prostate cancer patient</p></title><aug><au><snm>Harada</snm><fnm>M</fnm></au><au><snm>Matsueda</snm><fnm>S</fnm></au><au><snm>Yao</snm><fnm>A</fnm></au><au><snm>Noguchi</snm><fnm>M</fnm></au><au><snm>Itoh</snm><fnm>K</fnm></au></aug><source>J Immunother</source><pubdate>2005</pubdate><volume>28</volume><fpage>368</fpage><lpage>375</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1097/01.cji.0000165359.05710.d7</pubid><pubid idtype="pmpid" link="fulltext">16000955</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Recognition of prostate-specific antigenic peptide determinants by human CD4 and CD8 T cells</p></title><aug><au><snm>Corman</snm><fnm>JM</fnm></au><au><snm>Sercarz</snm><fnm>EE</fnm></au><au><snm>Nanda</snm><fnm>NK</fnm></au></aug><source>Clin Exp Immunol</source><pubdate>1998</pubdate><volume>114</volume><fpage>166</fpage><lpage>172</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2249.1998.00678.x</pubid><pubid idtype="pmcid">1905118</pubid><pubid idtype="pmpid">9822272</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>What is a B-cell epitope?</p></title><aug><au><snm>Van Regenmortel</snm><fnm>MH</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2009</pubdate><volume>524</volume><fpage>3</fpage><lpage>20</lpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid" link="fulltext">19377933</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>B cells acquire antigen from target cells after synapse formation</p></title><aug><au><snm>Batista</snm><fnm>FD</fnm></au><au><snm>Iber</snm><fnm>D</fnm></au><au><snm>Neuberger</snm><fnm>MS</fnm></au></aug><source>Nature</source><pubdate>2001</pubdate><volume>411</volume><fpage>489</fpage><lpage>494</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35078099</pubid><pubid idtype="pmpid" link="fulltext">11373683</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Staphylococcus aureus protein A triggers T cell-independent B cell proliferation by sensitizing B cells for TLR2 ligands</p></title><aug><au><snm>Bekeredjian-Ding</snm><fnm>I</fnm></au><au><snm>Inamura</snm><fnm>S</fnm></au><au><snm>Giese</snm><fnm>T</fnm></au><au><snm>Moll</snm><fnm>H</fnm></au><au><snm>Endres</snm><fnm>S</fnm></au><au><snm>Sing</snm><fnm>A</fnm></au><au><snm>Zahringer</snm><fnm>U</fnm></au><au><snm>Hartmann</snm><fnm>G</fnm></au></aug><source>J Immunol</source><pubdate>2007</pubdate><volume>178</volume><fpage>2803</fpage><lpage>2812</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17312124</pubid></xrefbib></bibl><bibl id="B17"><title><p>Bridging the knowledge gaps in vaccine design</p></title><aug><au><snm>Rappuoli</snm><fnm>R</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2007</pubdate><volume>25</volume><fpage>1361</fpage><lpage>1366</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1207-1361</pubid><pubid idtype="pmpid" link="fulltext">18066025</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>B cell responses to a peptide epitope. VII. Antigen-dependent modulation of the germinal center reaction</p></title><aug><au><snm>Agarwal</snm><fnm>A</fnm></au><au><snm>Nayak</snm><fnm>BP</fnm></au><au><snm>Rao</snm><fnm>KV</fnm></au></aug><source>J Immunol</source><pubdate>1998</pubdate><volume>161</volume><fpage>5832</fpage><lpage>5841</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9834061</pubid></xrefbib></bibl><bibl id="B19"><title><p>B cell responses to a peptide epitope: III. Differential T helper cell thresholds in recruitment of B cell fine specificities</p></title><aug><au><snm>Agarwal</snm><fnm>A</fnm></au><au><snm>Rao</snm><fnm>KV</fnm></au></aug><source>J Immunol</source><pubdate>1997</pubdate><volume>159</volume><fpage>1077</fpage><lpage>1085</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9233600</pubid></xrefbib></bibl><bibl id="B20"><title><p>Immunomics: discovering new targets for vaccines and therapeutics</p></title><aug><au><snm>De Groot</snm><fnm>AS</fnm></au></aug><source>Drug Discov Today</source><pubdate>2006</pubdate><volume>11</volume><fpage>203</fpage><lpage>209</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1359-6446(05)03720-7</pubid><pubid idtype="pmpid" link="fulltext">16580597</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing</p></title><aug><au><snm>Pizza</snm><fnm>M</fnm></au><au><snm>Scarlato</snm><fnm>V</fnm></au><au><snm>Masignani</snm><fnm>V</fnm></au><au><snm>Giuliani</snm><fnm>MM</fnm></au><au><snm>Arico</snm><fnm>B</fnm></au><au><snm>Comanducci</snm><fnm>M</fnm></au><au><snm>Jennings</snm><fnm>GT</fnm></au><au><snm>Baldi</snm><fnm>L</fnm></au><au><snm>Bartolini</snm><fnm>E</fnm></au><au><snm>Capecchi</snm><fnm>B</fnm></au><au><snm>Galeotti</snm><fnm>CL</fnm></au><au><snm>Luzzi</snm><fnm>E</fnm></au><au><snm>Manetti</snm><fnm>R</fnm></au><au><snm>Marchetti</snm><fnm>E</fnm></au><au><snm>Mora</snm><fnm>M</fnm></au><au><snm>Nuti</snm><fnm>S</fnm></au><au><snm>Ratti</snm><fnm>G</fnm></au><au><snm>Santini</snm><fnm>L</fnm></au><au><snm>Savino</snm><fnm>S</fnm></au><au><snm>Scarselli</snm><fnm>M</fnm></au><au><snm>Storni</snm><fnm>E</fnm></au><au><snm>Zuo</snm><fnm>P</fnm></au><au><snm>Broeker</snm><fnm>M</fnm></au><au><snm>Hundt</snm><fnm>E</fnm></au><au><snm>Knapp</snm><fnm>B</fnm></au><au><snm>Blair</snm><fnm>E</fnm></au><au><snm>Mason</snm><fnm>T</fnm></au><au><snm>Tettelin</snm><fnm>H</fnm></au><au><snm>Hood</snm><fnm>DW</fnm></au><au><snm>Jeffries</snm><fnm>AC</fnm></au><au><snm>Saunders</snm><fnm>NJ</fnm></au><au><snm>Granoff</snm><fnm>DM</fnm></au><au><snm>Venter</snm><fnm>JC</fnm></au><au><snm>Moxon</snm><fnm>ER</fnm></au><au><snm>Grandi</snm><fnm>G</fnm></au><au><snm>Rappuoli</snm><fnm>R</fnm></au></aug><source>Science</source><pubdate>2000</pubdate><volume>287</volume><fpage>1816</fpage><lpage>1820</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.287.5459.1816</pubid><pubid idtype="pmpid" link="fulltext">10710308</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Presentation of tumour antigens by dendritic cells and challenges faced</p></title><aug><au><snm>Robson</snm><fnm>NC</fnm></au><au><snm>Hoves</snm><fnm>S</fnm></au><au><snm>Maraskovsky</snm><fnm>E</fnm></au><au><snm>Schnurr</snm><fnm>M</fnm></au></aug><source>Curr Opin Immunol</source><pubdate>2010</pubdate><volume>22</volume><fpage>137</fpage><lpage>144</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.coi.2010.01.002</pubid><pubid idtype="pmpid" link="fulltext">20116984</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Processing and presentation of tumor antigens and vaccination strategies</p></title><aug><au><snm>Van der Bruggen</snm><fnm>P</fnm></au><au><snm>Van den Eynde</snm><fnm>BJ</fnm></au></aug><source>Curr Opin Immunol</source><pubdate>2006</pubdate><volume>18</volume><fpage>98</fpage><lpage>104</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.coi.2005.11.013</pubid><pubid idtype="pmpid" link="fulltext">16343880</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Prediction of protein antigenic determinants from amino acid sequences</p></title><aug><au><snm>Hopp</snm><fnm>TP</fnm></au><au><snm>Woods</snm><fnm>KR</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1981</pubdate><volume>78</volume><fpage>3824</fpage><lpage>3828</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.78.6.3824</pubid><pubid idtype="pmcid">319665</pubid><pubid idtype="pmpid">6167991</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites</p></title><aug><au><snm>Parker</snm><fnm>JM</fnm></au><au><snm>Guo</snm><fnm>D</fnm></au><au><snm>Hodges</snm><fnm>RS</fnm></au></aug><source>Biochemistry</source><pubdate>1986</pubdate><volume>25</volume><fpage>5425</fpage><lpage>5432</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/bi00367a013</pubid><pubid idtype="pmpid">2430611</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Benchmarking B cell epitope prediction: underperformance of existing methods</p></title><aug><au><snm>Blythe</snm><fnm>MJ</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>Protein Sci</source><pubdate>2005</pubdate><volume>14</volume><fpage>246</fpage><lpage>248</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1110/ps.041059505</pubid><pubid idtype="pmcid">2253337</pubid><pubid idtype="pmpid">15576553</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Prediction of linear B-cell epitopes</p></title><aug><au><snm>Davydov</snm><fnm>YI</fnm></au><au><snm>Tonevitsky</snm><fnm>AG</fnm></au></aug><source>Molecular Biology</source><pubdate>2009</pubdate><volume>43</volume><fpage>150</fpage><lpage>153</lpage><xrefbib><pubid idtype="doi">10.1134/S0026893309010208</pubid></xrefbib></bibl><bibl id="B28"><title><p>Improved method for predicting linear B-cell epitopes</p></title><aug><au><snm>Larsen</snm><fnm>JE</fnm></au><au><snm>Lund</snm><fnm>O</fnm></au><au><snm>Nielsen</snm><fnm>M</fnm></au></aug><source>Immunome Res</source><pubdate>2006</pubdate><volume>2</volume><fpage>2</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1745-7580-2-2</pubid><pubid idtype="pmcid">1479323</pubid><pubid idtype="pmpid">16635264</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>An integrated approach to epitope analysis I: Dimensional reduction, visualization and prediction of MHC binding using amino acid principal components and regression approaches.</p></title><aug><au><snm>Bremel</snm><fnm>RD</fnm></au><au><snm>Homan</snm><fnm>EJ</fnm></au></aug><source>Immunome Res</source><pubdate>2010</pubdate><volume>6</volume><issue>1</issue><xrefbib><pubid idtype="pmpid" link="fulltext">21044289</pubid></xrefbib></bibl><bibl id="B30"><title><p>Use of hydrophilicity plotting procedures to identify protein antigenic segments and other interaction sites</p></title><aug><au><snm>Hopp</snm><fnm>TP</fnm></au></aug><source>Methods Enzymol</source><pubdate>1989</pubdate><volume>178</volume><fpage>571</fpage><lpage>585</lpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid">2481215</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>From genome to vaccine--new immunoinformatics tools for vaccine design</p></title><aug><au><snm>De Groot</snm><fnm>AS</fnm></au><au><snm>Berzofsky</snm><fnm>JA</fnm></au></aug><source>Methods</source><pubdate>2004</pubdate><volume>34</volume><fpage>425</fpage><lpage>428</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ymeth.2004.06.004</pubid><pubid idtype="pmpid" link="fulltext">15542367</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>QSAR and the Prediction of T-Cell Epitopes</p></title><aug><au><snm>Doytchinova</snm><fnm>IA</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>Curr Proteomics</source><pubdate>2008</pubdate><volume>5</volume><fpage>73</fpage><lpage>95</lpage><xrefbib><pubid idtype="doi">10.2174/157016408784911945</pubid></xrefbib></bibl><bibl id="B33"><title><p>Prediction of MHC-peptide binding: a systematic and comprehensive overview</p></title><aug><au><snm>Lafuente</snm><fnm>EM</fnm></au><au><snm>Reche</snm><fnm>PA</fnm></au></aug><source>Curr Pharm Des</source><pubdate>2009</pubdate><volume>15</volume><fpage>3209</fpage><lpage>3220</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2174/138161209789105162</pubid><pubid idtype="pmpid">19860671</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Toward prediction of binding affinities between the MHC protein and its peptide ligands using quantitative structure-affinity relationship approach</p></title><aug><au><snm>Tian</snm><fnm>F</fnm></au><au><snm>Lv</snm><fnm>F</fnm></au><au><snm>Zhou</snm><fnm>P</fnm></au><au><snm>Yang</snm><fnm>Q</fnm></au><au><snm>Jalbout</snm><fnm>AF</fnm></au></aug><source>Protein Pept Lett</source><pubdate>2008</pubdate><volume>15</volume><fpage>1033</fpage><lpage>1043</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2174/092986608786071120</pubid><pubid idtype="pmpid">19075812</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Static energy analysis of MHC class I and class II peptide-binding affinity</p></title><aug><au><snm>Davies</snm><fnm>MN</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2007</pubdate><volume>409</volume><fpage>309</fpage><lpage>320</lpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid">18450011</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>QSAR method for prediction of protein-peptide binding affinity: application to MHC class I molecule HLA-A*0201</p></title><aug><au><snm>Zhao</snm><fnm>C</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au><au><snm>Luan</snm><fnm>F</fnm></au><au><snm>Zhang</snm><fnm>R</fnm></au><au><snm>Liu</snm><fnm>M</fnm></au><au><snm>Hu</snm><fnm>Z</fnm></au><au><snm>Fan</snm><fnm>B</fnm></au></aug><source>J Mol Graph Model</source><pubdate>2007</pubdate><volume>26</volume><fpage>246</fpage><lpage>254</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jmgm.2006.12.002</pubid><pubid idtype="pmpid" link="fulltext">17275373</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data</p></title><aug><au><snm>Toseland</snm><fnm>CP</fnm></au><au><snm>Clayton</snm><fnm>DJ</fnm></au><au><snm>McSparron</snm><fnm>H</fnm></au><au><snm>Hemsley</snm><fnm>SL</fnm></au><au><snm>Blythe</snm><fnm>MJ</fnm></au><au><snm>Paine</snm><fnm>K</fnm></au><au><snm>Doytchinova</snm><fnm>IA</fnm></au><au><snm>Guan</snm><fnm>P</fnm></au><au><snm>Hattotuwagama</snm><fnm>CK</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>Immunome Res</source><pubdate>2005</pubdate><volume>1</volume><fpage>4</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1745-7580-1-4</pubid><pubid idtype="pmcid">1289288</pubid><pubid idtype="pmpid">16305757</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>JenPep: a database of quantitative functional peptide data for immunology</p></title><aug><au><snm>Blythe</snm><fnm>MJ</fnm></au><au><snm>Doytchinova</snm><fnm>IA</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><fpage>434</fpage><lpage>439</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.3.434</pubid><pubid idtype="pmpid" link="fulltext">11934742</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>JenPep: a novel computational information resource for immunobiology and vaccinology</p></title><aug><au><snm>McSparron</snm><fnm>H</fnm></au><au><snm>Blythe</snm><fnm>MJ</fnm></au><au><snm>Zygouri</snm><fnm>C</fnm></au><au><snm>Doytchinova</snm><fnm>IA</fnm></au><au><snm>Flower</snm><fnm>DR</fnm></au></aug><source>J Chem Inf Comput Sci</source><pubdate>2003</pubdate><volume>43</volume><fpage>1276</fpage><lpage>1287</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12870921</pubid></xrefbib></bibl><bibl id="B40"><title><p>A consensus epitope prediction approach identifies the breadth of murine T(CD8+)-cell responses to vaccinia virus</p></title><aug><au><snm>Moutaftsi</snm><fnm>M</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Pasquetto</snm><fnm>V</fnm></au><au><snm>Tscharke</snm><fnm>DC</fnm></au><au><snm>Sidney</snm><fnm>J</fnm></au><au><snm>Bui</snm><fnm>HH</fnm></au><au><snm>Grey</snm><fnm>H</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2006</pubdate><volume>24</volume><fpage>817</fpage><lpage>819</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1215</pubid><pubid idtype="pmpid" link="fulltext">16767078</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Vaccinia virus-specific CD4+ T cell responses target a set of antigens largely distinct from those targeted by CD8+ T cell responses</p></title><aug><au><snm>Moutaftsi</snm><fnm>M</fnm></au><au><snm>Bui</snm><fnm>HH</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Sidney</snm><fnm>J</fnm></au><au><snm>Salek-Ardakani</snm><fnm>S</fnm></au><au><snm>Oseroff</snm><fnm>C</fnm></au><au><snm>Pasquetto</snm><fnm>V</fnm></au><au><snm>Crotty</snm><fnm>S</fnm></au><au><snm>Croft</snm><fnm>M</fnm></au><au><snm>Lefkowitz</snm><fnm>EJ</fnm></au><au><snm>Grey</snm><fnm>H</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>J Immunol</source><pubdate>2007</pubdate><volume>178</volume><fpage>6814</fpage><lpage>6820</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17513729</pubid></xrefbib></bibl><bibl id="B42"><title><p>T-Cell epitope discovery for variola and vaccinia viruses</p></title><aug><au><snm>Kennedy</snm><fnm>R</fnm></au><au><snm>Poland</snm><fnm>GA</fnm></au></aug><source>Rev Med Virol</source><pubdate>2007</pubdate><volume>17</volume><fpage>93</fpage><lpage>113</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/rmv.527</pubid><pubid idtype="pmpid" link="fulltext">17195963</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Peptide studies by means of principal properties of amino acids derived from MIF descriptors</p></title><aug><au><snm>Cruciani</snm><fnm>G</fnm></au><au><snm>Baroni</snm><fnm>M</fnm></au><au><snm>Carosati</snm><fnm>E</fnm></au><au><snm>Clementi</snm><fnm>M</fnm></au><au><snm>Valigi</snm><fnm>R</fnm></au><au><snm>Clementi</snm><fnm>S</fnm></au></aug><source>Journal of Chemometrics</source><pubdate>2004</pubdate><volume>18</volume><fpage>146</fpage><lpage>155</lpage><xrefbib><pubid idtype="doi">10.1002/cem.856</pubid></xrefbib></bibl><bibl id="B44"><aug><au><snm>Eriksson</snm><fnm>L</fnm></au><au><snm>Johansson</snm><fnm>E</fnm></au><au><snm>Kettaneh-Wold</snm><fnm>N</fnm></au><au><snm>Trygg</snm><fnm>J</fnm></au><au><snm>Wikstrom</snm><fnm>C</fnm></au><au><snm>Wold</snm><fnm>S</fnm></au></aug><source>Multi and Megavariate Data Analysis. Part I: Basic Principles and Applications</source><publisher>Umetrics Academy, Umea, Sweden</publisher><edition>2</edition><pubdate>2006</pubdate></bibl><bibl id="B45"><title><p>A combined transmembrane topology and signal peptide prediction method</p></title><aug><au><snm>Kall</snm><fnm>L</fnm></au><au><snm>Krogh</snm><fnm>A</fnm></au><au><snm>Sonnhammer</snm><fnm>EL</fnm></au></aug><source>J Mol Biol</source><pubdate>2004</pubdate><volume>338</volume><fpage>1027</fpage><lpage>1036</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jmb.2004.03.016</pubid><pubid idtype="pmpid" link="fulltext">15111065</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Transmembrane topology and signal peptide prediction using dynamic bayesian networks</p></title><aug><au><snm>Reynolds</snm><fnm>SM</fnm></au><au><snm>Kall</snm><fnm>L</fnm></au><au><snm>Riffle</snm><fnm>ME</fnm></au><au><snm>Bilmes</snm><fnm>JA</fnm></au><au><snm>Noble</snm><fnm>WS</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2008</pubdate><volume>4</volume><fpage>e1000213</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000213</pubid><pubid idtype="pmcid">2570248</pubid><pubid idtype="pmpid">18989393</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Improving the accuracy of transmembrane protein topology prediction using evolutionary information</p></title><aug><au><snm>Jones</snm><fnm>DT</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><fpage>538</fpage><lpage>544</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl677</pubid><pubid idtype="pmpid" link="fulltext">17237066</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes</p></title><aug><au><snm>Krogh</snm><fnm>A</fnm></au><au><snm>Larsson</snm><fnm>B</fnm></au><au><snm>von</snm><fnm>HG</fnm></au><au><snm>Sonnhammer</snm><fnm>EL</fnm></au></aug><source>J Mol Biol</source><pubdate>2001</pubdate><volume>305</volume><fpage>567</fpage><lpage>580</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.2000.4315</pubid><pubid idtype="pmpid" link="fulltext">11152613</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Smoothing and differentiation of data by simplified least squares procedures</p></title><aug><au><snm>Savitzky</snm><fnm>A</fnm></au><au><snm>Golay</snm><fnm>MJE</fnm></au></aug><source>Anal Chem</source><pubdate>1964</pubdate><volume>36</volume><fpage>1627</fpage><lpage>1639</lpage><xrefbib><pubid idtype="doi">10.1021/ac60214a047</pubid></xrefbib></bibl><bibl id="B50"><title><p>A novel Staphylococcus aureus vaccine: iron surface determinant B induces rapid antibody responses in rhesus macaques and specific increased survival in a murine S. aureus sepsis model</p></title><aug><au><snm>Kuklin</snm><fnm>NA</fnm></au><au><snm>Clark</snm><fnm>DJ</fnm></au><au><snm>Secore</snm><fnm>S</fnm></au><au><snm>Cook</snm><fnm>J</fnm></au><au><snm>Cope</snm><fnm>LD</fnm></au><au><snm>McNeely</snm><fnm>T</fnm></au><au><snm>Noble</snm><fnm>L</fnm></au><au><snm>Brown</snm><fnm>MJ</fnm></au><au><snm>Zorman</snm><fnm>JK</fnm></au><au><snm>Wang</snm><fnm>XM</fnm></au><au><snm>Pancari</snm><fnm>G</fnm></au><au><snm>Fan</snm><fnm>H</fnm></au><au><snm>Isett</snm><fnm>K</fnm></au><au><snm>Burgess</snm><fnm>B</fnm></au><au><snm>Bryan</snm><fnm>J</fnm></au><au><snm>Brownlow</snm><fnm>M</fnm></au><au><snm>George</snm><fnm>H</fnm></au><au><snm>Meinz</snm><fnm>M</fnm></au><au><snm>Liddell</snm><fnm>ME</fnm></au><au><snm>Kelly</snm><fnm>R</fnm></au><au><snm>Schultz</snm><fnm>L</fnm></au><au><snm>Montgomery</snm><fnm>D</fnm></au><au><snm>Onishi</snm><fnm>J</fnm></au><au><snm>Losada</snm><fnm>M</fnm></au><au><snm>Martin</snm><fnm>M</fnm></au><au><snm>Ebert</snm><fnm>T</fnm></au><au><snm>Tan</snm><fnm>CY</fnm></au><au><snm>Schofield</snm><fnm>TL</fnm></au><au><snm>Nagy</snm><fnm>E</fnm></au><au><snm>Meineke</snm><fnm>A</fnm></au><au><snm>Joyce</snm><fnm>JG</fnm></au><au><snm>Kurtz</snm><fnm>MB</fnm></au><au><snm>Caulfield</snm><fnm>MJ</fnm></au><au><snm>Jansen</snm><fnm>KU</fnm></au><au><snm>McClements</snm><fnm>W</fnm></au><au><snm>Anderson</snm><fnm>AS</fnm></au></aug><source>Infect Immun</source><pubdate>2006</pubdate><volume>74</volume><fpage>2215</fpage><lpage>2223</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/IAI.74.4.2215-2223.2006</pubid><pubid idtype="pmcid">1418914</pubid><pubid idtype="pmpid">16552052</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Selection and characterization of murine monoclonal antibodies to Staphylococcus aureus iron-regulated surface determinant B with functional activity in vitro and in vivo</p></title><aug><au><snm>Brown</snm><fnm>M</fnm></au><au><snm>Kowalski</snm><fnm>R</fnm></au><au><snm>Zorman</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>XM</fnm></au><au><snm>Towne</snm><fnm>V</fnm></au><au><snm>Zhao</snm><fnm>Q</fnm></au><au><snm>Secore</snm><fnm>S</fnm></au><au><snm>Finnefrock</snm><fnm>AC</fnm></au><au><snm>Ebert</snm><fnm>T</fnm></au><au><snm>Pancari</snm><fnm>G</fnm></au><au><snm>Isett</snm><fnm>K</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Anderson</snm><fnm>AS</fnm></au><au><snm>Montgomery</snm><fnm>D</fnm></au><au><snm>Cope</snm><fnm>L</fnm></au><au><snm>McNeely</snm><fnm>T</fnm></au></aug><source>Clin Vaccine Immunol</source><pubdate>2009</pubdate><volume>16</volume><fpage>1095</fpage><lpage>1104</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/CVI.00085-09</pubid><pubid idtype="pmcid">2725548</pubid><pubid idtype="pmpid">19553551</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Identification of an immunodominant ABC transporter in methicillin-resistant Staphylococcus aureus infections</p></title><aug><au><snm>Burnie</snm><fnm>JP</fnm></au><au><snm>Matthews</snm><fnm>RC</fnm></au><au><snm>Carter</snm><fnm>T</fnm></au><au><snm>Beaulieu</snm><fnm>E</fnm></au><au><snm>Donohoe</snm><fnm>M</fnm></au><au><snm>Chapman</snm><fnm>C</fnm></au><au><snm>Williamson</snm><fnm>P</fnm></au><au><snm>Hodgetts</snm><fnm>SJ</fnm></au></aug><source>Infect Immun</source><pubdate>2000</pubdate><volume>68</volume><fpage>3200</fpage><lpage>3209</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/IAI.68.6.3200-3209.2000</pubid><pubid idtype="pmcid">97562</pubid><pubid idtype="pmpid">10816464</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria</p></title><aug><au><snm>Gardy</snm><fnm>JL</fnm></au><au><snm>Spencer</snm><fnm>C</fnm></au><au><snm>Wang</snm><fnm>K</fnm></au><au><snm>Ester</snm><fnm>M</fnm></au><au><snm>Tusnady</snm><fnm>GE</fnm></au><au><snm>Simon</snm><fnm>I</fnm></au><au><snm>Hua</snm><fnm>S</fnm></au><au><snm>deFays</snm><fnm>K</fnm></au><au><snm>Lambert</snm><fnm>C</fnm></au><au><snm>Nakai</snm><fnm>K</fnm></au><au><snm>Brinkman</snm><fnm>FS</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>3613</fpage><lpage>3617</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg602</pubid><pubid idtype="pmcid">169008</pubid><pubid idtype="pmpid">12824378</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Of mice and humans: how good are HLA transgenic mice as a model of human immune responses?</p></title><aug><au><snm>Kotturi</snm><fnm>MF</fnm></au><au><snm>Assarsson</snm><fnm>E</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Grey</snm><fnm>H</fnm></au><au><snm>Oseroff</snm><fnm>C</fnm></au><au><snm>Pasquetto</snm><fnm>V</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>Immunome Res</source><pubdate>2009</pubdate><volume>5</volume><fpage>3</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1745-7580-5-3</pubid><pubid idtype="pmcid">2702351</pubid><pubid idtype="pmpid">19534819</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>HLA-A*0201, HLA-A*1101, and HLA-B*0702 transgenic mice recognize numerous poxvirus determinants from a wide variety of viral gene products</p></title><aug><au><snm>Pasquetto</snm><fnm>V</fnm></au><au><snm>Bui</snm><fnm>HH</fnm></au><au><snm>Giannino</snm><fnm>R</fnm></au><au><snm>Banh</snm><fnm>C</fnm></au><au><snm>Mirza</snm><fnm>F</fnm></au><au><snm>Sidney</snm><fnm>J</fnm></au><au><snm>Oseroff</snm><fnm>C</fnm></au><au><snm>Tscharke</snm><fnm>DC</fnm></au><au><snm>Irvine</snm><fnm>K</fnm></au><au><snm>Bennink</snm><fnm>JR</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Southwood</snm><fnm>S</fnm></au><au><snm>Cerundolo</snm><fnm>V</fnm></au><au><snm>Grey</snm><fnm>H</fnm></au><au><snm>Yewdell</snm><fnm>JW</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>J Immunol</source><pubdate>2005</pubdate><volume>175</volume><fpage>5504</fpage><lpage>5515</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">16210659</pubid></xrefbib></bibl><bibl id="B56"><title><p>Reducing risk, improving outcomes: bioengineering less immunogenic protein therapeutics</p></title><aug><au><snm>De Groot</snm><fnm>AS</fnm></au><au><snm>Martin</snm><fnm>W</fnm></au></aug><source>Clin Immunol</source><pubdate>2009</pubdate><volume>131</volume><fpage>189</fpage><lpage>201</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.clim.2009.01.009</pubid><pubid idtype="pmpid" link="fulltext">19269256</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>Cell surface recycling of internalized antigen permits dendritic cell priming of B cells</p></title><aug><au><snm>Bergtold</snm><fnm>A</fnm></au><au><snm>Desai</snm><fnm>DD</fnm></au><au><snm>Gavhane</snm><fnm>A</fnm></au><au><snm>Clynes</snm><fnm>R</fnm></au></aug><source>Immunity</source><pubdate>2005</pubdate><volume>23</volume><fpage>503</fpage><lpage>514</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.immuni.2005.09.013</pubid><pubid idtype="pmpid" link="fulltext">16286018</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Dendritic cells interact directly with naive B lymphocytes to transfer antigen and initiate class switching in a primary T-dependent response</p></title><aug><au><snm>Wykes</snm><fnm>M</fnm></au><au><snm>Pombo</snm><fnm>A</fnm></au><au><snm>Jenkins</snm><fnm>C</fnm></au><au><snm>MacPherson</snm><fnm>GG</fnm></au></aug><source>J Immunol</source><pubdate>1998</pubdate><volume>161</volume><fpage>1313</fpage><lpage>1319</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9686593</pubid></xrefbib></bibl><bibl id="B59"><title><p>Differential lysosomal proteolysis in antigen-presenting cells determines antigen fate</p></title><aug><au><snm>Delamarre</snm><fnm>L</fnm></au><au><snm>Pack</snm><fnm>M</fnm></au><au><snm>Chang</snm><fnm>H</fnm></au><au><snm>Mellman</snm><fnm>I</fnm></au><au><snm>Trombetta</snm><fnm>ES</fnm></au></aug><source>Science</source><pubdate>2005</pubdate><volume>307</volume><fpage>1630</fpage><lpage>1634</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1108003</pubid><pubid idtype="pmpid" link="fulltext">15761154</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>Immunodominance in TCD8+ responses to viruses: cell biology, cellular immunology, and mathematical models</p></title><aug><au><snm>Yewdell</snm><fnm>JW</fnm></au><au><snm>Del</snm><fnm>VM</fnm></au></aug><source>Immunity</source><pubdate>2004</pubdate><volume>21</volume><fpage>149</fpage><lpage>153</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.immuni.2004.06.015</pubid><pubid idtype="pmpid" link="fulltext">15308096</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>Meta-analysis of immune epitope data for all Plasmodia: overview and applications for malarial immunobiology and vaccine-related issues</p></title><aug><au><snm>Vaughan</snm><fnm>K</fnm></au><au><snm>Blythe</snm><fnm>M</fnm></au><au><snm>Greenbaum</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Peters</snm><fnm>B</fnm></au><au><snm>Doolan</snm><fnm>DL</fnm></au><au><snm>Sette</snm><fnm>A</fnm></au></aug><source>Parasite Immunol</source><pubdate>2009</pubdate><volume>31</volume><fpage>78</fpage><lpage>97</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-3024.2008.01077.x</pubid><pubid idtype="pmpid" link="fulltext">19149776</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>