<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1745-7580-2-2</ui>
   <ji>1745-7580</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Improved method for predicting linear B-cell epitopes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Larsen</snm>
               <mnm>Erik Pontoppidan</mnm>
               <fnm>Jens</fnm>
               <insr iid="I1"/>
               <email>jepl@cbs.dtu.dk</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Lund</snm>
               <fnm>Ole</fnm>
               <insr iid="I1"/>
               <email>lund@cbs.dtu.dk</email>
            </au>
            <au id="A3">
               <snm>Nielsen</snm>
               <fnm>Morten</fnm>
               <insr iid="I1"/>
               <email>mniel@cbs.dtu.dk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark</p>
            </ins>
         </insg>
         <source>Immunome Research</source>
         <issn>1745-7580</issn>
         <pubdate>2006</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>2</fpage>
         <url>http://www.immunome-research.com/content/2/1/2</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16635264</pubid>
               <pubid idtype="doi">10.1186/1745-7580-2-2</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Larsen et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>B-cell epitopes are the sites of molecules that are recognized by antibodies of the immune system. Knowledge of B-cell epitopes may be used in the design of vaccines and diagnostics tests. It is therefore of interest to develop improved methods for predicting B-cell epitopes. In this paper, we describe an improved method for predicting linear B-cell epitopes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In order to do this, three data sets of linear B-cell epitope annotated proteins were constructed. A data set was collected from the literature, another data set was extracted from the AntiJen database and a data sets of epitopes in the proteins of HIV was collected from the Los Alamos HIV database. An unbiased validation of the methods was made by testing on data sets on which they were neither trained nor optimized on. We have measured the performance in a non-parametric way by constructing ROC-curves.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The best single method for predicting linear B-cell epitopes is the hidden Markov model. Combining the hidden Markov model with one of the best propensity scale methods, we obtained the BepiPred method. When tested on the validation data set this method performs significantly better than any of the other methods tested. The server and data sets are publicly available at <url>http://www.cbs.dtu.dk/services/BepiPred</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Vaccines have mostly been composed of killed or attenuated whole pathogens. For safety reasons, however, it could be desirable to use peptide vaccines that are able to generate an immune response against a given pathogen <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Such vaccines could contain peptides representing linear B-cell epitopes from the proteins of the pathogen. Hughes et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> used linear B-cell epitopes to induce protective immunity in mice against <it>P. aeruginosa</it>. By immunizing animals, synthetic peptides containing linear B-cell epitopes can also be used to raise antibodies against a specific protein, which e.g. can be used in screening assays or as diagnostic tools <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>B-cell epitopes are parts of proteins or other molecules that antibodies (made by B-cells) bind. Most protein epitopes are composed of different parts of the polypeptide chain that are brought into spatial proximity by the folding of the protein. These epitopes are called discontinuous, but for approximately 10% of the epitopes, the corresponding antibodies are cross-reactive with a linear peptide fragment of the epitope <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. These epitopes are denoted linear or continuous and are mainly composed of a single stretch of the polypeptide chain.</p>
         <p>Even though linear B-cell epitopes thus are of limited relevance in the detailed understanding of a humoral immune response, identification of such linear peptide segments will often be the initial step in the search for antigenic determinants in pathogenic organisms. The traditional experimental peptide scanning approach is clearly not feasible on a genomic scale. Prediction methods are very cost effective and reliable methods for predicting linear B-cell epitopes would therefore be a first step in guiding a genome wide search for B-cell antigens in pathogenic organism.</p>
         <p>The classical way of predicting linear B-cell epitopes is by the use of propensity scale methods. These methods assign a propensity value to every amino acid, based on studies of their physico-chemical properties. Fluctuations in the sequence of prediction values are reduced by applying a running average window. This prediction procedure was first developed by Hopp and Woods <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>Pellequer et al. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> compared several propensity scale methods using a data set of 14 epitope annotated proteins. They found that applying the scales by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (hydrophilicity), Chou and Fasman <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> (secondary structure) and by Emini et al. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> (accessibility) gave slightly better results than the other scales tested.</p>
         <p>Alix <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> developed a program called PEOPLE, which predicts the location of linear B-cell epitopes using combinations of propensity scale methods. Odorico <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> have developed a program, BEPITOPE, for predicting the location of linear B-cell epitopes using propensity scale methods.</p>
         <p>Recently, Blythe and Flower <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> studied the performance of many propensity scale methods and found that even the best methods predict only marginally better than a random model. They made a thorough study using a data set of 50 epitope mapped proteins from the AntiJen web page <url>http://www.jenner.ac.uk/AntiJen</url><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>In this study, we have developed a novel method for predicting linear B-cell epitopes, BepiPred, which is found to perform both significantly better than random predictions as well as significantly better than a number of tested propensity scales.</p>
         <p>Even though the present method is a significant improvement over earlier methods for predicting linear B-cell epitopes, it still has major limitations. There is a need for further improvements in predictive power before such systems become generally useful to provide reliable predictions of B-cell epitopes.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Predictions by propensity scale methods</p>
            </st>
            <p>We first tested a number of propensity scale methods on the Pellequer data set <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. For every scale and window size, a ROC-curve and area under it, the <it>A</it><sub><it>roc</it></sub>-value, was calculated as a measure of the prediction accuracy. 1000 bootstrap samples were drawn from the predictions in order to estimate the standard error of the <it>A</it><sub><it>roc</it></sub>-value, <graphic file="1745-7580-2-2-i1.gif"/>. The best scale was found to be the one by Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> (window size of 11, <it>A</it><sub><it>roc </it></sub>= 0.658 &#177; 0.013). This method with will be denoted Levitt. The second best scale is the scale by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (window size 9, <it>A</it><sub><it>roc </it></sub>= 0.654 &#177; 0.013), denoted Parker. The other scales, that were tested, did not perform as well as the scales by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <p>Performing a permutation experiment 1000 times, we estimated the P-value for the hypothesis that a method performs like a random model, where the alternative hypothesis is that it performs better than a random model. The resulting P-values for Parker and Levitt were both below 0.1%.</p>
         </sec>
         <sec>
            <st>
               <p>Predictions by hidden Markov models</p>
            </st>
            <p>Experiments were conducted in which hidden Markov models (HMMs) were used for the prediction of the location of linear B-cell epitopes. The methods were build from positive windows extracted from the AntiJen data set. The HMMs were tested on the Pellequer data set to find the optimal parameters. Different sizes of the extracted peptide windows, different weights of pseudo count correction for estimating the amino acid frequencies and different sizes of the smoothing window were tested. For the best method, the size of the extracted windows was found to be 5, the size of the smoothing window was 9 and the pseudo-count correction was 10<sup>7</sup>. The performance of the method on the Pellequer data set was <it>A</it><sub><it>roc </it></sub>= 0.663 &#177; 0.012. This method with these parameters will be denoted HMM.</p>
         </sec>
         <sec>
            <st>
               <p>Combining methods</p>
            </st>
            <p>In order to make more accurate predictions, the hidden Markov model (HMM) was combined with one of the two best propensity scale methods (Parker and Levitt). The combinations were done as weighted sums of normalized prediction values. The sum of the weights on the two methods was kept equal to one and different weight-pairs were tested. The Pellequer data set was used to optimize the parameter values. The combination methods with the highest <it>A</it><sub><it>roc</it></sub>-values were chosen for further comparisons and are shown in Table <tblr tid="T1">1</tblr>. The combinational method with the highest <it>A</it><sub><it>roc</it></sub>-value is denoted BepiPred and it is the candidate method for predicting linear B-cell epitopes in this paper. It is a combination of HMM and Parker.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Combinations of methods. Predictions on the Pellequer data set.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Name</p>
                     </c>
                     <c ca="left">
                        <p>Method 1</p>
                     </c>
                     <c ca="left">
                        <p>Method 2</p>
                     </c>
                     <c ca="center">
                        <p>Weight on method 1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>A</it>
                           <sub>
                              <it>roc</it>
                           </sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>BepiPred</p>
                     </c>
                     <c ca="left">
                        <p>HMM</p>
                     </c>
                     <c ca="left">
                        <p>Parker</p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="left">
                        <p>0.671 &#177; 0.013</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Comb2</p>
                     </c>
                     <c ca="left">
                        <p>HMM</p>
                     </c>
                     <c ca="left">
                        <p>Levitt</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="left">
                        <p>0.669 &#177; 0.013</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Validating the methods</p>
            </st>
            <p>To make an unbiased validation of the methods, tests were performed on an independent data set, the HIV data set. The results are shown in Table <tblr tid="T2">2</tblr>. BepiPred is again seen to be the best method. ROC-curves for the selected methods are shown in Figure <figr fid="F1">1</figr>, and chosen values are given in Table <tblr tid="T3">3</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Validation of the methods on the HIV data set.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>A</it>
                           <sub>
                              <it>roc</it>
                           </sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>BepiPred</p>
                     </c>
                     <c ca="left">
                        <p>0.600 &#177; 0.011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HMM</p>
                     </c>
                     <c ca="left">
                        <p>0.586 &#177; 0.011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Parker</p>
                     </c>
                     <c ca="left">
                        <p>0.586 &#177; 0.011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Comb2</p>
                     </c>
                     <c ca="left">
                        <p>0.584 &#177; 0.011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Levitt</p>
                     </c>
                     <c ca="left">
                        <p>0.572 &#177; 0.011</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>ROC-curves for selected methods validated on the HIV data set</p>
               </caption>
               <text>
                  <p>ROC-curves for selected methods validated on the HIV data set. See Table 3 for chosen points on the curves.</p>
               </text>
               <graphic file="1745-7580-2-2-1"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Sensitivities for selected specificities (both in %) for some of the methods. The data is taken from their ROC-curves, shown in Figure 1. The methods were validated on the HIV data set.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>specificity</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>sensitivity</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>BepiPred</p>
                     </c>
                     <c ca="center">
                        <p>Parker</p>
                     </c>
                     <c ca="center">
                        <p>Levitt</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>16.7</p>
                     </c>
                     <c ca="center">
                        <p>17.2</p>
                     </c>
                     <c ca="center">
                        <p>14.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>30.9</p>
                     </c>
                     <c ca="center">
                        <p>28.8</p>
                     </c>
                     <c ca="center">
                        <p>26.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="center">
                        <p>42.6</p>
                     </c>
                     <c ca="center">
                        <p>40.8</p>
                     </c>
                     <c ca="center">
                        <p>39.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>53.8</p>
                     </c>
                     <c ca="center">
                        <p>50.9</p>
                     </c>
                     <c ca="center">
                        <p>50.1</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Paired t-tests were performed for the predictions on the HIV data set to determine if one method had a prediction accuracy that was significant higher than another. Table <tblr tid="T4">4</tblr> shows that BepiPred was found to be significantly better than all other tested methods, and that HMM was not significantly better than Parker.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>P values (in %) for the comparisons of methods. If a P-value is below the chosen significance level of 5%, the alternative hypothesis, which is that the method to the left is more accurate than the method at the top, can be accepted. The methods were validated on the HIV data set.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Levitt</p>
                     </c>
                     <c ca="right">
                        <p>Comb2</p>
                     </c>
                     <c ca="right">
                        <p>Parker</p>
                     </c>
                     <c ca="right">
                        <p>HMM</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Comb2</p>
                     </c>
                     <c ca="right">
                        <p>0.33</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Parker</p>
                     </c>
                     <c ca="right">
                        <p>12.62</p>
                     </c>
                     <c ca="right">
                        <p>43.61</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HMM</p>
                     </c>
                     <c ca="right">
                        <p>2.60</p>
                     </c>
                     <c ca="right">
                        <p>20.15</p>
                     </c>
                     <c ca="right">
                        <p>45.86</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>BepiPred</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.04</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.14</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>1.90</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.13</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have constructed a prediction method for linear B-cell epitopes using a hidden Markov model. Hidden Markov models have not been used for this specific purpose before.</p>
         <p>Our method has a quite low sensitivity. One way of increasing the sensitivity is to lower the applied threshold, but that would also lead to a lower specificity. Pellequer et al.<abbrgrp><abbr bid="B14">14</abbr></abbrgrp> showed that a reduction of over-predictions could be done by combining prediction curves, and further improvements of B-cell epitope prediction methods may be obtained using similar approaches.</p>
         <p>Pellequer et al. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> have made a comparison of several propensity scales using one of the data sets in the present study: the Pellequer dataset. They made a study applying some propensity scale methods to the data set and used a fixed threshold of 0.7 <it>s</it>, where <it>s </it>is the standard deviation of the prediction values. This threshold classified the predictions as positive or negative. They found that the predictions using the different scales were better than random, complying with the findings of the present study. They compared the scales on a data set consisting of nine of the sequences and found that the scales by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, Chou and Fasman <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and of Emini et al. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> gave slightly better results than the other scales tested.</p>
         <p>In the present study, we found that for a similar data set, the scales that performed best were constructed by Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. This corresponds well with the findings of Pellequer et al. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Blythe and Flower <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> have found that even the best propensity scale methods perform only marginally better than a random model. They used a data set of 50 epitope mapped proteins from the AntiJen home page <url>http://www.jenner.ac.uk/AntiJen</url><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and applied many propensity scale methods to the data.</p>
         <p>Our permutation tests showed that the scales by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> with their optimal window sizes were performing significantly better than random models.</p>
         <p>We have tested several propensity scale methods and optimized their parameters in order to identify the best method. For the Pellequer data set, the best method was found to be the scale by <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> with a window size of 11. The second best propensity scale method was the scale by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> with a window size of 7&#8211;11. This scale was intended to be used with a window size of 7 by the authors, which corresponds well with our findings.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We present a novel method for predicting linear B-cell epitopes, BepiPred. It is a combination method, made by combining the predictions of a hidden Markov model and the propensity scale by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. We have tested different parameters in order to optimize the hidden Markov model and the propensity scale method.</p>
         <p>We have tested the methods using the non-parametric ROC-curves and made an unbiased validation using a separate data set. We found that BepiPred had the highest prediction accuracy on the test data set, and it is shown to perform significantly better than all other methods tested on the validation data set. Comparing BepiPred with the best propensity scale methods on the validation data set, for a specificity of 80% the sensitivity for BepiPred, the scale by Parker et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and by Levitt <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> is 30.9%, 28.8% and 26.8%, respectively.</p>
         <p>Future work could include using data from other sources, such as the Immune Epitope Database and Analysis Resource, IEDB <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, or the Epitome database of structurally inferred antigenic epitopes in proteins <url>http://www.rostlab.org/services/epitome</url>.</p>
      </sec>
      <sec>
         <st>
            <p>Data sets</p>
         </st>
         <p>Three data sets of proteins with linear B-cell epitope annotation were used in these studies. All data sets were constructed by measuring the cross-reactivity between the intact protein and the peptide fragment <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>The Pellequer data set</p>
            </st>
            <p>A data set was used for the tests and optimization of the methods. Since this dataset was unavailable in an electronic form it was recreated by Lund et al. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The epitope annotations were taken from Pellequer et al. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and references herein. An exception was the sequence of scorpion neurotoxin, in which the data was taken from <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. This data set, denoted the Pellequer data set, contains 14 protein sequences and 83 epitopes. The epitope density is 0.34.</p>
         </sec>
         <sec>
            <st>
               <p>The AntiJen data set</p>
            </st>
            <p>A second data set was used to train and build the hidden Markov model. This data set was extracted from the AntiJen database, formerly JenPep <abbrgrp><abbr bid="B13">13</abbr></abbrgrp><url>http://www.jenner.ac.uk/AntiJen</url>. This data set, denoted the AntiJen data set, consists of 127 protein sequences, and the epitope density is 0.08. The proteins of this data set are not fully annotated, and the annotation for the non-epitope stretches is not known.</p>
         </sec>
         <sec>
            <st>
               <p>The HIV data set</p>
            </st>
            <p>A separate data set was made allowing an unbiased validation of the methods. It consists of epitopes found in the proteins of HIV taken from the HIV Molecular Immunology Database of the Los Alamos National Laboratory <abbrgrp><abbr bid="B19">19</abbr></abbrgrp><url>http://www.hiv.lanl.gov</url>. The epitopes in this data set are overlapping to some degree. Therefore a procedure for determining more accurate borders of the minimal epitopes was applied to the epitopes. If a smaller epitope was contained as part of a larger epitope, the larger epitope was discarded from the data set. Two of the sequences had no assigned epitopes and were therefore discarded from the data set. The HIV data set consists of 10 protein sequences and the epitope density is 0.38.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Propensity scale methods</p>
            </st>
            <p>The propensity scale methods assign a propensity value to every amino acid of the query protein sequence. Fluctuations are reduced by applying a running mean window. In the N- and C- termini we used asymmetric windows to avoid discarding prediction examples. The scales used in this study are based on antigenicity <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, hydrophilicity <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, inverted hydrophobicity <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, accessibility <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and secondary structure <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Hidden Markov models</p>
            </st>
            <p>Let <b>i </b>= (<it>i</it><sub>1</sub>, <it>i</it><sub>2</sub>, ..., <it>i</it><sub><it>w</it></sub>) denote a sequence of amino acids, which has been extracted from a protein sequence. Let <it>j </it>denote the position in this window, <it>j </it>= 1...<it>w</it>. On basis of <b>i</b>, the hidden Markov model predicts if the center position of the window is annotated as part of an epitope. In the N- and C-termini, parts of the extracted windows are exceeding the terminals. For these residues, the character 'X' is used, which does not count when the hidden Markov model is used for the predictions. The prediction score for a window is given by</p>
            <p>
               <graphic file="1745-7580-2-2-i2.gif"/>
            </p>
            <p>which is the log odds of the residue at the center position of the window is being part of an epitope (Epitope model) as opposed to if it is occurring by chance (Random model).</p>
            <p>To construct the Random model, background frequencies of the Swiss-Prot database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, <it>q</it><sub><it>i</it></sub>, is used. For the Epitope model, <it>p</it><sub><it>i,j </it></sub>is the effective amino acid probability of having amino acid <it>i </it>at position <it>j </it>according to the model.</p>
            <p>To calculate the values of <it>p</it><sub><it>i,j</it></sub>, all windows, for which their center position is annotated as part of an epitope, are extracted from atraining data set. Again, if an extracted window exceeds the N or C terminal, the character 'X' is used, which does not count when calculating the parameters.</p>
            <p>These extracted peptide windows form a matrix of aligned peptides of the width <it>w</it>. From this alignment, <it>p</it><sub><it>i,j </it></sub>is calculated as the pseudo count corrected probability of occurrence of amino acid <it>i </it>in column <it>j</it>, estimated as in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. To make the pseudo count correction, pseudo count frequencies, <it>g</it><sub><it>i,j</it></sub>, are calculated. They are given by</p>
            <p>
               <graphic file="1745-7580-2-2-i3.gif"/>
            </p>
            <p>where <it>p</it><sub><it>k,j </it></sub>is the observed frequency of amino acid <it>k </it>in column <it>j </it>of the alignment <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The variable <it>b</it><sub><it>i,k </it></sub>is the Blosum 62 substitution matrix frequency, e.g. the frequency of which <it>i </it>is aligned to <it>k </it><abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <p>To give an example of using (2), let the window size, <it>w </it>= 1. The model is then only covering residues, which are annotated as being part of linear B-cell epitopes. If the observed peptides consists of the following single amino acid sequences L and V, with the frequencies <it>p</it><sub><it>L,1 </it></sub>= 0.5 and <it>p</it><sub><it>V,1 </it></sub>= 0.5, then the pseudo-count frequency for e.g. I is given by</p>
            <p>
               <graphic file="1745-7580-2-2-i4.gif"/>
            </p>
            <p>The effective amino acid frequencies are calculated as a weighted average of the observed frequency and the pseudo count frequency,</p>
            <p>
               <graphic file="1745-7580-2-2-i5.gif"/>
            </p>
            <p>Here, <it>&#945; </it>is the effective number of sequences in the alignment - 1, and <it>&#946; </it>is the pseudo count correction <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, which is also called the weight on low counts. To finish the calculation example, let <it>&#946; </it>be very large as it is in this work. Then <it>p</it><sub><it>I,1 </it></sub>&#8776; <it>g</it><sub><it>I,1 </it></sub>= 0.14.</p>
            <p>Note that we shall use the term hidden Markov model throughout this work to refer to the weight matrix generated using (1). The parameters of the ungapped Markov model are calculated using a so-called Gibbs sampler, written by Nielsen et al. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>The result of applying (1) is a prediction score for every residue of the query sequence. To reduce fluctuations, a smoothing window is applied to every position. It is made asymmetric in the N- and C- termini in order to conserve prediction examples.</p>
         </sec>
         <sec>
            <st>
               <p>ROC-curves</p>
            </st>
            <p>The result of applying a prediction method to a data set is a set of prediction examples, <b>x </b>= (<it>x</it><sub>1</sub>, <it>x</it><sub>2</sub>, ...,<it>x</it><sub><it>N</it></sub>). Let <it>n </it>denote the residue number. Every <it>x</it><sub><it>n </it></sub>consists of a target value and a predicted value. If the residue is annotated as part of an epitope, the target value is 1, zero otherwise. If asymmetric smoothing windows are used in the N- and C- termini, the variable <it>N </it>is equal to the number of residues in the data set.</p>
            <p>According to a variable threshold, the prediction examples are classified as positives or negatives, and according to the target values, the predictions can be true or false. The predictions can be either true positives (TP), true negatives (TN), false positives (FP) or false negatives (FN).</p>
            <p>The prediction accuracy is measured by constructing Receiver Operational Characteristics, ROC, curves <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. For every value of the threshold, the true positive proportion, TP/(TP+FN), and the false positive proportion, FP/(FP+TN), is calculated. A ROC-curve is constructed by plotting the false positive proportion against the true positive proportion for all values of the threshold. It is therefore a non-parametric measure.</p>
            <p>The sensitivity is equal to the true positive proportion, and the specificity, given by TN/(FP+TN), is equal to 1 &#8211; the false positive proportion. In this way, a ROC-curve is displaying the trade-off between the sensitivity and the specificity for all possible thresholds. A good method has a high true positive proportion when it has a low false positive proportion. A such model has a high sensitivity and a high specificity. The performance of the method is measured as the area under the curve, the <it>A</it><sub><it>roc</it></sub>-value. For a random prediction, the true positive proportion is equal to the false positive proportion for every value of the threshold. Then <it>A</it><sub><it>roc </it></sub>= 0.5. For a perfect method, <it>A</it><sub><it>roc </it></sub>= 1.</p>
         </sec>
         <sec>
            <st>
               <p>Bootstrapping</p>
            </st>
            <p>Bootstrapping is used to estimate the standard error of the <it>A</it><sub><it>roc</it></sub>-value, <graphic file="1745-7580-2-2-i1.gif"/> as a measure of the uncertainty of the <it>A</it><sub><it>roc</it></sub>-value <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The relation between the standard error and the standard deviation, <it>s</it>, is that <it>se </it>= <graphic file="1745-7580-2-2-i6.gif"/>, where <it>r </it>is the number of repeats of the underlying experiment <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>Bootstrapping is a method for generating pseudo-replica (bootstrap samples) of the predictions, denoted <b>x</b>*, which deviate a little from <b>x</b>. The bootstrap sample, <b>x</b>* = <graphic file="1745-7580-2-2-i7.gif"/>, is defined as a random sample of size <it>N</it>, drawn with replacement from <b>x</b>. Some of the prediction examples from <b>x </b>may appear zero times, some one time, some twice etc. Drawing a bootstrap sample can in other words be done by copying randomly chosen prediction examples, <it>x</it><sub><it>n</it></sub>, from <b>x </b>into <b>x</b>*. In this way, some variation from <b>x </b>is introduced into <b>x</b>*.</p>
            <p>Totally <it>B </it>bootstrap samples are drawn. Let <b>x</b><sup>*<it>b </it></sup>denote the <it>b</it>'th bootstrap sample. The prediction accuracy of <b>x</b><sup>*<it>b </it></sup>is calculated as <graphic file="1745-7580-2-2-i8.gif"/>.</p>
            <p>The result of the bootstrap experiment is <b>x</b><sup>*1</sup>, <b>x</b><sup>*2</sup>,...,<b>x</b><sup>*<it>B </it></sup>and hence <graphic file="1745-7580-2-2-i9.gif"/>. The standard error of the original <it>A</it><sub><it>roc</it></sub>-value is given by</p>
            <p>
               <graphic file="1745-7580-2-2-i10.gif"/>
            </p>
            <p>where <graphic file="1745-7580-2-2-i11.gif"/> is the expected value of <graphic file="1745-7580-2-2-i12.gif"/>, given by <graphic file="1745-7580-2-2-i13.gif"/><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Note the similarity to the way the standard deviation is calculated. <graphic file="1745-7580-2-2-i11.gif"/> approaches the original <it>A</it><sub><it>roc</it></sub>-value as <it>B </it>gets large.</p>
         </sec>
         <sec>
            <st>
               <p>Paired t-tests</p>
            </st>
            <p>A paired t-test is performed in order to determine if one method is more accurate than another. The <it>H</it><sub>0</sub>-hypothesis for this test is that two means are equal, <it>&#956;</it><sub>1 </sub>= <it>&#956;</it><sub>2</sub>. Instead of <it>&#956;</it>, <graphic file="1745-7580-2-2-i11.gif"/> and hence <it>A</it><sub><it>roc </it></sub>is used. The starting point is the performance measures of the two methods, <it>A</it><sub><it>roc</it>,<it>M</it>1 </sub>and <it>A</it><sub><it>roc</it>,<it>M</it>2</sub>, where <it>M1 </it>denotes method 1. By bootstrapping we have the vectors <graphic file="1745-7580-2-2-i14.gif"/> and <graphic file="1745-7580-2-2-i15.gif"/>. Every bootstrap pair <graphic file="1745-7580-2-2-i16.gif"/> are drawn identically for every <it>b</it>, making the two <it>A</it><sub><it>roc</it></sub>-values paired.</p>
            <p>The <it>H</it><sub>0</sub>-hypothesis is therefore <it>A</it><sub><it>roc</it>,<it>M</it>1 </sub>= <it>A</it><sub><it>roc</it>,<it>M</it>2 </sub>and the alternative hypothesis <it>A</it><sub><it>roc</it>,<it>M</it>1 </sub>> <it>A</it><sub><it>roc</it>,<it>M</it>2</sub>. The test statistic <it>t </it>is given by</p>
            <p>
               <graphic file="1745-7580-2-2-i17.gif"/>
            </p>
            <p>The paired difference of the <it>b</it>'th bootstrap samples, <it>D</it><sup><it>b</it></sup>, is given by</p>
            <p>
               <graphic file="1745-7580-2-2-i18.gif"/>
            </p>
            <p>The variable <graphic file="1745-7580-2-2-i19.gif"/> is calculated as the expected value of <it>D</it><sup><it>b</it></sup>, and <graphic file="1745-7580-2-2-i20.gif"/> is calculated using (4) but replacing <graphic file="1745-7580-2-2-i8.gif"/> with <it>D</it><sup><it>b</it></sup>. The test statistic is following a t-distribution with <it>m </it>= <it>B </it>- 1 degrees of freedom, which approaches the normal distribution for <it>m </it>> 30, then <it>t </it>&#8776; <it>z</it>. The P-value for the test is then given by 1 - <it>F</it>(<it>z</it>), where <it>F</it>(<it>z</it>) is the cumulative normal distribution. See <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> for more information about the paired t-test.</p>
         </sec>
         <sec>
            <st>
               <p>Permutation tests</p>
            </st>
            <p>When testing the <it>H</it><sub>0</sub>-hypothesis that a method performs like a random model, a permutation experiment can be made. The alternative hypothesis is that the method is performing better than a random model. From the predictions of the method, <b>x</b>, the target values are permuted to result in a new prediction set, <b>x</b><sup><it>perm,p</it></sup>. This is done for <it>p </it>= 1...<it>p</it><sub><it>max</it></sub>. For every <it>p</it>, the prediction accuracy is calculated as <graphic file="1745-7580-2-2-i21.gif"/>. The P-value for the <it>H</it><sub>0</sub>-hypothesis is calculated as the proportion of times for which <graphic file="1745-7580-2-2-i21.gif"/> > <it>A</it><sub><it>roc</it></sub>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JEPL collected the AntiJen and the HIV database, developed, tested and validated the prediction methods and drafted the manuscript. OL created the Pellequer database. MN implemented the programs for the prediction methods. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A totally synthetic polyoxime malaria vaccine containing <it>Plasmodium falciparum </it>B cell and universal T cell epitopes elicits immune response in volunteers of diverse HLA types</p>
            </title>
            <aug>
               <au>
                  <snm>Nardin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Calvo-Calle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Nussenzweig</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tiercy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Loutan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hochstrasser</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Immunol</source>
            <pubdate>2001</pubdate>
            <volume>166</volume>
            <fpage>481</fpage>
            <lpage>489</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11123327</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Ability of synthetic peptides representing epitopes of outer membrane protein F of <it>Pseudomonas aeruginosa </it>to afford protection against <it>P. aeruginosa </it>infection in a murine acute pneumonia model</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gilleland</snm>
                  <fnm>HJ</fnm>
               </au>
            </aug>
            <source>Vaccine</source>
            <pubdate>1995</pubdate>
            <volume>13</volume>
            <issue>18</issue>
            <fpage>1750</fpage>
            <lpage>1753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0264-410X(95)00166-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">8701588</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The diagnostic properties of rheumatoid arthritis antibodies recognizing a cyclic citrullinated peptide</p>
            </title>
            <aug>
               <au>
                  <snm>Schellekens</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Visser</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>van den Hoogen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hazes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Breedveld</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>van Venrooij</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Arthritis Rheum</source>
            <pubdate>2000</pubdate>
            <volume>43</volume>
            <fpage>155</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1529-0131(200001)43:1&lt;155::AID-ANR20>3.0.CO;2-3</pubid>
                  <pubid idtype="pmpid">10643712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Predicting the location of continuous epitopes in proteins from their primary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Pellequer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Westhof</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Van Regenmortel</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1991</pubdate>
            <volume>203</volume>
            <fpage>176</fpage>
            <lpage>201</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1722270</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Prediction of protein antigenic determinants from amino acid sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Hopp</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Woods</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1981</pubdate>
            <volume>78</volume>
            <issue>6</issue>
            <fpage>3824</fpage>
            <lpage>3828</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">319665</pubid>
                  <pubid idtype="pmpid">6167991</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>New hydrophilicity scale derived from High-Performance Liquid Chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites</p>
            </title>
            <aug>
               <au>
                  <snm>Parker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hodges</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1986</pubdate>
            <volume>25</volume>
            <fpage>5425</fpage>
            <lpage>5432</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00367a013</pubid>
                  <pubid idtype="pmpid">2430611</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Prediction of the secondary structure of proteins from their amino acid sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Chou</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fasman</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Adv Enzymol Relat Areas Mol Biol</source>
            <pubdate>1978</pubdate>
            <issue>47</issue>
            <fpage>45</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid">364941</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Conformational preferences of amino acids in globular proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Levitt</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1978</pubdate>
            <volume>17</volume>
            <issue>20</issue>
            <fpage>4277</fpage>
            <lpage>4285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00613a026</pubid>
                  <pubid idtype="pmpid">708713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Induction of hepatitis A virus-neutralizing antibody by a virus specific synthetic peptide</p>
            </title>
            <aug>
               <au>
                  <snm>Emini</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Perlow</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boger</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>1985</pubdate>
            <volume>55</volume>
            <issue>3</issue>
            <fpage>836</fpage>
            <lpage>839</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">255070</pubid>
                  <pubid idtype="pmpid">2991600</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Predictive estimation of protein linear epitopes by using the program PEOPLE</p>
            </title>
            <aug>
               <au>
                  <snm>Alix</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Vaccine</source>
            <pubdate>1999</pubdate>
            <volume>18</volume>
            <issue>3</issue>
            <fpage>311</fpage>
            <lpage>314(4)</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0264-410X(99)00329-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">10506656</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>BEPITOPE: predicting the location of continuous epitopes and patterns in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Odorico</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pellequer</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Recognit</source>
            <pubdate>2003</pubdate>
            <volume>16</volume>
            <fpage>20</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jmr.602</pubid>
                  <pubid idtype="pmpid" link="fulltext">12557235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Benchmarking B cell epitope prediction: Underperformance of existing methods</p>
            </title>
            <aug>
               <au>
                  <snm>Blythe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Flower</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>246</fpage>
            <lpage>248</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.041059505</pubid>
                  <pubid idtype="pmpid" link="fulltext">15576553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>JenPep: a novel computational information resource for immunobiology and vaccinology</p>
            </title>
            <aug>
               <au>
                  <snm>McSparron</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Blythe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zygouri</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Doytchinova</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Flower</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Chem Inf Comput Sci</source>
            <pubdate>2003</pubdate>
            <volume>43</volume>
            <issue>4</issue>
            <fpage>1276</fpage>
            <lpage>1287</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci030461e</pubid>
                  <pubid idtype="pmpid" link="fulltext">12870921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Correlation between the location of antigenic sites and the prediction of turns in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pellequer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Westhof</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Van Regenmortel</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Immunol Lett</source>
            <pubdate>1993</pubdate>
            <volume>36</volume>
            <fpage>83</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0165-2478(93)90072-A</pubid>
                  <pubid idtype="pmpid">7688347</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Immune Epitope Database and Analysis Resource: From Vision to Blueprint</p>
            </title>
            <aug>
               <au>
                  <snm>Peters</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sidney</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bui</snm>
                  <fnm>HH</fnm>
               </au>
               <au>
                  <snm>Buus</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Doh</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fieri</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kronenberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kubo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lund</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Nemazee</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ponomarenko</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sathiamurthy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schoenberger</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Steward</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Surko</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Way</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sette</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e91</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1065705</pubid>
                  <pubid idtype="pmpid" link="fulltext">15760272</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030091</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Synthetic Peptides as Antigens</p>
            </title>
            <aug>
               <au>
                  <snm>Van Regenmortel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Laboratory Techniques in Biochemistry and Molecular Biology</source>
            <publisher>Elsevier</publisher>
            <editor>Pillai S, Van der Vliet P</editor>
            <pubdate>1999</pubdate>
            <volume>28</volume>
         </bibl>
         <bibl id="B17">
            <aug>
               <au>
                  <snm>Lund</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lundegaard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ke&#351;mir</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Immunological Bioinformatics</source>
            <publisher>The MIT Press</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Fine molecular analysis of the antigenicity of the Androctonus australis hector scorpion neurotoxin II: a new antigenic epitope disclosed by the Pepscan method</p>
            </title>
            <aug>
               <au>
                  <snm>Devaux</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Juin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mansuelle</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Granier</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Immunol</source>
            <pubdate>1993</pubdate>
            <volume>30</volume>
            <issue>12</issue>
            <fpage>1061</fpage>
            <lpage>1068</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0161-5890(93)90152-2</pubid>
                  <pubid idtype="pmpid">7690110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <aug>
               <au>
                  <snm>Korber</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Brander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Koup</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Watkins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <cnm>(Eds)</cnm>
               </au>
            </aug>
            <source>HIV Immunology and HIV/SIV Vaccine Databases 2003</source>
            <publisher>Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, New Mexico</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Prediction of sequential antigenic regions in protiens</p>
            </title>
            <aug>
               <au>
                  <snm>Welling</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Weijer</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>van der Zee</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Welling-Wester</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1985</pubdate>
            <volume>188</volume>
            <issue>2</issue>
            <fpage>215</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0014-5793(85)80374-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">2411595</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A simple method for displaying the hydropathic character of a protein</p>
            </title>
            <aug>
               <au>
                  <snm>Kyte</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1982</pubdate>
            <volume>157</volume>
            <fpage>105</fpage>
            <lpage>132</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(82)90515-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">7108955</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Cornette</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cease</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Spouge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Berzofsky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DeLisi</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1987</pubdate>
            <volume>195</volume>
            <issue>3</issue>
            <fpage>659</fpage>
            <lpage>685</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(87)90189-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">3656427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>45</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102476</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592178</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.45</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lundegaard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Worning</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hvid</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lamberth</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Buus</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lund</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>9</issue>
            <fpage>1388</fpage>
            <lpage>1397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth100</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Amino acid substitution matrices from protein blocks</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <fpage>10915</fpage>
            <lpage>10919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50453</pubid>
                  <pubid idtype="pmpid" link="fulltext">1438297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Measuring the accuracy of diagnostic systems</p>
            </title>
            <aug>
               <au>
                  <snm>Swets</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1988</pubdate>
            <volume>240</volume>
            <issue>4857</issue>
            <fpage>1285</fpage>
            <lpage>1293</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3287615</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibhirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>An Introduction to the Bootstrap</source>
            <publisher>Chapman &amp; Hall</publisher>
            <edition>first</edition>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B29">
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Probability and Statistics for Engineers</source>
            <publisher>Prentice Hall International, Inc.</publisher>
            <edition>seventh</edition>
            <pubdate>2005</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>

