Machine learning competition in immunology – Prediction of HLA class I binding peptides
Abstract
Experimental studies of immune system and related applications such as characterization of immune responses against pathogens, vaccine design, or optimization of therapies are combinatorially complex, time-consuming and expensive. The main methods for large-scale identification of T-cell epitopes from pathogens or cancer proteomes involve either reverse immunology or high-throughput mass spectrometry (HTMS). Reverse immunology approaches involve pre-screening of proteomes by computational algorithms, followed by experimental validation of selected targets ( [Mora et al., 2006], [De Groot et al., 2008] and [Larsen et al., 2010]). HTMS involves HLA typing, immunoaffinity chromatography of HLA molecules, HLA extraction, and chromatography combined with tandem mass spectrometry, followed by the application of computational algorithms for peptide characterization (Bassani-Sternberg et al., 2010). Hundreds of naturally processed HLA class I associated peptides have been identified in individual studies using HTMS in normal (Escobar et al., 2008), cancer ( [Antwi et al., 2009] and [Bassani-Sternberg et al., 2010]), autoimmunity-related (Ben Dror et al., 2010), and infected samples (Wahl et al, 2010). Computational algorithms are essential steps in high-throughput identification of T-cell epitope candidates using both reverse immunology and HTMS approaches. Peptide binding to MHC molecules is the single most selective step in defining T cell epitope and the accuracy of computational algorithms for prediction of peptide binding, therefore, determines the accuracy of the overall method. Computational predictions of peptide binding to HLA, both class I and class II, use a variety of algorithms ranging from binding motifs to advanced machine learning techniques ( [Brusic et al., 2004] and [Lafuente and Reche, 2009]) and standards for their assessments have been developed. The assessments of computational servers that predict peptide binding to several common HLA class I alleles have been performed by different groups (see [Peters et al., 2006], [Lin et al., 2008] and [Gowthaman et al., 2010]). Some of these models were reported to be highly accurate while others need improvement.