Abstract
Major histocompatibility complex (MHC) molecules play a crucial role in adaptive immunity by sampling peptides from self and non-self proteins to be recognised by the immune system. MHC molecules present peptides on cell surfaces for recognition by CD8+ and CD4+ T lymphocytes that can initiate immune responses. Therefore, it is of great importance to be able to identify peptides that bind to MHC molecules, in order to understand the nature of immune responses and discover T cell epitopes useful for designing new vaccines and immunotherapies. MHC molecules in humans, referred to as human leucocyte antigen (HLA) molecules, are encoded by extremely polymorphic genes on chromosome 6. Due to this polymorphism, thousands of different MHC molecules exist, making the experimental identification of peptide-MHC interactions a very costly procedure. This has primed the need for in silico peptide-MHC prediction methods, and over the last decade several such methods have been successfully developed and used for epitope discovery purposes. My PhD project has been dedicated to improve methods for predicting peptide-MHC interactions by developing new strategies for training prediction algorithms based on machine learning techniques. Several MHC class I binding prediction algorithms have been developed and due to their high accuracy they are used by many immunologists to facilitate the conventional experimental process of epitope discovery. However, the accuracy of these methods depends on data defining the MHC molecule in question, making it difficult for the non-expert end-user to choose the most suitable predictor. The first paper in this thesis presents a new, publicly available, consensus method for MHC class I predictions. The NetMHCcons predictor combines three state-of-the-art prediction tools and provides the most accurate predictions for any given MHC molecule. While the methods for MHC class I binding have reached a very high accuracy and are widely used for immunological research, the case of MHC class II is less clear. The open binding groove of MHC class II molecules and differences in polymorphism among MHC encoding genes makes predictions of pepetide binding to MHC class II molecules a complicated problem. We addressed these issues in order to develop the first pan-specific predictor common for all three human class II isotypes, HLA-DR, HLA-DP and HLA-DQ. The second paper introduces the NetMHCIIpan-3.0 predictor based on artificial neural networks, which is capable of giving binding affinities to any human MHC class II molecule. Chapter 4 of this thesis gives an overview of bioinformatics tools developed by the Immunological Bioinformatics group at Center for Biological Sequence Analysis. The chapter provides detailed explanations on how to use different methods for T cell epitope discovery research, explaining how input should be given as well as how to interpret the output. In the last chapter, I present the results of a bioinformatics analysis of epitopes from the yellow fever virus. The analysis demonstrated the absence of distinct regions of higher epitope density within the virus polyprotein. Also, the density of epitopes among different proteins was demonstrated to mostly depend on protein length and amino acid composition, underlining the importance of identifying peptide-MHC interactions. Furthermore, using yellow fever virus epitopes, we demonstrated the power of the %Rank score when compared with the binding affinity score of MHC prediction methods, suggesting that this score should be considered to be used for selecting potential T cell epitopes. In summary, this thesis presents methods for prediction of peptides that bind to both MHC class I and class II molecules, which is important for driving immunological research within the field of T cell epitope discovery and for general understanding of the cellular responses.