Abstract
The utopian balance between the individual cells of meta-organism, such as humans, is maintained by a sophisticated web of regulatory mechanisms, that are the results of at least 3.5 billion years of Darwinian evolution. The complexity of these mechanisms is what makes the study of living organisms and life in general a fascinating pursuit, but it is also the main cause why we could not find an ultimate cure against cancer, this ancient disease yet. Due to the vast number of varying pathways comprising the regulatory machinery, a healthy cell may follow multiple directions to reach the state of malignancy, which makes cancer one of the most complex diseases, humanity ever had to face. Over the past few decades, however, due to the advancements in DNA sequencing technologies, we have made a significant progress towards the understanding of its underlying biological processes. Meticulous studies of the genome of malignant cells have revealed, that some of their genetic and epigenetic alterations can be utilized as specific targets against them. These strategies involve the usage of phenotypical changes, that are induced by these alterations, in targeted therapies, or to help the immune system of their host to target cancer epitopes displayed on their surface. The work presented in this thesis can be separated into a technical and exploratory part. Exploratory projects focus on the identification of deficiencies in the homologous recombination (HR) double-strand break (DSB) repair pathway through the detection of genetic markers of HR-deficiency (HRD). Technical manuscripts, on the other hand, aim to resolve analytical issues that may distort the detectability of these markers. Part I of the thesis is a general introduction to cancer genomics and its related fields. It is meant to summarize my understanding of the topic, and to describe all the features and concepts that are necessary to comprehend the contents of the manuscripts that comprise Part II. The first biomarkers of HR-deficiency that relied on large scale genomic scars have been developed on microarray data, and it was debatable whether they could be directly applied on next-generation whole exome or whole genome sequences. Paper I forms a bridge between these two technologies, as it compares the genomic scars extracted from matched TCGA whole exome sequences and single nucleotide polymorphism (SNP) arrays. Paper II and III are the first to report the presence of clear genomic signs of HR-deficiency in two different tumor types. Paper II investigates whether breast cancer brain metastases (BCBM) have differing HR phenotypes than their primary tumors, and concludes that the HR-deficiency-associated genomic scars are significantly higher in BCBM tumors than in their primary counterparts. Paper III addresses HR-deficiency in non-small-cell lung cancers (NSCLC), and it reports HRD-related markers in lung squamous cell carcinomas and adenocarcinomas. The primary goal of these papers is to draw the attention of future clinical trials to the existence of likely PARP-inhibitor sensitive subgroups of patients within the investigated tumor types. Paper IV is highly technical with important clinical utility, and it is also connected to the extraction of HRD-related biomarkers. It describes a machine learning application that can be used for the filtration of formalin-induced sequencing artifacts from paired-end next-generation sequencing data. Although formalin-fixed paraffin-embedded tissue processing is a highly efficient way of cancer specimen storage, without an artifact filtration step, neither the detection of individual mutations nor the extraction of DNA aberration-based biomarkers is reliable enough to make conclusions based on them.