Meta-Analysis of Heterogeneous Data Sources for Genome-Scale Identification of Risk Genes in Complex Phenotypes
Abstract
Meta‐analyses of large‐scale association studies typically proceed solely within one data type and do not exploit the potential complementarities in other sources of molecular evidence. Here, we present an approach to combine heterogeneous data from genome‐wide association (GWA) studies, protein‐protein interaction screens, disease similarity, linkage studies, and gene expression experiments into a multi‐layered evidence network which is used to prioritize the entire protein‐coding part of the genome identifying a shortlist of candidate genes. We report specifically results on bipolar disorder, a genetically complex disease where GWA studies have only been moderately successful. We validate one such candidate experimentally, YWHAH, by genotyping five variations in 640 patients and 1,377 controls. We found a significant allelic association for the rs1049583 polymorphism in YWHAH (adjusted P = 5.6e−3) with an odds ratio of 1.28 [1.12–1.48], which replicates a previous case‐control study. In addition, we demonstrate our approach's general applicability by use of type 2 diabetes data sets. The method presented augments moderately powered GWA data, and represents a validated, flexible, and publicly available framework for identifying risk genes in highly polygenic diseases. The method is made available as a web service at . Genet. Epidemiol. 2011. © 2011 Wiley‐Liss, Inc. 35:318‐332, 2011