Abstract
Background: In gene expression analysis, overlapping genes, splice variants, and fusion transcripts are potential sources of data analysis artefacts, depending on how the observed intensity is assigned to one, or more genes. We here exemplify this by an in-depth analysis of the INS-IGF2 fusion transcript, which has recently been reported to be among the highest expressed transcripts in human pancreatic beta cells and its protein indicated as a novel autoantigen in Type 1 Diabetes. Results: Through RNA sequencing and variant specific qPCR analyses we demonstrate that the true abundance of INS-IGF2 is >20,000 fold lower than INS in human beta cells, and we suggest an explanation to the nature of the artefacts which have previously led to overestimation of the gene expression level in selected studies. We reinvestigated the previous reported findings of detection of INS-IGF2 using antibodies both in Western blotting and immunohistochemistry. We found that the one available commercial antibody (BO1P) raised against recombinant INS-IGF2 show strong cross-reaction to native proinsulin, and we did not detect INS-IGF2 protein in the human beta cell line EndoC-βH1. Furthermore, using highly sensitive proteomics analysis we could not demonstrate INS-IGF2 protein in samples of human islets nor in EndoC-βH1. Conclusions: Sequence features, such as fusion transcripts spanning multiple genes can lead to unexpected results in gene expression analysis, and care must be taken in generating and interpreting the results. For the specific case of INS-IGF2 we conclude that the abundance of the fusion transcript/protein is exceedingly lower than previously reported, and that current immuno-reagents available for detecting INS-IGF2 protein have a strong cross-reaction to native human proinsulin. Finally, we were unable to detect INS-IGF2 protein by proteomics analysis.