Home » Abstracts » AI and Bioinformatics » Advancing Tissue Prediction Using Read-Based DNA Methylation Modelling Towards a Multi-Omics Integration

Advancing Tissue Prediction Using Read-Based DNA Methylation Modelling Towards a Multi-Omics Integration

Author(s)

  • Toon Callens (Presenting Author) | CompOmics, VIB-UGent Center for Medical Biotechnology | TECHNOLOGIEPARK-ZWIJNAARDE 75, 9052, Zwijnaarde, Belgium

Abstract

To enhance tissue-specific classification in oncological settings such as Cancers of Unknown Primary (CUPs), I developed a deep learning model leveraging read-level DNA methylation information, known as alpha values. Unlike traditional beta values, alpha values retain individual-read resolution, thereby preserving the heterogeneity of epigenetic landscapes. CpG islands were used as biologically meaningful regions to generate alpha distribution vectors per sample, which served as input features for tissue classification. Initially, feature selection was performed through cosine similarity to find Differentially Methylated Regions (DMRs), allowing the identification of highly tissue-informative regions. Subsequently, a convolutional neural network trained on this selected feature space achieved tissue prediction accuracies exceeding 95%.
Next, I removed feature selection to model all methylated CpG islands. SHAP values were computed to assess feature importance per tissue, which were then mapped to gene promoter regions, allowing downstream integration with proteomics. This work complements prior efforts within our group using proteomics-based machine learning tissue classification, which together establishes a baseline for biologically interpretable, multi-omics tissue prediction. Ultimately, these integrated approaches pave the way for improved diagnostics in CUPs and other tissue-related pathologies.