Engineering A Tissue-Specific Imputation Panel For The Human Methylome

PhD Project
Supervisors
Riccardo Marioni, riccardo.marioni@ed.ac.uk
Daniel McCartney, daniel.mccartney@ed.ac.uk
 
Project Description
Genetic and genomic data are crucial for us to understand disease mechanisms and to facilitate risk prediction. The genome contains billions of pieces of information. Measuring each piece (for example, through whole-genome sequencing) is both computationally and economically expensive. However, many pieces, particularly those that lie close together, are strongly correlated. This means that we can measure a subset of information and then impute the missing values in a highly cost-effective manner.

Presently, imputation is widely established for genetic data, specifically, Single Nucleotide Polymorphisms (SNPs). No analogue currently exists for other omics layers, such as the epigenetic modification, DNA methylation (DNAm). DNAm is critical to gene regulation and is also able to accurately track lifestyle behaviours and environmental exposures.

We will use the world’s largest saliva- and blood-based DNAm datasets with replication of our findings across other cohorts of diverse ancestries, ages, and backgrounds. By using a variety of statistical ML and AI approaches, we will create a robust imputation process for DNAm that will be implemented via a front-end server. This will enable the global research community to save £millions from generating new array data and lead to new discoveries in association studies within biomedical research.