Events

Our CDT runs a variety of events throughout the year, targeted at different audiences. Below you can find the information about our upcoming events and register to attend.

Distinguished lecture by Prof Mark van der Laan:

17 June 2025, 4pm

Targeted Machine Learning, Highly Adaptive Lasso, 
and Causal Inference for Generating Actionable Information

Register to attend

Traditional statistics often makes simplistic assumptions and is not tailored to the real questions of interest. Due to model misspecification, confidence intervals and p-values are unreliable: as sample size grows, the probability that the confidence interval contains the true estimand approaches zero and the probability of a type-I error approaches 1. On the other hand, machine learning is often focussed on prediction and often lacks theory supporting statistical inference. The lack of pre-specification generally results in human bias driving the findings, further deteriorating reliability of a typical data analysis. The real question of interest is often the causal impact of a policy or intervention on an exposure or treatment on the distribution of an outcome of interest, or more generally a quantity defined in a perfect world without missingness/censoring/drop-out and confounding.

Targeted Learning is a subfield of statistics that follows a general scientific roadmap for 1) accurately translating the real-world question into a formal statistical estimation problem in terms of a causal estimand, a corresponding statistical estimand that identifies the causal estimand under stated assumptions, and statistical model; 2) a corresponding template for construction of a targeted maximum likelihood estimator (TMLE) of the statistical estimand; and finally 3) a sensitivity analysis addressing the possible causal gap.

The TMLE represents an optimal plug-in machine learning based estimator of the estimand combined with formal statistical inference. The three pillars of TMLE are super-learning, Highly Adaptive Lasso (HAL), and the TMLE-update step. Through super-learning it can incorporate high dimensional and diverse data sources such as images, genomics, NLP features, and state-of-the-art algorithms tailored to such data sources such as deep learning algorithms. To optimize finite sample performance, the precise specification of the TMLE update step can be tailored towards the precise experiment and statistical estimation problem in question, while being theoretically grounded, optimal, and benchmarked.

We provide a motivation, explanation, and overview of targeted learning; the key role of super-learning and HAL; discuss some of the key choices and considerations in specifying the TMLE-step; and discuss (a priori specified) statistical analysis plan (SAP) construction based on targeted learning, incorporating outcome-blind simulations to choose a best specification of the SAP. We also discuss various case studies including a Sentinel and FDA RWE demonstration project of targeted learning demonstrating SAP specification on real data.