PCIC 2023 | PCIC 2022 | PCIC 2021 | PCIC 2020 | PCIC 2019

PCIC 2023

Speakers

Learn About

Derrick Bennett

Personal Website

Derrick A. Bennett (he/him), PhD, CStat, Associate Professor
University Research Lecturer, Senior Statistician
Nuffield Department of Population Health
Medical Research Council Population Health Research Unit at the University of Oxford

Alcohol and Cardiovascular Disease: Is Moderate Drinking Really Beneficial for Cardiovascular Disease?

Derrick Bennett

Nuffield Department of Population Health
Learn About

Ross L. Prentice

Personal Website

Professor Emeritus, Cancer Prevention Program
Public Health Sciences Division, Fred Hutch

Intention-to-Treat Comparisons in Randomized Trials

Ross L. Prentice

Fred Hutch
Learn About

Xihong Lin

Personal Website

Professor
Harvard University and Member of U.S. National Academy of Science

Statistical Inference for Large-scale Causal Mediation Analysis in Genome-wide Studies

Xihong Lin

Harvard University
Learn About

Ewout W. Steyerberg

Personal Website

Professor of Clinical Biostatistics & Medical Decision Making
Chair, Dept of Biomedical Data Sciences
Leiden University Medical Center
Leiden, The Netherlands

Causal Inference and Counterfactual Prediction of Treatment Benefit: Modeling in Randomized Controlled Trials and Observational Studies

Ewout W. Steyerberg

Leiden University
Learn About

Xi Lin

Personal Website

Ph.D., University of Oxford

Combining Randomized and Observational Data using a Power Likelihood

Xi Lin

University of Oxford
Learn About

Shu Yang

Personal Website

Associate Professor
NC State University

Enhancing Treatment Effect Estimation: A Model Robust Approach Integrating Randomized Experiments and Historical Controls using the Double Penalty Integration Estimator

Shu Yang

NC State University
Learn About

Evan Rosenman

Personal Website

Assistant Professor
Claremont McKenna College

Shrinkage Estimation for Causal Inference and Experimental Design

Evan Rosenman

Claremont McKenna College
Learn About

Bénédicte Colnet

Personal Website

Ph.D., Inria

Reweighting the RCT for Generalization: Finite Sample Error and Variable Selection

Bénédicte Colnet

Inria
Learn About

Wei Li

Personal Website

Associate Professor, Renmin University of China
Wei Li is an Associate Professor in the School of Statistics at Renmin University of China.
He obtained his PhD in Probability and Statistics from Peking University in 2018 and BS in Mathematics and Applied Mathematics from Nankai University in 2013.
His research interests focus on causal inference, missing data, and high-dimensional statistics

Sparse Mediation Analysis with Unmeasured Mediator-outcome Confounding

Wei Li

Renmin University of China
Learn About

Zhexiao Lin

Ph.D., University of California, Berkeley
Zhexiao Lin is a PhD student in the Department of Statistics at University of California, Berkeley (advised by Professors Peng Ding, Peter Bickel and Fang Han).
He received his master's degree from University of Washington in 2022, and bachelor's degree from Zhejiang University in 2020.

On Regression-Adjusted Imputation Estimators of the Average Treatment Effect

Zhexiao Lin

University of California, Berkeley
Learn About

Hanzhong Liu

Personal Website

Associate Professor
Tsinghua University

Hanzhong Liu is an Associate Professor at the Center for Statistical Science in the Department of Industrial Engineering at Tsinghua University. He received his PhD in Statistics from Peking University and then worked with Prof. Bin Yu as a Postdoctoral Scholar in the Department of Statistics at UC Berkeley. His research focuses on developing statistical theory and methodologies for solving high-dimensional data problems and drawing causal inference.

Randomization and covariate adjustment in split-plot designs

Hanzhong Liu

Tsinghua University
Learn About

Zhongshang Yuan

Personal Website

Professor
Shandong University

Likelihood based Mendelian randomization analysis with automated instrument selection and horizontal pleiotropic modeling

Zhongshang Yuan

Shandong University
Learn About

Hongkai Li

Personal Website

Doctor, Professor in Department of Biostatistics, Shandong University.
Executive director of Causal Inference Branch of China Field Statistics Research Society, Director of Health and Medical Big Data Society of China Industrial Statistics Teaching Research Society, member of Statistical Theory and Method Research Society of China Health Information and Health and Medical Big Data Society, member of Shandong Province Brain glioma multidisciplinary Joint Committee.
His research interests include control methods for unknown confounding, data integration under the framework of causal inference, mediation effect analysis, control methods for measurement errors and selective bias, and statistical genetic methodology research.
He has published more than 60 SCI articles in internationally renowned magazines

A meta-analysis method integrating multiple Mendelian randomization studies

Hongkai Li

Shandong University
Learn About

Yumou Qiu

Personal Website

Associate Professor, Department of Statistics, Iowa State University
yumouqiu@iastate.edu

Yumou Qiu

Iowa State University
Learn About

Anqi Zhao

Personal Website

Assistant Professor, Duke University

To Adjust or not to Adjust? Estimating the Average Treatment Effect in Randomized Experiments with Missing Covariates

Anqi Zhao

Duke University
Learn About

Jae Kwang Kim

Personal Website

Professor, Iowa State University

Triply robust propensity score estimation under missing at random

Jae Kwang Kim

Iowa State University
Learn About

Jin Tian

Personal Website

Professor, Iowa State University

Estimating Causal Effects from Observational and Experimental Studies

Jin Tian

Iowa State University
Learn About

Zheng Zhang

Personal Website

Assistant Professor
Renmin University of China

Causal Inference of General Treatment Effects using Neural Networks with A Diverging Number of Confounders

Zheng Zhang

Renmin University of China
Learn About

Lin Liu

Personal Website

Associate Professor, Shanghai Jiao Tong University

Recent Advances in Machine-Learning-based Causal Inference

Lin Liu

Shanghai Jiao Tong University
Learn About

Kun Zhang

Personal Website

Professor, Carnegie Mellon University

Advances in Causal Representation Learning: Discovery of the Hidden World

Kun Zhang

Carnegie Mellon University
Learn About

Yifan Cui

Personal Website

Professor, Zhejiang University

Proximal Causal Learning of Heterogeneous Treatment Effects

Yifan Cui

Zhejiang University
Learn About

Wei Chen

Personal Website

Wei Chen is a lecturer at the School of Computer, Guangdong University of Technology. She received the B.S. degree in computer science and the Ph.D. degree in computer application engineering from the Guangdong University of Technology, Guangzhou, China, in 2015 and 2020, respectively. She was a visiting student at Carnegie Mellon University, Pittsburgh, PA, USA, from 2018 to 2019. Her research interests include causal discovery and its applications. Causal Discovery with Latent Variables Based on Higher-Order Cumulants

Wei Chen

Guangdong University of Technology
Learn About

Linbo Wang

Personal Website

Assistant Professor, Department of Statistical Sciences, University of Toronto
linbo.wang@utoronto.ca

The Promises of Parallel Outcomes

Linbo Wang

University of Toronto
Learn About

Lu Wang

Personal Website

Professor
University of Michigan-Ann Arbor

How to Use Latent Patient Preference when Evaluating the Optimal Dynamic Treatment Regimes?

Lu Wang

University of Michigan-Ann Arbor
Learn About

Cong Jiang

Personal Website

Ph.D., University of Montreal

Vaccine Effectiveness Estimation under the Test-Negative Design: Identifiability and Efficiency Theory for Causal Inference under Conditional Exchangeability

Cong Jiang

University of Montreal
Learn About

Guanbo Wang

Personal Website

Ph.D., Harvard University

Transporting Subgroup Treatment Effects under Multi-Source Data

Guanbo Wang

Harvard T.H. Chan School of Public Health
Learn About

Hengrui Cai

Personal Website

Assistant Professor, University of California-Irvine

Towards Causal Revolution: On Learning Heterogeneity and Non-Spuriousness in Causal Graphs

Hengrui Cai

University of California-Irvine
Learn About

Ting Ye

Personal Website

Dr. Ting Ye is the Genentech Endowed Assistant Professor in Biostatistics at the University of Washington.
Her research focuses on covariate adjustment in randomized controlled trials, Mendelian randomization, and other natural experiment methods for causal inference.

Debiased Multivariable Mendelian Randomization

Ting Ye

University of Washington
Learn About

Sai Li

Personal Website

Sai Li is an associate professor (without tenure) at the Institute of Statistics and Big Data, Renmin University of China.
She received her PhD degree in Statistics from Rutgers University.
She worked as a postdoctoral researcher at University of Pennsylvania after graduation.
Her research interests include methods and theories for high-dimensional statistics, transfer learning, and causal inference.

Leveraging Local Distributions in Mendelian Randomization: Uncertain Opinions are Invalid

Sai Li

Renmin University of China
Learn About

Jie Zheng

Professor
Shanghai Jiao Tong University School of Medicine, Ruijin Hospital
Jie obtained a PhD degree in genetic epidemiology from the University of Bristol in 2015. In 2018, Jie received the Vice- Chancellor fellowship and in 2021 the Springboard Award from the Academy of Medical Science. In January 2022, he officially joined Shanghai Jiao Tong University School of Medicine as a professor. Jie has multi-disciplinary background in statistical genetics, epidemiology, and bioinformatics, with experience in building up platforms such as MR-Base and LD Hub, applying causal inference method Mendelian randomization and conducting multi-ancestry omics analysis.
So far, Jie published over 100 papers, with over 8800 citations (h- index=31, i-10-index=55). In addition, Jie was supported by 2 fellowships and 4 research grants in the UK. With the support of these research funds, he led a multi-disciplinary team in the UK till 2021. After returning back to China, he received four fellowships and a major grant to establish a research team in Shanghai to support development of causal inference in large-scale biobanks in China.

Multi-omics Mendelian Randomization Supports Drug Target Discovery

Jie Zheng

Shanghai Jiao Tong University
Learn About

Haoyu Zhang

Personal Website

Tenure-track Investigator, National Cancer Institute
I was appointed Earl Stadtman tenure-track investigator at NCI in August 2022.
I finished a postdoc training in the Department of Biostatistics in Harvard under the guidance of Dr. Xihong Lin from 2019-2022.
I received Ph.D. in biostatistics at Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland in 2019 under the guidance of Dr. Nilanjan Chatterjee and Dr. Ni Zhao.
I received B.S. in statistics from Zhejiang University in Hangzhou, China.

A robust whole-genome Mendelian randomization approach for improved estimation and inference of causal effects

Haoyu Zhang

National Cancer Institute
Learn About

Fabrizia Mealli

Personal Website

Professor, Department of Statistics, University of Florence
fabrizia.mealli@unifi.it

Selecting Subpopulations for Causal Inference in Regression Discontinuity Designs

Fabrizia Mealli

University of Florence
Learn About

Zhichao Jiang

Personal Website

Policy learning with asymmetric utilities

Zhichao Jiang

Sun Yat-sen University
Learn About

Fan Yang

Personal Website

Professor
Tsinghua University

Mediation Analysis with the Mediator and Outcome Missing not at Random

Fan Yang

Tsinghua University
Learn About

Satoshi Hattori

Personal Website

Professor, Osaka University

A simple sensitivity analysis method for the average causal effect via linear programming subject to estimating equation constraints

Satoshi Hattori

Osaka University
Learn About

Ronghui Xu

Professor, University of California, San Diego

Doubly Robust Inference Under Possibly Misspecified Marginal Structural Cox Model

Ronghui Xu

UC San Diego
Learn About

Theis Lange

Personal Website

Professor
University of Copenhagen

Reanalyzing a four large RCTs within intensive care using TMLE; does it deliver on the theoretical promises?

Theis Lange

University of Copenhagen
Learn About

Xiao-Hua Zhou

Personal Website

Endowed Chair Professor
Peking university

Causal Inference in Companion Diagnostic Test Studies

Xiao-Hua Zhou

Peking university
Learn About

Richard Guo

Personal Website

Research Associate, Cambridge University

Harnessing Extra Randomness: Replicability, Flexibility and Causality

Richard Guo

University of Cambridge
Learn About

Qingyuan Zhao

Personal Website

Assistant Professor, Cambridge University

Confounder selection via graph expansion

Qingyuan Zhao

University of Cambridge
Learn About

Thomas S. Richardson

Personal Website

Professor
University of Washington

Generalizing Conditional Independence: Nested Markov Models

Thomas S. Richardson

University of Washignton
Learn About

James M. Robins

Personal Website

Professor
Harvard University

On Minimaxity and Admissibility of Double Machine Learning (DML) Estimators Under Minimal Assumptions

James M. Robins

Harvard T. H. Chan School of Public Health
Learn About

Jinzhu Jia

Personal Website

Professor
School of Public Health, Peking University
jzjia@pku.edu.cn

Mendelian randomization analysis with pleiotropy-robust log-linear models for binary outcomes

Jinzhu Jia

Peking University
Learn About

Ruohan Zhan

Hong Kong University

Policy Learning in Adaptive Experiments

Ruohan Zhan

Hong Kong University
Learn About

Min Zhang

Personal Website

Vanke Chair Professor
Tsinghua University

Robust Method for Optimal Treatment Decision Making Based on Survival Data

Min Zhang

Tsinghua University
Learn About

Wenjie Hu

Ph.D., Peking University

Identification and estimation of treatment effects on long-term outcomes in clinical trials with external observational data

Wenjie Hu

Peking University
Learn About

Liuhua Peng

Personal Website

Senior Lecturer
University of Melbourne

Extreme Continuous Treatment Effects: Measures, Estimation and Inference

Liuhua Peng

University of Melbourne
Learn About

Mingming Gong

Personal Website

Associate Professor, Sun Yat-Sen University

Counterfactual Fairness with Partially Known Causal Graph

Mingming Gong

The University of Melbourne
Learn About

Kun Kuang

Personal Website

Associate Professor, Zhejiang University
kunkuang@zju.edu.cn

Causal Inference in Complex Environments

Kun Kuang

Zhejiang University
Learn About

Ilya Shpitser

Personal Website

Associate Professor
Johns Hopkins University

The Proximal ID Algorithm

Ilya Shpitser

Johns Hopkins University
Learn About

Yang Ni

Personal Website

Associate Professor, Texas A&M University

Causal Discovery from Multivariate Functional Data

Yang Ni

Texas A&M University
Learn About

Lu Cheng

Personal Website

Assistant Professor, University of Illinois Chicago

Applied Causal Inference with Surrogate Representation

Lu Cheng

University of Illinois Chicago
Learn About

Biwei Huang

Personal Website

University of California, San Diego

Recent Advances in Causal Discovery with Hidden Confounders

Biwei Huang

University of California, San Diego
Learn About

Chengxi Zang

Personal Website

Instructor
Weill Cornell Medicine

Generating Real-World Evidence for Understanding Long COVID

Chengxi Zang

Weill Cornell Medicine
Learn About

An Zhang

Personal Website

Ph.D., National University of Singapore

Causality-enhanced Recommendation Simulation with Large Language Model-based Agents

An Zhang

National University of Singapore
Learn About

Wei Chen

Personal Website

Professor, Chinese Academy of Sciences

Towards Trustworhty AI by Identifying the Causal Latent Variables

Wei Chen

Chinese Academy of Sciences
Learn About

Peng Cui

Personal Website

Associate Professor (Tenured), Lab of Media and Network,Department of Computer Science and Technology, Tsinghua University
He has published more than 100 papers in famous conferences and periodicals in the field of data mining and multimedia, and has won the best paper awards in 7 international conferences and journals. He won the ACM China New Star Award in 2015 and the CCF-IEEECS Young Scientist Award in 2018. He is currently an outstanding member of CCF and a senior member of IEEE.

Peng Cui

Tsinghua University

PCIC 2022

Speakers

Learn About

Milan Studený

Personal Website

Senior Research Fellow, Academy of Sciences of the Czech Republic
studeny@utia.cas.cz
On Structural Imsets for Describing and Learning Graphical Models
The lecture will be a brief overview of the method of structural imsets for describing probabilistic conditional independence (CI) structures induced by discrete random variables. These are specific vectors with integer components indexed by subsets of a basic set $N$ of variables. Greater emphasis will be put on describing graphical models of CI structure. This leads to the idea of an integer linear programming (ILP) approach to structural learning of decomposable graphical models.
The talk will recall some joint research with James Cussens.

Milan Studený

Academy of Sciences of the Czech Republic
Learn About

Zhongyi Hu

PhD Candidate, Department of Statistics, University of Oxford
zhongyi.hu@keble.ox.ac.uk
Towards Standard Imsets for Maximal Ancestral Graphs
The imsets are an algebraic method for representing conditional independence models. They have many attractive properties when applied to such models, and they are particularly nice for working with directed acyclic graph (DAG) models. In particular, the 'standard' imset for a DAG is in one-to-one correspondence with the independences it induces, and hence is a label for its Markov equivalence class.
We present a proposed extension to standard imsets for maximal ancestral graph (MAG) models, using the `parameterizing set' representation. By construction, our imset also represents the Markov equivalence class of the MAG. We show that for many such graphs our proposed imset is perfectly Markovian with respect to the graph thus providing a scoring criteria by measuring the discrepancy for a list of independences that define the model; this gives an alternative to the usual BIC score. Unfortunately, for some models the representation does not work, and in certain cases does not represent any independences at all. We prove that it does work for simple MAGs where there are only heads of size less than three, as well as for a large class of purely bidirected models. We also show that of independence models that do represent the MAG, the one we give is the simplest possible, in a manner we make precise. Further we refine the ordered local Markov property, which relates to finding the best imsets representing general MAGs.

Zhongyi Hu

University of Oxford
Learn About

Bryan Andrews

Postdoctoral Fellow, Department of Philosophy, Carnegie Mellon University
bjandrews@andrew.cmu.edu
Using Imsets to Score Causal Models with Latent Confounding
Directed acyclic graph (DAG) models have become widely studied and applied in statistics and machine learning -- indeed, their simplicity facilitates efficient procedures for learning and inference. Unfortunately, these models are not closed under marginalization, making them poorly equipped to handle systems with latent confounding. Acyclic directed mixed graph (ADMG) models characterize margins of DAG models, making them far better suited to handle such systems. However, ADMG models have not seen wide-spread use due to their increased complexity. In this talk, I will discuss an extension of the charateristic imset (for DAG models) to ADMG models. I will discuss a factorization criterion for ADMG models admitted by the extension and discuss its equivalence to the global Markov property. Finally, I will demostrate the utility of the factorization by formulating a consistent scoring criterion for learning ADMGs.

Bryan Andrews

Carnegie Mellon University
Learn About

James Cussens

Personal Website

Senior Lecturer, Department of Computer Science, University of Bristol
james.cussens@bristol.ac.uk
Imsets and Supermodular Functions
Supermodular functions provide a "dual" alternative to structural imsets for representing conditional independence (CI) structures. In this talk I will outline the key aspects of the relationship between imsets and supermodular functions and explore the pros and cons of using supermodular functions to represent CI structures. For any given CI relation it is easier to check whether it holds using the supermodular function representation than with the imset representation. Moreover, computing "marginal" supermodular functions is also easy. On the other hand there is, at present, no "standard" supermodular function representation of a Bayesian network structure (unlike the case for imsets). The rank function of any matroid is a submodular function and thus matroids provide compact representations of certain supermodular functions. I will consider how, if at all, this connection to matroids can be exploited.

James Cussens

University of Bristol
Learn About

Jean Morrison

PhD,Assistant Professor, Department of Biostatistics, University of Michigan
jvmorr@umich.edu
Controlling Weak Instrument Bias in Multivariable Mendelian Randomization Using Empirical Shrinkage
Mendelian randomization is a powerful causal inference technique. However, there are several sources of bias that can distort causal estimates made by MR. One particularly troubling source of bias is directional pleiotropy that occurs when confounding variables have substantial heritability. One solution to this problem is multivariable Mendelian randomization (MVMR). However, current MVMR approaches suffer from unavoidable substantial weak instrument bias when applied with more than a few variables or prohibitive computational times. This is problematic for modern applications where it may be desirable to adjust for many potential confounders. I will introduce a new method that dramatically reduces weak instrument bias and has a linearly scaling computational demand.

Jean Morrison

University of Michigan
Learn About

Linbo Wang

Personal Website

PhD,Assistant Professor, Department of Statistical Sciences, University of Toronto
linbo.wang@utoronto.ca
The Synthetic Instrument
In many observational studies, researchers are interested in studying the effects of multiple treatments on the same outcome. Unmeasured confounding is a key challenge in these studies as it may bias the causal effect estimate. To mitigate this bias, we introduce a novel device, called synthetic instrument, to leverage the information contained in multiple treatments for causal effect identification and estimation. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an $\ell_0$ penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset.

Linbo Wang

University of Toronto
Learn About

Yanxun Xu

Personal Website

PhD,Associate Professor,Department of Applied Mathematics and Statistics, Johns Hopkins University
yanxun.xu@jhu.edu
A Bayesian Reinforcement Learning Framework for Optimizing Sequential Combination Antiretroviral Therapy in People with HIV
Numerous adverse effects (e.g., depression) have been reported for combination antiretroviral therapy (cART) despite its remarkable success on viral suppression in people with HIV (PWH). To improve long-term health outcomes for PWH, there is an urgent need to design personalized optimal cART with the lowest risk of comorbidity in the emerging field of precision medicine for HIV. Large-scale HIV studies offer researchers unprecedented opportunities to optimize personalized cART in a data-driven manner. However, the large number of possible drug combinations for cART makes the estimation of cART effects a high-dimensional combinatorial problem, imposing challenges in both statistical inference and decision-making. We develop a Bayesian reinforcement learning framework for optimizing sequential cART assignments. Applying the proposed approach to a dataset from the Women's Interagency HIV Study, we demonstrate its clinical utility in assisting physicians to make effective treatment decisions, serving the purpose of both viral suppression and comorbidity risk reduction.

Yanxun Xu

Johns Hopkins University
Learn About

Yingqi Zhao

Personal Website

PhD,Associate Professor, Public Health Sciences Division, Fred Hutch
yqzhao@fredhutch.org
Constructing Stabilized Dynamic Surveillance Rules for Optimal Monitoring Schedules

Yingqi Zhao received her PhD in biostatistics from the University of North Carolina, Chapel Hill in 2012. She is currently an Associate Professor at Fred Hutchinson Cancer Research Center. Her research focus includes methodologies for personalized medicine, dynamic treatment regimes, observational studies and machine learning. Specific applications of these work include cancer treatment and prevention, health care delivery for complex type II diabetes patients and childhood obesity surveillance. Her work in personalized medicine is particularly notable for these applications, which has been the basis for much subsequent work on developing biomarker-based treatment rules. She is actively engaged with collaborations with the SWOG clinical oncology cooperative group.

Yingqi Zhao

Fred Hutch
Learn About

Lu Wang

Personal Website

University of Michigan
luwang@umich.edu
Estimating the Optimal Dynamic Treatment Regime with Restrictions Using Observational Data
A dynamic treatment regime (DTR) is a sequence of decision rules that provide guidance on how to treat individuals based on their static and time-varying status. Existing observational data are often used to generate hypotheses about effective DTRs. A common challenge with observational data, however, is the need for analysts to consider ``restrictions" on the treatment sequences. Such restrictions may be necessary for settings where (i) one or more treatment sequences that were offered to individuals when the data were collected are no longer considered viable in practice; (ii) specific treatment sequences are no longer available; or (iii) the scientific focus of the analysis concerns a specific type of treatment sequences (e.g., ``stepped-up" treatments). To address this challenge, we propose a Restricted Tree-based Reinforcement Learning (RT-RL) method that searches for an interpretable DTR with the maximum expected outcome, given a (set of) user-specified restriction(s), which specifies treatment options (at each stage) that ought not to be considered as part of the estimated tree-based DTR. In simulations, we evaluate the performance of RT-RL versus the standard approach of ignoring the partial data for individuals not following the (set of) restriction(s). The method is illustrated using an observational dataset to estimate a two-stage stepped-up DTR for guiding the level of care placement for adolescents with substance use disorder.

Lu Wang

University of Michigan
Learn About

Dehan Kong

Personal Website

University of Toronto
dehan.kong@utoronto.ca
Fighting Noise with Noise: Causal Inference with Many Candidate Instruments
Instrumental variable methods provide useful tools for inferring causal effects in the presence of unmeasured confounding. To apply these methods with large-scale data sets, a major challenge is to find valid instruments from a possibly large candidate set. In practice, most of the candidate instruments are often not relevant for studying a particular exposure of interest. Moreover, not all relevant candidate instruments are valid as they may directly influence the outcome of interest. In this article, we propose a data-driven method for causal inference with many candidate instruments that addresses these two challenges simultaneously. A key component of our proposal is a novel resampling method, which constructs pseudo variables to remove irrelevant candidate instruments having spurious correlations with the exposure. Synthetic data analyses show that the proposed method performs favourably compared to existing methods. We apply our method to a Mendelian randomization study estimating the effect of obesity on health-related quality of life.

Dehan Kong

University of Toronto
Learn About

Mireille Schnitzer

Personal Website

Université de Montréal
mireille.schnitzer@umontreal.ca
Outcome-Adaptive LASSO for Confounder Selection with Time-Varying Treatments
Data sparsity is a common problem when conducting causal inference with time-varying binary treatments, especially when treatment can change over many time-points. Many methods involve weighting by the inverse of the probability of treatment, which requires modeling the probability of treatment at each time point. Under sparsity, it is possible to pool these models over time, but when correlations between covariates and treatment vary over time, this can lead to bias. Furthermore, with a large covariate space assumed to be a non-minimal sufficient adjustment set, reducing the adjustment set can greatly improve the variance of the estimator. We consider a novel approach to longitudinal confounder selection using a longitudinal outcome adaptive fused LASSO that will data-adaptively select covariates and collapse the treatment model parameters over time-points with the goal of improving the efficiency of the estimator while minimizing confounding bias.

Mireille Schnitzer

Université de Montréal
Learn About

Yifan Cui

Personal Website

Zhejiang University
cuiyf@zju.edu.cn
Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests
Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival and observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects under unconfoundedness. In our experiments, we find our approach to perform well relative to a number of baselines.

Yifan Cui

Zhejiang University
Learn About

Theis Lange

Personal Website

University of Copenhagen
thlan@sund.ku.dk
Introducing The Joint Initiative for Causal Inference – A New Way to Work Together for Causal Inference Development
Theis Lange is currently Professor and the head of department of Public Health at University of Copenhagen. His primary fields of research are causal inference and mediation, statistical analysis of clinical trials, and non-linear dynamic models.

Theis Lange

University of Copenhagen
Learn About

Helene Charlotte Wiese Rytgaard

Personal Website

University of Copenhagen
hely@sund.ku.dk
Continuous-Time TMLE for Multivariate Causal Parameters in Time-to-Event Settings
Targeted learning (TMLE) is a general methodology for semiparametric efficient substitution estimation of causal parameters that combines machine learning with asymptotic statistical inference. The continuous-time TMLE is a generalization of the targeted learning methods for longitudinal data settings where interventions, covariates and outcome can happen at any subject-specific point in time. This talk considers the continuous-time TMLE for causal effect estimation in classical time-to-event settings with baseline treatment decisions possibly confounded by pre-treatment covariates. Particular focus will be put on estimation of survival and absolute risk probabilities simultaneously across time and event types. The presented estimators are not only asymptotically linear and efficient, following an asymptotic distribution fully characterized by the (multivariate) nonparametric efficient influence function, but are also guaranteed to respect parameter space constraints such as monotonicity and probabilities summing to 1.

Helene Charlotte Wiese Rytgaard is currently Assistant Professor at the Section of Biostatistics, University of Copenhagen. Her research interests are within the fields of causal inference, time-to-event analysis and machine learning, and her work is concerned with the use of data-adaptive methods for drawing valid causal conclusions based on large-scale time-varying observational data. One of her main focus areas is the further development of methods for proper adjustment for time-dependent confounding in the analysis of longitudinal data.

Helene Charlotte Wiese Rytgaard

University of Copenhagen
Learn About

Edwin Fong

Personal Website

Novo Nordisk
CHEF@novonordisk.com
Drop-in of Concomitant Medication
Randomization within a clinical trial ensures that the allocation of treatment is unconfounded, but intercurrent events which occur post-randomization are deemed irrelevant and not accounted for under the intention-to-treat principle. Within clinical trials in diabetes, the initiation of concomitant medication is a key intercurrent event. In this talk, I will investigate the use of longitudinal targeted maximum likelihood estimation (LTMLE) to account for the imbalanced drop-in of concomitant medication between the treatment and placebo arm.

I am currently a data scientist within Methods, Innovation & Outreach at Novo Nordisk, where I develop and apply novel statistical methods within the healthcare setting in collaboration with academia. Previously, I completed my PhD at the Department of Statistics, University of Oxford and the Alan Turing Institute, supervised by Professor Chris Holmes. I am broadly interested in causal inference, machine learning, and Bayesian inference, with a primary focus on applications to clinical trial and observational data. During my PhD, I focused on the foundational intersection between prediction and Bayesian inference. I also investigated scalable Bayesian nonparametric methods such as the Bayesian bootstrap, with a focus on computation and model misspecification.

Edwin Fong

Novo Nordisk
Learn About

Lauren Dang

University of California, Berkeley
lauren.eyler@berkeley.edu
A Cross-Validated Targeted Maximum Likelihood Estimator for Data-Adaptive Experiment Selection Applied to the Augmentation of RCT Control Arms with Observational Data
When running an adequately powered randomized clinical trial (RCT) is infeasible, augmenting the control arm with external data may increase power but at the risk of introducing bias. Existing methods for combining data sources generally rely on stringent assumptions or may have decreased coverage or power in the presence of bias. We propose a cross-validated targeted maximum likelihood estimator (CV-TMLE) to data-adaptively select the optimal experiment - RCT only (if no unbiased external data exists) or RCT with external data. Our algorithm maps the union of empirical data distributions into a selector of the experiment that optimizes the bias-variance tradeoff for the causal effect of interest, where bias is estimated from the combined data with or without information from a negative control outcome (NCO). We apply this CV-TMLE to estimate the average treatment effect (ATE) using simulated RCT and external data with bias ranging from zero to five times the standard error of the RCT CV-TMLE estimator. The CV-TMLE (for all levels of bias: coverage 93-97%, power 62-81%) had similar coverage and improved power compared to the RCT-only CV-TMLE (coverage 95%, power 62%), a t-test (coverage 94%, power 25%), and the Bayesian meta-analytic-predictive priors (RBesT) approach (coverage 93-97%, power 21-33%) and improved coverage compared to a TMLE-based test-then-pool approach (coverage 73-95%, power 62-93%). We also apply the experiment-selector CV-TMLE to distinguish biased versus unbiased extra controls by region in an analysis of the effect of liraglutide on change in hemoglobin A1c from the LEADER trial.

Lauren Dang

University of California, Berkeley
Learn About

Kim Katrine Bjerring Clemmense

Personal Website

Novo Nordisk, MD, PhD, Senior Data Scientist, Postdoc
KKRC@novonordisk.com
Generalisability and Transportability in the Context of Target Trial Emulations
The target trial approach for observational data analysis was formally proposed by Hernan and Robins in 2016. The target trial is the (hypothetical) randomized trial that we would have conducted to answer our question of interest had it been possible to conduct an randomized controlled trial (RCT). Using this approach the analysis of observational data can be viewed as an attempt to emulate that target trial. In this presentation the target trial approach is shortly introduced and examples on how the approach can be used for questions on generalisability and transportability of results from RCTs will be given.

Kim Katrine Bjerring Clemmense

Novo Nordisk
Learn About

Anqi Zhao

Personal Website

National University of Singapore
No Star Is Good News: A Unified Look at Rerandomization Based on P-Values from Covariate Balance Tests
RCTs balance all covariates on average and provide the gold standard for estimating treatment effects. Chance imbalances however exist more or less in realized treatment allocations, subjecting subsequent inference to possibly large variability. Modern scientific publications require the reporting of covariate balance tables with not only covariate means by treatment group but also the associated p-values from significance tests of their differences. The practical need to avoid small p-values renders balance check and rerandomization by hypothesis testing an attractive tool for improving covariate balance in RCTs. We examine a variety of potentially useful schemes for rerandomization based on p-values (ReP) from covariate balance tests, and demonstrate their impact on subsequent inference. The main findings are twofold. First, the estimator from the fully interacted regression is asymptotically the most efficient under all ReP schemes examined, and permits convenient regression-assisted inference identical to that under complete randomization. Second, ReP improves not only covariate balance but also the efficiency of the estimators from the unadjusted and additive regressions.

Anqi Zhao

National University of Singapore
Learn About

Hanzhong Liu

Personal Website

Associate Professor
Tsinghua University

Design-Based Theory for Cluster Rerandomization
Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with rerandomization in the design stage. We define cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors.

Hanzhong Liu is an Associate Professor at the Center for Statistical Science in the Department of Industrial Engineering at Tsinghua University. He received his PhD in Statistics from Peking University and then worked with Prof. Bin Yu as a Postdoctoral Scholar in the Department of Statistics at UC Berkeley. His research focuses on developing statistical theory and methodologies for solving high-dimensional data problems and drawing causal inference.

Hanzhong Liu

Tsinghua University
Learn About

Feng Xie

Personal Website

Beijing Technology and Business University
Identification of Linear Latent Hierarchical Structure
Traditional causal discovery methods mainly focus on estimating causal relations among measured variables, but in many real-world problems, such as questionnaire-based psychometric studies, measured variables are generated by latent variables that are causally related. In this talk, we will investigate the problem of discovering the hidden causal variables and estimating the causal structure, including both the causal relations among latent variables and those between latent and measured variables. We relax the frequently-used measurement assumption and allow the children of latent variables to be latent as well, and hence deal with a specific type of latent hierarchical causal structure.

Feng Xie is currently an associate professor in the Department of Applied Statistics at Beijing Technology and Business University. Before joining BTBU, he received a Ph.D. in the School of Computer Science at Guangdong University of Technology from 2017 to 2020, then did postdoctoral research in the Department of Probability and Statistics at Peking University from 2020 to 2022. From 2019 - 2020, he was a visiting Ph.D. student in the Department of Philosophy, Carnegie Mellon University. Interest: Causal discovery Latent variable model, causal representation learning

Feng Xie

Beijing Technology and Business University
Learn About

Wang Miao

Personal Website

Peking University
Paradoxes and Resolutions for Semiparametric Data Fusion with Individual Data and Summary Statistics
Suppose we have available individual data from an internal study and various types of summary statistics from relevant external studies. External summary statistics have been used as 5 constraints on the internal data distribution, which promised to improve the statistical inference; however, paradoxical results arise in such data integration: efficiency loss may occur if the uncertainty of the summary statistics is not negligible and estimation bias can emerge if they are obtained from a different population from the internal study. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. This initial data-fused efficient estimator is further regularized with adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and an application to a Helicobacter pylori infection data are used to illustrate the proposed methods.

Wang Miao is currently Assistant Professor at the School of Mathematical Sciences, Peking University. He obtained BS and PhD degrees in 2012 and 2017 at Peking University, and then did postdoctoral research at the Department of Biostatistics at Harvard University. His research interest spans in causal inference, missing data analysis, data fusion, semiparametrics, and their application in modern data science and AI researches.

Wang Miao

Peking University
Learn About

Xiao-Hua Zhou

Personal Website

PKU Endowed Chair Professor
Peking university
Causal Inference of Truncation-by-Death with Unmeasured Confounding
Clinical studies often encounter truncation by death, which may render some outcomes undefined. Statistical analysis based solely on observed survivors may lead to biased results because the characteristics of survivors may differ between treatment groups. In this case, the commonly used meaningful causal parameter is the survivor average causal effect (SACE), which may not be identifiable when there is unmeasured confounding between the treatment assignment and survival or outcome processes. In this talk, we first show that the survivor average causal effect on the control is identifiable based on a substitutional variable under appropriate assumptions.
Next, we propose an augmented inverse probability weighting (AIPW) type estimator for this estimand with robustness to model misspecification. Finally, the proposed method is applied to investigate the effects of allogeneic stem cell transplantation types on leukemia relapse. This is a joint work with Drs. Yuhao Deng and Y. Chang at Peking University.

Xiao-Hua Zhou

Peking university
Learn About

Sara Magliacane

Personal Website

University of Amsterdam & Research Scientist at IBM Watson AI Lab
sara.magliacane@gmail.com
Causality-Inspired ML: What can Causality Do for ML?
Applying machine learning to real-world cases often requires methods that are robust w.r.t. heterogeneity, missing not at random or corrupt data, selection bias, non i.i.d. data etc. and that can generalize across different domains. Moreover, many tasks are inherently trying to answer causal questions and gather actionable insights, a task for which correlations are usually not enough. Several of these issues are addressed in the rich causal inference literature. On the other hand, often classical causal inference methods require either a complete knowledge of a causal graph or enough experimental data (interventions) to estimate it accurately. Recently, a new line of research has focused on causality-inspired machine learning, i.e. on the application ideas from causal inference to machine learning methods without necessarily knowing or even trying to estimate the complete causal graph. In this talk, I will present an example of this line of research in the unsupervised domain adaptation case, in which we have labelled data in a set of source domains and unlabelled data in a target domain ("zero-shot"), for which we want to predict the labels. In particular, given certain assumptions, our approach is able to select a set of provably "stable" features (a separating set), for which the generalization error can be bound, even in case of arbitrarily large distribution shifts. As opposed to other works, it also exploits the information in the unlabelled target data, allowing for some unseen shifts w.r.t. to the source domains. While using ideas from causal inference, our method never aims at reconstructing the causal graph or even the Markov equivalence class, showing that causal inference ideas can help machine learning even in this more relaxed setting.

Sara Magliacane

University of Amsterdam & IBM Watson AI Lab
Learn About

Fredrik Johansson

Personal Website

Chalmers University of Technology, Sweden
fredrik.johansson@chalmers.se
Efficient Learning Using Privileged Information with Known Causal Structure
In domains where sample sizes are limited, efficient learning is critical. Yet, for many machine learning problems, standard practice routinely leaves substantial information unused. One example is prediction of an outcome at the end of a time series based on variables collected at a baseline time point, for example, the 30-day risk of mortality for a patient upon admission to a hospital. In applications, it is common that intermediate samples, collected between baseline and end points, are discarded, as they are not available as input for prediction when the learned model is used. We say that this information is privileged, as it is available only at training time. In this talk, we show that making use of known causal structure and privileged information from intermediate time series can lead to much more efficient learning. We give conditions under which it is provably preferable to classical learning, and a suite of empirical results to support these findings.

Fredrik Johansson

Chalmers University of Technology
Learn About

Kun Zhang

Personal Website

Carnegie Mellon University & Mohamed bin Zayed University of Artificial Intelligence
kunz1@cmu.edu
Advances in Causal Representation Learning
This talk is concerned with causal representation learning, which aims to reveal the underlying high-level hidden causal variables and their relations. It can be seen as a special case of causal discovery, whose goal is to recover the underlying causal structure or causal model from observational data. The modularity property of a causal system implies properties of minimal changes and independent changes of causal representations, and I will explain how such properties make it possible to recover the underlying causal representations from observational data with identifiability guarantees: under appropriate assumptions, the learned representations are consistent with the underlying causal process. The talk will consider various settings with independent and identically distributed (i.i.d.) data, temporal data, or data with distribution shift as input, and demonstrate when identifiable causal representation learning can benefit from the flexibility of deep learning and when it has to impose parametric assumptions on the causal process.

Kun Zhang

Carnegie Mellon University & Mohamed bin Zayed University of Artificial Intelligence
Learn About

Zhichao Jiang

Personal Website

Sun Yat-sen University
jzcpanda@163.com
Safe Policy Learning through Extrapolation
Algorithmic recommendations and decisions have become ubiquitous in today’s society. Many of these and other data-driven policies are based on known, deterministic rules to en- sure their transparency and interpretability. This is especially true when such policies are used for public policy decision-making. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic classification scores and recommendations to help judges make release decisions. Unfortunately, existing methods for policy learning are not applicable because they require existing policies to be stochastic rather than deterministic. We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the worst-case regret. The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. Lastly, we apply the proposed methodology to a unique field experiment on pre-trial risk assessments.

Zhichao Jiang

Sun Yat-sen University
Learn About

Zhonghua Liu

Personal Website

Columbia University
zl2509@cumc.columbia.edu
Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification and Estimation for Causal Inference
Standard Mendelian randomization analysis can produce biased results if the genetic variant defining an instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment variable. We provide novel identification conditions for the causal effect of a treatment in the presence of unmeasured confounding by leveraging a possibly invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the possibly invalid IV on the additive scale; (ii) that the confounding bias does not vary with the possibly invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic with respect to the possibly invalid IV. Although assumptions (i) and (ii) have respectively appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV. MR MiSTERI is shown to be particularly advantageous in the presence of pervasive heterogeneity of pleiotropic effects on the additive scale. We propose a simple and consistent three-stage estimator that can be used as a preliminary estimator to a carefully constructed efficient one-step-update estimator. In order to incorporate multiple, possibly correlated and weak invalid IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved estimation accuracy. Both simulation studies and UK Biobank data analysis results demonstrate the robustness of the proposed methods.

Zhonghua Liu

Columbia University
Learn About

Lin Liu

Personal Website

Shanghai Jiao Tong University
linliu@sjtu.edu.cn
DNN-Based Causal Inference and New Stable Empirical Higher Order Influence Functions
Deep neural nets (DNNs) have been successful in a variety of statistical problems, from image classification to solving nonlinear inverse problems. In recent years, there is also a growing body of literature studying how well DNNs can learn causal effects. In this talk, we show that the current theoretical results are, however, largely unsatisfactory, potentially due to the implicit bias of the DNN training process. As a partial remedy, in Liu et al. 2020 Stat. Sci., we developed a higher-order influence function (HOIF)-based assumption-lean hypothesis testing procedure that can detect bias of causal effect estimators regardless of the properties of the true or the estimated nuisance functions. But the proposed procedure has many unresolved open problems. In the later part of the talk, we will introduce new numerically stable, empirical HOIFs. The new HOIFs shares very similar asymptotic properties to those proposed before, yet enjoys much better finite-sample properties. Finally, we demonstrate the improved finite-sample performance with simulation studies.

Lin Liu

Shanghai Jiao Tong University
Learn About

Jingshu Wang

University of Chicago
jingshuw@uchicago.edu
Causal Mediation Analysis with Mendelian Randomization
Understanding the pathogenic mechanism of common diseases is a fundamental goal in clinical research. As randomized controlled experiments are not always feasible, Mendelian Randomization (MR), which uses natural genetic mutations as instruments, has become a popular alternative method for probing the causal mechanisms of common diseases. However, current MR methods typically ignore the temporal relationship between the risk factors and disease progression. In this talk, I will discuss a statistical approach based on Mendelian Randomization to evaluate the causal mediation effects of a sequence of risk factors in a temporal order on a later disease status. To increase efficiency and robustness, our framework is based on a full Bayesian framework and allows the adjustment of pleiotropic effects. I will illustrate the performance of our approach both in simulations and real data case studies.

Jingshu Wang

University of Chicago
Learn About

Wei Li

Renmin University of China
weilistat@ruc.edu.cn
Retrospective Causal Inference with Multiple Effect Variables
As highlighted in Dawid (2000) and Pearl & Mackenzie (2018),deducing the causes of given effects is a more challenging problem than evaluating the effects of causes in causal inference. Lu et al. (2022) proposed an approach for deducing causes of a single effect variable based on posterior causal effects. In many applications, there are multiple effect variables, and thus they can be used simultaneously to more accurately deduce the causes. To retrospectively deduce causes from multiple effects, we propose multivariate posterior total, intervention and direct causal effects conditional on the observed evidence.
We describe the assumptions of no-confounding and monotonicity, under which we prove identifiability of the multivariate posterior causal effects and provide their identification equations. When the causal relationships among the causes and effects are described by a causal network, both the assumptions and identification equations can be simplified.
The proposed approach can be applied for causal attributions, medical diagnosis, blame and responsibility in various studies with multiple effect or outcome variables. Two examples are used to illustrate the proposed approach.

Wei Li

Renmin University of China
Learn About

Jinzhu Jia

Personal Website

School of Public Health, Peking University
jzjia@pku.edu.cn
Evaluating Causes of Effects by Posterior Effects of Causes
For the case with a single causal variable, Dawid et al. (2014) defined the probability of causation and Pearl (2000) defined the probability of necessity to assess the causes of effects. For a case with multiple causes which may affect each other, this paper defines the posterior total and direct causal effects based on the evidences observed for post-treatment variables, which could be viewed as measurements of causes of effects. The posterior causal effects involve the probabilities of counterfactual variables. Thus, like probability of causation, probability of necessity and the direct causal effects, the identifiability of the posterior total and direct causal effects requires more assumptions than the identifiability of the traditional causal effects conditional on pre-treatment variables. We present assumptions required for the identifiability of the posterior causal effects and provide identification equations. Further, when the causal relationships among multiple causes and an endpoint may be depicted by causal networks, we can simplify both the required assumptions and the identification equations of the posterior total and direct causal effects. Finally, using numerical examples, we compare the posterior total and direct causal effects with other measures for evaluating the causes of effects and the population attributable risks.

Jinzhu Jia

Peking University
Learn About

Gary Chan

Personal Website

Department of Biostatistics, University of Washington
kcgchan@uw.edu
A Simple Asymptotic Non-Conversative Test for Indirect Effects

Gary Chan

University of Washington
Learn About

Dylan Small

Personal Website

Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
dsmall@wharton.upenn.edu
Testing an Elaborate Theory of a Causal Hypothesis
When R.A. Fisher was asked what can be done in observational studies to clarify the step from association to causation, he replied, “Make your theories elaborate” -- when constructing a causal hypothesis, envisage as many different consequences of its truth as possible and plan observational studies to discover whether each of these consequences is found to hold. William Cochran called “this multi-phasic attack…one of the most potent weapons in observational studies.” Statistical tests for the various pieces of the elaborate theory help to clarify how much the causal hypothesis is corroborated. In practice, the degree of corroboration of the causal hypothesis has been assessed by a verbal description of which of the several tests provides evidence for which of the several predictions. This verbal approach can miss quantitative patterns. We develop a quantitative approach to making statistical inference about the amount of the elaborate theory that is supported by evidence.

Dylan Small

University of Pennsylvania
Learn About

Emilija Perkovic

Personal Website

University of Washington
perkovic@uw.edu
Total Causal Effects in MPDAGs: Identification and Minimal Enumeration
We present a necessary and sufficient causal identification formula for maximally oriented partially directed acyclic graphs (MPDAGs) and a recursive algorithm for possible causal effect enumeration when the causal effect is not identified.

Emilija Perkovic

University of Washington
Learn About

Ricardo Silva

Personal Website

Professor of Statistical Machine Learning and Data Science
Department of Statistical Science, University College London, UK
ricardo.silva@ucl.ac.uk
On Prediction, Action and Interference in Algorithmic Fairness
We also characterize the minimal additional edge orientations required to identify a given total effect. A recursive algorithm is developed to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This result resolves an issue with existing methods, which often report possible total effects with duplicates, namely those numerically distinct due to sampling variability but causally identical.

Ricardo Silva is a Professor of Statistical Machine Learning and Data Science at the Department of Statistical Science, UCL. He also holds an Adjunct Faculty position at the Gatsby Computational Neuroscience, UCL, and a Faculty Fellowship at the Alan Turing Institute. Ricardo obtained a PhD in Machine Learning from Carnegie Mellon University, 2005, followed by postdoctoral positions at the Gatsby Unit and at the Statistical Laboratory, University of Cambridge. His main interests are on causal inference, graphical models, and probabilistic machine learning. His research has received funding from organisations such as EPSRC, Innovate UK, the Office of Naval Research, Winton Research and Adobe Research. Ricardo has also served in the senior program committee of several top machine learning conferences, including acting as a Senior Area Chair at the NeurIPS and ICML conferences and being a Program Chair and Conference Chair for the Uncertainty in Artificial Intelligence conference

Ricardo Silva

University College London
Learn About

Jake Fawkes

Personal Website

PhD Student Department of Statistics, University of Oxford, UK
jake.fawkes@stats.ox.ac.uk
Problems due to Selection and an Overcommitment to Ignorability in Causal Fairness
In this talk we discuss which causal models can correctly capture the causal features in fairness problems. We begin by pointing out that the current causal fairness literature often commits to DAGs with independent noise and ancestrally closed sensitive attributes. From here we argue that these assumptions are often significantly too strong to capture the correct causal features such as counterfactuals and causal effects. Our argument is based upon two points, the fact that these models commit to ignorability and that fairness datasets arise from a complex selection process. We derive conditions this selection must satisfy for ignorability to hold and discuss implications for causal fairness when this assumption does not hold.

Jake Fawkes is a PhD student at the University of Oxford Statistics Department working under the supervision of Robin Evans and Dino Sejdinovic. His research centres on causal machine learning with a focus on applying causality to the fairness and explainability of machine learning methods.

Jake Fawkes

University of Oxford
Learn About

Razieh Nabi

Personal Website

Assistant Professor, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, USA
razieh.nabi@emory.edu
A Causal and Counterfactual View of (Un)fairness in Automated Decision Making
Despite the illusion of objectivity, algorithms make use of subjective judgements of human beings at every step of their development. A particular worry in the context of automated decision making is perpetuating injustice, i.e., when maximizing “utility” maintains, reinforces, or even introduces unfair dependencies between sensitive features (e.g., race, gender, age, sexual orientation), decisions, and outcomes. It is therefore essential that automated decisions respect principles of fairness, particularly in socially-impactful settings such as healthcare, social welfare, and criminal justice. In this talk, we show how to use methods from causal inference and constrained optimization to make optimal but fair decisions that would “break the cycle of injustice” by correcting for the unfair dependence of both decisions and outcomes on sensitive features.

Razieh Nabi is a Rollins Assistant Professor in the Department of Biostatistics and Bioinformatics at Emory Rollins School of Public Health. Her research is situated at the intersection of machine learning and statistics, focusing on causal inference and its applications in healthcare and social justice. More broadly, her work spans problems in causal inference, mediation analysis, algorithmic fairness, semiparametric inference, graphical models, and missing data. She has received her PhD (2021) in Computer Science from Johns Hopkins University.

Razieh Nabi

Emory University
Learn About

Daniel Malinsky

Personal Website

Assistant Professor Department of Biostatistics, Mailman School of Public Health, Columbia University, USA
d.malinsky@columbia.edu
Causal Determinants of Postoperative Length of Stay in Cardiac Surgery Using Causal Graphical Learning
Many goals within causal inference, including estimating average treatment effects and understanding path-specific mechanisms, depend on knowing the qualitative causal structure underlying a domain. In this work we apply methods for graphical causal discovery (specifically the FCI algorithm) to observational data in the form of electronic health records (EHR) from Johns Hopkins Hospital. Our goal is to understand the causal determinants of postoperative length of stay for patients undergoing cardiac surgery procedures, in order to inform possible interventions that support faster patient recovery. We discuss the challenges in applying causal discovery methods to electronic health records and opportunities for future work.

Daniel Malinsky's methodological research focuses mostly on causal inference: developing statistical methods and machine learning tools to support inference about treatment effects, interventions, and policies. Current research topics include graphical structure learning (a.k.a. causal discovery or causal model selection), semiparametric inference, time series analysis, and missing data. Application areas of particular interest include environmental determinants of health and health disparities. Dr. Malinsky also studies algorithmic fairness: understanding and counteracting the biases introduced by data science tools deployed in socially-impactful settings. Finally, Dr. Malinsky has interests in the philosophy of science and the foundations of statistics. Before joining Columbia University, Dr. Malinsky was a postdoctoral fellow at Johns Hopkins University and he earned his PhD at Carnegie Mellon University.

Daniel Malinsky

Columbia University
Learn About

Chaochao Lu

Personal Website

PhD, University of Cambridge & Max Planck Institute for Intelligent Systems
cl641@cam.ac.uk
Learning Causal Representations for Generalization in Reinforcement Learning
A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on. We investigate these generalization problems from a unified view. For this, we propose a general framework to tackle them with theoretical guarantees on both identifiability and generalizability under mild assumptions on environmental changes. By leveraging a diverse set of training environments, we construct a data representation that ignores any spurious features and consistently predicts target variables well across environments. Following this approach, we build invariant predictors in terms of policy, representations, and dynamics. We theoretically show that the resulting policies, representations, and dynamics are able to generalize to unseen environments. Extensive experiments on both synthetic and real-world datasets show that our methods attain improved generalization over a variety of baselines.

Chaochao Lu

University of Cambridge & Max Planck Institute for Intelligent Systems
Learn About

Jundong Li

Personal Website

Assistant Professor, University of Virginia
jundong@virginia.edu
Learning Causality with Graphs
The ability to learn causality is considered as a significant component of human-level intelligence and can serve as the foundation of AI. In causality learning, one fundamental problem is to understand the causal effects of a specific treatment (e.g., prescription of medicine) on an important outcome (e.g., cure of a disease), with significant implications in various high-impact domains such as health care, education, and e-commerce. One prevalent way to solve the problem is to directly use the observational data since the alternative randomized experiments could be expensive, time-consuming, and even unethical in many scenarios. However, existing data-driven methods are often limited since they: (1) assume that observational data is independent and identically distributed (i.i.d.), furthermore, different units cannot interfere with each other; and (2) ignore the influence of hidden confounders (i.e., the unobserved variables that affect both the treatment and the outcome). Meanwhile, real-world data is often connected and can be abstracted as graphs (e.g., social networks, biological networks, and knowledge graphs). The ubiquitous of graph data across many influential areas also brings opportunities to control the influence of hidden confounders and build more effective models that yield unbiased causal effects estimation. In this talk, I will introduce our recent research efforts in causal effects learning with graphs. Specifically, we attempt to answer the following research questions: How to utilize graph information among observational data for causal effects learning? How to harness the power of historical information to tame the influence hidden confounders for causal effects learning when the graph is continuously evolving?

Jundong Li is an Assistant Professor in the Department of Electrical and Computer Engineering, with a joint appointment in the Department of Computer Science, and School of Data Science. He received his Ph.D. degree in Computer Science at Arizona State University in 2019, M.Sc. degree in Computer Science at University of Alberta in 2014, and B.Eng. degree in Software Engineering at Zhejiang University in 2012. His research interests are generally in data mining and machine learning, with a particular focus on graph mining/graph machine learning, causal inference, and algorithmic fairness. As a result of his research work, he has published over 100 papers in high-impact venues (including KDD, WWW, IJCAI, AAAI, WSDM, EMNLP, CIKM, ICDM, SDM, ECML-PKDD, CSUR, TPAMI, TKDE, TKDD, TIST, etc), with over 5,500 citation count. He has won several prestigious awards, including SIGKDD 2022 Best Research Paper Award, NSF CAREER Award, JP Morgan Chase Faculty Research Award, Cisco Faculty Research Award, and being selected for the AAAI 2021 New Faculty Highlights roster.

Jundong Li

Columbia University
Learn About

Fei Wang

Personal Website

Cornell University
feiwang.cornell@gmail.com
Emulating Clinical Trials with Large Scale Electronic Health Records
Drug discovery and development is an expensive and time consuming process. Improving the efficiency and efficacy of clinical trials is crucial to the pharmaceutical industry. In recent years, large scale real world patient data, such as longitudinal electronic health records, have been accumulated. These data contain practiced based evidence including treatment effectiveness and safety information for individual patients. Effective exploration of these information can help generate better hypotheses and inform clinical trial design. Clinical trial emulation refers to the process of mimicking clinical trials with real world patient data. Due to their retrospective nature, we need to carefully control the potential confounding factors to allow objective treatment effectiveness estimation. In this talk, I will talk about some of our recent research on trial emulation for identification of repurposable drugs for treating Alzheimer's disease and identification of potential symptoms and conditions for post-acute SARS-CoV-2 infection sequelae, and point out potential challenges and future directions.

Fei Wang

Cornell University
Learn About

Kun Kuang

Personal Website

Associate Professor, Zhejiang University
kunkuang@zju.edu.cn
Causal Inference with Instrumental Variables
Causal questions exist in many areas, such as health care, economics, political science, digital marketing, etc. Does a new medication lead to a better performance on a certain illness, compared with the old ones? Does a new marketing strategy improve the sales of a certain products? All these questions can be addressed by the causal inference technique.
The gold standard approaches for causal inference are randomized experiments, for example, A/B testing. However, the fully randomized experiments are usually extremely expensive and sometimes even infeasible. Hence, it is highly demanding to develop automatic statistical approaches to infer causal effect in observational studies.
In this talk, we will show some new challenges of causal inference in the wild big data scenarios, including (1) high dimensional and noisy variables, (2) Unobserved confounders, and (3) complex treatment variables. We will mainly focus on the challenges from unobserved confounders and introduce recently proposed IV methods in both causal inference and machine learning communities. Specifically, we first introduce how to combine confounder balancing techniques and IV regression model for a confounder balanced IV regression. Then, we will have a discussion on generating a representation to serve the role of IVs from observed variables. Finally, we will introduce how to learn a latent group IV with Meta-EM algorithm from data fusion for causal inferences with unobserved confounders.

Kun Kuang

Zhejiang University

PCIC 2021

Speakers

Learn About

Fan Li

Personal Website

Duke University
Is Being an Only Child Harmful to Psychological Health? Evidence from a Local Instrumental Variable Analysis of the China One-Child Policy
众所周知,中国曾经有一段严格的计划生育政策。一些文献声称独生子女对心理健康有积极影响,但这些断言很难称得上是因果的。作者分析了中国家庭面板研究(CFPS)数据,描述统计量显示,一孩家庭更富裕、教育水平更高。因果推断的挑战是,是否是独生子女(处理变量)不是随机分配的,作者把一孩政策当作工具变量,一孩政策在不同地区执行强度不同,更倾向于是外生的。结局变量是序数型的,工具变量是连续的,所以作者采取了局部工具变量(local IV)方法。
在一些假设——外生工具变量、非弱工具变量(正性)、处理变量关于工具变量的单调性、排他性约束——下,定义因果参数处理组平均因果作用(ATT)、政策相关的因果作用(PRTE)、边际因果作用(MTE)。需要注意的是,PRTE和MTE是局部作用,依赖于处理变量在不同工具变量下的联合潜在值,类似于主分层。可以把处理变量写成隐选择表示,这样,ATT、PRTE可以写成MTE关于隐变量的特定分布积分形式。设定隐选择模型,用MCMC方法计算MTE的后验。作者发现,成为独生子女会损害自我报告的心理健康程度。

Fan Li, PhD, is an expert in causal inference, the field of statistics concerned with evaluating treatments in randomized experiments and observational studies. She earned her BSc in Mathematics from Peking University and her PhD in Biostatistics from Johns Hopkins University. Prior to joining the Duke faculty, she completed a postdoctoral fellowship at Harvard Medical School's Department of Health Care Policy. Her research includes advanced Bayesian methods for causal inference, missing data, and variable selection. Dr. Li's applied interests span the social sciences, economics, health policy, epidemiology and engineering.

Fan Li

Duke University
Learn About

Clark Glymour

Personal Website

Carnegie Mellon University
Two Dogmas of Methodology
从哲学层面剖析了自动化因果搜索的合理性,反驳了一些批评者对于使用观察性数据进行自动因果搜索的批评,例如:

  1. 休谟证明了不存在因果性,但实际上休谟并没说过!
  2. 因果性只能从对照试验中或的证据,但说这句话的费舍尔也研究确定性的科学!
  3. 实验结局需要事先确定,这是口号,在实际研究中忽略这个口号!
  4. 自动化的科学发现是不可能的(因为观察不到隐变量等),但其实存在一些解决算法了。
  5. 多重检验悖论,可以用真/假阳性、真/假阴性权衡,所有的校正方法都需要一个待校正的集合。
  6. 搜索假设了忠实性或无冗余性,但实际上有算法不需要。
作者认为,对因果关系自动搜索的批评要么是错误的,要么同样适用于用过试验进行的因果推断。作者相信,随着新算法的发展,以及对教条的放弃,自动化的、数据驱动的搜索将会得到发展。

Glymour is the founder of the Philosophy Department at Carnegie Mellon University, a Guggenheim Fellow, a Fellow of the Center for Advanced Study in Behavioral Sciences, a Phi Beta Kappa lecturer,and is a Fellow of the statistics section of the AAAS. Glymour and his collaborators created the causal interpretation of Bayes nets.

Clark Glymour

Carnegie Mellon University
Learn About

Sofia Triantafillou

Personal Website

University of Pittsburgh
Combining Observational and Experimental Data for Personalized Causal Prediction
这一研究的动机是,我们有处理X、结局Y,以及一些处理前的协变量,然后想预测Y"do(X)。观察性数据的样本量大,但由于混杂,因果作用估计有偏;试验数据样本量小,但因果作用估计无偏。如何结合二者呢?问题是,为了更好地预测,要纳入哪些协变量呢? 在观察性预测中,我们想要预测P(Y"V):如果在因果图中Z是Y的马尔可夫边界,那么P(Y"V)=P(Y"Z),有许多方法从观察性数据中学习马尔可夫边界。而对于因果预测,我们想要预测的是P(Y"do(X),V)。我们把干预图中Y的马尔可夫边界称为干预马尔可夫边界(IMB)。如果有试验数据,那么我们可以使用MB算法识别IMB;如果只有观察性数据,我们并不总能识别出IMB。 再定义因果马尔科夫边界(CMB)为一个集合Z,使得P(Y"do(X),Z\X)可识别,Z包含了Y"do(X)的最大信息并且Z是最小的。CMB是马尔科夫边界的子集,不一定是唯一的。如果IMB=CMB,那么我们可以使用IMB+CMB;否则,我们可以只使用IMB。我们可以用贝叶斯得分检验Z是否是CMB或IMB。

Sofia Triantafillou is an assistant Professor, Department of Biomedical Informatics in University of Pittsburgh
Research interests:
Developing causal discovery methods in the presence of latent variables, methods for integrative analysis of multiple experiments, modelling and application of causal discovery on neural and biological data.

Sofia Triantafillou

University of Pittsburgh
Learn About

Yanxun Xu

Personal Website

PhD,Associate Professor,Department of Applied Mathematics and Statistics, Johns Hopkins University
yanxun.xu@jhu.edu
该研究的一个背景是,在患者进行肾移植后,医生对患者进行随访,确定新的治疗和下一次随访日期。我们要研究的问题是,肌酐随时间如何变化?肌酐如何影响生存?医生如何治疗患者?最后需要找到最优的治疗策略。这涉及到四个模型,分别是纵向模型、生存模型、剂量量模型和随访模型。
我们记录了不同时间的患者临床测量值和医生决策,思路是用贝叶斯联合模型拟合模型。用线性混合效应模型把纵向模型和剂量模型联系起来,再通过把纵向效应、剂量效应、随访模型纳入到危险率函数中,把生存模型和其余三个模型联系起来。利用贝叶斯联合模型,为未知参数设定先验分布,就能获得后验分布。最后,最大化一个收益函数,得到最优治疗方案。作者把该方法应用于实际的肾移植数据上。

Yanxun Xu is an assistant professor in the Department of Applied Mathematics and Statistics. Her research focuses on Bayesian statistics; cancer genomics; clinical trial design; graphical models; nonparametric Bayesian statistical inference for big data analysis; high-throughput genomic date; and proteomics data. She earned her doctorate (2013) at Rice University; her master’s (2010) at Texas Tech University; and her bachelor’s (2007) at Beijing University of Aeronautics and Astronautics.

Yanxun Xu

Johns Hopkins University
Learn About

Mark van der Laan

Personal Website

University of California, Berkeley
通常,在统计学习中,我们会建立估计量的近似抽样概率分布,从而进行统计推断。在目标学习中,我们基于数据的现实统计模型,构建估计值和置信区间,这就需要高度数据适应性的统计方法。
一阶TMLE的步骤是,给定一个初始估计量,构建全局最小有利参数子模型,求极大似然估计并插入目标估计,解有效影响曲线估计方程。定义目标参数的精确二阶余项,与得分函数相结合,产生目标极大似然估计的关键方程。如果余项足够小,那么这样的到的估计是渐进有效的插入估计量。一阶TMLE优化了目标参数关于经验均值的精确总余项。假设冗余函数是右连左极、有界变差的,则使用初始估计的TMLE是渐进有效的。
在二阶TMLE中,仍然使用一个初始估计。定义一阶TMLE的精确总余项,用类似的方法让余项最小。使用二阶TMLE可以显著地降低偏差。进一步地,k阶TMLE将允许k+1阶差异的线性展开。

The goal of Mark's research group is to develop statistical methods to estimate/learn causal and non-causal parameters of interest, based on potentially complex and high dimensional data from randomized clinical trials or observational longitudinal studies, or from cross-sectional (e.g., case-control sampling) studies. The model assumptions under which these methods are valid should be clearly formulated, so that they can be subject to scrutiny. The estimates should be accompanied by confidence regions for the true parameter values or other types of confidence measures (e.g., variability/reproducibility of clusters as measured by the bootstrap). The longitudinal data structures may involve high dimensional measurements such as whole genome profiles at various points in time; censoring and missingness of data due to a subject not responding well to treatment (or not feeling well); and changes of treatment at various points in time, based on variables related to the outcome of interest. The methods are designed to rely on as few assumptions as possible on nuisance parameters so that they provide maximally objective statistical inference and testing procedures. To develop and refine these methods, work with simulated and real data in collaboration with biologists, medical researchers, epidemiologists, and others.

Mark van der Laan

University of California, Berkeley
Learn About

Fan Xia

Personal Website

National Alzheimer's Coordinating Center, University of Washington, Seattle
Decomposition, Identification and Multiply Robust Estimation of Natural Mediation Effects with Multiple Mediators
在因果推断中,识别性假设帮助我们找到直接的目标估计量,统计假设帮助我们得到估计量。处理变量和结局变量中间可能有多条路径,我们把因果作用分解为两部分:总(自然)直接作用和总(自然)非直接作用。当存在多个中介时,特别是如果因果结构未知,我们不能假设中介变量之间的因果结构,所以考虑这样的分解:总非直接作用=通过第k个中介的出非直接作用(EIE)+中介之间的交互作用(INT)。
这样分解的好处是,当对中介变量的干预不会影响中介的潜在结果时,EIE就等于传统的总非直接作用。如果中介之间没有交互作用,INT等于零。作者介绍了两组识别性假设,用来识别平均因果作用或各个分解的子作用,并在这些假设下建立工作参数模型,研究了矩估计和四重稳健估计。

Fan Xia is a postdoctoral fellow at NACC. Her primary areas of interest including causal inference and cluster randomized trials, especially causal mediation analysis and stepped wedge designs.

Fan Xia

University of Washington
Learn About

Peng Cui

Personal Website

Associate Professor (Tenured), Lab of Media and Network,Department of Computer Science and Technology, Tsinghua University
Deep Stable Learning and Heterogeneous Risk Minimization
机器学习方法在许多领域取得了巨大的成功,但绝大部分方法都是基于独立同分布假定(这里指训练集的分布与测试集的分布一致),这极大地限制了它们的应用范围。当测试集与训练集的分布存在较大差异时,模型在测试集的表现将会很差,因而模型缺乏稳定性,这种现象称为out-of-distribution generalization problem。
关于这一问题,Cui首先引入了稳定学习这一概念,它指在不同测试集分布下表现都良好且稳定的方法。其次指出要解决这一问题的关键技术是找出因果特征。以AI识别小狗的为例,背景中的草地可以看作是相关信息,它是导致算法泛化能力弱的主要原因。要使得算法在不同的背景下都能较好地识别出小狗,需要排除这部分相关信息,找出刻画小狗特征的因果特征。
通过借鉴因果推断中加权以使特征变量保持平衡,从而去除混杂的思路,Cui提出了一系列基于加权的稳定学习方法,并介绍了其中的一些算法细节和理论结果。稳定学习方法选择权重的目标是尽量剔除特征变量间的相关性,以剔除图片中关联信息、保留因果信息,从而产生更稳定的估计。
此外,Cui从因果不变预测的角度出发,提出了一个获得稳定学习的异质性风险极小化框架。

He has published more than 100 papers in famous conferences and periodicals in the field of data mining and multimedia, and has won the best paper awards in 7 international conferences and journals. He won the ACM China New Star Award in 2015 and the CCF-IEEECS Young Scientist Award in 2018. He is currently an outstanding member of CCF and a senior member of IEEE.

Peng Cui

Tsinghua University
Learn About

Ryo Okui

Personal Website

Seoul National University
Inference on Effect Size after Multiple Hypothesis Testing
通过假设检验的方法选出一些感兴趣的参数后,再对这些选出的参数作统计推断的方法称为选择推断或事后推断。例如:在一个研究财政补贴对慈善捐赠影响的RCT试验中,t-检验结果如下:
结果表明,财政补贴只对$given including match和response rate(红色部分)有显著影响。若我们只报告这两个结果,直接将这两个结果展现出来的做法是不正确的。因为这两个结果已经是经过我们删选过后的结果。事实上,现实生活中经常出现类似”报喜不报忧”的问题,我们只将那些显著的结果报告出来,而将不显著的结果隐藏起来,这样展现出的结论是有偏的。我们想知道在选出这些变量中,排除“报喜不报忧”引起的偏差后,究竟还有多少个变量(称为effect size)是显著的。Okui提出利用截断正态分布,提出了一种确定effect size的方法。

Ryo is an Associate Professor in Department of Economics, Seoul National University. His research interest includes Microeconometrics, Applied microeconomics And Experimental Economics.

Ryo Okui

Seoul National University
Learn About

Jinzhu Jia

Personal Website

Peking University
Improved Covariate Adjusted Average Treatment Effect Estimate
根据Neyman-Rubin模型,在随机对照试验中,平均处理效应(ATE)的无偏估计为处理组与控制组的结局均值之差,但这个估计没有用上协变量的信息。实际上,尽管协变量与处理变量独立,但它可能会影响潜在结果,因此从直觉上看,加入协变量的信息是能够改善ATE估计效率的。
已有研究表明,若结局为连续型变量时,直接将协变量放入到线性回归模型中,若限定处理组与控制组模型中协变量的系数一样,这样得到的ATE估计并不一定会改善直接用结局均值之差得到的估计;有趣的是,若允许处理组与控制组模型中协变量的系数不一样,无论线性模型是否指定正确,这样得到的ATE估计一定会改善直接用结局均值之差得到的估计,这个结果即便在高维情形下也成立。
一个自然的问题是:若结局为二值变量时,使用逻辑回归进行回归调整,上述结论依然成立吗?结论是不一定成立。Jia发现:逻辑回归调整得到的ATE估计是一个渐近正态的相合估计;若逻辑回归指定正确,则逻辑回归调整估计比用简单结局均值之差得到的估计更有效率,也比用线性回归调整估计更有效率;若逻辑回归指定错误,类似的结论不成立。因此,当Y为二值变量时的结论与Y为连续型变量的结论不对称。
通过分析这种不对称性的原因,Jia提出了基于广义估计方程(GEE)差异ATE估计,这个估计是有一个有效估计,因而比用简单结局均值之差得到的估计更有效。

Jinzhu JIA, researcher and doctoral supervisor, School of Public Health, Peking University. He graduated from Peking University in January 2009. From January 2009 to December 2010, UCBerkeley postdoctoral. From January 2011 to January 2018, he worked in the Department of probability and Statistics of the School of Mathematical Sciences of Peking University and the Statistics Center of Peking University, during which he visited Harvard University for one year. He joined the School of Public Health of Peking University in February 2018. The main research interests are high-dimensional statistical inference, big data analysis, statistical machine learning, causal inference, biological statistics and so on. He has published many papers in the fields of theoretical research on variable selection methods, the application of high-dimensional data and big data's statistical learning, and causal inference. He serves as deputy secretary-general of China probability and Statistics Society, executive director of Young statisticians' Association, director of Computing Statistics Branch of Field Statistics Research Society, and director of High-dimensional data Statistics Association of Field Statistics Research Society.

Jinzhu Jia

Peking University
Learn About

Xiao-Hua Zhou

Personal Website

PKU Endowed Chair Professor
Peking University
Covariate-Specific Treatment Effect Curve in Precision Using Observational Data with High-Dimensional Covariates
随着基础科学的进步,所收集到的数据越来越多,精准医疗在现代生物医学研究中备受关注,它旨在根据病人特征定制化地进行处理。精准医疗的成功依赖于精确稳定的估计个性化治疗规则的统计方法。
当前估计个性化治疗规则的方法可大体分为两类。一类是直接最小化或最大化总体的平均结局,这类方法通常将优化问题转化为一个加权的分类问题,然后采用一些机器学习算法进行估计,其局限性在于难以获得有效的置信区间;另一类方法是通过估计异质性因果效应和相应的置信区间或置信带来选择最优的个性化治疗规则。
Zhou提出的一系列方法属于第二类,但允许数据是高维的且未观测混杂存在。针对不同的结局类型,Zhou首先定义了不同的协变量特异因果作用CSTE (covariate-specific treatment effect)曲线,表示异质性因果效应,具有很好的因果解释。在高维情形下,当观察数据中存在未观测混杂时,Zhou通过工具变量定义了局部CSTE曲线,讨论了识别性,并给出了估计和相应的置信区间。

Professor and doctoral supervisor of Peking University, currently head of Department of Biostatistics, School of Public Health, Peking University, Director of data Center of traditional Chinese Medicine University of Beijing big data Research Institute, Deputy Director of big data Center of Medical and Health, Director of Biostatistics Laboratory of Beijing International Mathematical Research Center, Chairman of China Branch of International Association of Biological Statistics, President of Biomedical Statistics Branch of China Society for Field Statistics, member of American Association for the Advancement of Science. Member of the American Statistical Society and member of the International Institute of Statistics. At present, he is a top journal of biostatistics, associate editor of StatisticsinMedicine, and editor of Biostatistics&Epidemiology, China Branch of the International Society for Biostatistics. In the top international journal of statistics and biostatistics, J.R.Statist.Soc.B Journal of the American Statistical Association, Biometrika, Ann.Statist,Biometrics,Stat.Med. More than 240 SCI academic papers have been published, of which more than 130 are the first or correspondent authors.

Xiao-Hua Zhou

Peking University
Learn About

Jiji Zhang

Personal Website

Hong Kong Baptist University
Do-Calculus and Modularity in Causal Markov Category
考虑类别型的随机变量。一组变量V上的因果贝叶斯网络包含有向无环图以及一组条件概率函数,由此可以根据干预准则得到一系列干预条件分布。这些条件概率和外生机制组成了因果模块,干预打破了相应模块,但不改变其他模块。我们可以用类别理论来表述,类别由对象和箭头组成,箭头从域指向上域。称F是函子,如果F保持域和上域、保持复合和恒同。带有单积的类别称为单类别。因果理论建立在有向无环图上,态射表示因果机制或因果过程。在因果理论中,可以定义从干预变量到结果变量的因果效应态射。讲者用Trek分离的概念说明了因果效应可分解的条件。这一研究反映了do演算是因果理论的核心。

Zhang Jiji received his PhD in Logic, Computation, and Methodology from the Department of Philosophy at Carnegie Mellon University in 2006, and taught previously at California Institute of Technology and Lingnan University, before joining the Department of Religion and Philosophy at Hong Kong Baptist University in 2021. His philosophical interests lie mainly in philosophy of science, formal epistemology, and logic. The interdisciplinary part of his research centers around the topic of causation, addressing both the epistemological and logical aspects of causal reasoning, and the statistical and computational aspects of causal modelling and discovery. His work has appeared in both premier journals in philosophy, such as Journal of Philosophical Logic, British Journal for the Philosophy of Science, Philosophy of Science, Synthese, etc., and in leading venues in computer science and statistics, such as Artificial Intelligence, Journal of Machine Learning Research, Statistical Science, as well as some top conference proceedings in the field of Artificial Intelligence. With the new opportunities provided by the Ethical and Theoretical AI lab, he aims to work with colleagues across and beyond the university to apply causal modelling tools to shed new light on some important issues with implications for AI ethics, including especially machine learning interpretability, algorithmic bias, and AI-powered personalized medicine.

Jiji Zhang

Hong Kong Baptist University
Learn About

Ruichu Cai

Personal Website

Guangdong University of Technology
Hidden Causal Representation Learning via Surrogate Variables
在众多隐变量中发现因果关系在很多领域都受到了关注。如何发现隐变量?如何从观测数据中确定隐变量的因果关系?当存在四个观测变量时,可以用Triad条件判断它们之上存在几个隐变量,但不能识别隐变量之间的方向,而且当只有三个变量时无法确定隐变量的数量。能否用高阶的矩信息完成任务呢?对于非高斯独立噪声的情形,可以用变量关系的不对称来判断隐变量之间的方向。进而,讲者提出广义独立噪声条件,用观测变量作为代理变量,可以得到一些关于协方差的条件。Triad条件可以看作是广义独立噪声条件的特例。最后,讲者考虑了线性非高斯隐变量模型。基于广义独立噪声模型,先找出因果集群,然后确定隐变量之间的方向。

Ruichu Cai is a professor and doctoral supervisor at the School of Computer Science, Guangdong University of Technology. Received a doctorate in engineering from South China University of Technology in 2010 and entered Guangdong University of Technology; was named associate professor in 2011; was named professor and doctoral supervisor in 2015; went to National University of Singapore during 2007-2009 and 2013-2014 Visited and studied with UIUC Advanced Digital Science Research Center. He has presided over 2 National Natural Science Foundation of China, 1 Provincial Outstanding Youth Fund, 1 Pearl River Science and Technology Rising Star and other projects. Dr. Cai focuses on research in fields such as causality discovery and high-dimensional data mining. More than 30 papers have been published, including important conferences in the fields of ICML, SIGMOD, SDM, and internationally renowned journals such as TNNLS, Bioinformatics, TKDE, NN, and PR; 4 authorized invention patents, 2 of which have been implemented in NetEase’s mailbox; related achievements have been implemented successively Won the second prize of Provincial Science and Technology Award (the fourth completer), and the first prize of Provincial Science and Technology Award (the third completer).

Ruichu Cai

Guangdong University of Technology
Learn About

Sebastian Engelke

Personal Website

University of Geneva
Causality for Extreme Values
极值理论主要用于分析一些罕见现象:金融危机,洪灾,台风,火灾等。用统计的语言,即关心的事件都处于所有事件分布的尾部。最近,对极端事件的归因分析引起了广泛的关注,它旨在找出哪些是引起极端事件发生的风险因素。
Engelke的报告分为两部分。第一部分为重尾模型中的因果发现方法。Engelke首先介绍了因果尾系数(causal tail coefficient)的定义,它可用于任意两个变量因果的方向。对于任意p个变量,因果尾系数可用于恢复它们的因果序,并有理论保证。
第二部分讨论重尾分布的分位数处理效(QTE)的估计。QTE有着广泛的应用场景,如它可以回答下列问题:一个教育计划能够增加最穷的0.1%的人多少收入?Engelke给出了极值QTE(当分位数趋于1时的QTE)的估计,并讨论了极值QTE估计的渐近性质。

Sebastian is assistant professor at the Research Center for Statistics at the University of Geneva, where is holding an Eccellenza grant. He was visiting professor at the Department of Statistical Sciences at the University of Toronto from 2018 – 2019. Previously he was an Ambizione fellow at EPF Lausanne with Anthony Davison. Sebastian did his studies in Mathematics at University of Göttingen and UC Berkeley, and he finished his PhD as a Deutsche Telekom Foundation fellow in 2013 at the University of Göttingen with Martin Schlather. His research interests are in extreme value theory, spatial statistics, graphical models and data science. Since 2018, Sebastian is Associate Editor of the Springer journal Extremes and the Scandinavian Journal of Statistics.

Sebastian Engelke

University of Geneva
Learn About

Wang Miao

Personal Website

Peking University
Semiparametric Inference for Nonignorable Nonresponse with Paradata
Miao首先介绍了Paradata。Paradata指跟踪调查数据收集过程的记录,如电话访谈的日期,访谈时间,沟通方式,访谈对象的态度等。这些信息不属于结构化的数据,但它也提供了一些信息。因此,合理地利用Paradata能改善调查的质量。回叫数据为Paradata中的一种,它记录了尝试访谈的次数,提供了关于数据缺失机制的信息。
结构化数据中经常会出现缺失数据。非随机缺失MNAR的难点在于估计量通常是不可识别的,目前解决方法分为三类:约束参数模型、工具变量方法和影子变量方法。这三类方法都需要强的假设条件,实际中未必满足。
Miao提出了一种利用回叫数据处理MNAR的方法,通过它将约束参数模型推广到非参模型,讨论了可识别性假设,得到了半参有效界,并给出了一系列半参有效估计。

Wang is currently an assistant professor in the Department of Probability and Statistics at Peking University. He studied undergraduate and Ph.D. in the School of Mathematical Sciences at Peking University from 2008 to 2017. He did postdoctoral research at the Department of Biostatistics at Harvard University from 2017 to 2018. He joined Peking University in 2018.

Wang Miao

Peking University
Learn About

Michael Elliott

Personal Website

University of Michigan & Vertex
Accounting for Selection Bias due to Death in Estimating the Effect of Wealth Shock on Cognition for the Health and Retirement Study
讲者研究了经济水平降低对于退休人员认知水平的影响。然而由于数据来源并不是随机化实验,同时还存在随时间变化的混杂因素以及死亡截断的问题,导致对因果作用的分析十分复杂。对于时间相关的混杂变量问题,Robins提出的边际结构模型可以去除这些混杂得到因果作用的无偏估计。然而边际结构模型并没有考虑死亡截断造成的影响。因此讲者采用了主分层的方法来估计幸存者因果作用。讲者利用BARTps算法来代替之前的PENCOMP算法,利用填补的方法估计出反事实的生存状态以及潜在结果,讲者通过实际数据分析表明经济水平下降对于退休人员认知水平并没有显著的负面效应。

Michael Elliott is a Professor of Biostatistics at the University of Michigan School of Public Health and Research Scientist at the Institute for Social Research. He received his PhD in biostatistics in 1999 from the University of Michigan. Prior to joining the University of Michigan in 2005, he held an appointment as an Assistant Professor at the Department of Biostatistics and Epidemiology at the University of Pennsylvania School of Medicine, and prior to that as a Visiting Professor of Biostatistics at the University of Michigan School of Public Health and as a Visiting Research Scientist at the University of Michigan Transportation Research Institute. Dr. Elliott's statistical research interests focus around the broad topic of "missing data," including the design and analysis of sample surveys, casual and counterfactual inference, and latent variable models. He has worked closely with collaborators in injury research, pediatrics, women's health, and the social determinants of physical and mental health. Dr. Elliott serves as an Associate Editor for the Journal of the American Statistical Association and the Journal of Survey Statistics and Methodology.

Michael Elliott

University of Michigan & Vertex
Learn About

Vincent Tan

Vertex
Accounting for Selection Bias due to Death in Estimating the Effect of Wealth Shock on Cognition for the Health and Retirement Study
讲者研究了经济水平降低对于退休人员认知水平的影响。然而由于数据来源并不是随机化实验,同时还存在随时间变化的混杂因素以及死亡截断的问题,导致对因果作用的分析十分复杂。对于时间相关的混杂变量问题,Robins提出的边际结构模型可以去除这些混杂得到因果作用的无偏估计。然而边际结构模型并没有考虑死亡截断造成的影响。因此讲者采用了主分层的方法来估计幸存者因果作用。讲者利用BARTps算法来代替之前的PENCOMP算法,利用填补的方法估计出反事实的生存状态以及潜在结果,讲者通过实际数据分析表明经济水平下降对于退休人员认知水平并没有显著的负面效应。

Vincent started working at Vertex Pharmaceuticals as a Senior Biostatistician in Jan 2020. He have served as a Study Biostatistician for 6 studies - 2 successfully data base locked, 2 transitioned, and 2 ongoing (1 of which just had a Interim Analysis data cut). 3 protocols and 1 SAP were completed during this period. The disease areas that he have worked in are Cystic Fibrosis and Type 1 Diabetes. The study designs that he have been involved in are Phase 1 BA, Phase 3 Mechanistic studies, Phase 3 Open Label Studies, and Phase 1/2 studies. He look forward to continue developing my knowledge on the role of a Study Biostatistician in the pharmaceutical industry at Vertex Pharmaceuticals.

Vincent Tan

Vertex
Learn About

Linbo Wang

Personal Website

PhD,Assistant Professor, Department of Statistical Sciences, University of Toronto
Causal Inference on Distribution Functions
传统因果推断往往是考虑处理对于某个实值的结果变量的因果效应,而讲者则关心的是处理对于分布函数的影响。这是因为在讲者所研究的问题是关于结婚对于运动量的分布情况的影响。当结局变量是一个分布时,首先要定义关于分布的平均,以及两个分布之间的差值。对于第一个问题,讲者提出利用Wasserstein barycenter来得到不同分布之间的平均值,讲者指出这样得到的平均能够反映出原来分布的一些性质;而对于第二个问题,讲者利用分位数给出了比较合适的潜在结果之间的差,并具有因果的含义。在定义好因果参数后,讲者给出了类似于实值结局变量时的估计方法,包括逆概率加权,回归以及双稳健方法。

Linbo Wang received his PhD in Biostatistics from University of Washington in 2016. Prior to joining the University of Toronto, he spent two years at Harvard Causal Inference Program. His research interest includes causal inference, graphical models, and modern statistical inference in infinite-dimensional models. He is the recipient of several research awards, including a NSERC Discovery Accelerator Supplement in 2019.

Linbo Wang

University of Toronto
Learn About

Robin Evans

Personal Website

Associate Professor, Department of Statistics, University of Oxford
evans@stats.ox.ac.uk
Parameterizing and Simulating from Causal Models
在因果推断问题中,我们往往会给出潜在结果的模型,此时我们如何利用我们设定的模型模拟出数据的真实分布并不是一个简单的问题。讲者给出了一种参数化以及模拟数据分布的解决方案。讲者将数据分布做了一种分解,分成了我们感兴趣部分的模型,已有的分布模型,以及前两者之间的依赖关系分布。而依赖关系可以利用copula来表示。从而利用这样的分解,讲者通过reject sampling的方法就能够模拟出所需要的数据分布。

Robin is an Associate Professor in Statistics, and a fellow of Jesus College. He received his PhD in Statistics from the University of Washington in 2011, and was a Postdoctoral Research Fellow at the Statistical Laboratory in Cambridge from 2011 to 2013. His research interests include graphical models, causal inference, latent variable models and algebraic and semi-parametric statistics.

Robin Evans

Oxford University
Learn About

Shu Yang

Personal Website

North Carolina State University
Semiparametric Efficient Estimation of Structural Nested Mean Models with Irregularly Spaced Observations
讲者主要介绍了纵向数据中变量的观测时间不规则时的统计分析方法。在SNMM模型中一般假定所有数据都是在固定时间点上测得,然而这在真实世界往往并不能满足。讲者在无未观测混杂的条件下,先讨论了在离散时间情形下的因果作用估计方法,然后利用鞅等工具提出了连续时间情形下的估计方法。同时该方法具有双稳健以及局部有效的性质。此外,当存在可忽略删失时,讲者指出只用完全数据分析是逆概率加权估计量中最优的。

Shu Yang graduated from Iowa State University in 2014 with major in Mathematics and comajor in Statistics working with J.K. Kim and Z. Zhu. After graduation, she joined Harvard TH Chan School of Public Health as a post-doc with Judith Lok. She then joined NC State as a faculty member since 2016.

Shu Yang

North Carolina State University
Learn About

Eric J. Tchetgen

Personal Website

University of Pennsylvania
Proximal Causal Inference
未观测混杂是最爱因果分析的一个很重要的问题,讲者介绍了一种新的能够处理未观测混杂的统计推断方法。讲者首先回顾了可忽略性假定成立是识别因果作用所需的条件。讲者认为很多时候,我们测量的并不是精确的混杂变量,而是混杂变量的代理。此时想要识别因果作用,讲者指出,当存在两个代理变量,且存在bridge function时,因果作用就可以识别。这样的识别条件大大减弱了工具变量识别因果作用的严格条件。同时讲者也简单讲解了非参情形下可以利用minimax方法来估计bridge function。当在线性模型下,讲者给出了类似于工具变量的两阶段最小二乘方法,这使得新提出的方法在应用时非常方便。

Eric J. Tchetgen Tchetgen is the Luddy Family President’s Distinguished Professor at the Wharton School of the University of Pennsylvania.
Professor Tchetgen Tchetgen comes to the University of Pennsylvania from Harvard University, where he has served since 2008 as Professor of Biostatistics and Epidemiologic Methods with joint appointments in the departments of Biostatistics and Epidemiology at the T.H. Chan School of Public Health.
He researches infectious diseases, including HIV/AIDS, and the role of genetic and social factors in the patterns, causes, and effects of public health. Professor Tchetgen Tchetgen has received grants from the National Institutes of Health and the Centers for Disease Control.
He completed his Ph.D. in Biostatistics at Harvard University in 2006 under the supervision of Professor James M. Robins. He received his B.S. in Electrical Engineering from Yale University in 1999.

Eric J. Tchetgen

University of Pennsylvania
Learn About

Zhonghua Liu

University of Hong Kong
MRCIP: A Robust Mendelian Randomization Method Accounting for Correlated and Idiosyncratic Pleiotropy
孟德尔随机化在基因相关的研究中越来越重要。最初大家一般直接将基因变异当作工具变量进行分析,讲者指出实际上这些基因变异往往相关且会有基因多效性,导致工具变量的假定不能满足。之前俄又一些处理基因多效性的方法,但是这些方法还需要其他一些假定,例如InSIDE假定等。讲者为了解决这一问题,提出了MRCIP方法,该方法利用了参数模型,其中包括随机效应模型,并利用了PRW-EM算法来进行优化。讲者利用该方法分析了冠状动脉疾病数据,通过数据分析,讲者认为InSIDE假定很可能不成立,从而基于该假定的方法很可能得到有偏估计。

Dr. Liu received his doctorate in biostatistics from Harvard University, advised by Prof. Xihong Lin. He worked on the Wall street as a quantitative strategist in NYC before joining HKU. His current research interests are: Statistical inference for massive data, Big Data Analytics, Causal Mediation Analysis, Machine Learning, Signal Detection, Statistical Genetics and Genomics.

Zhonghua Liu

University of Hong Kong
Learn About

Peng Ding

Personal Website

University of California, Berkeley
Model-Assisted Analyses of Cluster-Randomized Experiments
讲者首先介绍了在实验设计中应用很广的群体随机试验,并回顾了之前处理这种实验数据的方法,包括广义估计方程,混合效应模型,多层次模型等。然而这些模型往往是假定模型指定正确时,估计出的参数才具有因果含义,当模型错误设定时,这些方法就失效了。讲者提出了新的估计方法。在分析中,讲者使用有限样本的框架,只有处理的分配是随机化的。讲者发现用所有的个体数据做回归的得到的估计量反而比利用每个组内变量的平均做回归的渐近方差更大。

Peng Ding is currently an assistant professor in the Department of Statistics at the University of California, Berkeley. From 2004 to 2011, he received a bachelor's degree in mathematics and economics and a master's degree in statistics from Peking University. He received a Ph.D. in statistics from Harvard University in 2011-2015, and then did postdoctoral research in the Department of Epidemiology at Harvard School of Public Health.

Peng Ding

University of California, Berkeley
Learn About

Carlos Cinelli

Personal Website

University of Washington
An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables
讲者首先回顾了经典的两阶段最小二乘方法以及Anderson-Rubin估计方法。但是在观察性研究中,可能还会存在未观测的协变量,讲者关心的是当将未观测的协变量加入回归模型时,会对统计推断造成什么影响。讲者对于这个问题进行了敏感性分析。讲者指出如果未观察到的协变量能够使得第一阶段的回归系数改变符号,那么该未观测的协变量将可以使得最终得到的因果作用可正可负,既不包含该协变量的回归并不能得到有用的结论。然后讲者也介绍了最差情况下的置信区间的构造方法,并解释了实际数据的分析结果。

Carlos is a PhD candidate in the Department of Statistics at the University of California, Los Angeles (2016-2021). Starting this Fall 2021, He is joining the Department of Statistics at the University of Washington as an Assistant Professor.His research focuses on developing new causal and statistical methods for transparent and robust causal claims in empirical sciences.

Carlos Cinelli

University of Washington
Learn About

Ting Ye

Personal Website

University of Washington
A Simple Cure for Bias from Weak Instruments and Horizontal Pleiotropy in Mendelian Randomization
讲者研究了孟德尔随机化中的弱工具变量的问题,即工具变量往往与处理相关性很小。此时讲者证明了在一些情况下,逆概率加权方法会得到有偏的估计,这也在一些实际应用中得到了验证。与此同时很多时候孟德尔随机化中的工具变量会直接影响结局变量。讲者提出了纠偏逆概率加权估计量,并证明了该估计量的无偏性和渐近正态性。最后讲者将该方法应用到了BMI-CAD数据集上,同时讲者也建议纠偏逆概率加权方法可以作为复杂数据分析的一个初步分析,能够提供一个简单的初始结论。

Ting Ye is an Assistant Professor of Biostatistics at the University of Washington. Before joining UW, she was a postdoctoral fellow in the Statistics Department of the Wharton School, University of Pennsylvania (mentored by Dylan Small and Sean Hennessy).

Ting Ye

University of Washington
Learn About

Stijn Vansteelandt

Personal Website

Ghent University (Belgium) and London School of Hygiene & Tropical Medicine
Assumption-Lean Inference for Cox Regression Parameters
比例风险(hazard ratio)是流行病和药学中的一个重要概念,尽管缺乏明确的因果解释,比例风险模型仍然是有用而合理的近似。问题是,危险率不一定是比例的,或者协变量建模可能有误。当模型设定错误,部分似然估计收敛到哪里是不确定的。可以把Cox模型变复杂,但这会导致简单性和正确性的取舍。那么该怎么办呢?作者认为,应该以一个非参数的待估量为目标,而不是以模型为目标,从而发展基于有效影响函数的非参数推断,推断不依赖于模型是否正确。
假设希望使用Cox比例风险模型,然后总结给定某个协变量时暴露变量对事件发生时间结局的关联性。对于二元暴露变量,可以用对数风险率来总结。作者考虑了风险率随协变量或时间变化的情形,用加权平均来总结。这样就提出了一个与模型无关的待估量,可以用插入估计量或机器学习方法来估计。通过计算影响函数,可用数理统计的渐进理论消除插入偏差。最后,作者用模拟实验比较了一些方法。

Stijn Vansteelandt graduated as Master in Mathematics at Ghent University in 1998, and obtained a PhD in Mathematics (Statistics) in 2002 at the same university. After postdoctoral research at the Department of Biostatistics of the Harvard School of Public Health, he returned to Ghent University in 2004, where he is now Full Professor (80%) in the Department of Applied Mathematics, Computer Science and Statistics. He is furthermore Professor of Statistical Methodology (20%) in the Department of Medical Statistics at the London School of Hygiene and Tropical Medicine.

Stijn Vansteelandt

Ghent University (Belgium) and London School of Hygiene & Tropical Medicine
Learn About

Jon Michael Gran

Personal Website

University of Oslo
Estimating Causal Effects on Multi-State Outcomes - An Application to Return to Work after Sick Leave Using Norwegian Registry Data
作者把传统的事件发生时间数据推广到多状态数据上,感兴趣的是基线处理对多状态结局(过程)的作用。特别地,将该方法应用于挪威的一项旨在增加就业和复业的政府项目。作者考虑了6个状态,包括就业、教育、全时生病、部分时间生病、失业和死亡,给出了转移图。整个过程可以被看作是一个随机过程,内含随时间变化的转移密度。状态之间的个体的分布可以用占用概率描述,我们想要估计基线处理对应的反事实占用概率,平均因果作用或其他感兴趣的量可通过反事实真用概率定义。占用概率通过Aalen-Johansen估计量估计。如果有协变量,用逆概率加权(倾向得分)来平均是很容易的。最后,作者用包含45万次转移的实际数据说明了该方法,并指出了几个值得未来进一步研究的地方。

Jon Michael Gran is an associate professor at the department of biostatistics of University of Oslo. His current projects are Causal and statistical models for patient outcomes after traumatic injury, and Effects of workplace initiatives on sick leave and work participation - new statistical and causal models to utilise population registries.

Jon Michael Gran

University of Oslo
Learn About

Mats J. Stensrud

Personal Website

University of Geneva
Causal Inference when Treatment Resources Are Limited
在很多实际场景中,资源事有限的,例如疫苗、器官等等,资源有限的情况下进行因果推断是有挑战性的,例如病人之间有因果联系(产生了非独立同分布样本)、反事实策略依赖于全体病人的特征。考虑n个因果关联的个体,只有前N个个体的数据,观测的信息包括临床特征、处理、生存结局。一般的动态治疗策略被定义为从患者的特征到处理的映射,作者采用的方法纳入了全部患者的特征,感兴趣的问题是识别在治疗策略g下n个基线个体的平均生存率,称为g下的聚类平均潜在结果。
对于一个统计模型,包含一个条件集A,使得结构方程的正则条件成立,这产生给定过去测量后独立同分布的结局以及荣誉参数的相合估计;以及一个条件集B,满足正性、可交换性、一致性,用于反事实识别。

Mats is a tenure-track assistant professor of statistics at the Department of Mathematics, EPFL. His research focuses on methods for causal inference. He is particularly interested in settings with exposures and outcomes that depend on time, that is, longitudinal data. Many of his works are inspired by applications in (bio)medicine.
Before he came to EPFL, he was privileged to work with Miguel Hernán and other excellent researchers at Harvard School of Public Health as a Kolokotrones research fellow and Fulbright Research Scholar. He also had the pleasure of being a part-time postdoctoral researcher under supervision of Kjetil Røyslandand Odd Aalen at the University of Oslo. Before he became a full time academic, he had a short career as resident doctor in internal medicine.
Mats received his MD, Dr.Philos in Neuroscience and BSc in Mathematics from the University of Oslo. He also hold a Msc in Statistics from the University of Oxford.

Mats J. Stensrud

University of Geneva
Learn About

Kosuke Imai

Personal Website

Harvard University
Experimental Evaluation of Machine Learning Algorithms for Causal Inference
哈佛大学的Kosuke Imai发表题为《因果推断的机器学习算法的试验评估》的报告。已有方法在试验性研究中使用机器学习算法估计异质性治疗效应、构造个性化治疗规则。但在实际中,机器学习算法管用吗?我们应当经验地评估机器学习算法的表现,从而避免假设机器学习算法的好性质、精确量化不确定性,并使样本量小的时候也适用。我们希望用一个随机化试验来评估一般的个性化治疗策略。在Neyman推断的框架下,针对平均因果作用的度量,可以评估偏差和方差,一个好的个性化治疗策略应当比随机分配更好。作者介绍了通过交叉拟合估计和评估个性化治疗策略(难点是同时考虑估计的不确定性和评估的不确定性),以及预算约束下的评估。最后,作者介绍了模拟实验和真实数据分析结果,并指出了可以期待的扩展,如扩展到异质性因果作用。

Kosuke Imai is a professor in the Department of Government and the Department of Statistics at Harvard University. He is also an affiliate of the Institute for Quantitative Social Science. Before moving to Harvard in 2018, Imai taught at Princeton University for 15 years where he was the founding director of the Program in Statistics and Machine Learning. In addition, Imai served as the President of the Society for Political Methodology from 2017 to 2019 and was elected fellow in 2017. He has been Professor of Visiting Status in the Faculty of Law and Graduate Schools of Law and Politics at the University of Tokyo.

Kosuke Imai

Harvard University

PCIC 2020

Speakers

Learn About

Donald B. Rubin

Personal Website

Professor, Yau Mathematical Science Center, Tsinghua University
Emeritus Professor, Department of Statistics, Harvard University
dbrubin@me.com

Donald B. Rubin

Tsinghua University
Learn About

Donglin Zeng

Personal Website

University of North Carolina at Chapel Hills

Donglin Zeng

University of North Carolina at Chapel Hills
Learn About

Linbo Wang

Personal Website

PhD,Assistant Professor, Department of Statistical Sciences, University of Toronto
linbo.wang@utoronto.ca

Linbo Wang

University of Toronto
Learn About

James M. Robins

Personal Website

Harvard T. H. Chan School of Public Health

James M. Robins

Harvard T. H. Chan School of Public Health
Learn About

Zhichao Jiang

Personal Website

University of Massachusetts Amherst
jzcpanda@163.com

Zhichao Jiang

University of Massachusetts Amherst
Learn About

Peng Cui

Personal Website

Tsinghua University
Associate Professor (Tenured),Lab of Media and Network,Department of Computer Science and Technology. He has published more than 100 papers in famous conferences and periodicals in the field of data mining and multimedia, and has won the best paper awards in 7 international conferences and journals. He won the ACM China New Star Award in 2015 and the CCF-IEEECS Young Scientist Award in 2018. He is currently an outstanding member of CCF and a senior member of IEEE.

Peng Cui

Tsinghua University
Learn About

Peng Ding

Personal Website

University of California, Berkeley
Model-Assisted Analyses of Cluster-Randomized Experiments
Peng Ding is currently an assistant professor in the Department of Statistics at the University of California, Berkeley. From 2004 to 2011, he received a bachelor's degree in mathematics and economics and a master's degree in statistics from Peking University. He received a Ph.D. in statistics from Harvard University in 2011-2015, and then did postdoctoral research in the Department of Epidemiology at Harvard School of Public Health.

Peng Ding

University of California, Berkeley
Learn About

Lihua Lei

Personal Website

Stanford University

Lihua Lei

Stanford University
Learn About

Wang Miao

Personal Website

Peking University

Wang Miao

Peking University
Learn About

Lin Liu

Personal Website

Shanghai Jiao Tong University
linliu@sjtu.edu.cn

Lin Liu

Shanghai Jiao Tong University
Learn About

Fei Wu

Personal Website

Zhejiang University

Fei Wu

Zhejiang University
Learn About

Yuhao Wang

Personal Website

Tsinghua University

Yuhao Wang

Tsinghua University
Learn About

Shu Yang

Personal Website

North Carolina State University

Shu Yang

North Carolina State University
Learn About

Betsy Ogburn

Personal Website

Johns Hopkins Bloomberg School of Public Health

Betsy Ogburn

Johns Hopkins University
Learn About

Qingyuan Zhao

Personal Website

University of Cambridge

Qingyuan Zhao

University of Cambridge
Learn About

Ingeborg Waernbaum

Personal Website

Umeå University

Ingeborg Waernbaum

Umeå University
Learn About

Ilya Shpitser

Personal Website

Johns Hopkins University

Ilya Shpitser

Johns Hopkins University
Learn About

Lu Wang

Personal Website

University of Michigan
luwang@umich.edu
Estimating the Optimal Dynamic Treatment Regime with Restrictions Using Observational Data

Lu Wang

University of Michigan
Learn About

Walter Dempsey

Personal Website

University of Michigan

Walter Dempsey

University of Michigan
Learn About

Peter Spirtes

Personal Website

Carnegie Mellon University

Peter Spirtes

Carnegie Mellon University
Learn About

Thomas S. Richardson

Personal Website

Professor, Department of Statistics, University of Washignton
thomasr@uw.edu

Thomas S. Richardson

University of Washignton
Learn About

Bernhard Schölkopf

Personal Website

MPI for Intelligent System

Bernhard Schölkopf

MPI for Intelligent System
Learn About

Lexin Li

Personal Website

University of California, Berkeley

Lexin Li

University of California, Berkeley
Learn About

Richard Guo

Personal Website

University of Washington, Seattle

Richard Guo

University of Washington, Seattle
Learn About

Emilija Perkovic

Personal Website

University of Washington, Seattle
perkovic@uw.edu

Emilija Perkovic

University of Washington, Seattle
Learn About

Kun Zhang

Personal Website

Carnegie Mellon University
kunz1@cmu.edu

Kun Zhang

Carnegie Mellon University
Learn About

Anqi Zhao

Personal Website

National University of Singapore

Anqi Zhao

National University of Singapore
Learn About

Xiao-Hua Zhou

Personal Website

PKU Endowed Chair Professor
Beijing International Center for Mathematical Research Chair, Department of Biostatistics, Peking University
azhou@math.pku.edu.cn

Xiao-Hua Zhou

Peking University
Learn About

Zhenhua Lin

Personal Website

National University of Singapore

Zhenhua Lin

National University of Singapore
Learn About

Shohei Shimizu

Personal Website

Shiga University

Shohei Shimizu

Shiga University
Learn About

Theis Lange

Personal Website

University of Copenhagen
thlan@sund.ku.dk
Theis Lange is currently Professor and the head of department of Public Health at University of Copenhagen. His primary fields of research are causal inference and mediation, statistical analysis of clinical trials, and non-linear dynamic models.

Theis Lange

University of Copenhagen
Learn About

Torben Martinussen

Personal Website

University of Copenhagen

Torben Martinussen

University of Copenhagen
Learn About

Zhiqiang Tan

Personal Website

Rutgers University

Zhiqiang Tan

Rutgers University
Learn About

Lu Mao

Personal Website

University of Wisconsin-Madison

Lu Mao

University of Wisconsin-Madison