Guidelines for using multiple imputation without bias in studies

Theme Translational data science

Workstream Large, complex datasets

Status: This project is ongoing

In health research, it’s common to have missing information in study data. This missing data can make it hard to get accurate results.  

Researchers often deal with this problem using methods like complete records analysis (CRA), where they only use participants with full data, or multiple imputation (MI), where they fill in missing values using statistical techniques.  

While MI can give reliable results under certain conditions, it’s not always easy to tell if these conditions are met, especially when many pieces of data are missing. Understanding whether these methods produce biased (misleading) results is important when studying how different exposures, such as smoking, affect health outcomes. 

Project aims

This project aims to help researchers decide when it is safe to use MI to estimate the relationship between an exposure and an outcome. We will: 

  1. Develop an easy-to-follow method (an algorithm) that shows whether MI can be used without bias in a full dataset 
  2. Extend this method to check whether MI can give unbiased results in just a part of the data (a subsample), where some information is complete and some is imputed 

We will use a visual tool called a directed acyclic graph (DAG) to help researchers apply this method to their own studies. 

What we hope to achieve

By the end of the project, we aim to give researchers clear guidance on when and how to use MI correctly. Our algorithm will help them avoid mistakes that could lead to incorrect conclusions. We’ll also provide real-life examples to show how the method works in practice. Ultimately, this work will improve the quality and reliability of research findings about the risk factors of disease.