Large, complex datasets

Reliable and reproducible analysis of large, complex datasets

Theme Translational data science

With this workstream, we use machine learning and other state-of-the-art methods to analyse large-scale linked electronic health records to inform translational research. Translational research takes the results of early stage research and applies them to humans.

Complex data with a lot of dimensions, such as digital images, can’t be efficiently analysed using standard methods. Machine learning is being rapidly adopted to analyse medical imaging, but it lacks suitably labelled data for this purpose. This leads to poor accuracy and results that can’t be reproduced.

Automated, scalable methods can overcome issues such as missing data, misclassification and confounding factors. All these issues can bias analysis, giving misleading results.

We are developing state-of-the-art methods to address bias in machine learning, alongside a large, labelled data set of images for evaluating machine learning.

We are also developing training in machine learning, including consideration of ethical issues. This work will benefit all the Bristol BRC themes.

View all research projects

Reducing bias in research: Building better tools to combine study results

When researchers want to know whether something causes a health outcome, like whether a vitamin…

Theme Translational data science

Workstream Large, complex datasets

Predicting mental illness risk using health records

Serious mental illnesses like bipolar disorder, suicidal thoughts, and post-traumatic stress disorder (PTSD) can…

Theme Translational data science

Workstream Large, complex datasets

Guidelines for using multiple imputation without bias in studies

In health research, it’s common to have missing information in study data. This missing…

Theme Translational data science

Workstream Large, complex datasets

Improving decisions on what to focus on in research using large datasets

Research using de-personalised data from electronic health records is increasingly common.  Electronic…

Theme Translational data science

Workstream Large, complex datasets

Combining data and AI to predict heart problems following Covid

Electronic health records contain a wealth of information that has the potential to be…

Theme Translational data science

Workstreams Clinical informatics platforms Large, complex datasets

Do ethnicity and coexisting health conditions impact high-risk diabetes?

About a third of people diagnosed with type 2 diabetes have very high blood sugar…

Theme Translational data science

Workstreams Clinical informatics platforms Large, complex datasets

Handling missing data in large electronic healthcare record datasets

Electronic healthcare records (EHRs) are created when healthcare professionals record information about the health of…

Theme Translational data science

Workstream Large, complex datasets