Large, complex datasets

Reliable and reproducible analysis of large, complex datasets

Theme Translational data science

With this workstream, we use machine learning and other state-of-the-art methods to analyse large-scale linked electronic health records to inform translational research. Translational research takes the results of early stage research and applies them to humans.

Complex data with a lot of dimensions, such as digital images, can’t be efficiently analysed using standard methods. Machine learning is being rapidly adopted to analyse medical imaging, but it lacks suitably labelled data for this purpose. This leads to poor accuracy and results that can’t be reproduced.

Automated, scalable methods can overcome issues such as missing data, misclassification and confounding factors. All these issues can bias analysis, giving misleading results.

We are developing state-of-the-art methods to address bias in machine learning, alongside a large, labelled data set of images for evaluating machine learning.

We are also developing training in machine learning, including consideration of ethical issues. This work will benefit all the Bristol BRC themes.

View all research projects

Great Western Secure Data Environment

NHS England are developing and deploying a national secure data environment for research. A secure…

Theme Translational data science

Workstream Clinical informatics platforms

Preventing cardiovascular events in stroke patients

Having a stroke means you are more likely to experience a subsequent cardiovascular event. Cardiovascular…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Exploring how obesity influences cancer survival

Evidence from different studies suggests that obesity or body mass index (BMI) might play a…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Using biomarkers and machine learning to predict antidepressant resistance

Around half of patients with depression don’t improve after taking antidepressants. Clinicians need to…

Theme Translational data science

Workstream Omics for prediction and prognosis

Can DNA methylation biomarkers predict whether pleural effusion is caused by cancer?

Pleural effusion, where fluid builds up in the cavity around the lungs, can develop…

Theme Translational data science

Workstream Omics for prediction and prognosis

Using DNA methylation biomarkers to understand Parkinson’s disease severity and progression

The Biogen Tel Aviv Parkinson Project (BeatPD) looks in-depth at clinical and genetic information…

Theme Translational data science

Workstream Omics for prediction and prognosis

Biomarkers for screening and diagnosing lung cancer

In the UK, only 15 per cent of people diagnosed with lung cancer will still…

Theme Translational data science

Workstream Omics for prediction and prognosis

Creating the infrastructure to enable translational data analysis at scale

Our priority is to create the data infrastructure to enable analysis of linked administrative and…

Theme Translational data science

Workstreams Clinical informatics platforms Large, complex datasets

Data driven approaches to drug target prioritisation

Despite more money going towards developing drugs, the success rate of getting new drugs to…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention