AI and saliva proteins could help improve early detection of head and neck cancer
Researchers supported by the NIHR Biomedical Research Centre: Bristol have shown that artificial intelligence could help improve the early detection of head and neck cancer using saliva samples. Crucially they’ve demonstrated this can happen even when only small numbers of patient samples are available for study.
In a proof-of-concept study, the team combined:
- Deep learning
- Large-scale proteomic data from UK Biobank
- Synthetic data generation
This allowed them to learn cancer-related patterns that were transferable across sample types, i.e. from blood plasma to saliva. Findings from the study suggest this approach could help address some of the biggest challenges in biomarker research, including small datasets, differences between tissue types and imbalanced case numbers.
Proteomics is the large-scale study of proteins. Proteins collected from human blood, saliva and tissues might play a role in cancer detection because they can reflect how a tumour changes as it develops. This makes them valuable candidates for identifying cancers early enough for them to still be treatable.
The researchers used plasma protein data from more than 13,000 cancer cases and nearly 40,000 controls in UK Biobank to train deep learning models. They tested this approach in an independent study of 156 saliva samples from people with and without head and neck cancer.
This model outperformed a range of more traditional machine learning approaches. They then used deep learning to add 10,000 synthetic cancer samples to the training set and trained a second model to detect cancer. This model performed substantially better, suggesting that deep learning may be better able to capture the complex biological relationships linked to human health reflected in protein data.
Importantly, the study found that the model could detect cancers across multiple stages, including early-stage disease. The authors say this may reflect shared systemic protein signals between blood and saliva, rather than tissue-specific effects alone.
Analysis of the model also highlighted several proteins already linked to cancer biology, including IL6, CXCL13 and CXCL17, adding confidence that the predictions were biologically meaningful as well as statistically robust.
The team says the results should be seen as an encouraging early step rather than a finished clinical tool. The study was limited by:
- A relatively small size of the saliva dataset
- The use of a pre-selected panel of 92 cancer-related proteins
- Participants were overwhelmingly White British for both training and testing
Authors note that larger and more diverse studies, including other cancer types and broader molecular measurements, will be needed to confirm how well the approach works across populations and tissues.
Paul Yousefi, Senior Research Fellow at the University of Bristol and senior author of the study, said:
“These findings show that combining large-scale proteomic data with deep learning and carefully generated synthetic data can reveal cancer signals that might otherwise be missed in smaller studies.
“While this is still an early proof-of-concept, it gives us a promising route towards more sensitive, less invasive approaches to cancer detection.”
This work lays important groundwork for future research aimed at improving earlier diagnosis and widening the clinical use of saliva-based cancer testing.