new publication

AI-generated synthetic data

for cancer research and clinical trials

24. February 2026

Clinical cancer research is facing two major challenges: restricted data sharing and rapidly rising clinical trial costs. Patient datasets are often confined to individual institutions, and transferring them requires complex legal and administrative procedures. At the same time, clinical trials are becoming more expensive. Recruitment is difficult, particularly in precision oncology, where therapies target small molecular subgroups. This results in prolonged development timelines, higher failure rates and the need for large, resource-intensive control groups.

Eckardt, JN., Hahn, W., Prelaj, A., Bornhäuser, M., Middeke, JM., Kather, JN.: Artificial intelligence-generated synthetic data for cancer research and clinical trialsNat Rev Cancer (2026). https://doi.org/10.1038/s41568-026-00912-4

In the recently published review article “Artificial intelligence-generated synthetic data for cancer research and clinical trials” in Nature Reviews Cancer, researchers critically examine whether AI-generated synthetic data could help address both challenges. Synthetic data, produced using advanced artificial intelligence models, are gaining traction in healthcare research, especially in high-stakes fields such as haematology and oncology.

A possible solution? AI-generated synthetic data

Synthetic data are AI-generated datasets that closely replicate the statistical patterns and behavior of real patient data without being exact copies of individual records. In practice, researchers obtain a cohort of synthetic patients that behaves like the original ones. Such synthetic cohorts can be tailored to meet specific eligibility criteria in clinical trials and may therefore augment or, under certain conditions, even substitute traditional control groups. However, synthetic data are not a universal solution. The authors highlight that the lack of standardization in training data selection, model evaluation, bias mitigation, privacy preservation and quality assurance remain major challenges, limiting their reliability and safe application. The review outlines clear design standards such as defining the specific purpose of data generation upfront, carefully selecting and preprocessing variables, training and comparing multiple generative models with hyperparameter optimization, benchmarking synthetic data against real-data references for fidelity and utility, conducting bias and privacy audits, filtering medically implausible records, and ensuring transparent documentation and independent oversight. While synthetic data is not a universal solution, but robust validation and responsible oversight could make them a powerful enabler for data sharing, scientific collaboration, and trial design.

Share this Post

More News

EURAID framework

EURAID offers hospitals structured guidance on the collaborative, human-centered “in-house” development, validation, and implementation of AI systems.
Read more
EURAID framework
Skip to content