new publication

AI-generated synthetic data

for cancer research and clinical trials

24. February 2026

Clinical cancer research is facing two major challenges: restricted data sharing and rapidly rising clinical trial costs. Patient datasets are often confined to individual institutions, and transferring them requires complex legal and administrative procedures. At the same time, clinical trials are becoming more expensive. Recruitment is difficult, particularly in precision oncology, where therapies target small molecular subgroups. This results in prolonged development timelines, higher failure rates and the need for large, resource-intensive control groups.

Eckardt, JN., Hahn, W., Prelaj, A., Bornhäuser, M., Middeke, JM., Kather, JN.: Artificial intelligence-generated synthetic data for cancer research and clinical trials. Nat Rev Cancer (2026). https://doi.org/10.1038/s41568-026-00912-4

read full publication

In the recently published review article “Artificial intelligence-generated synthetic data for cancer research and clinical trials” in Nature Reviews Cancer, researchers critically examine whether AI-generated synthetic data could help address both challenges. Synthetic data, produced using advanced artificial intelligence models, are gaining traction in healthcare research, especially in high-stakes fields such as haematology and oncology.

A possible solution? AI-generated synthetic data

Synthetic data are AI-generated datasets that closely replicate the statistical patterns and behavior of real patient data without being exact copies of individual records. In practice, researchers obtain a cohort of synthetic patients that behaves like the original ones. Such synthetic cohorts can be tailored to meet specific eligibility criteria in clinical trials and may therefore augment or, under certain conditions, even substitute traditional control groups. However, synthetic data are not a universal solution. The authors highlight that the lack of standardization in training data selection, model evaluation, bias mitigation, privacy preservation and quality assurance remain major challenges, limiting their reliability and safe application. The review outlines clear design standards such as defining the specific purpose of data generation upfront, carefully selecting and preprocessing variables, training and comparing multiple generative models with hyperparameter optimization, benchmarking synthetic data against real-data references for fidelity and utility, conducting bias and privacy audits, filtering medically implausible records, and ensuring transparent documentation and independent oversight. While synthetic data is not a universal solution, but robust validation and responsible oversight could make them a powerful enabler for data sharing, scientific collaboration, and trial design.

Share this Post

More News

AI agent MIRA supports clinical workflows in electronic health records as a co-pilot

17 Jun at 5:08 pm

The next stage of medical AI: AI agent MIRA evaluated autonomously medical information, ordered tests, and prepared diagnostic and treatment decisions within electronic health records in a simulated hospital information…

Simulating stress, sadness, and anxiety in LLMs

11 Jun at 10:05 am

A research team from TU Dresden has now shown that LLMs can reproduce patterns of human emotions such as anxiety, sadness, or stress.

Fiona Kolbinger receives 2026 Jung Career Advancement Award for Medical Research

21 May at 11:00 am

In recognition of her work on AI-supported cancer surgery, Clinician Scientist Dr. Fiona Kolbinger, MD, PhD, receives the 2026 Jung Career Advancement Award endowed with 210,000 euros.

EURAID framework

Advancing Ultrasound Technologies