Covid-19 Research

Short Commentary

Synthetic Data Generation in Biomedical Research: Opportunities, Methods and Applications of Generative Adversarial Networks

Journal of Biomedical Research & Environmental Sciences article abstract with citation details, DOI, publication dates, subject areas, full text links, and references.

Article Details

Publication record, authors, dates, abstract, and full text access.

Open Access
Article Type Short Commentary
Subject Biology Group
OCLC JBRES Record
Marco Parrillo
Issue: Volume7-Issue6
Pages: 1-7
Received: 2026-05-23
Accepted: 2026-06-02
Published: 2026-06-03

Abstract

The exponential growth of biomedical data, combined with increasingly stringent privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), has created a significant bottleneck in the development of Artificial Intelligence (AI) and machine learning (ML) models for clinical and translational research. Synthetic data generation has emerged as a promising solution, enabling researchers to produce statistically realistic datasets that preserve the distributional properties of real patient data without exposing sensitive information. This commentary argues that GAN-based methods, and CTGAN in particular, represent a practical and scalable pathway for privacy-preserving biomedical AI: they outperform classical anonymisation techniques in downstream ML utility, handle mixed tabular data types that are pervasive in clinical records, and generalise across oncology, genomics, clinical trial simulation, and electronic health record synthesis. This paper reviews the theoretical foundations and practical applications of synthetic data generation methods in biomedical contexts, with a particular focus on Generative Adversarial Networks (GANs) and their tabular variant, the Conditional Tabular GAN (CTGAN). We further discuss emerging approaches including diffusion-based generative models and federated synthetic data generation. We examine key use cases, outline methodological considerations for validating the fidelity and utility of generated datasets, and address critical limitations including privacy leakage risks, model bias, and unresolved ethical and regulatory questions. Our analysis demonstrates that GAN-based approaches can produce synthetic biomedical records that support downstream ML tasks with accuracy comparable to models trained on real data, opening a viable pathway toward privacy-preserving, data-rich biomedical research.

Certificate of Publication

Certificate of Publication

Copyright

© 2026 Parrillo M. Distributed under Creative Commons CC-BY 4.0 Creative CommonsAttribution

How to cite this article

Parrillo M. Synthetic Data Generation in Biomedical Research: Opportunities, Methods and Applications of Generative Adversarial Networks. J Biomed Res Environ Sci. 2026 June 03; 7(6): 7. Doi: 10.37872/jbres2304

Subject area(s)

References

  1. Emam K, Mosquera L, Hoptroff R. Practical synthetic data generation. Sebastopol (CA): O'Reilly Media; 2020.
  2. Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: a narrative review. PLOS Digit Health. 2023;2(1):e0000082. doi:10.1371/journal.pdig.0000082.
  3. Figueira A, Vaz B. Survey on synthetic data generation, evaluation methods and GANs. Mathematics. 2022;10(15):2733. doi:10.3390/math10152733.
  4. Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K. Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems. 2019;32.
  5. Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med. 2023;6:186. doi:10.1038/s41746-023-00927-3.
  6. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. In: Advances in Neural Information Processing Systems. 2014;27.
  7. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems. 2020;33:6840-51.
  8. Kotelnikov A, Baranchuk D, Rubachev I, Babenko A. TabDDPM: modelling tabular data with diffusion models. In: Proceedings of the 40th International Conference on Machine Learning (ICML). 2023;202:17564-79.
  9. McMahan B, Moore E, Ramage D, Hampson S, Arcas BA y. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). 2017;54:1273-82.
  10. Isasa I, Catalina M, Epelde G, Aginako N, Beristain A. Synthetic tabular data generation under horizontal federated learning environments in acute myeloid leukemia: case-based simulation study. JMIR Med Inform. 2025;13:e74116.
  11. Liu K, Altman RB. Conditional generative models for synthetic tabular data: applications for precision medicine and diverse representations. Annu Rev Biomed Data Sci. 2025;8:21-49. doi:10.1146/annurev-biodatasci-103123-094844.
  12. Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018;25(3):230-8.
  13. Jordon J, Szpruch L, Houssiau F, Bottarelli M, Cherubin G, Maple C, et al. Synthetic data: what, why and how? arXiv [Preprint]. 2022. Available from: arXiv:2205.03257.
  14. Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, et al. Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS). New York (NY): ACM; 2016. p. 308-18.
  15. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in healthcare. Annu Rev Biomed Data Sci. 2021;4:123-44.
  16. Shahul Hameed MA, Qureshi AM, Kaushik A. Bias mitigation via synthetic data generation: a review. Electronics. 2024;13:3909. doi:10.3390/electronics13193909.
  17. Esteban C, Hyland SL, Ratsch G. Real-valued (medical) time series generation with recurrent conditional GANs. arXiv [Preprint]. 2017. Available from: arXiv:1706.02633.
  18. Nikolenko SI. Synthetic data outside computer vision. In: Nikolenko SI. Synthetic data for deep learning. Cham: Springer; 2021. p. 217-26.
Publish with JBRES — Peer-reviewed, multidisciplinary Open Access with rapid review, DOI, and global visibility.
Double-Blind CrossRef DOI Discoverable