David Joshua Ferguson*
Volume6-Issue5
Dates: Received: 2025-05-16 | Accepted: 2025-05-24 | Published: 2025-05-28
Pages: 556-562
Abstract
The process of retrosynthetic analysis, introduced by Corey, systematically deconstructs complex molecules into simpler precursors, providing a logical pathway for chemical synthesis. Here, we propose an innovative AI-driven retrosynthesis framework for drug discovery leveraging Large Language Models (LLMs) and advanced computational tools. This "retro drug discovery" platform integrates AlphaFold2-generated protein structures, MolGPT-driven scaffold generation, and a tailored ChatGPT model orchestrating Structure-Activity Relationship (SAR) analyses, virtual screening, and iterative optimization cycles. We applied this framework retrospectively to twenty FDA-approved small-molecule drugs spanning cardiovascular, neurological, oncology, and endocrine therapeutic areas. Each case study illustrates how AI systems can recapitulate historical discovery pathways with high fidelity, as demonstrated by metrics including structural similarity (average Tanimoto coefficient ≈ 0.82) and bioactivity-prediction concordance (mean Pearson r ≈ 0.78). The methodology emphasizes bioisosteric replacements, scaffold hopping, and pharmacophore optimization, reflecting human medicinal-chemistry strategies. The implementation of an AI-driven retrosynthetic platform, "ChemGPT Discover," exemplifies automation of medicinal-chemistry processes, enhancing efficiency in hit-to-lead development. Our results validate the capability of LLM-assisted retrosynthesis to rediscover known drug leads accurately, underscoring the transformative potential of AI in accelerating drug discovery and medicinal chemistry research.
FullText HTML
FullText PDF
DOI: 10.37871/jbres2110
Certificate of Publication

Copyright
© 2025 Ferguson DJ, Distributed under Creative Commons CC-BY 4.0
How to cite this article
Ferguson DJ. AI-Driven Retrosynthesis Framework for Drug Discovery: The Use of LLMs. J Biomed Res Environ Sci. 2025 May 28; 6(5): 556-562. doi: 10.37871/jbres2110, Article ID: JBRES2110, Available at: https://www.jelsciences.com/ articles/jbres2110.pdf
Subject area(s)
References
- DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ. 2016 May;47:20-33. doi: 10.1016/j.jhealeco.2016.01.012. Epub 2016 Feb 12. PMID: 26928437.
- Schneider G. Automating drug discovery. Nat Rev Drug Discov. 2018 Feb;17(2):97-113. doi: 10.1038/nrd.2017.232. Epub 2017 Dec 15. PMID: 29242609.
- Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019 Jun;18(6):463-477. doi: 10.1038/s41573-019-0024-5. PMID: 30976107; PMCID: PMC6552674.
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15. PMID: 34265844; PMCID: PMC8371605.
- Corey EJ. General methods for the construction of complex molecules. Pure Appl Chem. 1967;14:19-37. doi: 10.1351/pac196714010019.
- Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061. PMID: 34791371; PMCID: PMC8728224.
- Bagal V, Aggarwal R, Vinod PK, Priyakumar UD. MolGPT: Molecular Generation Using a Transformer-Decoder Model. J Chem Inf Model. 2022 May 9;62(9):2064-2076. doi: 10.1021/acs.jcim.1c00600. Epub 2021 Oct 25. PMID: 34694798.
- Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG, Karlsson A, Al-Lazikani B, Hersey A, Oprea TI, Overington JP. A comprehensive map of molecular drug targets. Nat Rev Drug Discov. 2017 Jan;16(1):19-34. doi: 10.1038/nrd.2016.230. Epub 2016 Dec 2. PMID: 27910877; PMCID: PMC6314433.
- Roth BD. The discovery and development of atorvastatin, a potent novel hypolipidemic agent. Prog Med Chem. 2002;40:1-22. doi: 10.1016/s0079-6468(08)70080-8. PMID: 12516521.
- Cushman, D. W. & Ondetti, M. A. History of the design of captopril and related inhibitors of angiotensin converting enzyme. Hypertension 17, 589–592 (1991).
- Capdeville R, Buchdunger E, Zimmermann J, Matter A. Glivec (STI571, imatinib), a rationally developed, targeted anticancer drug. Nat Rev Drug Discov. 2002 Jul;1(7):493-502. doi: 10.1038/nrd839. PMID: 12120256.
- Alpha fold protein structure database.
- Tian W, Chen C, Lei X, Zhao J, Liang J. CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res. 2018 Jul 2;46(W1):W363-W367. doi: 10.1093/nar/gky473. PMID: 29860391; PMCID: PMC6031066.
- Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940. doi: 10.1093/nar/gky1075. PMID: 30398643; PMCID: PMC6323927.
- Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J Chem Inf Model. 2021 Aug 23;61(8):3891-3898. doi: 10.1021/acs.jcim.1c00203. Epub 2021 Jul 19. PMID: 34278794; PMCID: PMC10683950.
- RDKit: Open-source cheminformatics.
- Jain AN, Nicholls A. Recommendations for evaluation of computational methods. J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):133-9. doi: 10.1007/s10822-008-9196-5. Epub 2008 Mar 13. PMID: 18338228; PMCID: PMC2311385.