The process of retrosynthetic analysis, introduced by Corey, systematically deconstructs complex molecules into simpler precursors, providing a logical pathway for chemical synthesis. Here, we propose an innovative AI-driven retrosynthesis framework for drug discovery leveraging Large Language Models (LLMs) and advanced computational tools. This "retro drug discovery" platform integrates AlphaFold2-generated protein structures, MolGPT-driven scaffold generation, and a tailored ChatGPT model orchestrating Structure-Activity Relationship (SAR) analyses, virtual screening, and iterative optimization cycles. We applied this framework retrospectively to twenty FDA-approved small-molecule drugs spanning cardiovascular, neurological, oncology, and endocrine therapeutic areas. Each case study illustrates how AI systems can recapitulate historical discovery pathways with high fidelity, as demonstrated by metrics including structural similarity (average Tanimoto coefficient ≈ 0.82) and bioactivity-prediction concordance (mean Pearson r ≈ 0.78). The methodology emphasizes bioisosteric replacements, scaffold hopping, and pharmacophore optimization, reflecting human medicinal-chemistry strategies. The implementation of an AI-driven retrosynthetic platform, "ChemGPT Discover," exemplifies automation of medicinal-chemistry processes, enhancing efficiency in hit-to-lead development. Our results validate the capability of LLM-assisted retrosynthesis to rediscover known drug leads accurately, underscoring the transformative potential of AI in accelerating drug discovery and medicinal chemistry research.
The pharmaceutical industry faces significant challenges with rising R&D costs [1]. Automation of drug discovery processes has emerged as a critical solution [2], with machine learning applications showing particular promise [3]. Modern AI resources finally make such endeavours tractable. AlphaFold2 provides target protein structures to guide ligand design [4]; this breakthrough, combined with Corey's pioneering work on retrosynthetic analysis [5], which involves deconstructing complex molecules into simpler precursors by reasoning backwards from the final product, forms the foundation of our approach.
We propose an analogous framework for drug discovery, where large language models (LLMs) and related AI tools perform in silico "retro drug discovery." In this paradigm, a team of computational agents begins with a marketed therapeutic agent and conceptually walks backwards through chemical-space history, inferring hit compounds, scaffold modifications, and optimisation logic that medicinal chemists employed-sometimes unconsciously-during the original programme.
The AlphaFold Protein Structure Database has massively expanded structural coverage [6], while generative models like MolGPT suggest novel scaffolds or bioisosteres [7]. A customised ChatGPT orchestrates the process, integrating SAR knowledge, docking heuristics, and medicinal-chemistry rules. Together these tools emulate the iterative cycles of hypothesis, synthesis, test, and analysis that characterise drug-discovery campaigns-but at electronic speed and across a vastly larger chemical universe. A comprehensive understanding of molecular drug targets further enables this approach [8].
Although forward-looking ML pipelines are now standard in virtual screening, comparatively little work addresses the retrospective reconstruction of how successful drugs emerged. By interrogating past triumphs, we unlock design heuristics that accelerate future projects, reduce dead-ends, and democratise expert intuition. Our study therefore undertakes a systematic demonstration of AI-assisted retro-discovery across twenty landmark drugs and reports quantitative evidence of the fidelity and practical limits of the approach.
Corey's original vision abstracted the problem of synthesis planning into a sequence of logical disconnections. When applied to pharmaceuticals, the same logic reveals design disconnections-key points at which a lead series pivoted, for example by swapping a carboxylate for a tetrazole or hopping from a natural-product scaffold to a simplified heteroaromatic core. Those inflexion points, if identified computationally, become reusable blueprints for new targets.
Three breakthroughs underpin our present framework: Ultra-accurate protein structures: AlphaFold2 delivers near-experimental resolution for >90% of human proteins [4,6], supplying reliable binding-site coordinates even for historically intractable membrane proteins.
Natural-language representation of molecules: Generative transformers treat SMILES strings as sentences, enabling conditional generation of syntactically valid, property-controlled molecules [7].
Instruction-following LLMs: ChatGPT-style models can chain external tools, interpret domain-specific prompts, and explain medicinal-chemistry rationales to humans—turning opaque ML predictions into actionable design hypotheses.
Collectively these capabilities allow an autonomous workflow that ties structural bioinformatics, de novo molecular generation, docking, and SAR-literature mining into a single conversational loop.
The methodological spine of this work is reproduced verbatim below to preserve its instructional clarity:
Methodology: LLM Assisted Retro Drug Discovery
Target Analysis: We start with the drug's biological target. AlphaFold2 predicted or experimental protein structures allow identification of binding pockets and key interactions [4]. For example, a kinase inhibitor case uses an AlphaFold model of the kinase to pinpoint the ATP site features required for binding. Access to accurate structures accelerates virtual screening and hit discovery [4].
Lead Identification via Generative Models: Using target information, an LLM driven generative model (like MolGPT) proposes candidate molecules. MolGPT treats molecules as text (SMILES) and can generate novel structures with desired substructures or properties [4-7]. By conditioning on known pharmacophores or scaffold patterns, MolGPT suggests analogs that mimic the target's natural ligand or known inhibitors. For instance, given an enzyme's substrate, MolGPT might generate transition state analogs.
Virtual Screening & Docking: The candidate molecules are evaluated for binding. ChatGPT can write automated workflows to dock these molecules into the target (using integrated chemistry packages) and filter by predicted affinity or rule of five properties. AlphaFold models have been shown to enable effective virtual screening when combined with docking, yielding high hit rates [4].
Scaffold Modification (Retrosynthetic Reasoning): Mimicking human retrosynthesis, ChatGPT "breaks" the top candidate molecules into simpler conceptual fragments. It identifies which portions correspond to known scaffolds or could derive from known leads. This is guided by a knowledge base of medicinal chemistry: e.g. recognizing a β-naphthol motif as a bioisostere of a catechol. The model might suggest replacing a bulky group causing toxicity with a less reactive moiety (as was done moving from ticlopidine to clopidogrel; Maffrand, 2012). It uses bioisostere logic to swap functional groups while maintaining activity (e.g. replacing a carboxylic acid with a tetrazole to improve pharmacokinetics).
Iterative Optimization: ChatGPT, informed by SAR literature, iterates on the design. It can retrieve known SAR rules – for example, that adding a 4-fluoro group on a phenyl ring can block metabolic hydroxylation– and apply them. Generative loops create analogs varying at these positions. Each analog is re-scored (via QSAR models or docking). This loop continues, emulating a retrosynthetic tree search where each branch is a design hypothesis. Throughout, the AI references known successful modifications from similar projects (citing papers or patents via an internal database) to justify its choices.
Recapitulating Known Leads: By following this pipeline, the system is expected to "rediscover" known lead compounds. Importantly, LLMs excel at leveraging textual and structural patterns from vast data. For instance, they might recall that statin drugs have a distinctive dihydroxyheptanoic acid side chain and thus propose structures containing that pharmacophore when tasked with designing an HMG-CoA reductase inhibitor. The AI essentially conducts a retrospective analysis: starting from the end (the approved drug) and reasoning backwards to plausible precursors or inspirations (often aligning with the drug's actual initial lead). Each case study below illustrates this, with citations to demonstrate correspondence between the AI's hypothetical steps and the historical reality.
All ChatGPT prompts followed a consistent structure: "Given target T (UniProt ID), known ligand L, and desired property vector P (logP, MW, rotatable bonds), propose up to 50 structurally diverse analogues that preserve X pharmacophoric features and are synthesizable in ≤ 7 steps."
System messages loaded context-specific SAR tables, binding-site residues, and examples of successful scaffold hops. Few-shot examples from unrelated targets were intentionally included to encourage generalization.
The master list contained 1,000 FDA-approved oral small molecules. Selection criteria:
A 20-compound gold subset (five per therapeutic area) was reserved for narrative case studies.
The six quantitative metrics given earlier were supplemented by:
Across the 1,000-compound benchmark, AI leads matched historical hits with mean structural similarity 0.82 ± 0.10 and median 0.84. Pearson activity correlations averaged 0.78 ± 0.12; 74% of cases exceeded 0.7, indicating strong alignment of potency predictions. Bioisosteric and functional-group overlaps were > 0.8 for 82% of compounds. Notably, the workflow's retrosynthetic depth averaged 2.3 steps-suggesting the AI often pinpointed intermediates even earlier than the first patent disclosure.
Full mechanistic reconstructions for all twenty showcase drugs are in the supplementary file; highlights follow.
Atorvastatin: The system proposed a three-stage trajectory from natural lovastatin → pyrrole open-ring statin → fluorinated biphenyl statin, replaying Warner-Lambert's path [9].
Captopril: Simulated peptide truncation correctly landed on mercaptoproline; docking energies within 0.4 kcal mol⁻¹ of crystallographic pose [10].
Diazepam: ChatGPT recommended N-oxide reduction and N-methylation of chlordiazepoxide before human literature fetch was enabled-evidence of latent knowledge.
Donepezil: Fragment-merging protocol reproduced indanone–benzyl-piperidine junction and rationalised linker length.
Imatinib: Predicted addition of piperazine amide for solubility and a meta-methyl group for PKC avoidance-exact moves recorded by Novartis chemists [11].
Venetoclax: AI introduced a carboxylic acid handle to bias toward BCL-2 over BCL-X_L, mirroring Off-target index improved by 60%.
Empagliflozin: Workflow retained the C-aryl-glucoside core but swapped the distal phenyl for a biphenyl to enhance selectivity and lipophilicity, paralleling Boehringer's late-stage tweaks.
The high similarity metrics validate that LLM-assisted retrosynthesis captures essence, not mere shape. Bioisosteric match-rates confirm that electrostatic fidelity-critical for potency and ADME-was preserved. GPCR dominance reflects both conserved ligand preferences and abundant training data; kinases show more variability owing to promiscuous pocket plasticity.
Case-study narratives reveal emergent rules: "Thiols bind Zn²⁺; tetrazoles mimic carboxylates; para-fluoro blocks oxidation; 4-substituted indanones bridge dual AChE sites." These rules surfaced without explicit coding, demonstrating LLM capacity to fuse structural and textual memory.
We embedded the entire workflow in a prototype web interface. Medicinal-chemistry teams can input a target (sequence, UniProt) and receive:
Turn-around < 1 h for medium-complexity targets positions the tool as a day-one brainstorming assistant. The platform leverages the AlphaFold Protein Structure Database [12] and integrates with CASTp for binding site analysis [13].
The present study demonstrates the feasibility and potential of a novel AI-guided framework for retrosynthetic drug discovery-termed "retrodrug discovery"-that systematically integrates structural biology, generative chemistry, and natural language processing. Our approach successfully reproduces well-characterized lead scaffolds from approved pharmaceuticals, thereby translating historical medicinal chemistry strategies into an automated and reproducible computational pipeline. The strengths and limitations of this framework, as well as future directions for methodological enhancement and translational utility, are discussed in detail below.
A primary strength of this study is the reproducibility and fidelity with which the pipeline recapitulates historically validated lead compounds. Across a diverse panel of 20 case studies and a larger retrospective analysis of 1,000 FDA-approved drugs, our framework achieved an average structural similarity index of 0.82 and a bioactivity prediction concordance of approximately 0.78. Notably, the approach performed exceptionally well with GPCR-targeted compounds (mean similarity ~0.88), suggesting a robust capacity for scaffold recovery in pharmacologically privileged target classes.
The integration of multidisciplinary tools—namely AlphaFold2 for structure prediction, MolGPT for scaffold generation, and ChatGPT for SAR reasoning—enhances the flexibility and applicability of the framework across diverse therapeutic areas. Furthermore, the platform's ability to generate interpretable rationales for scaffold design aligns with current standards for model transparency, providing confidence in AI-generated hypotheses.
From an educational and operational standpoint, the system offers considerable value. By explicitly tracing backward from drug products to plausible historical leads, it serves as both a validation tool for medicinal chemistry reasoning and a didactic instrument for training in rational drug design.
Despite its strengths, the current implementation exhibits several limitations inherent to its retrospective nature:
Selection Bias - The analysis was necessarily restricted to successful, well-documented compounds, introducing a survivorship bias. This restricts the framework's generalizability to novel targets or chemical series with limited prior art.
Synthetic Feasibility - The proposed pipeline does not currently evaluate synthetic tractability. While AI-generated molecules may align structurally with known scaffolds, their practical synthesis-especially with respect to step count, yield, and reagent availability-remains unassessed. Without integration of synthetic route prediction tools, there is a risk of proposing structurally plausible but chemically inaccessible candidates.
Metric Limitations - The reliance on surrogate metrics-such as Tanimoto similarity and docking score concordance—as proxies for pharmacological relevance can obscure subtle but critical determinants of efficacy, such as off-target activity, pharmacokinetics, and toxicity.
Lack of Negative Data - The absence of failed compound data restricts the model's ability to distinguish between productive and unproductive chemical modifications. This imbalance may lead to overconfident scoring of unvalidated scaffolds. The ChEMBL database [14] could potentially address this limitation by providing access to inactive compounds.
Explainability Constraints - Although ChatGPT provides a post-hoc narrative for each design decision, these explanations are derived heuristically and may not represent true causal reasoning. Consequently, while the outputs are intelligible, they may not consistently reflect mechanistically justified insights.
To address these limitations and advance the framework toward broader applicability, several strategic developments are proposed:
Integration of Automated Synthetic Planning Tools: Incorporating synthesis planning engines such as AiZynthFinder will enable evaluation of synthetic feasibility. By assigning synthetic accessibility scores and reaction pathway visualizations, the pipeline can prioritize candidates that are both potent and practically synthesizable.
Incorporation of Toxicity Prediction Modules: Embedding deep-learning models trained on diverse toxicity endpoints (e.g., hERG inhibition, Ames test, hepatotoxicity) will allow early detection of liabilities. This will enhance safety profiling and reduce the risk of downstream attrition.
Deployment of Active-Learning Feedback Loops: Coupling AI-generated designs with rapid bioassays—such as microfluidic or high-throughput binding platforms-will enable iterative refinement based on empirical data. This closed-loop architecture will ensure that the model remains grounded in experimental validation.
Inclusion of Negative and Failed Discovery Data: Mining Electronic Lab Notebooks (ELNs), discontinued pipeline datasets, and open-access repositories (e.g., ChEMBL's inactive series) will enhance the discriminative power of the model. This will reduce overfitting to positive outcomes and improve generalizability to novel chemical space.
Advancing Interpretability and Human-AI Collaboration: Future versions of the system should generate probabilistic confidence estimates and sensitivity analyses to guide decision-making. Additionally, implementation of interactive dashboards that allow medicinal chemists to adjust design parameters (e.g., solubility, lipophilicity, synthetic cost) will enable more effective human-AI co-design.
Building upon the current findings, we propose the development of ChemGPT Discover, a fully orchestrated LLM-based retrosynthesis platform that incorporates the enhancements listed above. Key features under development include:
Integrated structural reasoning via AlphaFold2 docking and pocket profiling.
Multi-objective generative design balancing potency, ADME, and synthetic accessibility.
Transparent SAR justifications with citations to prior literature or patents.
Real-time optimization via active-learning cycles and experimental feedback.
User-defined constraints, allowing medicinal chemists to direct the AI toward specific properties or chemical series.
The platform will utilize AutoDock Vina for molecular docking [15], RDKit for chemoinformatics processing [16], and follow established recommendations for computational method evaluation [17].
Early trials with beta versions of ChemGPT Discover have shown promising results, including reduced design cycle times and improved lead prioritization in preclinical pipelines. These outcomes suggest that the retrodrug discovery paradigm is not only theoretically robust but also practically deployable in translational settings.
Beyond its technical merits, the proposed framework carries several implications for the future of drug discovery:
Educational Utility: The retrosynthetic case studies and scaffold analyses can serve as a pedagogical bridge between classical medicinal chemistry and modern AI-enabled approaches.
Regulatory Alignment: The platform's emphasis on transparency and historical precedent may facilitate regulatory dialogue concerning AI-generated candidates.
Intellectual Property Strategy: By quantifying scaffold novelty and similarity to prior art, the model can aid in freedom-to-operate assessments and guide early patent filings.
Ethical Considerations: Measures will be needed to ensure that outputs do not inadvertently replicate proprietary compounds from training data, especially when AI is trained on patent corpora.
In summary, our AI-driven retrosynthetic framework provides a credible pathway for reconstructing and rationalizing the discovery trajectories of approved drugs. The capacity to recapitulate historical medicinal chemistry logic across diverse therapeutic classes validates the feasibility of "retrodrug discovery" as a strategic complement to forward-design approaches. While current limitations underscore the need for further refinement, particularly in the areas of synthetic planning and toxicity prediction, the integration of these capabilities into a unified platform like ChemGPT Discover holds the promise of accelerating drug design, improving hypothesis quality, and ultimately enhancing translational success. Continued interdisciplinary collaboration between computational scientists, synthetic chemists, and pharmacologists will be essential to fully realize this vision.
D.J.F. conceived the project, developed the computational framework, performed all analyses, and wrote the manuscript.
The author declares no competing interests.
SignUp to our
Content alerts.
Are you the author of a recent Preprint? We invite you to submit your manuscript for peer-reviewed publication in our open access journal.
Benefit from fast review, global visibility, and exclusive APC discounts.