1. Recap: A Cross-Modality “AI in Drug Discovery” Map
In Parts 1–6, we walked through:
- Small molecules × AI
- Antibodies, bispecifics, and ADCs × AI
- Nucleic acid and RNA therapeutics × AI
- Cell and gene therapies × AI
Each time, we looked at the value chain and discussed what AI can do today and what remains difficult. In this final part, we step back to build an integrated view across:
- Phases (from discovery to post-marketing),
- Modalities (small molecules, antibodies, RNA, cell and gene therapies, etc.), and
- Stakeholders (R&D, corporate functions, manufacturing, regulators, investors, consultants).
Our goal is to provide a shared mental model of where AI can realistically add value, and where biology, experiments, and human judgment must remain in the lead.
2. Phase-by-Phase Roadmap: From Discovery to Post-Marketing
We can decompose the end-to-end journey from discovery to market into six phases:
- (1) Target discovery and disease understanding
- (2) Hit identification and lead generation
- (3) Lead optimization and preclinical development
- (4) Clinical development (Phase I–III)
- (5) Approval, pricing, and market access
- (6) Post-marketing (RWE, safety, lifecycle management)
2-1. (1) Target Discovery and Disease Understanding
Here, teams integrate multi-omics, imaging, literature, and clinical evidence to decide where to intervene. AI supports:
- Clustering and factor analysis of large omics datasets (bulk/single-cell RNA-seq, proteomics, etc.).
- Generating and ranking mechanistic hypotheses across human, in vitro, and in vivo data.
- Mapping known targets, modalities, and competitive landscapes across publications, patents, and trial registries.
In this phase, AI’s primary role is to highlight patterns and prioritize hypotheses, not to autonomously “pick the right target.”
2-2. (2) Hit Identification and Lead Generation
Across modalities, AI tries to expand the world of possible candidates while focusing experiments on the most promising ones:
- Small molecules: generative models, virtual screening, and docking surrogates.
- Antibodies and bispecifics: paratope/epitope prediction, sequence generation, immunogenicity-risk scoring.
- RNA/nucleic acids: sequence design, on-/off-target prediction, UTR and modification patterns.
- Cell/gene therapy: CAR/TCR architecture, capsid sequence, promoter design.
The common pattern is to treat AI as a tool for structuring the search space and prioritizing experimental shots on goal.
2-3. (3) Lead Optimization and Preclinical Development
In lead optimization, we try to evolve “potent but problematic” candidates into “potent and acceptable” ones. AI can:
- Perform multi-objective optimization across activity, ADME, safety, developability, and cost.
- Build predictive models linking structure, sequence, formulation, and process conditions to experimental readouts.
- Support human PK/PD predictions and first-in-human dose setting based on integrated preclinical data.
Still, predicting rare serious adverse events or long-term safety in humans remains fundamentally hard. In this phase, AI is a tool for prioritizing and classifying risk, not for guaranteeing safety.
2-4. (4) Clinical Development (Phase I–III)
In clinical phases, AI’s potential lies in:
- Integrating biomarkers, genomics, imaging, and EHR data for patient selection and stratification.
- Monitoring changes in endpoints and safety signals in near real time, informing decisions about adaptation or termination.
- Leveraging external and real-world control data to improve trial-design efficiency.
Given regulatory and statistical constraints, AI output must be framed as decision support for statisticians and clinicians, not as an autonomous driver of trial decisions.
2-5. (5) Approval, Pricing, and Market Access
AI is also increasingly relevant around approval and access:
- Simulating cost-effectiveness and budget impact under different scenarios using trial and RWE data.
- Exploring future scenarios for label expansions and combinations to maximize value.
- Analyzing HTA and reimbursement trends to optimize launch sequences across countries.
Here, AI is best viewed as an engine for running multiple “what-if” scenarios and sensitivity analyses, helping teams understand trade-offs, rather than as a black-box recommendation engine.
2-6. (6) Post-Marketing: RWE, Safety, and Lifecycle Management
After approval, real-world and safety data accumulate rapidly. AI supports:
- Signal detection using AE reports, EHRs, registries, and claims data.
- Analyzing relationships among prescribing patterns, co-medications, and patient characteristics to identify risky usage patterns.
- Evaluating real-world effectiveness and generating hypotheses for label expansions or regimen optimization.
Because RWE involves multiple stakeholders (clinicians, regulators, payers, industry), this may become one of the earliest areas where AI’s role stabilizes and becomes routine.
3. Strengths and Weaknesses of AI by Modality
Let us now summarize “where AI is strong and where it is weak” across modalities.
3-1. Small Molecules: Particularly Strong in Design and Optimization
- Strengths: generative design, virtual screening, multi-objective optimization for ADME/safety, synthesis-route suggestions.
- Weaknesses: long-term safety, rare toxicities, and complex DDIs in humans.
Thanks to decades of accumulated data, model performance is relatively robust, making small-molecule spaces fertile ground for practical AI deployments.
3-2. Antibodies, Bispecifics, and ADCs: Sequence, Structure, and Developability
- Strengths: sequence design, affinity prediction, immunogenicity assessment, developability screening.
- Weaknesses: holistic in vivo behavior of ADCs that combines linker, payload, and heterogeneous target expression.
Antibodies are structurally and data-wise well suited to AI, and we can expect continued progress here.
3-3. RNA and Nucleic-Acid Therapeutics: Compressing Sequence–Modification–Delivery Space
- Strengths: sequence design, on-/off-target prediction, UTR/modification patterns, LNP formulation search.
- Weaknesses: long-term safety, rare immune events, and whole-body behavior with patient variability.
Because the combinatorial space is enormous, AI’s role in compressing the design space is especially critical here.
3-4. Cell and Gene Therapies: Highly Challenging, Yet Useful in Design and Manufacturing
- Strengths: coarse-grained CAR/TCR/vector design, manufacturing monitoring and anomaly detection, RWD-based risk stratification.
- Weaknesses: long-term tumorigenicity, insertional mutagenesis, and individualized immune dynamics.
Given the “high-dimensional, low-sample” nature of the data, AI is best treated as a tool for visualization and pattern recognition rather than precise prediction in this space.
4. Final Synthesis: What AI Can and Cannot Do
4-1. What AI Is Good At
- Compressing search spaces: filtering vast candidate sets into promising and risky subsets.
- Multi-objective optimization: balancing activity, ADME, safety, manufacturability, and cost.
- Pattern discovery: revealing correlations and clusters that are hard to see with human intuition alone.
- Scenario simulation and sensitivity analysis: running “what-if” scenarios to pinpoint bottlenecks and leverage points.
4-2. What Remains Difficult or Often Misunderstood
- AI is not a universal “automatic drug designer”. It is a collection of tools for partial optimization and decision support, not a replacement for the full discovery and development process.
- Predicting long-term human safety or rare events is fundamentally hard. Overconfidence here is dangerous.
- Extrapolation beyond the training domain is risky. Changing platforms can collapse model performance.
- AI does not automatically reduce costs. Data infrastructure, talent, and governance often require substantial upfront investment.
5. A Practical Checklist for Implementing AI in Drug Discovery
In practice, success depends less on which model you choose and more on how you embed AI into the organization. A simplified checklist:
5-1. Data Strategy and Infrastructure
- Have you clearly defined which data should be treated as strategic assets for each modality?
- Are experimental, manufacturing, and clinical data linked in an analyzable way?
- Do you have processes for data quality, missingness, and labeling?
5-2. Model Development and MLOps
- Are each model’s purpose, domain of applicability, and constraints explicitly documented?
- Do you have mechanisms for monitoring drift and versioning models over time?
- Is there a structured forum where domain experts review model outputs?
5-3. Governance and Accountability
- Is it clear how much to trust each model and who holds final responsibility for decisions?
- Can you explain model usage and limitations in a way that withstands regulatory scrutiny?
- If using external AI vendors, are data use, IP, and liability clearly defined in contracts?
5-4. Talent and Culture
- Do domain experts (medicinal chemists, biologists, clinicians, etc.) and data scientists collaborate frequently?
- Is the culture oriented toward using AI effectively, rather than “outsourcing thinking to AI”?
- Do you have a learning loop that captures both successes and failures and feeds them back into practice?
6. How Investors and Consultants Can Assess “AI in Drug Discovery”
For investors and consultants, distinguishing substance from hype is critical. Useful questions include:
- Beyond algorithms, what is the depth and uniqueness of data and experimental infrastructure?
- Is there a coherent strength in specific modalities (small molecules, antibodies, RNA, cell/gene therapy), rather than generic claims?
- Can the team articulate concrete KPIs for what AI has improved, both qualitatively and quantitatively?
- Is there a credible story from lab to market, including regulatory, manufacturing, and supply-chain considerations?
- Is the talent portfolio balanced among scientific, data, and business expertise?
Focusing only on the “AI” label can be misleading. The durable differentiation often lies in data design, deep modality understanding, and integrated operations that tie AI tightly to experiments and decision-making.
My Thoughts and Future Outlook
Across this series, a central theme emerges: “AI in drug discovery” is not a single technology, but rather a set of tools applied differently across modalities and phases. For small molecules and antibodies, AI is already being used at a practical level in design and optimization. For RNA, cell, and gene therapies, data structure and sample-size limitations mean AI is, for now, mainly a tool for exploration efficiency and risk stratification rather than precise prediction.
Looking ahead over the next decade, we can expect more multi-modal foundation models and deeper integration with automated labs, accelerating the cycle of design → experiment → analysis → redesign. When that happens, competitive advantage will likely depend less on who deploys the flashiest models and more on who has been systematically designing their data and AI workflows from an early stage: which data they collect, at what granularity, and in which parts of the pipeline. AI in drug discovery is, beneath the hype, a long-term game of cumulative learning. The differences built up quietly today will crystallize into meaningful cross-modality advantages several years from now.
I hope this series helps researchers, pharma employees, corporate functions, investors, and consultants establish a shared baseline for discussing what to expect from AI in drug discovery—and where to remain cautious. The next step is local: defining, for each organization, which AI use cases make the most sense as a first step given its modalities, pipeline, and structure, and then implementing them in an iterative, learning-oriented way.
This article has been edited by the Morningglorysciences team.
Related Articles










Comments