1. Why Small Molecules Became the Primary Testbed for AI
When you scan AI-in-drug-discovery papers and case studies, the first thing you notice is how many of them focus on small molecules. This is not just because small-molecule R&D has a long history; there are structural reasons:
- Chemical structures can be represented as SMILES or graphs, which are convenient for machine learning.
- Decades of activity data from HTS and profiling campaigns are available.
- QSAR and related methods are already embedded in workflows, making AI feel like an incremental extension rather than a radical change.
- Automation in synthesis and screening enables fast Design–Make–Test–Analyze (DMTA) cycles that AI can help orchestrate.
In short, small-molecule projects offer rich data, mature representations, and large efficiency gains – a natural testbed for AI. In this part, we focus on how AI is used from hit finding to lead optimization.
2. The Small-Molecule Workflow and Where AI Fits
A typical small-molecule discovery workflow includes:
- (1) Target selection and validation
- (2) Library design
- (3) Hit finding (HTS / virtual screening)
- (4) Hit-to-lead and series selection
- (5) Lead optimization (MPO, ADMET optimization, candidate selection)
(1) is closely tied to omics and network analysis discussed in earlier parts. Here, we zoom in on (2)–(5).
2-1. Library Design
In library design, AI helps to:
- Assess chemical space coverage of existing libraries.
- Learn “good chemistry” and “structures to avoid” from historical successes and failures.
- Design focused libraries tailored to particular targets, pathways, or modalities.
The goal is to move from purely diversity-driven libraries toward information-rich libraries aligned with disease and target strategy.
2-2. Hit Finding: HTS and Virtual Screening
During hit finding, AI is used in several ways:
- Pre-screening libraries virtually to narrow down candidates before HTS.
- Re-scoring noisy HTS data to recover plausible hits and reduce false positives/negatives.
- Combining docking with AI-based rescoring to leverage both physics-based and data-driven signals.
The primary value here is not “massively increasing hit rates” but reducing cost and time while minimizing missed opportunities.
2-3. Hit-to-Lead and Series Selection
Once hits are identified, teams face questions such as:
- Which scaffolds are worth deep investment?
- Which series has greater long-term optimization potential?
- Which scaffolds should be dropped early because of safety or ADMET risk?
AI supports these decisions by assessing not only potency but also predicted ADMET, off-target profiles, and synthetic accessibility, helping teams choose series with a better chance of surviving downstream MPO.
2-4. Lead Optimization and MPO
In lead optimization, AI primarily strengthens the Design and Analyze phases of the DMTA loop:
- Suggesting “what to make next” using generative models and predictive models.
- Computing composite scores that integrate potency, selectivity, ADMET, and synthetic feasibility.
- Visualizing structure–activity relationships (SAR) and highlighting promising directions or red flags.
The key is not to let AI “blindly search” but to embed human strategy and constraints into the search process.
3. Representative AI Approaches for Small Molecules
Let us group the main AI approaches used for small molecules by their role.
3-1. “Reinvented QSAR”: Modern Predictive Models
While QSAR is an old concept, the toolset has evolved:
- Molecular descriptors + classical ML (random forests, gradient boosting, etc.).
- Molecular graphs + GNNs.
- SMILES- or sequence-based Transformers and related deep models.
These models power tasks such as:
- On-target potency prediction.
- Off-target and safety flag prediction.
- ADMET property prediction (solubility, permeability, clearance, etc.).
They are applied across hit, hit-to-lead, and lead optimization stages.
3-2. Structure-Based Methods: Docking Plus AI
When protein structures are available, AI can augment docking and molecular dynamics:
- Re-scoring docking poses using AI to improve ranking quality.
- Learning binding affinity from protein–ligand complexes and 3D features.
- Clustering pocket properties to suggest new binding modes or allosteric sites.
The combination of physics-based and data-driven components aims to blend the strengths of both, while mitigating their respective weaknesses.
3-3. De Novo Design and Generative Models
Generative models are used to propose new molecules. Common use cases include:
- Local exploration around existing series, guided by SAR.
- Scaffold hopping to bridge gaps in chemical space and IP space.
- Conditional generation under target and property constraints.
In practice, only a small fraction of generated molecules will ever be synthesized. Therefore, strong filters for chemical validity, synthetic feasibility, and safety risk are essential.
3-4. Retrosynthesis and Synthetic Feasibility
AI also addresses the question “How do we make this?”:
- Retrosynthesis models that propose synthetic routes.
- Synthetic accessibility scores to filter out impractical designs.
- Ranking routes with an eye on protecting-group strategies and scale-up.
This helps avoid spending DMTA cycles on molecules that are attractive on paper but impractical in the lab or at manufacturing scale.
4. Practical Use Cases
Below are some illustrative real-world scenarios.
- Use Case 1: Pre-HTS Virtual Triage
Pre-score libraries based on models trained on public and internal assays to decide which plates to screen.
→ Reduce HTS cost and time while preserving hit diversity. - Use Case 2: Series Selection in Hit-to-Lead
Compare series not only by potency but also by predicted ADMET, off-target risk, and synthetic feasibility.
→ Prioritize series with better long-term MPO potential, not just short-term numbers. - Use Case 3: MPO Navigation
Visualize trends in properties as functions of structural changes, highlighting directions that tend to harm solubility or increase toxicity.
→ Complement chemist intuition and reduce unproductive exploration. - Use Case 4: Early Safety Flagging
Use historical safety and off-target data to flag high-risk chemotypes early.
→ Reduce the likelihood of late-stage surprises and costly resets. - Use Case 5: Integrating Chemical and IP Space
Combine patent and structure data to identify “open” regions of chemical space with attractive properties.
→ Align discovery strategy with IP strategy using AI as a joint navigator.
5. What AI Can and Cannot Do (Yet) for Small Molecules
Small-molecule projects have provided many early AI success stories, but the limitations are equally clear.
5-1. AI Is Very Good at “Failing Fast”
Today, AI’s strongest contribution is often failing fast:
- Eliminating obviously high-risk structures early.
- Flagging profiles that are ADMET-wise almost certainly unworkable.
- Avoiding repeatedly investing in known “dead-end” scaffolds.
This allows teams to focus resources on candidates with better odds, which can have a substantial impact on timelines and cost.
5-2. AI Does Not Yet Own Mechanism Discovery
What AI does not yet do reliably is to independently discover entirely new mechanisms or targets. In practice:
- Humans still define the hypothesis space (e.g., which protein domain to address, which pathway to modulate).
- AI explores and optimizes within that hypothesis space.
Of the two jobs – designing the hypothesis space and exploring it – AI is excellent at the second, but the first still relies heavily on deep biological understanding and human creativity.
5-3. Downstream Realities: Synthesis, Scale-Up, and Formulation
Even when AI proposes attractive molecules, hard constraints remain:
- Are synthetic routes realistic for bench chemists?
- Will scale-up produce acceptable impurity profiles and stability?
- Are formulation and manufacturing costs compatible with the intended market?
Ignoring these leads to an accumulation of designs that look ideal in silico but are not actionable in real-world development.
6. Setting KPIs and Expectations: R&D, Corporate, and Investors
How you judge success for small-molecule AI projects depends heavily on your KPIs.
- R&D teams
• Shorter DMTA cycle times per iteration.
• More “meaningful learning” per compound synthesized and assayed.
• Reduced overall cost to reach candidate selection. - Corporate and DX functions
• Productivity differences between AI-enabled and non-AI projects across the portfolio.
• Degree of reuse across projects (platform vs. one-off tools).
• Talent mix between chemists and data scientists. - Investors and consultants
• Depth of AI integration relative to peers, and quality and uniqueness of data assets.
• For AI-focused companies: advantages in data access and integration, not just algorithms.
• For pharma: concrete project outcomes, not just AI spending.
Without clear KPIs, organizations risk treating “using AI” as an end in itself, making it difficult to assess impact a few years down the line.
My Thoughts and Future Outlook
A recurring pattern in successful small-molecule AI programs is an early, explicit separation between what AI is good at and what it should not be asked to do. AI is extremely good at ranking, filtering, and prioritizing – essentially, helping teams fail fast and focus on promising directions – but it is not yet a replacement for target selection or mechanistic insight. The most productive teams treat AI as a powerful search engine inside a hypothesis space that humans define, not as an oracle that will generate strategy from scratch.
As generative and foundation models evolve, the next frontier will be how tightly they are coupled to synthesis, scale-up, and formulation realities. Platforms that bring chemistry, process development, and manufacturing constraints closer to the “design loop” are likely to build a long-term edge. In the next parts of this series, we will look at other modalities – antibodies, peptides, nucleic acids, and cell and gene therapies – and examine where lessons from small molecules transfer, and where we must fundamentally rethink design principles for AI-driven discovery.
This article has been edited by the Morningglorysciences team.
Related Articles






Comments