1. Why “Antibodies & Biologics × AI” Is Different from Small Molecules
In Part 3, we focused on small-molecule projects. In Part 4, we turn to antibodies and biologics (including antibodies, bispecifics, antibody–drug conjugates, and fusion proteins). The assumptions here are quite different from those in small-molecule chemistry, and AI is asked to play a different set of roles.
Antibodies and biologics have several distinctive characteristics:
- Structural complexity: large proteins with hundreds of amino acids, multi-domain and multimeric structures, and post-translational modifications such as glycans.
- Tight coupling between sequence, structure, and function: minor sequence changes can dramatically alter affinity, specificity, and stability.
- High importance of developability: aggregation, viscosity, immunogenicity, expression, and manufacturability impose stringent constraints.
- Data are primarily sequence and structure based: unlike small molecules, the core representation is amino-acid sequences and 3D protein structures.
AI is therefore used to support sequence design, affinity maturation, specificity tuning, and developability prediction. At the same time, due to strong links with experimental systems, manufacturing, and regulation, careful design of how AI is used is essential.
2. The Biologics Discovery Workflow and Where AI Fits
Let us first outline a typical antibody/biologics discovery workflow and identify where AI is commonly used.
- (1) Target selection and epitope strategy.
- (2) Antibody generation (e.g., hybridoma, phage display, B-cell screening).
- (3) Affinity maturation and specificity optimization (sequence engineering).
- (4) Developability engineering (aggregation, viscosity, immunogenicity, expression, etc.).
- (5) CMC, scale-up, and formulation development.
Today, AI is most commonly used in steps (2)–(4), but applications in (1) (epitope strategy) and (5) (process optimization) are also emerging.
2-1. AI in Antibody Generation (Step 2)
During antibody generation, AI helps in several ways:
- Clustering sequences from phage display or B-cell repertoires to prioritize promising clones.
- Combining sequence/structure and epitope information to select diverse clones targeting relevant epitopes.
- Comparing candidates against public antibody databases to assess novelty and IP risk.
The emphasis shifts from “test as many clones as possible and sort it out later” to “use AI to focus experimental effort on the most informative clones.”
2-2. Affinity Maturation and Specificity Optimization (Step 3)
This is one of the clearest domains where AI can add value.
- Proposing CDR mutations that jointly consider affinity and specificity.
- Predicting affinity, cross-reactivity, and polyspecificity risk from sequence.
- Comparing candidate epitopes using AI-based scoring to guide epitope focus.
Historically, large mutational libraries and extensive screening dominated this step. Now, AI-guided mutational proposals and focused library design are increasingly used to reduce wet-lab burden.
2-3. Developability Engineering (Step 4)
For biologics, developability is a critical determinant of success. AI models are used to predict:
- Aggregation propensity, viscosity, and stability in solution.
- Expression levels and ease of purification.
- Immunogenicity risk (e.g., T-cell epitopes).
- Non-specific binding and self-reactivity.
By scoring these properties early at the sequence-design stage, teams can filter out hard-to-develop candidates before they consume substantial resources.
3. Biologics-Specific Data Types and AI Models
The dominant data types for antibodies and biologics differ from those for small molecules.
3-1. Sequence Data (Amino-Acid Sequences)
For antibodies, heavy- and light-chain sequences are the primary inputs.
- Framework regions (FR) and complementarity-determining regions (CDRs).
- V(D)J recombination patterns, isotypes, and subclasses.
- Position-specific amino-acid frequencies and mutational tolerance.
AI models here are typically sequence-based deep learning models (LSTMs, Transformers, and protein language models). Many are pre-trained on large antibody datasets and fine-tuned for specific projects.
3-2. Structural and 3D Information
3D structures of antibodies and antigens are essential for epitope–paratope analysis.
- Structures of antibodies alone (e.g., Fab, scFv).
- Complexes of antibodies with antigens.
- Glycans and other modifications, flexibility, and conformational changes.
Advances in structure prediction now allow us to generate approximate 3D models from sequence and derive structural features as model inputs. AI can then learn from binding modes and contact patterns to support affinity and specificity prediction.
3-3. Developability Data
Experimental data related to developability provide valuable labels for AI:
- Thermal stability (Tm) and onset of aggregation.
- Viscosity and particle counts at high concentration.
- Expression yields and purification recovery.
- In vitro and in vivo clearance and distribution.
Linking these measurements to sequence and structural features enables models to learn patterns associated with poor developability. However, these datasets are often company-specific, limited in size and biased toward certain platforms or formats.
4. Representative AI Use Patterns for Antibodies & Biologics
We can group common AI use patterns in antibodies and biologics into several categories.
4-1. Sequence Generation and Optimization
Generative models and protein language models are used to explore and optimize sequence space.
- Learning CDR sequence distributions from existing antibodies and sampling “natural-looking” variants.
- Performing conditional generation under affinity or developability constraints.
- Transforming sequences to increase “human-likeness” (e.g., humanization and de-immunization).
Because these models can generate huge numbers of candidates, effective multi-stage filtering with predictive models is critical in practice.
4-2. Affinity and Specificity Prediction
In affinity maturation, experimental exploration is constrained, so AI-driven prioritization is valuable.
- Predicting affinity using sequence-only models.
- Incorporating structural information from antibody and complex models.
- Scoring off-target binding and cross-reactivity risk.
Absolute predictions remain challenging, but AI models are increasingly reliable for ranking candidate mutations relative to one another.
4-3. Developability Prediction and Design
For developability, AI is particularly strong at early detection of “bad actors.”
- Learning features associated with high aggregation or viscosity and flagging risky sequences.
- Estimating the impact of specific mutations on stability and expression.
- Combining immunogenicity prediction with sequence design to reduce T-cell epitope burden.
Many organizations now run AI-based developability screens before and after candidate selection to reduce late-stage surprises.
4-4. Complex Modalities: Bispecifics, ADCs, and Fusion Proteins
For more complex modalities – bispecifics, ADCs, cytokine or receptor fusion proteins – AI can contribute along several axes:
- Designing linkers and junctions (ADC linkers, fusion junctions).
- Balancing affinity and specificity of each arm in a bispecific format.
- Balancing potency, duration, and safety for immune-modulating biologics.
Because these modalities have many moving parts, AI is useful for structuring the design space and extracting design rules. However, limited data and high complexity mean that hybrid approaches combining physics, expert knowledge, and AI are still essential.
5. What AI Can and Cannot Do (Yet) for Biologics
While AI has clear value in biologics, the nature of its limitations is different from those in small molecules.
5-1. Strengths: Exploring Sequence Space and Failing Fast
Areas where AI tends to perform reliably include:
- Efficiently exploring and ranking variants around existing clones.
- Early elimination of sequences with clear developability red flags.
- Supporting humanization and de-immunization with candidate sequence suggestions.
As with small molecules, AI is especially effective for prioritization and failure acceleration, not for magically producing perfect designs in one shot.
5-2. Remaining Challenges: Epitope Strategy and Immune-Complex Modeling
Some tasks remain largely human- and physics-driven:
- Choosing epitope strategies that are biologically and clinically meaningful.
- Modeling immune complexes, Fc functions, and interactions with effector cells.
- High-precision prediction of rare events such as severe immune-related adverse events.
Future advances in structure prediction, multi-scale simulation, and integrated clinical data may extend AI’s reach here, but in the near term fully automated design remains unrealistic.
5-3. Data Bias and Generalization
Antibody datasets often over-represent certain targets and epitopes, so generalization to novel targets is not guaranteed. Internal developability datasets may be biased toward specific platforms and formats, making cross-project reuse non-trivial. Explicit checks for bias and generalizability are therefore crucial.
6. KPIs and Expectations for Antibodies & Biologics × AI
Finally, how should organizations measure the value of AI in this space?
- R&D and antibody engineering teams
• Reduced number of variants and experiments per affinity-maturation round.
• Fewer reworks triggered by late-discovered developability issues.
• Greater sequence-space coverage with the same headcount and timelines. - Corporate functions, CMC, and manufacturing
• Shifts in the proportion of project kills attributable to developability issues.
• Earlier detection of formulation and manufacturing problems in development.
• Adoption of platform-level frameworks for antibody engineering and developability assessment. - Investors and consultants
• Not just “using AI,” but owning reusable data and models for antibody/biologics design.
• Ability to integrate AI into complex modalities such as bispecifics and ADCs.
• Observable improvements in success rates and throughput across the biologics pipeline.
Clear KPIs help ensure that AI is evaluated as a means to improve the overall quality and efficiency of the pipeline, not an end in itself.
My Thoughts and Future Outlook
In antibodies and biologics, AI has enormous potential, but expectations can easily become misaligned with reality because this space is even more tightly coupled to experimental systems, manufacturing, and regulatory requirements than small molecules. AI is undeniably strong at exploring sequence space and flagging developability risks, yet high-level design questions – epitope strategy, immune-system orchestration, clinical positioning – still rely heavily on human hypothesis generation. Making these role boundaries explicit early on can help teams avoid frustration and “AI fatigue.”
As protein language models, structure prediction, and simulation continue to advance, we may be able to model the chain from sequence to structure to function to clinical outcomes in a more integrated way. If that happens, the primary bottleneck is likely to be not algorithms but how systematically organizations capture and integrate experimental and manufacturing data. In the next parts of this series, we will turn to nucleic-acid medicines and cell and gene therapies, and ask to what extent AI can provide shared design principles across these increasingly complex modalities.
This article has been edited by the Morningglorysciences team.
Related Articles







Comments