From Beginner to Expert: AI in Drug Discovery – A Definitive Guide from Lab to Market (Part 6: “Cell & Gene Therapy × AI”) maps how AI is used from CAR-T/TCR and vector design to manufacturing and quality control, and clarifies both its potential and its current limitations in these complex modalities.

TOC

1. Why “Cell & Gene Therapy × AI” Is Both Difficult and Important

In Part 6, we focus on cell therapies (e.g., CAR-T, TCR-T) and gene therapies (e.g., AAV or lentiviral vectors). Compared with small molecules, antibodies, or RNA therapeutics, these modalities are challenging for several reasons:

  • The therapeutic unit is a living cell or vector, with many sources of variability.
  • The intervention acts at the genomic level, raising concerns about long-term and irreversible effects.
  • The manufacturing process is effectively part of the therapy, and batch-to-batch and patient-to-patient variability is high.
  • Safety must account not only for acute toxicity but also for insertional mutagenesis, tumorigenicity, and complex immune events over time.

AI is beginning to show value in three key areas: design, manufacturing & quality, and data integration. At the same time, we are far from any realistic notion of fully automated AI-designed therapies; the key is to be very specific about which parts of the problem AI should tackle.

2. The Cell & Gene Therapy Value Chain and Where AI Fits

Although different platforms have their own specifics, we can abstract a common value chain as follows:

  • (1) Disease and target selection (target cells/tissues, antigens, genes).
  • (2) Therapeutic concept design (CAR/TCR design, gene-transfer strategy, etc.).
  • (3) Vector and construct design (promoters, enhancers, cassettes, etc.).
  • (4) Cell manufacturing and gene-transfer process design and optimization.
  • (5) Non-clinical evaluation (in vitro and in vivo models).
  • (6) Clinical trials and real-world data collection.
  • (7) Commercial manufacturing, quality control, and supply chain.

AI is increasingly applied in steps (2)–(4) and (6).

2-1. Target and Concept Design (Steps 1–2)

For CAR-T/TCR targets and gene-therapy target tissues or cells, AI supports:

  • Integration of single-cell RNA-seq, spatial transcriptomics, and proteomics to quantify differences between tumor and normal tissues.
  • Estimating on-target off-tumor risk by modeling antigen expression patterns across tissues.
  • Positioning the therapeutic concept within the existing treatment landscape and lines of therapy.

Again, AI does not “decide” the target autonomously; it functions as a filter that highlights hazardous and promising candidates based on multi-omics evidence.

2-2. CAR/TCR and Gene-Cassette Design (Steps 2–3)

Design of CARs, TCRs, and gene-therapy constructs is one of the more natural applications of AI:

  • Learning relationships between CAR signaling-domain architectures (e.g., co-stimulatory domains, spacer length) and activation, persistence, and toxicity.
  • Modeling associations among TCR sequences, affinity, specificity, and self-reactivity or cross-reactivity.
  • Linking AAV capsid sequences to tropism and immunogenicity.
  • Relating promoter/enhancer and cassette structure to expression levels and kinetics.

These models are far from perfect, but they are increasingly useful for coarse-grained design and prioritization early in the process.

2-3. Manufacturing and Quality (Steps 4 & 7)

For many organizations, the most pressing need is to apply AI to stabilize manufacturing and quality.

  • Modeling relationships between culture conditions (media, cytokines, gas, time, etc.) and cell phenotype, effector function, and viability.
  • Using sensor data from cell-processing equipment for real-time anomaly detection and predictive maintenance.
  • Combining batch-level quality attributes (surface markers, functional assays, vector copy number, etc.) into release-decision support tools.

In practice, AI is best deployed as part of “instrument + analytics” systems, with humans in the loop, rather than as a fully autonomous controller.

3. Data Types and AI Models Specific to Cell & Gene Therapy

This space is dominated by data types quite different from those of small molecules or conventional biologics.

3-1. Single-Cell and Spatial Omics

Understanding CAR-T targets and the tumor microenvironment increasingly relies on single-cell and spatial data:

  • Cell-type-specific expression profiles across tumor cells, immune cells, and stromal cells.
  • Spatial organization of cell types and signaling pathways within the tumor microenvironment.
  • Changes in cell composition and states before and after therapy.

AI performs clustering, dimensionality reduction, trajectory inference, and integration across datasets to support target selection and resistance-hypothesis generation.

3-2. Vector and Construct Sequence Data

AAV and lentiviral vectors, as well as CAR/TCR cassettes, are encoded as DNA sequences:

  • Capsid sequences and their relationships to tropism and immunogenicity.
  • Promoter/enhancer sequences and expression levels in specific tissues or cell types.
  • CAR/TCR sequences and their links to signaling strength, persistence, and exhaustion.

AI here uses sequence-based models (DNA/protein language models, CNNs, Transformers) to approximate the mapping from sequence to function. Given limited data, these models often need to be paired with mechanistic/physics-based or rule-based approaches.

3-3. Manufacturing and Bioprocess Data

Cell and vector manufacturing generate time-series and batch-level data:

  • Time courses of cell density, metabolites, dissolved oxygen, pH, and other process variables.
  • Flow cytometry and imaging data characterizing surface markers and morphology.
  • Batch-level quality attributes, yields, and root causes of failures.

Time-series and anomaly-detection models can learn from these data to enable more stable manufacturing and earlier detection of deviations.

4. Representative AI Use Cases in Cell & Gene Therapy

4-1. CAR/TCR Design and Optimization

In practice, AI is used to support CAR/TCR design in several ways:

  • Mining CAR/TCR datasets to learn relationships between domain architectures and clinical outcomes, generating initial design templates.
  • Scoring self-reactivity and cross-reactivity risk based on epitopes, HLA types, and TCR sequences.
  • Combining single-cell data with T-cell phenotypes to identify signaling designs that favor less exhausted cell states.

We are not at the point where AI can “design the best CAR/TCR in one shot,” but it is well-suited for constraining the design space and flagging dangerous regions.

4-2. AAV and Lentiviral Vector Design

For viral vectors, AI is applied to:

  • Learn from capsid mutation-library screens to associate sequence with tropism and immunogenicity, and propose new capsid candidates.
  • Relate promoter/enhancer sequences to expression patterns, supporting cassette designs that achieve the desired expression in target tissues.
  • Analyze vector integration patterns and copy numbers in the context of insertional mutagenesis risk.

Because data are often sparse, a practical pattern is to iterate through in silico prediction → small-scale experiments → model refinement in short cycles.

4-3. Monitoring and Optimizing Cell Manufacturing

Cell manufacturing suffers from patient-to-patient and batch-to-batch variability even under nominally identical protocols. AI is used to:

  • Analyze sensor, imaging, and flow-cytometry data in near real time to detect emerging deviations between batches.
  • Mine historical process and quality data to discover parameter patterns associated with high yields and robust function.
  • Integrate release-test data to flag batches at higher risk for clinical performance issues.

Again, the realistic model is a human–AI co-monitoring setup, not full automation.

4-4. Clinical and Real-World Data Analysis

Even though sample sizes are limited, clinical and real-world data in cell and gene therapy are rich and heterogeneous. AI can:

  • Integrate biomarkers, genomic features, and microenvironment data to identify patient subgroups with higher response probabilities.
  • Learn patterns associated with severe adverse events and help define high-risk subpopulations.
  • Use longitudinal data to extract risk factors for long-term outcomes during follow-up.

The goal is not just post hoc explanation but also to inform next-generation protocol design and patient selection.

5. What AI Can and Cannot Do (Yet) in Cell & Gene Therapy

5-1. Strengths: Structuring Design Spaces, Stabilizing Manufacturing, and Stratifying Risk

Today, AI is relatively strong in:

  • Structuring CAR/TCR, vector, and promoter design spaces to avoid obviously risky regions.
  • Using process data to support early deviation detection and yield improvement.
  • Stratifying responders vs. non-responders and high-risk subgroups based on clinical and real-world data.

In all of these, AI is best treated as an information-support layer for human decision-makers, not as a fully autonomous designer.

5-2. Remaining Challenges: Long-Term Safety, Tumorigenicity, and Individual Variability

AI remains weak at tasks such as:

  • Predicting how integration events may affect patients years later.
  • Modeling long-term interactions among immune reconstitution, tumor evolution, and microenvironment dynamics.
  • Anticipating rare but severe adverse events from small early-stage trials.

In these domains, AI can assist in identifying risk factors and stratifying patients, but final safety judgments must remain firmly with clinical and safety experts.

5-3. Data Sparsity and the “High-Dimensional, Low-Sample” Problem

Cell and gene therapy datasets are a classic example of “high-dimensional, low-sample” conditions:

  • Single-cell, spatial omics, manufacturing, and clinical data are all high-dimensional.
  • For a given product or protocol, the number of patients may remain in the tens to hundreds for quite some time.

This reality forces us to keep models relatively simple and to rely on hybrid approaches that combine mechanistic knowledge and physics-based models with AI.

6. KPIs and Expectations for R&D, Corporate Functions, and Investors

To evaluate AI in cell and gene therapies, different stakeholders need different KPIs.

  • R&D and process-development teams
    • Reduction in time and cost per design–manufacture–test cycle.
    • Reduction in batch-to-batch and patient-to-patient variability in key quality attributes.
    • Changes in the frequency and impact of unexplained process deviations after AI deployment.
  • Corporate functions, quality, and supply chain
    • Degree of visibility into quality indicators across sites and lines.
    • Scope and results of AI-enabled real-time quality monitoring.
    • Reduced risk and verification cost when implementing process changes.
  • Investors and consultants
    • Depth of cell and gene therapy–specific data and know-how, beyond generic AI claims.
    • Coverage across layers such as CAR/TCR design, vector design, manufacturing/quality, and clinical/RWD analytics.
    • How AI-enabled processes are integrated into regulatory interactions and approval pathways.

With clear KPIs, organizations are better positioned to avoid situations where “AI increased complexity without improving outcomes.”

My Thoughts and Future Outlook

Among all modalities discussed in this series, cell and gene therapies may be the most challenging for AI. Because design, manufacturing, and patient-level variability all compound, it is structurally difficult to “predict accurately from small datasets,” and this is where the limits of AI become most visible. At the same time, AI is already proving useful for structuring design spaces for CAR/TCRs and vectors, making manufacturing variability more transparent, and stratifying risk in clinical data.

As sample sizes grow, follow-up periods lengthen, and data infrastructures for manufacturing and omics improve, we can expect faster feedback loops linking design → manufacturing → clinical outcomes → redesign. When that happens, the key differentiator will not be who has the fanciest algorithm but rather who has been systematically designing their data and AI workflows from an early stage. In the next and final parts of this series, we will step back and synthesize a practical roadmap for AI-driven drug discovery across modalities, including implications for investment and business strategy.

This article has been edited by the Morningglorysciences team.

Related Articles

Comment Guideline

💬 Before leaving a comment, please review our [Comment Guidelines].

Let's share this post !

Author of this article

After completing graduate school, I studied at a Top tier research hospital in the U.S., where I was involved in the creation of treatments and therapeutics in earnest. I have worked for several major pharmaceutical companies, focusing on research, business, venture creation, and investment in the U.S. During this time, I also serve as a faculty member of graduate program at the university.

Comments

To comment

CAPTCHA


TOC