Proteins Are “More Stable” Than We Thought: Mega-scale Experiments and Interpretable Models Refresh Design Rules

2025-08-14

Even with solved structures, folding thermodynamics (ΔG) has long been hard to see. Recent work closes this gap with mega-scale measurements and a simple scheme—additive energy models with sparse pairwise couplings. This article connects measurement, prediction, and drug-oriented design in a gentle, student-to-researcher style.

TOC

1. Why revisit protein stability now?

Assays like cDNA display proteolysis can evaluate ~900k sequences in a week under matched conditions, yielding ~776k absolute stability values. Such scale provides the fuel that powers both understanding and design.

2. Three pillars (what’s been updated)

Mega-scale measurements (Nature 2023): absolute stabilities for natural and de novo mini-domains, covering all single mutants and selected doubles in one framework.
Genetic architecture (Nature 2024): accurate high-dimensional prediction using additive free energies plus a small, sparse set of pairwise couplings tied to contacts and backbone distance.
“More robust” cores (Science, DDN 2025): cores behave less like Jenga and more like LEGO. With compensatory mutations, surprisingly bold combinations can still fold; a model trained on one SH3 generalized to 50k+ SH3 sequences.

3. cDNA display proteolysis, explained

Folded proteins resist proteolysis more than unfolded ones. The assay challenges large variant libraries with proteases, then uses NGS to quantify survivors and convert an estimated metric (e.g., K₅₀) into stability.
Notes: deviations can arise from reduced cooperativity, non-equilibrium conditions, or aggregation. The method is especially effective for small or single domains.

4. A simpler key: additive models + sparse pair couplings

We cannot “measure everything” in combinatorial space. An interpretable energy model—additive ΔΔG terms plus a sparse set of pairwise couplings and an explicit link for global epistasis—delivers strong performance. Couplings align with structural contacts and decay with backbone distance.

Randomly piling up many mutations usually unfolds proteins (e.g., 2–8% folded at 5 mutations; <0.2% at 10), which is why choosing which couplings to measure matters.

5. “More robust than expected” cores—enabling bolder designs

Recent SH3 work shows that otherwise deleterious mutations can be tolerated with compensatory changes, making cores more “LEGO-like” than fragile. This supports bolder combinatorial edits and practical resurfacing campaigns.

6. Direct applications (antibodies, enzymes, clinical variants)

Resurfacing to lower immunogenicity: redesign surfaces without losing stability by anchoring energy terms in experimental data.
Stabilization design: data-driven redesigns can add stability while preserving function.
Variant interpretation: ΔΔG plus (sparse) couplings aid mechanistic hypotheses and prioritization.

7. Mini-guide (6-step workflow)

Define the target domain and run a shallow DMS to map single-mutant ΔΔG.
Use contacts/backbone distance to select a small set of pair couplings for targeted measurement.
Model the nonlinearity (global epistasis) between energy and phenotype.
Train the additive + sparse-coupling energy model and screen the design space.
Profile function, developability, immunogenicity.
Advance staged experimental validation (biophysics → activity → cell).

8. Glossary for students

ΔG / ΔΔG: free energy of folding / its change upon mutation. More negative ΔG often implies stabilization.
Global epistasis: the nonlinear mapping from free energy (continuous) to phenotype (often bounded/nonlinear).
Sparse pair couplings: only a subset of residue pairs contribute meaningful energetic interactions; linked to contacts and backbone proximity.

9. Building a “good loop” between experiments and ML

Consistent, large-scale measurements guide model training; in turn, models guide which pair couplings and designs to measure next—closing the loop of measurement → model → design.

10. Take-home messages

Mega-scale assays now deliver absolute stability at week-scale for vast libraries.
Interpretable energy models (additive + sparse couplings) are surprisingly strong.
Protein cores are more robust—plan bolder combinatorial designs and resurfacing with compensatory mutations in mind.

This article was produced by the Morningglorysciences editorial team.

Let's share this post !

Copied the URL !

Copied the URL !

Author of this article

Morning Glory Sciences

After completing graduate school, I studied at a Top tier research hospital in the U.S., where I was involved in the creation of treatments and therapeutics in earnest. I have worked for several major pharmaceutical companies, focusing on research, business, venture creation, and investment in the U.S. During this time, I also serve as a faculty member of graduate program at the university.