u/XpertAI

ODesign vs BoltzGen: are we entering the “general-purpose biomolecular design model” era?

Paper: ODesign: A World Model for Biomolecular Interaction Design
Repo: ODesign

Wanted to discuss ODesign, especially in the context of models like BoltzGen, and RFdiffusion3.

The key distinction is that ODesign is closer to an “all-to-all” biomolecular design model, while BoltzGen is more like a universal protein/peptide binder design model. ODesign tries to design across multiple molecular modalities:

  • proteins
  • peptides
  • DNA/RNA
  • small molecules
  • multimolecular complexes

BoltzGen, by contrast, mainly designs protein-like binders: miniproteins, peptides, cyclic peptides, nanobodies, and antibody-like binders, against many target types.

So the difference is roughly:

BoltzGen:
“Given a biomolecular target, design a protein/peptide binder.”

ODesign:
“Given a biomolecular target/interface, design the appropriate molecular partner, potentially protein, nucleic acid, or ligand.”

That makes ODesign broader in ambition, but BoltzGen currently looks stronger on experimental validation. BoltzGen reports validation across nanobodies, miniproteins, peptides, cyclic peptides, and challenging target classes, while ODesign’s wet-lab validation appears mainly focused on protein minibinders so far, with other modality validation still pending.

Technically, ODesign is interesting because it builds on an AlphaFold3-like structure-prediction backbone. It uses unified generative tokens for different chemical modalities, then performs conditional all-atom diffusion to generate coordinates. After that, an inverse-folding/type-design module assigns amino acids, nucleotides, or ligand atom types depending on the modality. The clever part is the masking system. ODesign can mask at different levels:

  • whole molecule/entity level
  • residue/token level
  • atom/motif level

That lets it handle tasks like binder design, motif scaffolding, ligand-binding protein design, aptamer-like design, and ligand generation in one framework.

Compared with other models:

RFdiffusion3 is probably the closest “serious” competitor from the protein-design side. It is all-atom and can design proteins in the context of ligands, DNA/RNA, and other molecules, but it is still mostly about generating proteins, not freely switching between protein, nucleic acid, and ligand outputs.

I think, BoltzGen feels closer to a practical wet-lab binder design tool today.
ODesign feels like the broader future direction: a unified model for programmable molecular interaction design across modalities.

The big question is whether ODesign’s cross-modality promise will translate experimentally beyond protein minibinders. If it can actually produce validated RNA/DNA binders, ligand designs, and non-protein interaction partners, that would be a major step beyond current protein-centric design workflows.

Curious what people think: are these “world models” actually becoming useful design engines, or are we still mostly benchmarking pretty structures until the wet-lab hit rates catch up?

u/XpertAI — 18 hours ago

PXDesign from ByteDance Seed looks surprisingly good for general de novo protein binder design

Paper: PXDesign: Fast, Modular, and Accurate De Novo Design of Protein Binders
Repo: PXDesign

I know some people may be skeptical because PXDesign comes from ByteDance Seed, but for general de novo protein binder design, this is honestly one of the more impressive and practical papers I’ve seen, while researching de novo design.

To be clear, I’m not talking about antibody-specific design or niche antibody engineering tasks. I mean general protein binder generation against protein targets like the red protein in the image above..

The main claim is strong: PXDesign reports 20–73% nanomolar binder hit rates across five of six tested targets, with wet-lab validation on IL-7RA, SARS-CoV-2 RBD, PD-L1, TrkA, VEGF-A, and TNF-α. It combines two parts:

  1. PXDesign-d, a diffusion-based generator for making candidate binders.
  2. PXDesign-h, a hallucination/optimization-based approach using Protenix-style structure prediction.

The diffusion model seems to be the real workhorse. It is fast, generates structurally diverse binders, and appears better suited for large-scale exploratory campaigns than slower hallucination methods. They also put a lot of effort into filtering and ranking, comparing AF2-style filters with Protenix-based filters, and showing that Protenix often improves enrichment and ranking.

What I like most is that this is not just another “we generated nice-looking structures” paper. They actually test designs experimentally, report hit rates, compare against methods like AlphaProteo, RFDiffusion, Chai, and Latent-X, and release a benchmarking framework.

The important caveat is that this is not peer-reviewed. Also, TNF-α failed, and the authors are pretty open about limitations in filtering thresholds, dataset sparsity, and experimental throughput.

But overall, for de novo protein binder design, PXDesign looks strong. I would not treat it as a universal solution, and I would not use it as an antibody design tool, but for general binder generation it seems very reliable and worth paying attention to.

u/XpertAI — 2 days ago

Absci releases Origin-1 paper (updated)+ repo for de novo antibody design against “zero-prior” epitopes

Paper: Origin-1: a generative AI platform for de novo antibody design against novel epitopes

Repo: Origin-1

The main idea is that Origin-1 designs antibodies against zero-prior epitopes, meaning target sites where there are no prior antibody-antigen complex structures or close structural templates available.

From the paper, Origin-1 is presented as a two-stage design-and-score system. The design side, AbsciGen, generates antibody-antigen complexes and designs paired heavy/light chain CDR sequences against a specified epitope. The scoring side, AbsciBind, then filters candidates using co-folding-based scoring and developability criteria before anything is tested experimentally.

In other words, the platform is not just generating antibody sequences. It is trying to generate an epitope-specific binding pose, design the antibody CDRs around that pose, and then rank/filter the designs before wet-lab validation.

In the paper, they tested the system across 10 human protein targets and report validated antibodies for 4 targets: COL6A3, AZGP1, CHI3L2, and IL36RA.

They also report structural validation for two of the designs. For COL6A3 and AZGP1, cryo-EM structures matched the designed binding modes at 3.0-3.1 Å resolution, with reported DockQ scores of 0.73-0.83.

For IL36RA, they went further and used AI-guided affinity maturation to improve the binder into a functional antagonist, reporting 104 nM potency.

The repo includes supporting study data, including in silico and in vitro results, SPR data, and generated computational models. But the model itself is not being released. This is an open data/results release around a proprietary antibody design platform, not an open-source model release with weights and inference code.

reddit.com
u/XpertAI — 5 days ago

AFSample2T is out: making AlphaFold2 more useful for GPCR virtual screening

Repo: AFSample2
Paper: Improving AlphaFold2 Performance in Virtual Screens Targeting GPCRs by Enhancing Binding-Site Conformational Sampling

The AFSample2T paper is finally out, and it is a cool example of where AI protein modelling is heading.

Vanilla AlphaFold2 is great at predicting a likely protein structure, but it usually collapses toward one dominant conformation. For GPCR drug discovery, that can be limiting because the binding pocket is flexible, and small local changes can strongly affect docking and virtual screening results.

AFSample2T tackles this by using targeted MSA masking around the binding site. Instead of perturbing the whole protein, it selectively weakens the evolutionary signal near the pocket, pushing AF2 to sample more ligand-compatible GPCR conformations.

So the difference is basically:

Vanilla AF2: predicts one likely structure.
AFSample2T: samples alternative binding-site conformations for better virtual screening.

That matters because drug discovery needs more than static structures. It needs useful receptor states.

u/XpertAI — 5 days ago

OpenBind just released its first public structure-affinity dataset, and they are also teasing OpenBind-1.

Blog: OpenBind’s first release: A structure–affinity dataset for structure-based AI

Dataset: OpenBind

The release includes 925 crystallographic binding events from 699 compounds, with affinity measurements for 601 compounds, focused on EV-A71 / CVA16 2A protease.

What makes this interesting is that it is not just another protein-ligand benchmark scraped from public structures. It is a dense experimental campaign where structures and binding measurements are linked across a single target system.

That feels pretty valuable for AI protein-ligand modelling because a lot of current methods still struggle with real structure-based design problems: receptor state choice, cross-docking, affinity trends, and whether models actually understand local SAR instead of just memorising near-neighbour structures.

They are also teasing OpenBind-1, a predictive model trained using the UK’s Isambard-AI compute cluster. The whole project is open science/open access, which makes it much more useful for benchmarking, fine-tuning, and community testing.

reddit.com
u/XpertAI — 7 days ago

Has anyone here tried Pro-1?

Blog: Fully open source reasoning model (8b and 70b) trained using GRPO towards a physics based reward function for protein stability.

Repo: Pro1

I feel like it is pretty underrated, especially because it takes a different approach from a lot of protein design models.

Most tools in this space are either sequence-only, structure-prediction-first, or diffusion-based. Pro-1 is interesting because it uses an LLM-style reasoning loop with a physics-based reward signal, mainly around Rosetta stability scoring.

We have been testing it for antibody design, and the useful part is that it can reason over the sequence, structural context, prior mutations, and design goals before suggesting changes. That makes it feel less like a black-box generator and more like a guided protein engineering assistant.

The downside is that it is heavy computationally, especially if you want to run the full loop properly with structure prediction/scoring. It is not a lightweight “generate 100 sequences instantly” kind of model.

Still, I think the direction is underrated: using language models to reason through protein engineering decisions, then grounding them with physics-based scoring.

Curious if anyone else has tested it, especially for antibodies or enzyme stability.

u/XpertAI — 7 days ago

Came across this repo: ProSolNet

It looks like ProSolNet predicts protein solubility using multimodal features from sequence, structure, graph-based representations, and surface features. There is also a ProSolNet_mut model for predicting solubility changes caused by mutations.

The method looks interesting, especially because it combines 3D structure and surface features rather than only sequence embeddings.

I could not find an associated paper, preprint, DOI, or benchmark write-up linked in the repo. Has anyone here tried it, or know whether there is a manuscript behind it?

Also curious how people think this compares to tools like Protein-Sol, SoluProt, ProSol-Multi, ProtSolM, etc.

u/XpertAI — 9 days ago

Paper: Zero-shot design of a de novo metalloenzyme

Repo: dEVA

Interesting new bioRxiv preprint from El Nesr et al. on dEVA, a multi-objective protein design framework for de novo enzyme design.

What makes this paper interesting is that the authors do not just introduce a design algorithm. They use it to design a functional enzyme.

Their main example is desB, a de novo bi-zinc metalloenzyme designed to catalyze phosphate ester hydrolysis. This is a hard design problem because you are not only asking for a folded protein. You need the active site geometry, metal coordination, sequence, side chains, and catalytic environment to all work together.

That is where dEVA comes in.

Instead of optimizing designs against one score or doing sequential filtering, dEVA evolves a population of protein candidates across multiple objectives. In this case, the authors optimized for sequence/side-chain compatibility using LigandMPNN and for a catalytically competent zinc coordination sphere using Metal3D-Cat.

The result was desB, which the paper reports can hydrolyze both phosphomonoesters and phosphodiesters, with catalytic efficiency up to 1500 M⁻¹s⁻¹ and a rate enhancement up to 3 × 10¹³. Importantly, this was done zero-shot, without directed evolution, natural templates, predefined motifs, or evolutionary information.

I think the useful takeaway for researchers is not only “they designed a metalloenzyme.”

It is that dEVA gives a framework for protein design problems where multiple constraints have to be satisfied at the same time. For example, you could imagine adapting the same idea to optimize for binding geometry, active-site placement, stability, ligand coordination, catalytic motifs, or other custom scoring functions depending on the research problem.

That feels especially relevant for AI protein design, because many real applications are not single-objective. A model can generate something that looks structurally plausible, but function usually depends on several biochemical constraints being compatible at once.

Curious what people here think:

Is evolutionary multi-objective optimization likely to become a standard layer on top of generative protein models, especially for functional design tasks like enzymes?

u/XpertAI — 10 days ago

TL;DR: A lot of protein-protein interaction work still comes down to three questions: what does the complex look like, does it seem energetically believable, and which interface residues matter most? There are now far more tools for each step, so I’m curious what people here trust most in practice.

When I think about working on a protein-protein complex, I still break it into those three layers. First I need a structure or complex model I can inspect. Then I want some sense of whether the interaction looks stable enough to take seriously. Then I want to know which residues are doing most of the work at the interface, especially if the goal is mutagenesis, binder design, or understanding where the leverage is.

The older workflow for that was pretty straightforward. Homology modelling or template-based modelling for structure, molecular dynamics for energetics, and then feature-based or ML-style hot spot prediction on top. I still think that stack makes sense because each step answers a different question.

What has changed is the toolset. For structure, I assume a lot of people now start with AlphaFold 3, Chai-1, Boltz-2, OpenFold, or RosettaFold-style tools before touching older homology workflows. For energetics, I still see people use OpenMM, GROMACS, MM/PBSA-style calculations, or just MD as a sanity check after prediction. And for hot spots, it feels much less standardized. Some people still rely on alanine-scanning logic or Rosetta-style interface analysis, while others use newer ML predictors.

My bias is that the newer structure stack is clearly better, but the rest of the workflow still matters. For PPIs especially, I care just as much about what people use after the structure step.

So I’m curious, what tools would you pick for each layer today: structure prediction, binding or energetics, and hot spot prediction?

References:

Abramson, J. et al. (2024) ‘Accurate structure prediction of biomolecular interactions with AlphaFold 3’, Nature, 630, pp. 493–500. doi: 10.1038/s41586-024-07487-w.

Chen, Y.C. et al. (2024) ‘PPI-hotspotID for detecting protein-protein interaction hot spots from the free protein structure’, eLife, 13, RP96643. doi: 10.7554/eLife.96643.3.

u/XpertAI — 11 days ago

TL;DR: Target identification is starting to feel like a data-convergence problem: expression, genetics, prior biology, and perturbation signals all pointing toward the same protein. But once that happens, the next computational question is different: what do you build around that target? That is where the field seems to be splitting into different generative stacks, from RFdiffusion and BindCraft for binder design to newer systems like BoltzGen and ODesign that push toward broader all-atom or multimodal interaction design.

What I like about this framing is that target ID is not really the finish line anymore. It is the handoff. Once a target candidate survives the biology, you still have to decide what modality makes sense and what design engine you trust. For a while that mostly meant more classical workflows: inspect the structure, look for pockets, do docking, engineer around known scaffolds, maybe try local sequence design. I still think that older stack matters because it is easier to interpret and easier to debug when something fails.

What feels different now is that the generative tools are getting much closer to the actual discovery step. RFdiffusion made it feel realistic to generate proteins and binders directly from structural constraints, and BindCraft pushed that further into a more usable binder-design workflow. BoltzGen and ODesign are the examples I find most interesting right now because they seem to be reaching for a bigger design space instead of staying inside one modality. That feels more aligned with where AI drug discovery is heading: not just finding the right target, but deciding what kind of molecule you should build against it in the first place.

My bias is still that the newer stack is better, but only if it helps with real decisions. Not just generating something impressive-looking, but helping answer whether a target is better approached with a miniprotein, peptide, antibody-like binder, or something else entirely.

What are people here using after target ID lands on a protein: RFdiffusion, BindCraft, BoltzGen, ODesign, or something else? And which one has felt most useful in practice?

References:

Watson, J.L. et al. (2023) ‘De novo design of protein structure and function with RFdiffusion’, Nature, 620, pp. 1089–1100. doi: 10.1038/s41586-023-06415-8.

Pacesa, M. et al. (2025) ‘One-shot design of functional protein binders with BindCraft’, Nature, 646(8084), pp. 483–492. doi: 10.1038/s41586-025-09429-6.

Stark, H. et al. (2025) ‘BoltzGen: Toward Universal Binder Design’, bioRxiv [Preprint], 24 November. doi: 10.1101/2025.11.20.689494.

Zhang, O. et al. (2025) ‘ODesign: A World Model for Biomolecular Interaction Design’, arXiv preprint, arXiv:2510.22304.

u/XpertAI — 12 days ago

Paper: Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators
Repo: PLaTITO

The paper introduces PLaTITO, a generative molecular dynamics approach that conditions transferable implicit transfer operators on protein language model embeddings.

The main idea is pretty neat: instead of learning MD surrogates only from structural or trajectory data, the model also uses sequence-level representations from protein language models to improve transfer to unseen proteins.

A few things that stood out:

  • It targets the generalization problem in generative MD, where models often work well on known systems but struggle out-of-distribution.
  • The authors report that coarse-grained TITO models are more data-efficient than Boltzmann Emulators.
  • Adding pLM embeddings improves OOD generalization, including on fast-folding protein benchmarks.
  • The repo includes pretrained checkpoints, a quick-start notebook, trajectory generation scripts, and instructions for reproducing the paper results.

This feels like an interesting direction for protein simulation: using foundation model embeddings not just for structure/function prediction, but as conditioning signals for dynamics.

Curious what people think: are pLM-conditioned generative MD models likely to become practically useful for protein engineering workflows, or are we still mostly in benchmark territory?

u/XpertAI — 12 days ago

Paper: The past, present and future of de novo protein design

I thought it was worth discussing here because it is not just another “AI for proteins is getting better” article.

The review basically divides the field into a few design frontiers:

Designing new folds and assemblies:

This is probably the most mature part of de novo design now. The review goes from early examples like Top7, one of the classic atomic-level de novo protein fold designs, to designed TIM barrels, repeat proteins, symmetric nanoparticles, 1D fibres, 2D lattices and even 3D protein assemblies.

What is interesting is how broad this category has become. It is not just soluble globular proteins anymore. The examples include transmembrane beta barrels, designed conducting nanopores, bottom-up designed Ca²⁺ channels, designed voltage-gated anion channels, and mechanically coupled axle-rotor protein assemblies.

The field has also moved into application-driven assemblies, like designed protein nanoparticle vaccines for MERS-CoV and influenza, pH-responsive antibody nanoparticles and designed protein crystals.

So the question is becoming less “can we make a protein fold?” and more “what architecture should we make for a useful biological, therapeutic or material function?”

Designing protein binders:

This is where the AI protein design hype feels most justified. The review highlights workflows based around RFdiffusion-style backbone generation, ProteinMPNN-style sequence design and structure prediction, but the examples are what make it convincing.

There are designed binders for viral targets, including picomolar SARS-CoV-2 miniprotein inhibitors, designed miniproteins against MERS-CoV, RSV immunogen design and inhibitors targeting SARS-CoV-2 Omicron variants.

It also points to newer general workflows like BindCraft, one-shot functional protein binder design, and examples where de novo designed proteins neutralize snake venom toxins.

The review also includes antibody and peptide-like directions: de novo antibody design with SE(3) diffusion, RFdiffusion-based antibody design, beta-pairing targeted binder design and de novo protein-binding macrocycles.

That does not mean every binder works, or that affinity, specificity, expression and developability are solved. But the framing is that protein-target binder design is moving from a heroic custom project toward a more generalizable workflow. That is a big deal for therapeutics, diagnostics, target validation and synthetic biology, because binders are basically programmable biological handles.

Small-molecule binders and enzymes are harder:

This part is interesting because the review is much more cautious. Binding a protein surface is one thing. Designing a precise pocket for a small molecule, or designing an enzyme that stabilizes a high-energy transition state, is much harder.

The examples they show include designed binders for small molecules like apixaban, methotrexate, cholic acid, digoxigenin and cortisol, plus drug-binding proteins designed with predictable binding energy and specificity.

For enzymes, the review points to progress in designed luciferases, serine hydrolases, heme enzymes, porphyrin-containing catalysts, artificial metathases and metallohydrolases.

But this still feels like one of the major unsolved frontiers. We can now make structures that look right, and sometimes bind the right ligand, but getting strong catalytic activity, specificity and evolvability is still difficult. Catalysis is not just shape complementarity. It requires geometry, dynamics, electrostatics, proton transfer, transition-state stabilization and sometimes conformational changes all working together.

The next step is dynamic proteins, switches and nanomachines:

The most exciting section to me is the future-looking one. Static structure design is becoming powerful, but biology is full of proteins that move, switch, sense, gate, assemble, disassemble and couple one event to another.

The review gives examples like modular and tunable protein biosensors, sensors for endogenous Ras activity, bioactive protein switches, designed protein logic for targeting cells with combinations of surface antigens, small-molecule safety switches for CAR-T cells, stimulus-responsive two-state hinge proteins and deep-learning-guided design of dynamic proteins.

So the next challenge is not just designing a stable object. It is designing a system with multiple states and controlled transitions between them. That includes biosensors, logic gates, responsive materials, designed channels, artificial photosystems and eventually protein systems that perform functions nature never evolved.

My takeaway: the field is moving from designing shapes to designing behavior.

The review’s most important point, in my opinion, is that protein design is becoming less about proving that de novo design is possible and more about deciding what we should actually build.

Curious what people here think: Are we actually close to “solving” protein binder design, or is that still too optimistic?

And for the next phase, do you think the bigger breakthrough will come from better generative models, better experimental feedback loops, or better physical modeling of dynamics/catalysis?

u/XpertAI — 12 days ago

Repo: BAGEL

Paper: BAGEL: Protein engineering via exploration of an energy landscape

Came across BAGEL, a new open-source framework for programmable protein engineering, and thought it was worth discussing here.

The basic idea is pretty intuitive: instead of treating protein design as a fixed pipeline, BAGEL lets you define a protein engineering problem as an “energy landscape” and then search through sequence space using modular objectives.

What I like about BAGEL is that it does not seem to be trying to be one model that solves everything. It is more like a framework where you can combine different design pressures:

  • “Make this region structurally confident”
  • “Bring these two protein regions into contact”
  • “Avoid exposed hydrophobic patches”
  • “Preserve this catalytic motif”
  • “Bind this target but not that off-target”
  • “Generate diverse variants while keeping the functional site intact”

That makes it quite interesting for protein engineering problems, because many useful designs are not single-objective. You rarely just want “a folded protein.” You want something that folds, binds, avoids a close homolog, keeps a motif, stays soluble, and gives you enough sequence diversity to test.

Some examples where BAGEL could be useful:

Peptide binder design Designing short binders against a target protein, especially when you want to bias the search toward confident interfaces.

Targeting disordered regions This is a cool one. BAGEL was shown on intrinsically disordered epitopes, where a designed binder can potentially stabilize a region that is otherwise flexible or poorly structured. That could be relevant for targets that do not present a nice rigid pocket.

Species-selective or state-selective binders The multi-state setup is probably one of the strongest parts. You can optimize one sequence to bind one target while avoiding another similar target. In principle, this could be useful for cross-reactivity design, species selectivity, off-target avoidance, or model-organism-to-human translation.

Enzyme variant generation Another useful application is exploring sequence diversity while preserving catalytic residues or an active-site geometry. That is interesting as a way to seed directed evolution with more focused libraries instead of mutating blindly.

Custom design campaigns Since the framework is modular, you could imagine plugging in future folding models, embedding models, interface predictors, solubility predictors, or custom lab-derived scoring functions.

Curious what people here think: is this kind of modular energy-landscape approach the right direction for practical AI protein engineering, or do you think end-to-end generative models will dominate?

u/XpertAI — 13 days ago

Repo: SimpleFold
Paper: SimpleFold: Folding Proteins is Simpler than You Think

Apple released SimpleFold, and I think it is worth discussing because it is doing something quite different from the usual AlphaFold-style models.

The interesting part is not just “Apple made a protein model.”

The interesting part is that SimpleFold asks a pretty direct question: How much protein-specific architecture do we actually need for folding?

Most modern structure predictors are built around a lot of specialised machinery: pair representations, triangle attention, recycling, diffusion modules, complex-specific handling, templates, MSAs, ligand representations, confidence heads, and so on.

SimpleFold goes in the opposite direction.

It uses a general-purpose transformer and a flow-matching generative objective to predict protein structures. In simpler terms: it starts from noisy coordinates and learns how to move them toward a plausible folded structure.

That makes it feel less like “AlphaFold with a few changes” and more like a simplified generative folding system.

And I think that is why the model is interesting.

Not because it replaces AlphaFold3, Boltz-2, Chai-1, or Protenix-v2.

But because it suggests that for single-chain protein folding, a simpler scalable architecture might go surprisingly far.

On speed, SimpleFold can be very efficient, especially for long proteins and smaller checkpoints, because it avoids some expensive AlphaFold-style components. But the largest 3B model with many sampling steps is not automatically faster than everything else.

On accuracy, it is competitive for monomer folding, but it does not clearly beat AlphaFold2 across benchmarks. The impressive part is that it gets close with a much simpler architecture.

Where I would use it:

  • folding single protein chains
  • generating multiple plausible conformations
  • testing local inference, especially on Apple Silicon
  • benchmarking simpler generative folding approaches

Where I would not use it first:

  • protein-ligand binding
  • antibody-antigen complexes
  • protein-protein interface modelling
  • DNA / RNA / protein complexes
  • binding affinity prediction

For those, I would still reach for AF3, Chai-1, Boltz-2, or Protenix-v2 depending on the task.

SimpleFold is narrower, but that is also what makes it interesting.

It is not trying to solve every biomolecular modelling problem. It is asking whether protein folding itself can be done with less specialised machinery than we thought.

Curious what people think: is this just a clean research model, or does it point toward the next generation of simpler folding architectures?

u/XpertAI — 13 days ago