u/ContributionFun3037

I trained a transformer to model cell differentiation as developmental flow instead of static clusters.

Animation of the model’s learned developmental landscape across pseudotime, where nearby cell states gradually organize into branching developmental structures.

I built a small transformer project inspired by Waddington’s epigenetic landscape idea, where the goal was to model developmental progression as probabilistic movement through cell state space across pseudotime.

The setup is pretty simple:

• order cells by pseudotime
• use PCA compressed embeddings as inputs
• train a causal transformer to autoregressively predict future developmental states

Instead of just predicting discrete labels or trajectories, the model learns something closer to developmental flow dynamics. After projecting predictions into 2D space, the learned representation starts producing flow-like trajectories, branching regions, and varying uncertainty across developmental space.

One thing that surprisingly helped a lot was using the actual pseudotime values themselves as positional encodings (through sinusoidal embeddings ie) instead of just sequence order. The predictions became noticeably more stable because the model had some notion of developmental progression itself.

A few visualizations from the learned developmental landscape and predicted flow dynamics:

Learned developmental flow field across pseudotime. Streamlines show predicted movement through developmental state space.

3D developmental landscape inspired by Waddington’s epigenetic landscape framing.

Prediction uncertainty across developmental space. Branching regions became much less certain than stable trajectories.

Fate proportions evolving across pseudotime stages

Also worth mentioning: I used AI pretty heavily for generating and iterating on visualization code. The actual modeling/experiments were manual and done by me, but AI made exploration way faster on the plotting side.

Still very exploratory, and I’m not really from the biology side originally, so there’s definitely a decent chance I’m misunderstanding or reinventing ideas that already exist in single cell literature. Any corrections, criticism, or pointers are genuinely welcome.

Mostly curious whether people think this kind of probabilistic flow framing is actually useful beyond standard trajectory inference / pseudotime methods.

Not sure if posting repo is allowed here(if it is i'll leave the link in the comments)

reddit.com
u/ContributionFun3037 — 3 days ago