A visual workspace for "Transformer Surgery": Building, pruning, and exporting hybrid architectures (Gemma 4, Mistral, Llama and more)
I’ve spent a lot of time lately digging into the "surgical" side of LLMs—specifically trying to understand how the internal math changes when you mix architectural concepts, like putting a Llama-style MLP into a Gemma-style soft-capping attention block.
One thing that consistently slows down research is how rigid the standard libraries are. If you want to swap a normalization layer or test a hybrid GQA/SWA (Grouped-Query/Sliding Window) setup, you usually end up monkey-patching deep inside a modeling_xxx.py file or writing one-off scripts that break when you change a hidden dimension.
To solve this for my own research, I built a visual workspace called Neural Playground (part of OLLA) that handles the boilerplate and exports the results as clean, runnable PyTorch code. I’m opening it up for others to use for their own prototyping and architecture experiments.
What you can do with it:
- Deconstruct Model Families: Inspect the exact layer structures of Mistral, Llama, Gemma, and Phi.
- Configure Every Parameter: Directly adjust KV heads, RoPE settings, hidden sizes, and attention variants through the UI.
- Export to PyTorch: Once you’ve designed a hybrid variant, you can export the entire thing as a clean PyTorch project.
- Local Pruning: I’ve also included a one-click local checkpoint pruner with VRAM reporting to see the impact of architectural changes before you even hit
train.
Why I’m sharing this: I’m looking for technical feedback from people who do a lot of model surgery or local deployment. Specifically:
- Are there specific hybrid combinations (like MoE variants) that are currently a pain for you to implement manually?
- What additional "model surgery" tools would be most useful? I'm currently looking at adding Knowledge Distillation support next.
The project is live at: https://olla.work. I’m hoping this helps lower the barrier to entry for custom architecture research and helps people "see" the math behind the layers.