u/Otaku_7nfy

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability
▲ 17 r/ResearchML+1 crossposts

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability

I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.

I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.

Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule

u/Otaku_7nfy — 8 days ago

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.

I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.

Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule

u/Otaku_7nfy — 8 days ago