
u/oatmealcraving

Healing ReLU neural networks with fast transforms
Lifting the ReLU decisions (converted to 0 or 1) in a layer into a decision matrix D - a ReLU layer then becomes:
DWx
A neural network then might be W₃D₂W₂D₁W₁x.
Wₙ₊₁Dₙ fractures Wₙ₊₁ by column selection.
You can apply some healing balm in the form of the fast Walsh Hadamard Transform (WHT) equivalent matrix H.
Then you have WHD and due to the one-to-all connectivity of H, W is no longer fractured.
In a sense H absorbs the sparsity effects of D prior to the weight matrix.
You needn't fear the spectral bias behavior of fast transforms internally in a neural network. All the math sees is matrices of orthogonal vectors.
At the input and output of the neural network you may have to account for the spectral bias.
Also H is self inverse so to backpropagate through it just apply it.
You can attic rummaging here if you like:
[D] Hash table aspects of ReLU neural networks
If you collect the ReLU decisions into a diagonal matrix with 0 or 1 entries then a ReLU layer is DWx, where W is the weight matrix and x the input.
What then is Wₙ₊₁Dₙ where Wₙ₊₁ is the matrix of weights for the next layer?
It can be seen as a (locality sensitive) hash table lookup of a linear mapping (effective matrix). It can also be seen as an associative memory in itself with Dₙ as the key.
There is a discussion here:
https://discourse.numenta.org/t/gated-linear-associative-memory/12300
The viewpoints are not fully integrated yet and there are notation problems.
Nevertheless the concepts are very simple and you could hope that people can follow along without difficulty, despite the arguments being in such a preliminary state.