u/oatmealcraving

▲ 0 r/deeplearning+1 crossposts

Healing ReLU neural networks with fast transforms

Lifting the ReLU decisions (converted to 0 or 1) in a layer into a decision matrix D - a ReLU layer then becomes:

DWx

A neural network then might be W₃D₂W₂D₁W₁x.

Wₙ₊₁Dₙ fractures Wₙ₊₁ by column selection.

You can apply some healing balm in the form of the fast Walsh Hadamard Transform (WHT) equivalent matrix H.

Then you have WHD and due to the one-to-all connectivity of H, W is no longer fractured.

In a sense H absorbs the sparsity effects of D prior to the weight matrix.

You needn't fear the spectral bias behavior of fast transforms internally in a neural network. All the math sees is matrices of orthogonal vectors.

At the input and output of the neural network you may have to account for the spectral bias.

Also H is self inverse so to backpropagate through it just apply it.

You can attic rummaging here if you like:

https://archive.org/details/@seanc4s

reddit.com
u/oatmealcraving — 10 hours ago
▲ 42 r/MachineLearning+2 crossposts

[D] Hash table aspects of ReLU neural networks

If you collect the ReLU decisions into a diagonal matrix with 0 or 1 entries then a ReLU layer is DWx, where W is the weight matrix and x the input.

What then is Wₙ₊₁Dₙ where Wₙ₊₁ is the matrix of weights for the next layer?

It can be seen as a (locality sensitive) hash table lookup of a linear mapping (effective matrix). It can also be seen as an associative memory in itself with Dₙ as the key.

There is a discussion here:

https://discourse.numenta.org/t/gated-linear-associative-memory/12300

The viewpoints are not fully integrated yet and there are notation problems.

Nevertheless the concepts are very simple and you could hope that people can follow along without difficulty, despite the arguments being in such a preliminary state.

u/oatmealcraving — 3 days ago