




Hi everyone!
I was the freelancer, who made a post a few days ago, with my questions about the temporal "barcode" on the weight matrix. I wanted to share some updates on my project and the progress I've made. I have now provided the run with more informative plots for a better view of the network.
So, the recap:
I built an SNN with the following specs:
Architecture: 2x256 hidden layers. AdLIF neurons, feed-forward.
Encoding: Pure Latency Coding (Time-to-First-Spike). I moved away from Poisson to capture the temporal structure of the input.
Learning Rule: Purely local STDP for the internal layers. Weight clamp(0.001 to 1.0), Synaptic Scaling: multiplicative normalization (L1-norm based). "Surprise-driven" learning rate gating mechanism for the STDP.
Readout: A simple linear readout head with a leaky lowpass filter. This is the only part of the system that uses the optimizer for supervised classification.
Dataset: MNIST (0-9 digits, 28x28).
The learning rates was 2e-2 for the STDP and 2.0 for the linear readout.
And now, my project progression update:
First I replaced the STDP learning rate gating mechanism with a more refined accuracy-driven gating mechanism. I found it unstable to use the readout loss as a gating mechanism, so I switched to an accuracy-driven gating mechanism. If the accuracy is dropping, the gate opens up, if it is increasing, the gate closes. I set the activation threshold at 80% accuracy.
Then, because i was currious, I did an ablation study on the weight decay parameter that the readout uses. An interesting range was the optimal one, 1e-1. It seems that these strange ranges are the personal perversion of this SNN.
After that I thought: I want to see the run and the logs with my own eyes. Then what I saw, made me think my logger or the system was broken. It's not possible, I thought, something must have gone wrong, I'm screwing something up. I checked it and there was no error or bug, this was the real performance. I never saw a neural net accuracy so high literaly at the begining of the training:
Steps: 500 | Loss: 1.160 | Acc: 96.40% | Valve: 0.42 | Sp L1: 0.070 | Sp L2: 0.068 | Sp Tot: 0.069 | W-Delta: 26.1120
Steps: 1000 | Loss: 1.122 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.056 | Sp L2: 0.048 | Sp Tot: 0.052 | W-Delta: 8.2499
Steps: 1500 | Loss: 0.914 | Acc: 96.40% | Valve: 0.20 | Sp L1: 0.059 | Sp L2: 0.043 | Sp Tot: 0.051 | W-Delta: 5.6242
Steps: 2000 | Loss: 0.524 | Acc: 96.60% | Valve: 0.20 | Sp L1: 0.053 | Sp L2: 0.035 | Sp Tot: 0.044 | W-Delta: 4.9330
Steps: 2500 | Loss: 0.603 | Acc: 96.00% | Valve: 0.23 | Sp L1: 0.050 | Sp L2: 0.028 | Sp Tot: 0.039 | W-Delta: 6.1300
Steps: 3000 | Loss: 0.476 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.048 | Sp L2: 0.024 | Sp Tot: 0.036 | W-Delta: 4.8336
Steps: 3500 | Loss: 0.624 | Acc: 96.80% | Valve: 0.18 | Sp L1: 0.052 | Sp L2: 0.031 | Sp Tot: 0.041 | W-Delta: 5.4503
Then i let it run for 30000 steps. The last steps:
Steps: 27000 | Loss: 0.361 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8722
Steps: 27500 | Loss: 0.354 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8889
Steps: 28000 | Loss: 0.325 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.7192
Steps: 28500 | Loss: 0.360 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8478
Steps: 29000 | Loss: 0.333 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.7299
Steps: 29500 | Loss: 0.362 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.6118
Steps: 30000 | Loss: 0.353 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8433
It turns out that when you push the system to its mathematical extremes, the "dead end" of neuromorphic learning, because this is what a lot of people says about neuromorphic learning, might not be so dead after all.
Key takeaways from this run:
Instant Lock-in: The network reached 96.40% accuracy by step 500. It didn't just learn; it practically "recognized" the patterns immediately. Although there was a slight fluctuation in accuracy, the network was incredibly confident. Accuracy was over 96% throughout the entire run.
Structural Purge: By applying a brutal weight decay combined with fast learning rates, the network became extremely sparse. L1 sparsity dropped to 4% and L2 sparsity dropped to 1.6%, leaving only the most critical, high-precision synapses alive. Like a ruthless digital Darwinism.
Temporal Stability: Despite the discrete 1.0ms timesteps, the 25ms integration window and Latency Coding created a causal chain so robust that the confusion matrix is almost a perfect diagonal.
New adaptive gating: I refined the "Learning Valve" to be accuracy-driven. As it turned out, this was a good decision. As soon as the network hit the 96% mark, the valve throttled down learning to 0.20, effectively freezing the successful internal state.
It is fascinating to see how biological principles (STDP, latency, etc.) combined with “cruel” mathematical constraints can create such efficiency and how they can outperform complex surrogate gradient methods in this task, at least in speed, as we can see.
Now imagine the learning speed and efficiency of this if it ran on real neuromorphic hardware, rather than Neumann hardware, although the entire 30,000 steps took no more than ~30 seconds on a laptop processor.
However, I'm having a bit of trouble with the receptive field interpretation. According to them, since the dark spot is in the middle, they don't look at the digit, but at the outline of the digit and its surroundings. Did the neurons learned to recognize the silhouette? Am I understanding it correctly?
I remain open to any exchange of ideas, criticism, or explanation that I can learn from.