u/Androo_94 — reddlx

Hi everyone!

I was the freelancer, who made a post a few days ago, with my questions about the temporal "barcode" on the weight matrix. I wanted to share some updates on my project and the progress I've made. I have now provided the run with more informative plots for a better view of the network.

So, the recap:

I built an SNN with the following specs:

Architecture: 2x256 hidden layers. AdLIF neurons, feed-forward.

Encoding: Pure Latency Coding (Time-to-First-Spike). I moved away from Poisson to capture the temporal structure of the input.

Learning Rule: Purely local STDP for the internal layers. Weight clamp(0.001 to 1.0), Synaptic Scaling: multiplicative normalization (L1-norm based). "Surprise-driven" learning rate gating mechanism for the STDP.

Readout: A simple linear readout head with a leaky lowpass filter. This is the only part of the system that uses the optimizer for supervised classification.

Dataset: MNIST (0-9 digits, 28x28).

The learning rates was 2e-2 for the STDP and 2.0 for the linear readout.

And now, my project progression update:

First I replaced the STDP learning rate gating mechanism with a more refined accuracy-driven gating mechanism. I found it unstable to use the readout loss as a gating mechanism, so I switched to an accuracy-driven gating mechanism. If the accuracy is dropping, the gate opens up, if it is increasing, the gate closes. I set the activation threshold at 80% accuracy.

Then, because i was currious, I did an ablation study on the weight decay parameter that the readout uses. An interesting range was the optimal one, 1e-1. It seems that these strange ranges are the personal perversion of this SNN.

After that I thought: I want to see the run and the logs with my own eyes. Then what I saw, made me think my logger or the system was broken. It's not possible, I thought, something must have gone wrong, I'm screwing something up. I checked it and there was no error or bug, this was the real performance. I never saw a neural net accuracy so high literaly at the begining of the training:

Steps: 500 | Loss: 1.160 | Acc: 96.40% | Valve: 0.42 | Sp L1: 0.070 | Sp L2: 0.068 | Sp Tot: 0.069 | W-Delta: 26.1120
Steps: 1000 | Loss: 1.122 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.056 | Sp L2: 0.048 | Sp Tot: 0.052 | W-Delta: 8.2499
Steps: 1500 | Loss: 0.914 | Acc: 96.40% | Valve: 0.20 | Sp L1: 0.059 | Sp L2: 0.043 | Sp Tot: 0.051 | W-Delta: 5.6242
Steps: 2000 | Loss: 0.524 | Acc: 96.60% | Valve: 0.20 | Sp L1: 0.053 | Sp L2: 0.035 | Sp Tot: 0.044 | W-Delta: 4.9330
Steps: 2500 | Loss: 0.603 | Acc: 96.00% | Valve: 0.23 | Sp L1: 0.050 | Sp L2: 0.028 | Sp Tot: 0.039 | W-Delta: 6.1300
Steps: 3000 | Loss: 0.476 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.048 | Sp L2: 0.024 | Sp Tot: 0.036 | W-Delta: 4.8336
Steps: 3500 | Loss: 0.624 | Acc: 96.80% | Valve: 0.18 | Sp L1: 0.052 | Sp L2: 0.031 | Sp Tot: 0.041 | W-Delta: 5.4503

Then i let it run for 30000 steps. The last steps:

Steps: 27000 | Loss: 0.361 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8722
Steps: 27500 | Loss: 0.354 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8889
Steps: 28000 | Loss: 0.325 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.7192
Steps: 28500 | Loss: 0.360 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8478
Steps: 29000 | Loss: 0.333 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.7299
Steps: 29500 | Loss: 0.362 | Acc: 96.20% | Valve: 0.22 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.6118
Steps: 30000 | Loss: 0.353 | Acc: 96.40% | Valve: 0.21 | Sp L1: 0.040 | Sp L2: 0.016 | Sp Tot: 0.028 | W-Delta: 0.8433

It turns out that when you push the system to its mathematical extremes, the "dead end" of neuromorphic learning, because this is what a lot of people says about neuromorphic learning, might not be so dead after all.

Key takeaways from this run:

Instant Lock-in: The network reached 96.40% accuracy by step 500. It didn't just learn; it practically "recognized" the patterns immediately. Although there was a slight fluctuation in accuracy, the network was incredibly confident. Accuracy was over 96% throughout the entire run.

Structural Purge: By applying a brutal weight decay combined with fast learning rates, the network became extremely sparse. L1 sparsity dropped to 4% and L2 sparsity dropped to 1.6%, leaving only the most critical, high-precision synapses alive. Like a ruthless digital Darwinism.

Temporal Stability: Despite the discrete 1.0ms timesteps, the 25ms integration window and Latency Coding created a causal chain so robust that the confusion matrix is almost a perfect diagonal.

New adaptive gating: I refined the "Learning Valve" to be accuracy-driven. As it turned out, this was a good decision. As soon as the network hit the 96% mark, the valve throttled down learning to 0.20, effectively freezing the successful internal state.

It is fascinating to see how biological principles (STDP, latency, etc.) combined with “cruel” mathematical constraints can create such efficiency and how they can outperform complex surrogate gradient methods in this task, at least in speed, as we can see.

Now imagine the learning speed and efficiency of this if it ran on real neuromorphic hardware, rather than Neumann hardware, although the entire 30,000 steps took no more than ~30 seconds on a laptop processor.

However, I'm having a bit of trouble with the receptive field interpretation. According to them, since the dark spot is in the middle, they don't look at the digit, but at the outline of the digit and its surroundings. Did the neurons learned to recognize the silhouette? Am I understanding it correctly?

I remain open to any exchange of ideas, criticism, or explanation that I can learn from.

u/Androo_94 — 6 days ago

▲ 4 r/neuro

Hi everyone!

So, the recap:

I built an SNN with the following specs:

Architecture: 2x256 hidden layers. AdLIF neurons, feed-forward.

Encoding: Pure Latency Coding (Time-to-First-Spike). I moved away from Poisson to capture the temporal structure of the input.

Readout: A simple linear readout head with a leaky lowpass filter. This is the only part of the system that uses the optimizer for supervised classification.

Dataset: N-MNIST (0-9 digits, 28x28).

The learning rates was 2e-2 for the STDP and 2.0 for the linear readout.

And now, my project progression update:

Then i let it run for 30000 steps. The last steps:

Key takeaways from this run:

Temporal Stability: Despite the discrete 1.0ms timesteps, the 25ms integration window and Latency Coding created a causal chain so robust that the confusion matrix is almost a perfect diagonal.

Now imagine the learning speed and efficiency of this if it ran on real neuromorphic hardware, rather than Neumann hardware, although the entire 30,000 steps took no more than 30 seconds on a laptop processor.

Of course, I remember the constructive criticism of "receptive field", I haven't implemented it yet, but it's not forgotten.

u/Androo_94 — 7 days ago

▲ 13 r/neuro

Hi everyone!

First off, I’m not a professional CompNeuro researcher—just a very enthusiastic and somewhat obsessed freelancer diving deep into the neuromorphic direction. I’ve been spending my time reading studies and conducting my own research, and I’ve reached a point where the results honestly blew me away. I’d love to get some expert eyes on this.

I built an SNN with the following specs:

Architecture: 2x256 hidden layers.
Encoding: Pure Latency Coding (Time-to-First-Spike). I moved away from Poisson to capture the temporal structure of the input.
Learning Rule: Purely local STDP for the internal layers.
Readout: A simple linear readout head with a leaky lowpass filter. This is the only part of the system that uses the Adam optimizer for supervised classification. I used readout loss-dependent gating for the STDP learning rate (surprise-based learning) as neuromodulation, although it was open throughout the run.
Dataset: N-MNIST (0-9 digits, 28x28).

After a logarithmic grid search for stability and jitter, I found a configuration that produced these metrics:

Peak Accuracy: 89.40% (at step 20,500).
Emergent Sparsity: I didn't enforce a specific sparsity level, but the L2 layer self-organized into an extreme 0.75% activity rate (0.0075 sparsity). And the L1 produces a 4.0% (0.04) sparsity.
Stability: Even with a high readout learning rate, the system remained mathematically stable with a Stability Score of 0.6703 and a Jitter (Std) of 0.0611.

The learning rates was 2e-2 for the STDP and 2.0 for the linear readout.

My Questions to the Experts:

The most fascinating part for me is the attached weight map. The L1 weights developed these incredibly sharp, vertical "barcode-like" patterns. To my eyes, it looks like the network physically carved out specific temporal windows to respond to the latency-encoded input.

As someone coming from a non-academic background, I find this visual structure beautiful, but I want to understand it better:

Biological Analogy: How does this vertical stripping correlate with biological visual processing (like the optic nerve or V1)? Is this a known phenomenon when STDP meets latency coding?
Sparsity: Is a 0.75% emergent sparsity typical for these types of networks, or is my "Readout Loss" acting as a pressure cooker that forces this extreme efficiency?

I’m genuinely impressed by how the math and the biological inspiration converged here. Looking forward to your insights and critiques!

Postscript: I used LLM to write and format this post because English is not my native language and I didn't want you to get annoyed by my possibly bad English.

u/Androo_94 — 9 days ago