

I was exploring Convolutional Neural Networks (CNNs) in more depth and I had an interesting idea of making a dependency free, header only cnn library for C++20.
I did some research and found out about tiny-dnn which is a cnn library for c++14, super fast but the developers stopped updating it back in 2016, so I decided to take on a challenge to make my own CNN library from scratch for c+ +20 with extreme performance tuning for CPU, and I did achieve close to what I was expecting.
I benchmarked with "pytorch" and the results were good enough to post, I have documented about the library here along with the benchmark results. At some instances it outperformed pytorch and I was shocked too.
Documentation- "https://Inkd.in/gNFF74JJ"
To get a rough idea on how fast is my engine it goes 97.51% accuracy on mnist dataset in just 25 seconds of training with a throughput of 2k+ images / second.
processor - Ryzen 7 5800H mobile
For overview -
My engine uses DAG layout
It has Zero Allocation
Multithreading Support
L1/L2 Cache Optimization
and a lot of internal stuffs going on, here is the repository link-
"https://github.com/KunwarPrabhat/CustomCNN"
My engine is still in its early stage so there are alot of things that can be fixed I need more devlopers to contribute if they're interested in it :)) Here are two side by side benchmark.