u/BloodyBBenzene

Image 1 —
Image 2 —

I was exploring Convolutional Neural Networks (CNNs) in more depth and I had an interesting idea of making a dependency free, header only cnn library for C++20.

I did some research and found out about tiny-dnn which is a cnn library for c++14, super fast but the developers stopped updating it back in 2016, so I decided to take on a challenge to make my own CNN library from scratch for c+ +20 with extreme performance tuning for CPU, and I did achieve close to what I was expecting.

I benchmarked with "pytorch" and the results were good enough to post, I have documented about the library here along with the benchmark results. At some instances it outperformed pytorch and I was shocked too.

Documentation- "https://Inkd.in/gNFF74JJ"

To get a rough idea on how fast is my engine it goes 97.51% accuracy on mnist dataset in just 25 seconds of training with a throughput of 2k+ images / second.

processor - Ryzen 7 5800H mobile

For overview -

My engine uses DAG layout

It has Zero Allocation

Multithreading Support

L1/L2 Cache Optimization

and a lot of internal stuffs going on, here is the repository link-

"https://github.com/KunwarPrabhat/CustomCNN"

My engine is still in its early stage so there are alot of things that can be fixed I need more devlopers to contribute if they're interested in it :)) Here are two side by side benchmark.

u/BloodyBBenzene — 14 days ago
▲ 40 r/cpp

I was exploring Convolutional Neural Networks (CNNs) in more depth and I had an interesting idea of making a dependency free, header only cnn library for C++20.

I did some research and found out about tiny-dnn which is a cnn library for c++14, super fast but the developers stopped updating it back in 2016, so I decided to take on a challenge to make my own CNN library from scratch for c+ +20 with extreme performance tuning for CPU, and I did achieve close to what I was expecting.

I benchmarked with "pytorch" and the results were good enough to post, I have documented about the library here along with the benchmark results. At some instances it outperformed pytorch and I was shocked too.

Documentation- "https://Inkd.in/gNFF74JJ"

To get a rough idea on how fast is my engine it goes 97.51% accuracy on mnist dataset in just 25 seconds of training with a throughput of 2k+ images / second.

processor - Ryzen 7 5800H mobile

For overview -

My engine uses DAG layout

It has Zero Allocation

Multithreading Support

L1/L2 Cache Optimization

and a lot of internal stuffs going on, here is the repository link-

"https://github.com/KunwarPrabhat/CustomCNN"

My engine is still in its early stage so there are alot of things that can be fixed I need more devlopers to contribute if they're interested in it :))

Edit : By zero allocation, I meant to say my engine pre allocates memory so it doesn't have to ask OS for allocation during inference. I am not saying it doesn't use memory at all I am saying it doesn't allocate during runtime, it's basically a pre-allocation.

reddit.com
u/BloodyBBenzene — 14 days ago