u/jowe81

I tried building a simple self-hosted fitness tracker… and it kind of spiraled into this.

It actually started pretty dumb:
I was doing pushups in my basement and thought “couldn’t a camera just count reps and maybe draw a skeleton on top?”

I had played around with face recognition before, and since training isn’t really optional for me (Parkinson), I figured… why not try.

The first PoC was:

Ubuntu 20.04
an old NVIDIA Tesla P4
a single Reolink IP cam

It worked… badly. But enough to get hooked.

Then things escalated:

added more cameras (ended up with 3)
tried doing proper multi-view + 3D reconstruction
spent ~2 weeks in calibration hell (Charuco boards, triangulation, you name it)

At one point I thought I was clever and rotated the cameras 90° to get better vertical resolution.

That decision alone probably cost me several years of life:
cw/ccw confusion, projection errors, reprojection errors… everything was wrong in ways that almost looked right.

Even when pose detection worked perfectly per stream, 3D fusion would just refuse to cooperate.

Also learned the hard way:

cheap IP cams + no real timestamps = synchronization nightmare
Tesla P4 + 3D = technically possible, practically suffering

There was a brief detour with an Insta360 over USB (v4l2)… which was about as stable as you’d expect.

Current setup (less cursed, still questionable life choices):

AMD server + NVIDIA A2
1× Basler 4K industrial cam (side view)
2× IP cams (front)
RTMPose (133 keypoints) + MotionAGFormer (2D→3D)
hybrid multi-view approach with an “anchor stream” + auxiliary views

Now it can (more or less):

track full body (including hands/face)
count reps (state-machine based)
evaluate form (depth, symmetry, tempo, alignment, etc.)
render a live 3D model on the TV
identify the user via face recognition
log everything down to individual reps in SQLite

There’s also a (very early) voice coach and a YAML-based exercise system.

Where I want to take this:

better 3D visualization (SMPL-X instead of current prototype)
more robust scoring (right now it’s still pretty basic)
eventually a “real” coach that adapts workouts based on training history

Also worth mentioning:
Without tools like Codex / Claude I probably wouldn’t have been able to build this at all. This project is way beyond what I could realistically code solo from scratch.

What I’m curious about:

multi-view CV setups: how do you handle sync/calibration reliably in real-world setups?
better approaches for exercise phase detection than simple state machines?
stabilizing 2D→3D lifting in noisy environments
or just general “you’ve gone too far” feedback

Would love to hear thoughts or similar projects.

Multi-camera real-time fitness tracking with RTMPose + 2D→3D lifting (self-hosted project)