u/L42ARO — reddlx

Odyseus - Spatial VLM : Projecting 2D reasoning into 3D outputs (open source repo)

So I've always argued that Physical AI for robotics need actionable outputs like 3D coordinates, not bullet points or nice paragraphs.
So decided to experiment by combining a VLM with Monocular Depth Estimation, essentially projecting 2D reasoning into 3D, I called it Odyseus - Spatial VLM

Tech Stack:
- VLM: Qwen 3.6
- Depth Estimation: Depth Anything 3 - Metric Large

Worked pretty well, figured to share, check repo: https://github.com/MercuriusTech/Odyseus-Spatial-VLM

u/L42ARO — 3 days ago

▲ 254 r/robotics+1 crossposts

Spatial VLM : Projecting 2D reasoning into 3D output (open source demo)

Tech Stack:
- VLM: Qwen 3.6
- Depth Estimation: Depth Anything 3 - Metric Large

Worked pretty well, figured to share, check repo: https://github.com/MercuriusTech/Odyseus-Spatial-VLM

u/L42ARO — 3 days ago

▲ 106 r/robotics+1 crossposts

Hi, so I am having a hard time getting a low distortion camera (around 60deg-90deg FOV), so I was forced to use a wide 160deg fisheye camera. I need it for a vSLAM platform I'm building, but the raw video itself was too distorted for it to be good, so I vibecoded a toolkit to figure out the intrinsic parameters of my camera and be able to undistort the footage. It took me some time, at first the distortion was still there, so I went ahead and created a program that helped me sample ~60 frames with a mini guide on which positions I should record for best results, and yeah it worked, I was able to undistort my video from my 160deg camera, so I figured to share.

I know this ain't nothing new or ground breaking, there are probably tools out there that already do this and I was just too lazy to look them up and set them up, but hey if this turns out helpful for someone besides just me, I'm happy with that.

REPO LINK: https://github.com/L42ARO/Fisheye-Calibration

u/L42ARO — 8 days ago