r/computervision

Image 1 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 2 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 3 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 4 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 5 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 6 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 7 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 8 — Decade-long project to turn quantum physics&computing math to computer graphics
Image 9 — Decade-long project to turn quantum physics&computing math to computer graphics

Decade-long project to turn quantum physics&computing math to computer graphics

Hi

If you are remotely interested in programming on new computational models, oh boy this is for you. I am the Dev behind Quantum Odyssey (AMA! I love taking qs) - worked on it for about 6 years, the goal was to make a super immersive space for anyone to learn quantum computing through zachlike (open-ended) logic puzzles and compete on leaderboards and lots of community made content on finding the most optimal quantum algorithms. The game has a unique set of visuals capable to represent any sort of quantum dynamics for any number of qubits and this is pretty much what makes it now possible for anybody 12yo+ to actually learn quantum logic without having to worry at all about the mathematics behind.

This is a game super different than what you'd normally expect in a programming/ logic puzzle game, so try it with an open mind.

Stuff you'll play & learn a ton about

  • Boolean Logic – bits, operators (NAND, OR, XOR, AND…), and classical arithmetic (adders). Learn how these can combine to build anything classical. You will learn to port these to a quantum computer.
  • Quantum Logic – qubits, the math behind them (linear algebra, SU(2), complex numbers), all Turing-complete gates (beyond Clifford set), and make tensors to evolve systems. Freely combine or create your own gates to build anything you can imagine using polar or complex numbers.
  • Quantum Phenomena – storing and retrieving information in the X, Y, Z bases; superposition (pure and mixed states), interference, entanglement, the no-cloning rule, reversibility, and how the measurement basis changes what you see.
  • Core Quantum Tricks – phase kickback, amplitude amplification, storing information in phase and retrieving it through interference, build custom gates and tensors, and define any entanglement scenario. (Control logic is handled separately from other gates.)
  • Famous Quantum Algorithms – explore Deutsch–Jozsa, Grover’s search, quantum Fourier transforms, Bernstein–Vazirani, and more.
  • Build & See Quantum Algorithms in Action – instead of just writing/ reading equations, make & watch algorithms unfold step by step so they become clear, visual, and unforgettable. Quantum Odyssey is built to grow into a full universal quantum computing learning platform. If a universal quantum computer can do it, we aim to bring it into the game, so your quantum journey never ends.

PS. We now have a player that's creating qm/qc tutorials using the game, enjoy over 50hs of content on his YT channel here: https://www.youtube.com/@MackAttackx

Also today a Twitch streamer with 300hs in https://www.twitch.tv/beardhero

u/QuantumOdysseyGame — 4 hours ago
🔥 Hot ▲ 68 r/computervision

Real-Time Waste Sorting/Classification using CV

In this use case, the system tackles the slow, dirty, and often dangerous process of manual waste sorting by instantly identifying and segmenting different types of trash. Every piece of garbage moving through the frame is detected and classified into distinct categories like plastic bottles, plastic containers, plastic bags, waste paper etc. Using segmentation masks, the model precisely outlines the boundaries of each item, making it highly effective for environments where waste is clustered or overlapping.

To achieve this level of accuracy, the model leverages RetinaMask, which provides high-fidelity, pixel-level prediction to handle the complex, deformed shapes that crushed bottles and torn plastic bags typically present. Everything overlays live on the video feed to provide a real-time sorting and classification dashboard.

High level workflow:

  • Collected raw video footage of mixed waste including bottles, bags, containers, and paper.
  • Trained a YOLO11 model with a custom augmented dataset (incorporating rotations and flips) to prevent overfitting and ensure robust detection of mangled waste.
  • Implemented RetinaMask logic during inference for precise, high-resolution segmentation masks around complex shapes.
  • Ran inference per frame to get bounding boxes, segmentation masks, and specific class labels (bottles, containers, bags, paper).
  • Visualized the automated classification and segmentation masks as a live overlay on the raw video footage.

This kind of pipeline is useful for recycling center operators, automated waste sorting facilities, robotic sorting pipelines (guiding robotic arms for precise picking), and environmental tech teams looking to prevent contamination in recycling streams.

code: Link
video: Link

u/Full_Piano_3448 — 21 hours ago
Help with a Computer Vision Homework - Homography

Help with a Computer Vision Homework - Homography

I have a homework that consists on me having these following 2 images and, through homography, I have to create a front view of the image and eliminate the person in front of it

https://preview.redd.it/xc9beb5eq4tg1.jpg?width=1920&format=pjpg&auto=webp&s=1bbfb112201d2821aaa541f08a3cd1d035a6ae95

The two images in question

I managed to warp the first photo so both pictures now are in the same plane, pictured below:

https://preview.redd.it/0j3wshsoq4tg1.jpg?width=1920&format=pjpg&auto=webp&s=abc8cac993a36d2a437fd22eb9e3e912c3182dc3

But, I don't really know how to continue from here, I'm not sure how to remove the person from the picture aside from maybe splitting each picture in half and stitching both halves?? But I doubt that's what my professor wants me to do.

And besides, I'm honestly not even completely sure if this photos are actually in a front view perspective, because when I tried comparing them with the actual image that the professor gave us to help, the ones I got still look a bit skewed, and it's not like I can use the solution in order to help get the real coordinates so... I'm a bit lost on what to do.

In case it helps, these are the exact instructions we have:

  1. Writing a program to read JPG images, calculating the homography matrixes between them, and try to project part of them into a front view. Note: the frame of the painting is a circle.

  2. Please manually find at least 5 matching points in both images to find the homography, and eleminate the people to have a clean painting. Finally, please convert into (ex. fill in) a perfect circle. Save your result as a JPG file (named as Student_ID.jpg).

  3. In this homework, you can use any method including third-party lib. to perform, but please do NOT directly use any commercial software to create the image for this assignment.

reddit.com
u/Paco_Alpaco — 7 hours ago

I have developed new way which you can convert a Single Video to 4DGS model and can be viewed as a personal 3D theater. it's 50X smaller than the sequential ones, supports 2M splats per second and native audio

the original video was 47mb and this whole model is 99 MB. and minimal fluctuation even in a multi cut, multi scene 2-minute video. in coming weeks, I'll upload, the demo and the viewer, which I'm working on and is based on Radia gallery. modeling and rendering took me only 24 minutes on a L4. more refinements are coming and upload more examples in future; you can send your videos.

u/ninjawick — 15 hours ago
Help Needed!

Help Needed!

I’m building a vision system to count parts in a JEDEC tray (fixed grid, fixed camera, controlled lighting). Different products may have different package sizes, but the tray layout is known.

Is deep learning (YOLO/CNN) actually better here, or is traditional CV (ROI + threshold/contours) usually enough?

So as a beginner in this field, what i try just basic prepocessing and bunch of morphological operation (erode/dilate). It was successful for big ic, but for small it doesnt work as the morphological operation tends to close the contour. Ive also try YOLO, but it is giving false positive when there empty pocket as it detect it as an ic unit

Any recommendation so that i could learn?

u/Grouchy_Signal139 — 10 hours ago
Image 1 — Unitree L1 Lidar DIY viewer has some data offset by approx 16 degrees.
Image 2 — Unitree L1 Lidar DIY viewer has some data offset by approx 16 degrees.

Unitree L1 Lidar DIY viewer has some data offset by approx 16 degrees.

I have an eventual goal of running the L1 Lidar directly over UART to a MCU.

As an intermediate step I've been developing a C++ PC viewer (using the official UART>USB serial module) to get the payloads and decoding down but have been struggling to understand where this double image phenomenon is coming from.

The official unilidar viewer doesn't show this double image and I've been able to confirm this is not a rendering bug and appears in the data itself. When zooming in on near-field test objects it appears to have a complementary/alternating stripping effect indicating both images contain real depth plots and not simply duplicates.

My initial thoughts where it's a temporal/async issue coming from a secondary or auxiliary process that with a naive decode ends up with an offset that jut needs buffered and matched. All my tests so far indicate this is genuine data that isn't being processed properly rather than a render bug of duplicate data.

Has anyone seen anything like this before from any LIDAR products or have any ideas how to untangle the depth points, potentially with a good reference test for a manual alignment?

u/RipWooden6509 — 4 hours ago

Need some suggestion with industrial MV software

Hi there everyone! I recently received a couple of project proposals for implementation of a MV system for quality control of spare parts. Ive studied the case with an expert and deep learning approach might be the best option. Mainly because cycle times are pretty short and differences are too tight for using metrology or other approach.

Having said that, anyone with experience in MVTec, Keyence and vision pro from cognex? Bearing in mind that I live in Europe, id like to know about their tech support, price and learning curve.

Related to MVTEC, What's the conventional hw for embedding? I recently read that thatthey suggest arm ones so not pretty sure if a Jetson or an industrial IPC might fit.

Thanks a lot!

reddit.com
u/No-Sympathy2403 — 4 hours ago

6D pose estimation on Android phones

Hi everyone, I want to run a 6D pose estimation algorithm on an Android phone. I don’t need a high frame rate, around one frame per second is sufficient. The target is a known object (e.g., a table or chair), and I already have its 3D model from photogrammetry. I only have a standard RGB camera (no depth sensor).

What is the best 6D pose estimation library or algorithm for this setup? Ideally, it should be easy to use, lightweight enough to run on a mobile device, and preferably free or open-source. Thanks!

reddit.com
u/FeaturePretend1624 — 10 hours ago
▲ 2 r/ChatGPT+1 crossposts

Three days in a hole with ChatGPT

I'm an avid amateur Astro imager. I can't write code anymore as I'm not 80 years old. I've always got ideas for things that I could use to improve my experience and capabilities. So, now that I have a ChatGPT subscription I decided to give it a whirl. What I learned was the ChatGPT was overly optimistic about its domain knowledge when it came to astro-imaging. It would congratulate me over and over again as I slowly but surely went down a rabbit hole with my first project. Eventually, it told me that my project and approach were simply "a fantasy". It was correct but it only told me that when I gave up. Lesson is that if you want code written by ChatGPT you'd better figure out exactly what you want. Don't expect it to have a clue about any specialized domains. Anyone have similar experiences?

reddit.com
u/astronomer1946 — 10 hours ago

Hello, I have a question.

I'm working on a computer vision project where merchandisers take pictures of store shelves. My task is to detect the products in the image so I can identify competitors vs. my company's products.

I thought about two approaches:

  1. Use YOLO to detect products on the shelves, annotate them, and train a model to classify which products belong to my company.

  2. Create folders with images of each company's products, generate embeddings for them (possibly using OCR to extract and embed text), and when a new image arrives use vector search to identify which company the product belongs to.

Does this make sense, or is there a better approach for this problem?

(note that I don't have big resources to train a big model)

thanks in advance

reddit.com
u/ryan7ait — 6 hours ago

Supervisely tight bounding polygon

I have a series of photographs of different core boxes, which are a uniform rectangular container used to hold and display drill core. A tedious part of my job right now is manually cropping in on the core tray of each photograph, which is a task I'd rather automate.

Since the photographs are taken by hand, there is often a slight angle, so a bounding box parallel to the axis of the photograph won't be sufficient. I need a polygon which tightly encompasses the core tray, with four nodes, one for each corner of the tray. For this reason I believe I need instance segmentation rather than object recognition, please correct me if I'm wrong.

I started off by training a Yolo11m-seg model on 150 photographs which I annotated myself. I left all other parameters as their defaults. The results were subpar, the predictions were consistently significantly smaller than my annotations, which would cut off the edges of my core trays.

I think my model may have failed to learn that the core (highly variable) displayed withing the trays is irrelevant, the edges of the trays are all that matter.

I have tried to upgrade to a YOLO11l-seg model hoping it would be smarter but I always get a memory crash out on my 8GB of RAM even after setting the batch size to 2 and the number of workers to 0.

Any advice on how to train a model which can accurately make a tight bounding polygon based on the four corners of a core tray would be appreciated.

reddit.com
u/General_Degenerate- — 6 hours ago

Multi-camera real-time fitness tracking with RTMPose + 2D→3D lifting (self-hosted project)

I tried building a simple self-hosted fitness tracker… and it kind of spiraled into this.

It actually started pretty dumb:
I was doing pushups in my basement and thought “couldn’t a camera just count reps and maybe draw a skeleton on top?”

I had played around with face recognition before, and since training isn’t really optional for me (Parkinson), I figured… why not try.

The first PoC was:

  • Ubuntu 20.04
  • an old NVIDIA Tesla P4
  • a single Reolink IP cam

It worked… badly. But enough to get hooked.

Then things escalated:

  • added more cameras (ended up with 3)
  • tried doing proper multi-view + 3D reconstruction
  • spent ~2 weeks in calibration hell (Charuco boards, triangulation, you name it)

At one point I thought I was clever and rotated the cameras 90° to get better vertical resolution.

That decision alone probably cost me several years of life:
cw/ccw confusion, projection errors, reprojection errors… everything was wrong in ways that almost looked right.

Even when pose detection worked perfectly per stream, 3D fusion would just refuse to cooperate.

Also learned the hard way:

  • cheap IP cams + no real timestamps = synchronization nightmare
  • Tesla P4 + 3D = technically possible, practically suffering

There was a brief detour with an Insta360 over USB (v4l2)… which was about as stable as you’d expect.

Current setup (less cursed, still questionable life choices):

  • AMD server + NVIDIA A2
  • 1× Basler 4K industrial cam (side view)
  • 2× IP cams (front)
  • RTMPose (133 keypoints) + MotionAGFormer (2D→3D)
  • hybrid multi-view approach with an “anchor stream” + auxiliary views

Now it can (more or less):

  • track full body (including hands/face)
  • count reps (state-machine based)
  • evaluate form (depth, symmetry, tempo, alignment, etc.)
  • render a live 3D model on the TV
  • identify the user via face recognition
  • log everything down to individual reps in SQLite

There’s also a (very early) voice coach and a YAML-based exercise system.

Where I want to take this:

  • better 3D visualization (SMPL-X instead of current prototype)
  • more robust scoring (right now it’s still pretty basic)
  • eventually a “real” coach that adapts workouts based on training history

Also worth mentioning:
Without tools like Codex / Claude I probably wouldn’t have been able to build this at all. This project is way beyond what I could realistically code solo from scratch.

What I’m curious about:

  • multi-view CV setups: how do you handle sync/calibration reliably in real-world setups?
  • better approaches for exercise phase detection than simple state machines?
  • stabilizing 2D→3D lifting in noisy environments
  • or just general “you’ve gone too far” feedback

Would love to hear thoughts or similar projects.

reddit.com
u/jowe81 — 18 hours ago
So, I am working on AI/ ML driven Disaster dectection Model
▲ 3 r/computervision+2 crossposts

So, I am working on AI/ ML driven Disaster dectection Model

https://preview.redd.it/97c1kmglo0tg1.jpg?width=1080&format=pjpg&auto=webp&s=f08f5bd353354b87d89ab2739263c29d96b7a3fa

So I am new to coding And I genuinely need Help in coding and making things, So actually it's a problem Statement which is not from Our College or curriculum, and we are not thought as high end level of programming in our institution, So we have to make an Application which will take data from IOT devices or satellite images to detect disaster early and give warning , so is there any API or LLM made on this which would Help us to make it are there anything related to it which is missing by me, And sorry to interrupt cuz i am also learning sometimes from friends, classmate, internet or ChatGPT has been my Buddy , plss tell me imp sources related to this project

reddit.com
u/WarTop8796 — 21 hours ago

I'm having some confusion on YOLO (PnP?) vs April Tags for tracking an object?

Can YOLO be used to track the position of an object as well as an April Tag? Or is YOLO Just good for saying hey found it but not so much for tracking movement in space over time?

Also for a pi 4 would April Tags be faster/cheaper and more accurate than YOLO?

reddit.com
u/Nyxtia — 11 hours ago

Exception queues matter more than people admit in document pipelines

I think a lot of document workflow pain comes from queue design, not just extraction quality.

A system can parse plenty of pages and still create operational drag if every unclear case lands in one generic review bucket.

What breaks

  • Blurry images, layout shifts, changed versions, and missing fields all look the same in the queue
  • Retries and review-worthy cases compete with each other
  • Reviewers have to open each case before they even know what kind of issue they’re looking at

What I’d do

  • Split exceptions by reason instead of one catch-all queue
  • Attach source-page context and extracted output to each flagged case
  • Separate infrastructure retries from document-specific review flow

Options shortlist

  • General OCR/document APIs plus your own routing layer
  • Internal review tooling with better queue metadata
  • Queue/orchestration systems for prioritization and triage
  • Document ops tools built around exception handling

My bias is that “human in the loop” only helps if the reviewer gets useful context fast.

Curious how others structure exception types in production. If you’ve found a cleaner queue pattern for messy documents, I’d genuinely like to hear it.

reddit.com
u/Careless_Diamond7500 — 14 hours ago

Looking for arXiv cs.CV endorser (first submission – thin-obstacle segmentation)

Hello,

I am preparing my first arXiv submission in the cs.CV category and I am currently looking for an endorser.

The paper focuses on thin-obstacle segmentation for UAV navigation (e.g., wires and branches), which are particularly challenging due to low contrast and extreme class imbalance. The approach is a modular early-fusion framework combining RGB, depth, and edge cues, evaluated on the DDOS dataset across multiple configurations (U-Net, DeepLabV3, pretrained and non-pretrained).

If anyone with cs.CV endorsement is open to taking a quick look and possibly endorsing, I would really appreciate it.

Thank you in advance!

reddit.com
u/negar_fathi — 17 hours ago

I don't know how to add liveness detection and facial recognition to our attendance system. Are there open source models I can use or do I have to train one?

I'm creating an attendance system for a capstone project that has facial recognition, and liveness detection. Problem is, I don't exactly know where to start with the facial recognition and liveness detection.

if there are any open source models, where do I get them, and what would be the downsides I could face in using them

and I don't think I'm equipped with the right things to train a model. how does training a model work and what would I need to do so?

reddit.com
u/Applesareterrible — 7 hours ago
Week