
This is just a heads up for those of you using the GenAI function with Frigate. It's extraordinarily powerful (I was able to cut my "notification spam" to absolutely zero...I only get alerts for threat level 1 and 2 now, and they're 100% accurate and I can tell at a glance whether I need to watch the video or not). I've saved all my notifications for the past three days and made a short video (at bottom of the post). The times when it's actually me/my wife/both of us getting in the car, we almost never get notifications about that because part of my prompt is "if [my name] or [wifes name] are recognized, which you'll know if they're passed along to you in the object list, no matter what is happening in the scene it's threat level 0". So in the video the few that there are of us is when it didn't recognize our faces or we deliberately hid them.
Anyway, back to the main point of the post...with a well crafted prompt, I'm able to use "dumber" models to get additional speed while not losing out on the details that I'm looking for. I've got an absolutely MASSIVE prompt and am currently using gemma 4 26b A4B on an AMD MI60 GPU with 32gb VRAM. I have 130k context (two parallel slots, each with their own 130k context window). My current Frigate metrics show that I'm getting my review summaries as:
Average Inference Time 50722.53ms
(and it would be about 15 seconds less than that if I didn't allocate the 1024 thinking tokens, but I've found that without any thinking at all with this model, even with a good prompt it can get details incorrect)
I'll save my entire prompt for another post, as I had to modify a core python file (frigate/genai/__init__.py ) to have my prompt come after the "analyses guidelines" (which I also had to modify, as a number of the elements of it were interfering with my mechanical rules that I wanted to be followed strictly, and there were too many references to allowing it to "think" about the scene versus just following precisely what I told it to do...it would talk itself out of things being a threat level and assigning them a 0 that I wanted to be threat level 2).
Just a quick example, if it were the middle of the day and I ran out to get something out of the car but my face was hidden, it would assign threat level 0 because "this appears to be a homeowner retrieving something from your car" ...and while that's technically correct, my prompt says that any time ANY unknown person at all EVER opens or interacts with my car that it HAS to be a threat level 2. We park in a townhouse parking lot with shared parking and on a street where plenty of people walk their dogs and such. We've had our car broken into twice and my wife can occasionally forget to lock the doors. If someone that the facial recognition doesn't recognize opens a car door, I don't want the AI "guessing" if that's a problem...its ALWAYS a problem.
Anyway, like I said, that's for a different post.
====================================================================
VEHICLE DOOR SIDE CLASSIFICATION — U.S. DRIVER STANDARD
DETERMINISTIC RULE SET (NO HEURISTICS)
====================================================================
None of what follows is "relative" to anything. It's to be interpreted only as written explicitly. No reasoning or thinking. Follow in the order given:
STEP 1
1 - can you see the headlights and/or grill - if so, that means the vehicle is "front facing" - stop - otherwise
2 - can you see the taillights and/or the trunk - if so, that means the vehicle is "rear facing" - stop
Now you now whether its front of rear facing
END STEP 1
STEP 2
1 - Divide the entirety of the viewable frame perfectly down a dividing line that is dictated by the center of the viewable vehicle (to be done per vehicle). That should give you two "rectangles" that bewteen the two are the entirety of the frame.
2 - Is the person or object of interest in the left rectangle? If so, then they're "left side" - stop - otherwise
3 - Is the person or object of interest in the right rectangle? - then they're "right side" - stop
Now you now whether they're on the "left side" or "right side"
END STEP 2
STEP 3
1 - "Front facing" combined with "left side" = passenger side - stop - otherwise
2 - "Front facing" combined with "right side" = drivers side - stop - otherwise
3 - "Rear facing" combined with "left side" is drivers side - stop - otherwise
4 - "Rear facing" combined with "right side" is passengers side - stop
END STEP 3
STEP 4
OUTPUT FORMAT (MANDATORY):
- Use EXACTLY ONE of the following:
- "Driver-side door"
- "Passenger-side door"
- "Door side undetermined"
OUTPUT ORDER (MANDATORY — follow this sequence exactly):
1. Resolve vehicle orientation and door side first (silent or shown) using the steps above
2. Explain to yourself how you arrived at the determination to be sure, mechanically going through them
3. Write scene description using the resolved door side label after the above step
Do NOT write the scene description before door side is resolved (if it's relevant to the scene)
The scene description MUST use the door side label produced by the
classification steps above — never an independently assumed label.
END STEP 4
So that's just one part of my very long prompt about vehicle door side classification. You might think that even that is absurdly long for one thing...and I've tried many many iterations and I finally realized I should just upload the screenshots of things it was getting wrong to my localLLM and ask it to tell me why it came to the conclusion that it did. This helped me immensely in writing my prompt to get the results I wanted. For example, when I first had the prompt I had the front/rear facing part working (it identifying the front/rear of the car) and it just would CONTINUOUSLY say "driver side". So I uploaded it and asked it to tell me why, and it would say "well the person is on the right side of the frame"...which is technically correct...I am on the right side of the frame, but not the right side of the car. There's more to it than that, but that's a quick summary.
So, if you're not getting the results you want from your prompt (or don't have a very good one to begin with), try deliberately doing some things on your camera, taking some screenshots of them, uploading them to your LLM of choice and have it explain how it's coming the incorrect conclusions that it is.