Where does AI-generated embedded code fail?
AI-generated code is easy to spot in code review these days. The code itself is clean -- signal handling, error handling, structure all look good. But embedded domain knowledge is missing.
Recent catches from review:
- CAN logging daemon writing directly to /var/log/ on eMMC. At 100ms message intervals. Storage dies in months
- No volatile on ISR-shared variables. Compiler optimizes out the read, main loop never sees the flag change
- Zero timing margin. Timeout = expected response time. Works on the bench, intermittent failures in the field
Compiles clean, runs fine. But it's a problem on real hardware.
AI tools aren't the issue. I use them too. The problem is trusting the output because it looks clean.
LLMs do well with what you explicitly tell them, but they drop implicit domain knowledge. eMMC wear, volatile semantics, IRQ context restrictions, nobody puts these in a prompt.
I ran some tests: explicit prompts ("declare a volatile int flag") vs implicit ("communicate via a flag between ISR and main loop") showed a ~35 percentage point gap. HumanEval and SWE-bench only test explicit-style prompts, so this gap doesn't show up in the numbers.
I now maintain a silent failure checklist in my project config, adding a line every time I catch one in review. Can only write down traps I already know about, but at least the same failure types don't recur.
If you've caught similar failures, I'd like to hear about them.