Hey everyone,

I’ve been building a multi-agent system in my spare time, and I just open-sourced the repository. I was getting tired of the standard text-in/text-out chat paradigm and wanted to build a genuinely situated AI—one that actually perceives the physical environment and my physiological state in real-time without hitting a single cloud API. Using my Framework 128GB desktop with an amd v620 32GB oculink via minis forum deg1.

Repository: [https://github.com/anitherone556-max/Project-Aurelia.git\]

The TL;DR:

Project Aurelia is a completely local, biometric-aware multi-agent architecture. It continuously reads my heart rate, respiration, proximity, and system thermals, translates those metrics into a "biological" state, and injects them into an 80B MoE executive model's behavior loop.

The Cognitive Stack & Hardware Setup

I’m running this across a split compute setup to guarantee background tasks don't starve the main conversational model:

The Executive Cortex (80B MoE - Qwen3-Next-A3B): Runs on a Framework Desktop (Strix Halo) leveraging 96GB of unified system memory to eliminate PCIe bottlenecks. It handles the core reasoning, mood state, and UI delivery.
The Sensory Thalamus (9B - Qwen3.5): Also in unified memory. This acts as a signal transduction layer. It takes raw hardware arrays from my sensors and translates them into clinical "biological" observations. (e.g., instead of feeding the 80B "HR: 120", it feeds it "[PULSE]: Spiking. Tense, racing rhythm"). This preserves the AI's persona and hides the hardware numbers.
The Subconscious Action Engine (13B): Physically isolated on a Radeon Pro V620 connected via OCuLink. This loops in the background handling autonomous Python execution, web searches, and file parsing. Because it has dedicated silicon, it can run heavy reasoning loops without lagging the 80B.

The Sensor Pipeline (The Omni Hub)

FMCW mmWave Radar (60GHz): Pulls raw I/Q signal data into a 20-second rolling buffer, using an FFT pipeline to extract my heart rate and respiration.
VL53L1X LiDAR: Validates my physical presence and distance at the desk.
HWiNFO Shared Memory: Reads actual CPU/GPU thermals. (I built a hardware-gated "Unstable" mood lock—the 80B cannot throw a crisis-level behavioral response unless the actual silicon thermals cross a danger threshold).

If my heart rate spikes, the Omni Hub detects the variance and fires a "Thalamic Interrupt" straight into the async orchestrator, forcing the 80B to drop its current task and react to my physiological state instantly.

Memory

It uses a hybrid RRF (Reciprocal Rank Fusion) memory engine combining ChromaDB for semantic search and SQLite FTS5 for exact BM25 keyword matching. I also built in a mood-congruent retrieval multiplier, so if the 80B shifts into an "Analytical" or "Protective" mood, it preferentially surfaces long-term memories encoded in that same state.

I built this solo over the last month. The FFT biometric extraction works well but is susceptible to motion artifacts, so I'm looking into VMD or CNN reconstruction next.

I’d love for this community to tear the architecture apart, test the logic, or fork it. Let me know what you think!

https://preview.redd.it/w6pouri3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=b8a5a4d60ef51e02888294ef3c60f28c1bfddfbc

https://preview.redd.it/7eugari3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=1390690e5f3014a9a00dfd1514690ad26067474b

https://preview.redd.it/v72jyqi3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=f220f91ec214dbd3747b288b90823f13111a6a98

The Cognitive Stack & Hardware Setup

I’m running this across a split compute setup to guarantee background tasks don't starve the main conversational model:

The Executive Cortex (80B MoE - Qwen3-Next-A3B): Runs on a Framework Desktop (Strix Halo) leveraging 96GB of unified system memory to eliminate PCIe bottlenecks. It handles the core reasoning, mood state, and UI delivery.

The Sensory Thalamus (9B - Qwen3.5): Also in unified memory. This acts as a signal transduction layer. It takes raw hardware arrays from my sensors and translates them into clinical "biological" observations. (e.g., instead of feeding the 80B "HR: 120", it feeds it "[PULSE]: Spiking. Tense, racing rhythm"). This preserves the AI's persona and hides the hardware numbers.

The Subconscious Action Engine (13B): Physically isolated on a Radeon Pro V620 connected via OCuLink. This loops in the background handling autonomous Python execution, web searches, and file parsing. Because it has dedicated silicon, it can run heavy reasoning loops without lagging the 80B.

The Sensor Pipeline (The Omni Hub)

FMCW mmWave Radar (60GHz): Pulls raw I/Q signal data into a 20-second rolling buffer, using an FFT pipeline to extract my heart rate and respiration.

VL53L1X LiDAR: Validates my physical presence and distance at the desk.

HWiNFO Shared Memory: Reads actual CPU/GPU thermals. (I built a hardware-gated "Unstable" mood lock—the 80B cannot throw a crisis-level behavioral response unless the actual silicon thermals cross a danger threshold).

Memory

I built this solo over the last month. The FFT biometric extraction works well but is susceptible to motion artifacts, so I'm looking into VMD or CNN reconstruction next.

I’d love for this community to tear the architecture apart, test the logic, or fork it. Let me know what you think!

u/Front-Whereas-3050

Project Aurelia — A 3-model architecture (80B + 13B + 9B) that physically reacts to my real-time heart rate via mmWave radar, spatial awareness via Lidar, and Vibration via Accelerometer. All on a Framework Desktop + eGPU

The TL;DR:

The Cognitive Stack & Hardware Setup

The Sensor Pipeline (The Omni Hub)

Memory

Project Aurelia — A 3-model architecture (80B + 13B + 9B) that physically reacts to my real-time heart rate via mmWave radar, spatial awareness via Lidar, and Vibration via Accelerometer.

The TL;DR:

The Cognitive Stack & Hardware Setup

The Sensor Pipeline (The Omni Hub)

Memory