Hi everyone,
I am a 1st-year B.Tech student, and I recently published a theoretical architecture preprint on Zenodo exploring how to bypass the Thermal Wall and RC Delay limits using a quasi-delay-insensitive (QDI) paradigm.
The TL;DR of the LAGS Architecture:
- Core: Locally Asynchronous, Globally Synchronous (LAGS). Execution islands use 4-Phase RTZ handshakes and Valid Bit completion detection (single-rail, not dual-rail) to act as glitch-filtered QDI pipelines.
- Thermal Management: A hardware-level Token Ring acts as a strict power-gating enabler, forcing a rotating thermal duty-cycle across the NoC to prevent Dark Silicon meltdowns without OS intervention.
- The EDA Compromise: I know pure async is an EDA nightmare. To make this theoretically fabricable, the internal NoC is clockless, but the boundaries are wrapped in standard Synchronous Interfaces (Two-Flop Synchronizers) to act as a Trojan Horse for Static Timing Analysis (STA) tools.
What I am looking for: I am preparing to move into Phase 1 (VHDL/FPGA deployment) to empirically test the Token Ring thermal heuristics and interrupt latency.
Before I start writing hardware description logic, I want brutal feedback on the theoretical bottlenecks. Specifically:
- Does my synchronous boundary wrapper adequately satisfy modern STA tools, or will the tools still choke on the internal QDI logic?
- For those working with massive NoCs, does my assumption about the Token Ring acting as a strict hardware memory fence hold up under heavy localized data-dependency?
Tear it apart. I want to know where the physical limits break my logic before I try to simulate it. Thanks!
u/LuckySalary — 14 days ago