u/LuckySalary — reddlx

Hi everyone,

I am a 1st-year B.Tech student, and I recently published a theoretical architecture preprint on Zenodo exploring how to bypass the Thermal Wall and RC Delay limits using a quasi-delay-insensitive (QDI) paradigm.

Link to Paper: https://zenodo.org/records/20055657?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6Ijk5MGI1MzU2LTEyZGItNDA5Zi1iYzJjLTYwN2JlZDg4ZWRiYiIsImRhdGEiOnt9LCJyYW5kb20iOiIwNmZkZjA2ZmE5ZTRhMjE1MmNiMzNmNjhkZDM2ODhjYSJ9.qWnAPAz0EvW4OB819gAJ_jwncxSqO9w59BX9SKoC6mOUSPgglVEwbwKb2B9OkegSu6CtGlmlQBKjyJ0zxdD7cg

The TL;DR of the LAGS Architecture:

Core: Locally Asynchronous, Globally Synchronous (LAGS). Execution islands use 4-Phase RTZ handshakes and Valid Bit completion detection (single-rail, not dual-rail) to act as glitch-filtered QDI pipelines.
Thermal Management: A hardware-level Token Ring acts as a strict power-gating enabler, forcing a rotating thermal duty-cycle across the NoC to prevent Dark Silicon meltdowns without OS intervention.
The EDA Compromise: I know pure async is an EDA nightmare. To make this theoretically fabricable, the internal NoC is clockless, but the boundaries are wrapped in standard Synchronous Interfaces (Two-Flop Synchronizers) to act as a Trojan Horse for Static Timing Analysis (STA) tools.

What I am looking for: I am preparing to move into Phase 1 (VHDL/FPGA deployment) to empirically test the Token Ring thermal heuristics and interrupt latency.

Before I start writing hardware description logic, I want brutal feedback on the theoretical bottlenecks. Specifically:

Does my synchronous boundary wrapper adequately satisfy modern STA tools, or will the tools still choke on the internal QDI logic?
For those working with massive NoCs, does my assumption about the Token Ring acting as a strict hardware memory fence hold up under heavy localized data-dependency?

Tear it apart. I want to know where the physical limits break my logic before I try to simulate it. Thanks!