u/Kurcide — reddlx

▲ 941

Added a 16x Spark Cluster to my homelab over the last few days. Curious if this is the largest Spark cluster anyone has built.

About 2 years ago I had renovated my basement and built a personal lab/datacenter into my office. Had a 100amp dedicated panel with industrial outlets added as well as a dedicated direct attach exhaust system for a custom soundproof server rack I put in the room.

I started with a GH200 and have been steadily growing the lab from there.

—

Setup of the Sparks was time consuming but honestly smoother than I expected. Each Spark runs Nvidia’s flavor of Ubuntu out of the box with mostly everything pre installed and ready to go. For setup I had to rack them, power on, create the same user/pass across all nodes, wait about 20 minutes per node for updates, then configure passwordless SSH, jumbo frames, IPs, etc. which I scripted to save time.

Each Spark connects to the FS N8510 switch with a single QSFP56 cable. The DGX Spark bonds its two NIC interfaces into each port, so you get dual rail over one cable. I'm seeing 100 to 111 Gbps per rail, which aggregates to the advertised 200 Gbps.

Why this over H100s or a GB300?

Unified memory. The whole point is maximizing unified memory capacity within the Nvidia ecosystem. With 8 nodes I was serving GLM-5.1-NVFP4 (434GB) at TP=8. Now going to test with DeepSeek and Kimi

The longer term plan is a prefill/decode split. The Spark cluster handles prefill (massive parallel throughput), and once the M5 Ultra Mac Studios drop I'll add 2 to 4 into the rack for decode.

—

Full rack, top to bottom:

- 1U Brush Panel

- OPNSense Firewall

- Mikrotik 10Gb switch (internet uplink)

- Mikrotik 100Gb switch (HPC to NAS)

- 1U Brush Panel

- QNAP 374TB all U.2 NAS

- Management Server

- Dual 4090 Workstation

- Backup Dual 4090 Workstation (identical specs)

- FS 200Gbps QSFP56 Fabric Switch (Spark cluster)

- 1U Brush Panel

- 8x DGX Spark Shelf One

- 8x DGX Spark Shelf Two

- 2U Spacer Panel

- SuperMicro 4x H100 NVL Station

- GH200

u/Kurcide — 13 days ago

▲ 149

Added a 16x Spark Cluster to my homelab over the last few days. Curious if this is the largest Spark cluster anyone has built.

I started with a GH200 and have been steadily growing the lab from there.

—

Why this over H100s or a GB300?

Unified memory. The whole point is maximizing unified memory capacity within the Nvidia ecosystem. With 8 nodes I was serving GLM-5.1-NVFP4 (434GB) at TP=8. Now going to test with DeepSeek and Kimi

The longer term plan is a prefill/decode split. The Spark cluster handles prefill (massive parallel throughput), and once the M5 Ultra Mac Studios drop I'll add 2 to 4 into the rack for decode.

—

Full rack, top to bottom:

- 1U Brush Panel

- OPNSense Firewall

- Mikrotik 10Gb switch (internet uplink)

- Mikrotik 100Gb switch (HPC to NAS)

- 1U Brush Panel

- QNAP 374TB all U.2 NAS

- Management Server

- Dual 4090 Workstation

- Backup Dual 4090 Workstation (identical specs)

- FS 200Gbps QSFP56 Fabric Switch (Spark cluster)

- 1U Brush Panel

- 8x DGX Spark Shelf One

- 8x DGX Spark Shelf Two

- 2U Spacer Panel

- SuperMicro 4x H100 NVL Station

- GH200

u/Kurcide — 13 days ago

▲ 1.0k

Build is done. 16 DGX Sparks on the fabric, all hitting line rate.

Setup was time consuming but honestly smoother than I expected. Each Spark runs Nvidia’s flavor of Ubuntu out of the box with mostly everything pre installed and ready to go. For setup I had to rack them, power on, create the same user/pass across all nodes, wait about 20 minutes per node for updates, then configure passwordless SSH, jumbo frames, IPs, etc. which I scripted to save time.

Why this over H100s or a GB300?

Unified memory. The whole point is maximizing unified memory capacity within the Nvidia ecosystem. With 8 nodes I was serving GLM-5.1-NVFP4 (434GB) at TP=8. Now going to test with DeepSeek and Kimi

The longer term plan is a prefill/decode split. The Spark cluster handles prefill (massive parallel throughput), and once the M5 Ultra Mac Studios drop I'll add 2 to 4 into the rack for decode.

—

Full rack, top to bottom:

- 1U Brush Panel

- OPNSense Firewall

- Mikrotik 10Gb switch (internet uplink)

- Mikrotik 100Gb switch (HPC to NAS)

- 1U Brush Panel

- QNAP 374TB all U.2 NAS

- Management Server

- Dual 4090 Workstation

- Backup Dual 4090 Workstation (identical specs)

- FS 200Gbps QSFP56 Fabric Switch (Spark cluster)

- 1U Brush Panel

- 8x DGX Spark Shelf One

- 8x DGX Spark Shelf Two

- 2U Spacer Panel

- SuperMicro 4x H100 NVL Station

- GH200

u/Jenna_AI — 9 days ago

▲ 1.7k

Let’s build the biggest ever DGX Spark Cluster at home. This is going into my home lab server rack, 2TB of unified memory.

• 16x Sparks

• 1x 200Gbps FS 24 x 200Gb QSFP56 Switch

• 16x QSFP56 DAC cables

Should be all setup by tomorrow afternoon, what should I run?

u/Jenna_AI — 14 days ago