Guide

NVIDIA DGX Spark: Complete Guide to the $4,699 AI Mini-Supercomputer (2026)

NVIDIA DGX Spark puts a Grace Blackwell superchip on your desk — 1 petaflop, 128GB unified memory, ,699. Complete buyer's guide with benchmarks, thermal analysis, and comparisons to RTX 5090 and Mac Studio.

March 14, 2026·10 min read·3,519 words

NVIDIA's DGX Spark puts a Grace Blackwell superchip on your desk. A 1.2 kg box — smaller than a Mac Mini — that delivers 1 petaflop of AI compute, 128 GB of unified memory, and the full CUDA stack. No cloud bills, no SSH into a rented GPU, no waiting for queue slots.

First announced as "Project DIGITS" at CES 2025, then renamed and unveiled at GTC in March 2025, the DGX Spark started shipping in late 2025 as NVIDIA's most aggressive play at putting data center-class AI hardware within reach of individual developers, researchers, and small teams.

But is it worth $4,699? And who is it actually for? This guide covers every spec, benchmark, limitation, and comparison you need to make the right call.

DGX Spark Full Specifications

Specification	Details
Superchip	NVIDIA GB10 Grace Blackwell
CPU	20-core ARM (10× Cortex-X925 @ 4 GHz + 10× Cortex-A725 @ 2.8 GHz)
GPU	Blackwell architecture, 6,144 CUDA cores
AI Performance	1 PFLOP (FP4 sparse)
Memory	128 GB LPDDR5x unified (CPU + GPU shared)
Memory Bandwidth	273 GB/s
Storage	1 TB or 4 TB NVMe M.2 SSD
Networking	Wi-Fi 7, 10 GbE Ethernet, 2× QSFP (ConnectX-7, 200 Gbps aggregate), Bluetooth 5.4
Ports	4× USB-C, HDMI 2.1a
Power	240W external PSU (GPU TDP: ~140W)
Dimensions	150 × 150 × 50.5 mm
Weight	1.2 kg
OS	DGX OS 7.4.0 (Ubuntu 24.04, kernel 6.17)
CUDA	13.0.2
Price	$4,699 (Founder's Edition, 4 TB)

The spec sheet reads like a data center node that got miniaturized. The 20-core ARM CPU trades blows with Apple's M4 performance cores in single-threaded benchmarks. The QSFP ports — usually found on rack servers — let you link two Sparks together at 200 Gbps for combined 256 GB memory pools. And the full CUDA 13 stack means every NVIDIA framework, container, and tool works natively.

Pricing: From $3,999 to $4,699

The DGX Spark has had a bumpy pricing history. At CES 2025, NVIDIA announced it at $2,999. By the time reservations opened at GTC, it was $3,999. When units started shipping in late 2025, the Founder's Edition (4 TB SSD, gold-tinted metal chassis) landed at $3,999. In February 2026, the price increased to $4,699.

Partner versions are also available from Acer (Veriton GN100), ASUS (Ascent GX10), Dell (Pro Max GB10), and MSI (EdgeXpert MS-C931). These typically ship with 1 TB storage at slightly lower prices, though availability has been inconsistent.

For what you get — full Blackwell GPU, 128 GB unified memory, 200 Gbps networking, and a complete DGX software stack — the $4,699 price point is unprecedented. Two years ago, this level of AI compute required a $30,000+ workstation or a substantial monthly cloud bill.

What's Actually Inside the GB10 Superchip

The GB10 is a system-on-chip that combines a Blackwell GPU die with a Grace CPU on a single package, sharing 128 GB of LPDDR5x memory through a unified memory architecture.

The GPU side features 6,144 CUDA cores on the Blackwell architecture — the same generation that powers NVIDIA's B100 and B200 data center GPUs. It supports FP4, FP8, FP16, BF16, and FP32 precision, with hardware acceleration for transformer workloads. The headline 1 PFLOP figure refers to FP4 sparse performance — a theoretical peak, but the real-world numbers are still impressive.

The CPU side runs 20 ARM cores in a big.LITTLE configuration: 10 high-performance Cortex-X925 cores at 4 GHz and 10 efficiency Cortex-A725 cores at 2.8 GHz. In single-core performance, these are competitive with Apple's M4 and significantly faster than any Arm server chip from two years ago.

The unified memory is the defining architectural choice — and the biggest trade-off. All 128 GB of LPDDR5x is shared between CPU and GPU, meaning any model that fits in 128 GB can be loaded regardless of GPU VRAM limits. But unified memory comes at a bandwidth cost: 273 GB/s is excellent for a consumer device but an order of magnitude slower than an H100's 3.35 TB/s of HBM3. This bandwidth gap is the single most important number to understand about the DGX Spark.

Real-World Inference Performance

Small Models (8B–20B): The Sweet Spot

The DGX Spark excels with models in the 8–20 billion parameter range. Benchmark data from the LMSYS team:

Model	Framework	Batch	Prefill (tok/s)	Decode (tok/s)
Llama 3.1 8B FP8	SGLang	1	7,991	20.5
Llama 3.1 8B FP8	SGLang	32	7,949	368
Llama 3.1 8B NVFP4	TRT-LLM	1	10,257	38.7
DeepSeek-R1 14B FP8	SGLang	8	2,074	83.5
GPT-OSS 20B MXFP4	Ollama	1	2,053	49.7

Those batch-32 numbers for Llama 3.1 8B — 368 tokens per second decode — are genuinely impressive for a desktop device drawing under 240W. With NVIDIA's speculative decoding (EAGLE3), throughput can push even higher.

For single-user interactive use, the 20–50 tok/s decode range with 8B–20B models delivers a responsive chat experience. This is the DGX Spark's core value proposition: local, private, fast inference on capable open-source models without cloud dependency.

Large Models (70B+): It Works, But Slowly

The 128 GB unified memory can load models up to 200B+ parameters. Whether you'd *want* to run them interactively is another question:

Model	Prefill (tok/s)	Decode (tok/s)
Llama 3.1 70B FP8	80	2.7
Qwen3 235B NVFP4 (2× Sparks linked)	23,477	11.7

At 2.7 tokens/second, Llama 70B is usable for batch processing, testing prompts, and prototyping — not for real-time conversation. The bottleneck is memory bandwidth: serving 70 billion parameters through 273 GB/s creates a fundamental throughput ceiling.

The dual-Spark configuration running Qwen3-235B at 11.7 tok/s decode is a striking demo of what's possible when you link two units via QSFP. But that's a $9,400+ setup before considering the networking cables. For teams choosing which Qwen version to run on the Spark, see our Qwen 3.5 vs 2.5 comparison. For teams choosing which Qwen version to run on the Spark, see our Qwen 3.5 vs 2.5 comparison.

The Bandwidth Reality Check

Sebastian Raschka's analysis captures it well: "The DGX Spark performs roughly on par with the 6× more expensive H100" for single-sample inference on small models. But the H100 dominates for batched workloads and large models because it has 12× more memory bandwidth.

If your use case is interactive inference with 8B–20B models, the DGX Spark delivers excellent performance per dollar. If you need 70B+ at conversational speed, you need either a discrete GPU with HBM or a higher-bandwidth unified memory system (like Apple Silicon with its 800+ GB/s on the M4 Ultra).

Fine-Tuning Capabilities

The DGX Spark handles LoRA and QLoRA fine-tuning well on models up to 70B. Full fine-tuning works on smaller models. Three frameworks are well-supported:

Model	Method	Framework	Peak Tokens/sec
Llama 3.2 3B	Full	NeMo	13,520
Llama 3.1 8B	LoRA	Unsloth	53,658
Llama 3.3 70B	QLoRA	Unsloth	5,079

Unsloth deserves special mention — it delivers 2.5× speed-ups over standard Hugging Face Transformers on the Spark.

A critical advantage over Apple Silicon: the DGX Spark supports full torch.compile. The Apple MPS backend throws InductorErrors when you attempt the same thing. If your workflow involves training — even fine-tuning — the full CUDA compatibility matters more than most spec comparisons suggest.

Setup: Easier Than You'd Expect

DGX OS is Ubuntu 24.04 with NVIDIA's driver stack pre-configured. You get two setup options:

Desktop mode: Plug in a monitor, keyboard, and mouse before first boot. You get a full Ubuntu desktop experience with the DGX Dashboard, JupyterLab, and all tools pre-installed.

Headless mode: Power on without peripherals. The Spark creates a Wi-Fi hotspot for initial configuration. SSH in, configure networking, and you're running a remote AI server.

The pre-installed software stack is comprehensive:

CUDA 13.0.2
Docker with GPU passthrough
Ollama (pre-installed)
DGX Dashboard (system monitoring + JupyterLab)
NGC container support (pull and run PyTorch containers within minutes)

NVIDIA Sync is the remote management client — install it on your laptop and access JupyterLab, VS Code (remote mode), or any custom endpoint running on the Spark. Combined with Tailscale, you can access your DGX Spark from anywhere.

> ⚠️ Critical warning: Don't interrupt the initial software image download during setup. It can't be resumed, and you'll need a factory reset.

The Thermal Problem

This is the DGX Spark's most documented issue. Multiple owners report thermal throttling at ~100W — well below the 240W power supply capacity — with CPU temperatures hitting 95°C during sustained loads. Units have been observed throttling, rebooting, or shutting down after 20–30 minutes of continuous heavy workload.

John Carmack publicly flagged thermal throttling on his unit. NVIDIA has issued firmware patches that improve behavior, but the fundamental physics remain: dissipating 140W+ of GPU heat plus CPU, NVMe, and NIC thermals in a 150 mm cube with shared thermal pathways is aggressive engineering.

Practical mitigations:

Keep ambient temperature below 30°C
Don't block the vents (give it room to breathe)
Clear caches between intensive tasks
Keep firmware updated (NVIDIA has pushed multiple thermal fixes)
Accept that the Spark works best for bursty workloads — inference serving, dev cycles, short fine-tuning — not sustained multi-hour training

If you plan to run overnight training jobs regularly, the thermal design may frustrate you. For inference serving and development work, it's manageable.

DGX Spark vs RTX 5090 vs Mac Studio: The Real Comparison

This is the comparison everyone searching for "DGX Spark review" actually wants. Here's an honest breakdown:

DGX Spark vs RTX 5090

Feature	DGX Spark	RTX 5090 (in a build)
Price	$4,699 (complete system)	~$2,000 GPU + ~$1,500 system = ~$3,500
Memory	128 GB unified	32 GB GDDR7
Memory Bandwidth	273 GB/s	1,792 GB/s
Max Model Size	~200B parameters	~20B parameters (FP16)
Small Model Speed	Fast	Much faster (5090 wins 7/10 benchmarks)
Large Model Capability	Runs 70B (slowly)	Can't fit 70B at all
CUDA Compatibility	Full	Full
Form Factor	Complete mini-PC	Requires full desktop build

Verdict: The RTX 5090 is significantly faster for models that fit in 32 GB — roughly 6.5× more memory bandwidth translates directly to faster token generation. But it *cannot run* models larger than ~20B parameters at full precision. If you need to run 70B+ models locally, the DGX Spark is your only option in this price range. If 8B–20B models cover your use case, the 5090 build is faster and cheaper.

DGX Spark vs Mac Studio M4 Ultra (192 GB)

Feature	DGX Spark	Mac Studio M4 Ultra 192 GB
Price	$4,699	~$5,999
Memory	128 GB unified	192 GB unified
Memory Bandwidth	273 GB/s	819.2 GB/s
Inference Speed (70B)	~2.7 tok/s	~8–12 tok/s
CUDA Support	Full	None (Metal/MPS only)
torch.compile	Full support	Broken (InductorErrors)
Fine-tuning	Excellent (LoRA/QLoRA/full)	Limited (no torch.compile)
OS	Ubuntu 24.04 (DGX OS)	macOS
Dual-unit Linking	Yes (QSFP, 200 Gbps)	No

Verdict: For pure inference speed — especially on 70B+ models — the Mac Studio M4 Ultra wins decisively thanks to 3× the memory bandwidth. It also has more memory (192 GB vs 128 GB). But the DGX Spark has full CUDA support, torch.compile, native Docker GPU passthrough, and the entire NVIDIA AI ecosystem. If you're *developing* AI — training, fine-tuning, containerized workflows, or prototyping for deployment to NVIDIA infrastructure — the Spark's software advantage matters enormously. If you primarily *run* models and want the fastest inference, the Mac Studio is the better buy.

DGX Spark vs Multi-GPU Home Server (3× RTX 3090)

A popular community build: three used RTX 3090s (24 GB each) in a workstation for aggregate 72 GB VRAM and ~936 GB/s total bandwidth. Cost: roughly $2,500–$3,500 depending on the rest of the build.

This config delivers 124 tok/s on 120B models — over 3× faster than DGX Spark's ~38 tok/s. But it draws 1,000W+, requires a full tower case, generates significant noise and heat, and needs NVLINK or Tensor Parallelism configuration across GPUs.

Verdict: If you want raw throughput above all else and don't mind the power draw, noise, and complexity, multi-GPU builds outperform the Spark per dollar. The Spark wins on simplicity, form factor, power efficiency, and noise.

Who Should Buy the DGX Spark

Buy it if you are:

An ML engineer targeting NVIDIA infrastructure. The Spark's real superpower is as a development kit. Code that runs on the Spark runs on DGX Station, DGX SuperPOD, and cloud NVIDIA instances. It's a desktop-sized development environment for production NVIDIA workflows.
A researcher who needs to run 30B–70B models locally with privacy. Healthcare, legal, finance, defense — any field where data can't leave your premises. The Spark fits models no consumer GPU can handle.
A developer who wants a quiet, compact AI workstation. Compared to a multi-GPU tower, the Spark is silent at idle, tiny, and power-efficient for typical development workloads.
A team that wants to link two units. The QSFP dual-link configuration running 235B-parameter models is unique at this price point.

Don't buy it if:

You only run 8B–13B models. An RTX 5090 build is faster and cheaper. A Mac Mini with 32 GB handles these comfortably too.
You need sustained training for hours. Thermal throttling makes marathon training sessions problematic.
You want the fastest 70B+ inference. A Mac Studio M4 Ultra with 192 GB is 3× faster for large-model inference due to bandwidth advantage.
You're on a tight budget. Used multi-GPU builds deliver more raw performance per dollar. See our complete hardware buyer's guide for all options from $600 to $10,000+. See our complete hardware buyer's guide for all options from $600 to $10,000+.

Software Ecosystem: The Underrated Advantage

This is where the DGX Spark pulls ahead of competitors that might beat it on raw specs. NVIDIA's software stack is deeply integrated:

Ollama pre-installed — pull a model, start chatting (see our Ollama vs LM Studio vs llama.cpp comparison)
TensorRT-LLM optimized containers for production-style serving with OpenAI-compatible API
NeMo for enterprise fine-tuning workflows
NGC container catalog — thousands of pre-built AI containers
NVIDIA Sync for remote management from any device
Full Docker GPU passthrough out of the box
DGX Cloud integration — seamlessly scale from desk to cloud

If you're building AI applications that need to deploy on NVIDIA hardware at scale, developing on a Spark means zero friction when moving to production. This is the strategic reason NVIDIA built it — it's a gateway drug to their data center ecosystem. The same Grace Blackwell architecture powers the sim-to-real pipelines now driving the humanoid robot race — developing locally on a Spark and deploying to DGX Cloud is becoming a standard workflow in physical AI labs.

Recommended Complementary Hardware

To get the most from your DGX Spark setup, consider these upgrades:

Fast External Storage

The built-in NVMe is fine for OS and active models, but if you're working with large datasets or want a model library, external NVMe storage makes a big difference.

Samsung T9 Portable SSD (4 TB) — USB 3.2 Gen 2×2 speeds up to 2,000 MB/s. Excellent for a model and dataset library alongside your Spark.

→ Check price on Amazon

WD Black SN850X NVMe SSD (4 TB) — If you're adding internal storage to a partner-version Spark with an available M.2 slot, this is the go-to: 7,300 MB/s sequential read.

→ Check price on Amazon

Monitor

You'll want a quality display for JupyterLab and VS Code sessions when working locally.

Dell UltraSharp U2723QE (27" 4K IPS) — USB-C hub with 90W power delivery, perfect for a clean desk setup. Plug in the Spark via USB-C for display + power in one cable.

→ Check price on Amazon

10 GbE Networking

If you're linking two Sparks or connecting to a NAS for dataset access, 10 GbE makes a real difference.

QNAP QSW-1105-5T (5-Port 2.5 GbE Switch) — Affordable unmanaged switch for a small AI lab setup. Not full 10 GbE, but a significant upgrade from gigabit for most home networks.

→ Check price on Amazon

ASUS XG-C100C (10 GbE PCIe Card) — For your main workstation to connect to the Spark's 10 GbE port at full speed.

→ Check price on Amazon

USB-C Hub / Dock

The Spark has 4 USB-C ports but no USB-A. A quality dock keeps your peripherals organized.

CalDigit TS4 Thunderbolt 4 Dock — 18 ports including USB-A, Ethernet, SD card, and display outputs. The premium choice for a serious workstation setup.

→ Check price on Amazon

*Disclosure: Links above are affiliate links. ToolHalla may earn a commission at no extra cost to you. We only recommend hardware we'd actually use.*

DGX Spark FAQ

Can the DGX Spark run Windows?

No. It runs DGX OS (Ubuntu 24.04) exclusively. There's no Windows driver support for the GB10 superchip.

How loud is it?

Nearly silent at idle. Under heavy inference load, the fan ramps up but stays quieter than a typical desktop PC. Under sustained training load, it gets noticeably louder before thermal throttling kicks in.

Can I upgrade the RAM?

No. The 128 GB LPDDR5x is part of the GB10 superchip package. It's not socketed or upgradeable.

Can I use it as a daily desktop computer?

Yes. DGX OS is full Ubuntu 24.04 with desktop support. Browsing, coding in VS Code, running Docker — it's a very capable ARM Linux workstation. But most users run it headless as a remote AI server.

What's the difference between the Founder's Edition and partner models?

The Founder's Edition has a gold metal chassis, 4 TB SSD, and ships directly from NVIDIA. Partner models (Acer, ASUS, Dell, MSI) typically offer 1 TB storage at lower price points with their own chassis designs. The GB10 superchip and 128 GB memory are identical across all versions.

Can I run it 24/7 as a server?

For inference serving at typical loads, yes. For sustained full-load training, thermal throttling will be an issue. Most users report stable 24/7 operation when the workload is bursty (serving requests, not constant maximum compute).

Is the $4,699 price worth it vs. cloud GPUs?

At current cloud pricing (~$2–3/hour for an A100), the Spark pays for itself in roughly 2,000–2,500 hours of GPU time. If you're running models daily, it breaks even in 6–12 months. Plus: no latency, no data leaving your machine, no surprise bills.

Final Verdict

The NVIDIA DGX Spark is the most interesting piece of AI hardware released in the last two years. Not because it's the fastest — it isn't. Not because it's the cheapest — it isn't that either. It's interesting because it's the first time NVIDIA's full data center software stack runs on something you can hold in one hand.

For ML engineers building on NVIDIA's ecosystem, it's a genuine productivity multiplier. Develop locally, deploy to DGX Cloud or on-prem DGX stations, with zero code changes. That workflow didn't exist at this price point before. Pair it with Nemotron 3 for agentic workloads and you have a complete local-to-cloud development pipeline.

For pure inference performance hunters, the comparison math is clear: Mac Studio M4 Ultra wins on bandwidth-limited large-model speed, RTX 5090 wins on small-model throughput, multi-GPU builds win on raw performance per dollar. The Spark wins on none of these individual metrics — but it wins on the *combination* of memory capacity, form factor, CUDA compatibility, and software ecosystem. If you're running multi-agent orchestration locally, the full CUDA stack matters more than raw tok/s.

The thermal issues are real and documented. The price increase from $3,999 to $4,699 stings. The 273 GB/s memory bandwidth is the fundamental bottleneck that limits large-model performance.

But if you need to run 30B–70B+ models locally, with full CUDA, in a box smaller than a textbook, there's exactly one option in 2026. And that's the DGX Spark.

Rating: 7.8/10 — A compelling local AI workstation for development and inference, held back by thermal throttling and memory bandwidth. Best for NVIDIA-ecosystem developers and privacy-sensitive workloads. Not the right choice for budget builds or sustained training.

*Last updated: March 2026*

FAQ

What is the NVIDIA DGX Spark?

DGX Spark is NVIDIA's $3,000 personal AI supercomputer. It contains a Grace Blackwell Superchip with 128GB unified CPU+GPU memory, 1PetaFLOP of AI performance, and runs fully offline. It's the most powerful single-box local AI system available to consumers.

What LLMs can DGX Spark run?

With 128GB unified memory, DGX Spark runs: Llama 3.3 70B at full BF16 (no quantization), 405B models at Q2-Q3, and multi-model setups with two 30B models simultaneously. It's the only consumer device that runs 70B models without quantization.

How does DGX Spark compare to Mac Studio M4 Max?

Both have 128GB unified memory. DGX Spark: higher GPU throughput (Blackwell architecture, 1 PFLOP AI), better for training and fine-tuning. Mac Studio M4 Max: better software ecosystem (macOS), lower power draw (~150W vs 300W), better value for general use.

Who is DGX Spark for?

DGX Spark targets AI researchers, developers, and enterprises wanting maximum local AI performance without a data center. It's overkill for casual chatbot use but ideal for fine-tuning large models, running inference at production scale, or building AI applications without cloud dependencies.

What is the DGX Spark price?

DGX Spark starts at $2,999 for the 128GB unified memory configuration. Available directly from NVIDIA and select retailers. Power consumption is ~300W peak. Dimensions are compact (roughly Mac Mini-sized) despite the computing power.

Frequently Asked Questions

What is the NVIDIA DGX Spark?

What LLMs can DGX Spark run?

How does DGX Spark compare to Mac Studio M4 Max?

Who is DGX Spark for?

What is the DGX Spark price?

DGX Spark starts at $2,999 for the 128GB unified memory configuration. Available directly from NVIDIA and select retailers. Power consumption is 300W peak. Dimensions are compact (roughly Mac Mini-sized) despite the computing power.

🔧 Tools in This Article

Make (Integromat)

Hugging Face

LM Studio

Ollama

Related Guides

All guides →

Guide

Best Hardware for Local LLMs in 2026: 5 Platforms Compared (From $500)

Choosing hardware for local AI in 2026 involves five platforms, each with unique strengths and tradeoffs.

15 min read

Guide

Best GPU for AI in 2026: Every Budget From $300 to $2,000

Choosing a GPU for local AI? We compare RTX 3090, 4090, 5090, 5080, and Mac Studio on VRAM, speed, and price — with clear buying recommendations for every budget.

8 min read

Guide

Asia's Physical AI Offensive: XPeng, LG, and the Factory Race

Meta Title: Asia's Physical AI Offensive: XPeng, LG, AgiBot Lead the Robot Factory Race (2026)

6 min read

#nvidia#dgx-spark#ai-hardware#local-ai#grace-blackwell#buying-guide