GPU Infrastructure: The 1M+ GPU Deployment Powering US AI

The United States datacenter industry has deployed over 1 million high-performance GPUs as of 2025, representing the largest concentration of AI computing infrastructure in history. This unprecedented buildout, driven by the explosive growth of large language models and generative AI, has transformed NVIDIA from a graphics card manufacturer into the world’s most valuable semiconductor company and created entirely new categories of specialized AI infrastructure operators.

Executive Summary

Total Deployment: 1M+ GPUs across US datacenters (conservative estimate based on disclosed projects)
Market Leader: NVIDIA commands 95%+ market share for AI training workloads
Major Deployments: xAI Colossus (230K GPUs), CoreWeave (250K+ GPUs), Meta Prometheus (500K+ planned)
Current Generation: NVIDIA H100/H200 dominating deployments; Blackwell B200/B300 ramping production
Power Evolution: 400W (A100) → 700W (H100) → 1,400W (B300) requiring liquid cooling revolution
Supply Constraints: 6-12 month lead times for latest GPUs; allocation strategies determine competitive advantage
Alternative Suppliers: AMD MI300X gaining traction; custom silicon (Google TPU, AWS Trainium) for specific workloads
Future Roadmap: NVIDIA GB300, AMD MI350, custom accelerators pushing toward 2,000W per chip

This page documents the GPU infrastructure underpinning the AI revolution, with detailed specifications, deployment configurations, and strategic implications for datacenter operators.

GPU Market Evolution (2020-2025)

The AI Infrastructure Inflection Point

The GPU datacenter market underwent a fundamental transformation between 2020 and 2025:

2020-2021: Pre-Transformer Dominance

NVIDIA A100 launched (May 2020): 400W TDP, 40/80GB memory
Primary workloads: Computer vision, reinforcement learning, scientific computing
Deployment scale: Hundreds to low thousands of GPUs per cluster
Market: Dominated by cloud hyperscalers (AWS, Azure, GCP) and research institutions

2022: GPT-3 and Scale Realization

OpenAI’s GPT-3 training demonstrated value of massive-scale models
NVIDIA H100 announcement (March 2022): 700W TDP, 80GB HBM3, 3x AI performance vs A100
Early adopters (CoreWeave, Lambda Labs) invest heavily in H100 allocations
GPU supply becomes strategic competitive advantage

2023: ChatGPT and Demand Explosion

ChatGPT launch (November 2022) triggers unprecedented AI infrastructure demand
H100 lead times extend to 6-12 months; allocations selling at premium on secondary market
Specialized AI cloud providers (CoreWeave, Lambda Labs, Crusoe) emerge as alternatives to hyperscalers
NVIDIA market cap surpasses $1 trillion (first semiconductor company to achieve this)

2024-2025: The GPU Infrastructure Arms Race

xAI Colossus: 100,000 H100 GPUs deployed in 122 days (September 2024)
Meta announces Prometheus: 500,000+ GPUs for AGI research (2026)
NVIDIA Blackwell (B200/B300) architecture launch: 5x performance improvement over Hopper
Total US GPU deployments exceed 1 million units
AMD MI300X emerges as viable alternative; hyperscalers develop custom AI chips

NVIDIA’s Dominance

NVIDIA’s competitive advantages have created near-monopoly in AI training:

Technology Leadership:

5+ year architectural lead over competitors (Transformer Engine, NVLink, Tensor Cores)
CUDA software ecosystem creates switching costs
Continuous performance improvements (2x per generation)

Ecosystem Lock-In:

PyTorch and TensorFlow optimized for NVIDIA CUDA
Largest library of pre-trained models and frameworks
Developer familiarity and tooling maturity

Supply Chain Control:

Strategic allocation of GPUs to key customers (OpenAI, Microsoft, Meta)
Preferential access for cloud partners (CoreWeave, Lambda Labs)
Long-term supply agreements enable capacity planning

Market Share (AI training workloads, 2024):

NVIDIA: 95%+
AMD: 3-4% (growing with MI300X)
Custom silicon (Google TPU, AWS Trainium): 1-2%

GPU Specifications: Generation Comparison

NVIDIA AI GPU Portfolio

Model	Launch	Memory	Memory BW	TDP (SXM)	FP16	FP8	Price	Availability
A100	2020-05	40/80GB HBM2e	1.6/2.0 TB/s	400W	312 TFLOPS	—	$10K-15K	Mature supply
H100	2022-09	80GB HBM3	3.0 TB/s	700W	1,000 TFLOPS	2,000 TFLOPS	$25K-40K	Good supply
H200	2024-03	141GB HBM3e	4.8 TB/s	700W	1,000 TFLOPS	2,000 TFLOPS	$30K-45K	Limited supply
B200	2025-Q1	192GB HBM3e	7.7 TB/s	1,000W	—	9,000 TFLOPS (FP4)	$35K-50K	Ramping
B300	2025-Q2	288GB HBM3e	Enhanced	1,400W	—	14,000 TFLOPS (FP4)	$50K-70K	Initial production

Detailed Specifications

NVIDIA A100: The Workorse (2020-2023)

Architecture: Ampere Target: General-purpose AI training and inference

Specifications:

Memory: 40GB or 80GB HBM2e
Memory Bandwidth: 1.6 TB/s (40GB), 2.0 TB/s (80GB)
TDP: 250W (PCIe), 400W (SXM4)
FP64: 19.5 TFLOPS (SXM4)
FP16/BF16: 312 TFLOPS (with sparsity)
INT8: 624 TOPS
Interconnect: NVLink 3.0 (600 GB/s per GPU)
Form Factors: PCIe Gen4, SXM4

Key Features:

Third-generation Tensor Cores
Multi-Instance GPU (MIG) technology (partition into 7 instances)
Structural sparsity support (2x performance for eligible workloads)

Deployment Status: Mature; widely available; primary GPU for 2020-2023 AI infrastructure Major Deployments: Meta RSC (16,000 GPUs), early CoreWeave clusters, AWS P4d instances

NVIDIA H100: Current Workhorse (2023-2025)

Architecture: Hopper Target: Large language model training, generative AI

Specifications:

Memory: 80GB HBM3
Memory Bandwidth: 3.0 TB/s (60% increase vs A100)
TDP: 300-350W (PCIe), 400W (NVL), 700W (SXM5)
FP64: 60 TFLOPS
FP16/BF16: 1,000 TFLOPS (Tensor Cores)
FP8: 2,000 TFLOPS (new precision)
INT8: 4,000 TOPS
Interconnect: NVLink 4.0 (900 GB/s per GPU, 3.6 TB/s bisectional in 8-GPU systems via NVSwitch)
Form Factors: PCIe Gen5, NVL (air-cooled 2U server), SXM5 (highest performance)

Key Features:

Transformer Engine: Hardware-accelerated FP8 for transformer models (2x throughput vs H100 FP16)
Fourth-generation Tensor Cores: Support for FP8, FP16, BF16, INT8
DPX Instructions: Dynamic programming acceleration
Confidential Computing: Hardware-based security for multi-tenant workloads

Performance vs A100:

LLM Training: 3-4x faster (FP8 Transformer Engine)
LLM Inference: 6x faster with larger batch sizes
HPC (FP64): 3x faster

Deployment Status: Primary deployment GPU for 2023-2025; strong supply Major Deployments:

xAI Colossus: 150,000 H100 GPUs (largest single deployment)
CoreWeave: 16,384 H100 SXM5 under single InfiniBand fabric
Lambda Labs: Thousands of H100s in 1-Click Clusters
Microsoft Azure ND H100 v5: 8x H100 per VM with 3.2 Tbps InfiniBand
Applied Digital Ellendale: 50,000 H100 SXM capacity
AWS P5 instances: 8x H100 per instance

NVIDIA H200: Enhanced Hopper (2024-2025)

Architecture: Hopper (enhanced) Target: LLM inference and training with larger context windows

Specifications:

Memory: 141GB HBM3e (76% increase vs H100)
Memory Bandwidth: 4.8 TB/s (43% increase vs H100)
TDP: 700W (SXM)
Compute: Same as H100 (1,000 TFLOPS FP16, 2,000 TFLOPS FP8)
Interconnect: NVLink 4.0 (same as H100)

Key Advantages Over H100:

76% more memory: Enables larger models or bigger batch sizes
43% more bandwidth: Reduces memory-bound workload bottlenecks
30-35% better LLM inference: Particularly for long-context models (LLAMA 3.1 405B)

Performance Comparison (LLAMA 3.1 405B inference):

H200: 35% higher throughput vs H100
Critical for models approaching context limits on H100

Deployment Status: Ramping production (Q4 2024 - Q1 2025) Major Deployments:

CoreWeave: 42,000 H200 GPUs, first cloud provider to deploy (August 2024)
xAI Colossus: 50,000 H200 GPUs
Microsoft Azure ND H200 v5: GA October 2024
Lambda Labs: H200 available in 1-Click Clusters (16-512 GPUs)
AWS P5e instances: H200 in EC2 UltraClusters (up to 20,000 GPUs)

NVIDIA B200: Blackwell Architecture (2025)

Architecture: Blackwell Target: Next-generation LLM training and inference

Specifications:

Memory: 180-192GB HBM3e
Memory Bandwidth: 7.7 TB/s (60% increase vs H200)
TDP: 1,000W (rated), ~600W typical sustained
FP64: 37 TFLOPS
FP4: 9,000 TFLOPS (dense, new ultra-low precision)
FP8: 4,500 TFLOPS
Interconnect: Fifth-generation NVLink (1.8 TB/s per GPU)

Key Features:

Second-generation Transformer Engine: FP4 precision support
208 billion transistors: 2x vs H100
Dual-die design: Two GPU dies connected by high-speed interconnect
Enhanced Tensor Cores: FP4, FP6, FP8, FP16, BF16, INT8 support

Performance vs H100:

LLM Training: 2.5x faster (FP8)
LLM Inference: 5x faster (FP4 with acceptable accuracy)
Memory capacity: 2.4x larger (192GB vs 80GB)

Deployment Status: Early production (Q1-Q2 2025), ramping through 2025 Major Deployments: Lambda Labs B200 clusters announced; CoreWeave reservations

NVIDIA B300: Blackwell Ultra (2025)

Architecture: Blackwell Ultra Target: Inference-optimized, highest-density training

Specifications:

Memory: 288GB HBM3e (50% more than B200)
Memory Bandwidth: Enhanced over B200
TDP: 1,400W (highest TDP GPU ever produced)
FP4: 14,000 TFLOPS (dense, 55.6% faster than B200)
FP64: 1.25 TFLOPS (optimized for inference, not HPC)
HBM Stacks: 12-high (vs 8-high for B200)

Key Features:

Inference-Optimized: Lower FP64 performance, higher FP4/FP8
Massive Memory: 288GB enables largest models or highest batch sizes
High-Density Training: LLMs, diffusion models

Performance vs B200:

FP4 Inference: 55.6% faster
Memory: 50% more capacity

Deployment Status: Initial production (Q2 2025) Major Deployments:

CoreWeave: First hyperscaler to deploy GB300 NVL72 platform (announced)

NVIDIA GB200 NVL72: Rack-Scale System (2025)

Architecture: Grace Blackwell Superchip (rack-scale) Target: Largest LLM training clusters

Configuration:

GPUs: 72x Blackwell B200 GPUs per rack
CPUs: 36x NVIDIA Grace ARM CPUs per rack
GPU Memory: 13.5TB total per rack
Performance: 1.44 exaflops per rack
Power: 120 kW per rack
Form Factor: Liquid-cooled rack-scale system (18 nodes)

Architecture:

Each node: 2x Grace CPUs + 4x B200 GPUs
18 nodes per rack (must deploy in multiples of 18)
NVLink Switch interconnects all 72 GPUs
Fifth-generation NVLink provides 130 TB/s aggregate bandwidth

Deployment Constraint: Must deploy full 18-node racks (not server-by-server)

Deployment Status: Initial deployments (Q1-Q2 2025)

NVIDIA GB300 NVL72: Ultimate Rack-Scale (2025)

Architecture: Grace Blackwell Ultra Superchip (rack-scale) Target: Highest-performance AI training

Configuration:

GPUs: 72x Blackwell Ultra B300 GPUs per rack
CPUs: 36x NVIDIA Grace ARM CPUs per rack
DPUs: 18x NVIDIA BlueField-3 DPUs per rack
GPU Memory: 21TB total per rack (1.5x vs GB200)
Performance: 1.1 exaflops FP4, 10x user responsiveness, 5x throughput/watt vs Hopper, 50x reasoning inference
Power: ~140 kW per rack
Form Factor: Liquid-cooled rack-scale system

Key Advantages:

50% more memory per rack vs GB200 (21TB vs 13.5TB)
Inference-optimized: 10x responsiveness, 50x reasoning vs Hopper
Integrated networking: BlueField-3 DPUs provide network offload

Deployment Status: Initial production (Q2-Q3 2025) Major Deployments: CoreWeave (first deployment), Dell, Switch, Vertiv partnerships

Cooling Requirement: Vertiv CDU 121 (121 kW capacity) optimized for GB300

AMD Competition: MI300X

Architecture: CDNA 3 Launch: December 2023 Target: NVIDIA H100 alternative for LLM workloads

Specifications:

Memory: 192GB HBM3 (2.4x more than H100)
Memory Bandwidth: 5.3 TB/s (77% more than H100)
TDP: 750W
FP16: 1,300 TFLOPS (Tensor operations)
FP8: 2,600 TFLOPS
INT8: 5,200 TOPS
Interconnect: AMD Infinity Fabric (proprietary)

Key Advantages:

192GB memory: Largest GPU memory available (2.4x H100, 1.36x H200)
Memory bandwidth: 5.3 TB/s (exceeds H100, approaches H200)
Cost: Typically 20-30% less expensive than H100
Open software: ROCm platform (CUDA alternative)

Challenges:

Software ecosystem: ROCm maturity lags CUDA
Framework support: PyTorch/TensorFlow optimization ongoing
Developer familiarity: Smaller community vs CUDA

Performance (LLM inference, vendor claims):

1.2-1.6x faster than H100 for large models (benefits from 192GB memory)
Competitive with H100 for training (within 10-20%)

Deployment Status: Early production deployments (Q4 2024 - Q1 2025) Major Deployments:

Crusoe Energy: $400M order for thousands of MI300X accelerators
Oracle Cloud Infrastructure: MI300X instances
Microsoft Azure: ND MI300X v5 series (announced)

Market Impact: AMD MI300X represents first credible NVIDIA alternative for AI training, with 192GB memory advantage compelling for largest models.

Google TPU: Custom AI Accelerator

Google’s Tensor Processing Units (TPUs) represent an alternative architecture optimized for TensorFlow workloads:

TPU v5e (Cloud TPU)

Cost-Efficient: $1.20/chip-hour
Performance: 2.5x throughput/dollar vs TPU v4
Configuration: Pods up to 256 chips, 400 Tbps aggregate bandwidth, 100 petaOps INT8

TPU v5p (High-Performance)

Most Powerful TPU: 8,960 chips per pod
Topology: 3D torus with 4,800 Gbps/chip interconnect
Performance: 2.8x faster LLM training vs TPU v4

TPU v6 Trillium (Latest Generation)

Performance: 4.7x peak compute per chip vs v5e
Memory: 2x HBM memory, 2x internal bandwidth, 2x chip-to-chip interconnect
Efficiency: 67% more energy efficient than v5e
Scale: 91 exaflops in single cluster

TPU v7 Ironwood (Inference-Optimized)

Scale: 9,216 liquid-cooled chips
Power: ~10 MW for full system
Target: “Age of Inference” - first Google TPU optimized for serving models
Configurations: 256-chip and 9,216-chip pods

Google Cloud Strategy: TPUs for Google workloads and TensorFlow users; NVIDIA A3 VMs (H100) for PyTorch and broader ecosystem.

AWS Custom Silicon

Amazon Web Services developed custom AI accelerators to reduce NVIDIA dependency:

AWS Trainium (Training)

Target: 100B+ parameter model training
Configuration: Trn1.32xlarge with 16 Trainium accelerators, 512GB memory
Cost: 50% lower cost-to-train vs comparable EC2 GPU instances
Frameworks: PyTorch, TensorFlow via AWS Neuron SDK

AWS Inferentia2 (Inference)

Target: High-throughput LLM inference
Configuration: Inf2.48xlarge with 12 Inferentia2 accelerators, 384GB memory
Performance: 4x throughput, 10x lower latency vs Inferentia v1
Capability: Deploy 175B parameter model in single instance

AWS Strategy: Custom silicon for cost-sensitive workloads; NVIDIA GPUs (P5/P5e with H100/H200) for performance-critical and framework-flexible deployments.

Major GPU Deployments

xAI Colossus: World’s Largest AI Supercomputer

Overview: Elon Musk’s xAI company deployed the world’s largest AI training cluster in Memphis, Tennessee.

Location: Memphis, Tennessee (converted Electrolux factory, 785,000 sq ft)

GPU Inventory:

Phase	Date	GPU Type	GPU Count	Total
Phase 1	Sep 2024	H100	100,000	100,000
Expansion	Dec 2024	H100	50,000	150,000
Expansion	Mar 2025	H200	50,000	200,000
Expansion	Jun 2025	GB200	30,000	230,000
Current Total	Jun 2025	Mixed	—	230,000

Future Plans:

Second Memphis facility: 110,000 GB200 GPUs
Total target: 1,000,000 GPUs across multiple facilities

Infrastructure:

Power: 300 MW (150 MW utility + 150 MW Tesla Megapack battery backup)
Networking: NVIDIA Spectrum-X Ethernet with RDMA (not InfiniBand), single fabric for 100K H100s
Switches: NVIDIA Spectrum SN5600 (800 Gb/s), BlueField-3 SuperNICs
Cooling: Hybrid air and liquid cooling (Supermicro infrastructure)
Construction Time: 122 days from groundbreaking to Phase 1 operational (infrastructure speed record)

Technology Partners:

NVIDIA (GPUs, networking)
Supermicro (servers, liquid cooling)
Tesla (Megapack batteries)

Deployment Significance:

Largest single AI cluster deployed (100K GPUs on single fabric)
Fastest infrastructure deployment in datacenter history (122 days)
Demonstrated viability of Ethernet RDMA for massive GPU clusters (alternative to InfiniBand)
Proved existing industrial buildings can host AI supercomputers (not requiring purpose-built facilities)

Workloads: Training Grok AI models (xAI’s ChatGPT competitor)

CoreWeave: 250K+ GPU Cloud Fleet

Overview: Specialized GPU cloud provider operating largest independent GPU fleet.

Total GPU Inventory: 250,000 GPUs (end of 2024)

GPU Breakdown:

GPU Type	Count	Configuration	Notes
H100 SXM5	16,384+	Single InfiniBand fat-tree fabric	Trained GPT-3 in under 11 minutes with 3,500 H100s
H200	42,000	Clusters up to 42,000 GPUs	First cloud provider to deploy H200 (August 2024)
GB200 NVL72	TBD	72 GPUs per rack, beginning deployment	1.44 exaflops per rack
GB300 NVL72	TBD	First hyperscaler to deploy	72 Blackwell Ultra GPUs, ~140 kW/rack

Infrastructure:

Facilities: 33 operational data centers (US + Europe)
Active Power: 420 MW
Contracted Power: 2,200 MW (pipeline)
Cooling: 100% liquid cooling for all new facilities from 2025 onwards (~130 kW racks)
Networking: NVIDIA Quantum-2 InfiniBand 400Gb/s (3.2 Tbps per VM), BlueField-3 DPUs

Technology Strategy:

Priority access to latest NVIDIA GPUs (Elite Cloud Service Provider)
Purpose-built for AI: Kubernetes-native bare-metal architecture
35x faster instance spin-up vs traditional clouds
80% cost advantage vs hyperscalers for GPU workloads

Major Customers:

Microsoft (62% of 2024 revenue)
OpenAI ($22.4B total contracts)
Meta, Cohere, Stability AI, IBM

Geographic Footprint:

US: Pennsylvania, New Jersey, Indiana, Illinois, Georgia, Ohio, Nevada, Washington, Oregon, Texas, Virginia, New York
Europe: UK (London, Crawley), Norway, Sweden, Spain

Growth Trajectory:

2024: 28 data centers globally
2025: 38 data centers target
Revenue: $1.92B (2024), 737% YoY growth

Competitive Position: Largest specialized AI cloud, positioned between hyperscalers and smaller GPU providers.

Meta Platforms: The AGI Research Buildout

Overview: Meta is building massive AI infrastructure for Llama model training and AGI research.

GPU Deployments:

AI Research SuperCluster (RSC) - 2022

GPUs: 16,000 NVIDIA A100
Systems: 760 DGX A100 systems
Performance: 1,895 petaflops (TF32)
Networking: NVIDIA Quantum 200Gb/s InfiniBand
Storage: 185PB all-flash (Pure Storage), 46PB cache, 16TB/s training data throughput
Status: Operational, one of world’s fastest supercomputers at launch

24K GPU Clusters (Two Variants) - 2024

Total GPUs: 49,152 H100 (two clusters of 24,576 each)
Platform: Grand Teton (OCP open hardware), YV3 Sierra Point servers
Networking: Two different architectures tested
- Cluster 1: RoCE (RDMA over Converged Ethernet) at 400Gbps
- Cluster 2: NVIDIA Quantum2 InfiniBand
Storage: Tectonic distributed storage with Linux Filesystem in Userspace API
Purpose: Compare RoCE vs InfiniBand at scale; train Llama 3
Announcement: March 2024

Prometheus AI Cluster - 2026 (Planned)

Total GPUs: ~500,000 GPUs (alternative estimate: 1.3M H100-equivalent)
GPU Mix: NVIDIA Blackwell, AMD MI300X, Meta MTIA custom chips
Location: New Albany, Ohio
Power: 1,020 MW (1+ GW)
Performance: 2+ exaflops mixed-precision, 3.2 trillion TFLOPS
Rack Design: Catalina high-power AI racks (~140 kW per rack, air-assisted liquid cooling)
Power Generation: Two 200MW on-site natural gas plants
Networking: Arista 7808 switches with Broadcom Jericho and Ramon ASICs
Deployment: Multiple datacenter buildings + colocation + temporary weather-proof tents
Expected Launch: 2026
Purpose: Llama 4 training and AGI research

Future Plans:

Hyperion: 5+ GW multi-year development following Prometheus

Technology Approach:

Open Compute Project (OCP): Open-source hardware designs
Multi-vendor GPU strategy: NVIDIA, AMD, custom MTIA chips
Network architecture experiments: Testing RoCE and InfiniBand at scale
Cooling innovation: Catalina racks with air-assisted liquid cooling

Strategic Significance:

Largest disclosed corporate AI infrastructure buildout (500K+ GPUs)
Multi-vendor approach reduces NVIDIA dependency
OCP contributions democratize AI infrastructure design
On-site power generation addresses grid capacity constraints

Applied Digital: Ellendale HPC Campus

Overview: Purpose-built HPC datacenter in North Dakota optimized for AI training.

Location: Ellendale, North Dakota (Polaris Forge 1)

GPU Capacity: Nearly 50,000 H100 SXM-class GPUs in single parallel compute cluster

Infrastructure:

Facility Size: 342,000 sq ft, multi-story design
Power: 180 MW initial, 400 MW campus potential, 1+ GW under study
GPU Type: H100 SXM (current), expandable to future generations
Cooling: Closed-loop, waterless, direct-to-chip liquid cooling
Climate Advantage: North Dakota cold climate reduces cooling power
Status: Energized December 2024

Technology Partners:

Supermicro (GPU servers)
NVIDIA (Preferred Cloud Partner)

Business Model:

15-year lease to CoreWeave: $7 billion total revenue
Additional capacity: 250 MW CoreWeave lease commitment

Innovation:

Zero water consumption: Dry coolers eliminate evaporative cooling
Multi-story design: High-density vertical infrastructure
Single cluster: All 50K GPUs interconnected for parallel training

Competitive Advantage: Waterless cooling addresses environmental concerns; North Dakota power costs and climate enable economics.

Lambda Labs: Gigawatt-Scale GPU Cloud

Overview: GPU cloud platform targeting AI training and inference market.

GPU Offerings:

H100 SXM: 8x H100 instances at $2.59/hr/GPU
H100 Clusters: 16-512 interconnected H100 GPUs (expandable to 64-2,040+ GPUs)
H200: Available in 1-Click Clusters (16-512 GPUs minimum)
B200: Available in 1-Click Clusters
A100: 80GB at $1.79/hr (8-GPU nodes), 40GB at $1.29/hr

Networking: NVIDIA Quantum-2 InfiniBand 400Gb/s for large clusters (non-blocking)

Infrastructure:

Facilities: Multiple US locations (Texas, California)
Power: Gigawatt-scale capacity
Cooling: Liquid-cooled infrastructure for highest-density GPUs (~130 kW racks)

Major Projects:

Dallas-Fort Worth DFW-04

Location: Plano, Texas
Facility Size: 425,500 sq ft (39,500 sqm)
Partner: Aligned Data Centers
Technology: Liquid-cooled infrastructure for highest-density GPUs
Construction: October 2025 - October 2026

Mountain View MV1

Location: Mountain View, California
Partner: ECL
Power Source: Hydrogen fuel cells (off-grid)
Status: Operational (September 2024)

TerraSite-TX1

Location: Houston, Texas
Power: 50 MW initial, scalable to 1,000 MW (1 GW)
Campus: 600 acres
Partner: ECL
Power Source: Hydrogen
Status: First 50MW coming online summer 2025
Significance: Gigawatt-scale campus powered entirely by hydrogen

Competitive Position: Third-largest independent GPU cloud (behind CoreWeave and Crusoe), focused on developer-friendly 1-Click Clusters.

Crusoe Energy: 100K GPU Building Capacity

Overview: Energy-optimized AI infrastructure using stranded and renewable energy.

GPU Strategy:

Per-Building Capacity: Up to 100,000 GPUs on single integrated network fabric
AMD Order: $400M investment in thousands of AMD Instinct MI300X accelerators
Deployment: Across US facilities with sustainable energy sources

Major Project: Stargate Abilene (Project Polaris)

Location: Abilene, Texas (Lancium Clean Campus)
Power: 1.2 GW (initial), scalable to 1.8 GW
GPU Capacity: 64,000+ NVIDIA GB200 Blackwell GPUs by end 2026
Buildings: 8-building campus, 4 million sq ft
Power Generation: 360 MW on-site natural gas turbines + 1.2 GW ERCOT grid connection (60%+ renewable)
Cooling: Closed-loop liquid cooling, zero-water evaporation
Customer: Oracle (15-year lease), OpenAI (primary tenant)
Investment: $40B (Oracle GPU procurement + infrastructure)

Technology Approach:

Behind-the-meter power generation (natural gas + renewables)
Advanced emissions controls (SCR technology, 90% lower emissions)
Battery energy storage systems (BESS) to capture excess renewable energy
Modular datacenter design (rapid deployment)

Competitive Advantage: Energy-first approach solves power availability constraint; operational Stargate site validates model.

Microsoft Azure: Hyperscale AI Cloud

GPU Offerings:

ND H100 v5 Series

GPUs: 8x H100 80GB per VM
GPU Memory: 640GB total
Networking: 3.2 Tbps Quantum-2 InfiniBand per VM (dedicated 400Gb/s per GPU)
CPU: 96 physical cores (4th Gen Intel Xeon Scalable)
NVLink: 3.6 TB/s bisectional between 8 local GPUs
Target: High-end deep learning training and tightly coupled Gen AI
Status: Generally Available (2023)

ND H200 v5 Series

GPUs: 8x H200 per VM
GPU Memory: 1,128GB total (141GB per H200)
Performance: 35% throughput increase over H100 for LLAMA 3.1 405B inference
CPU: AMD EPYC Genoa (variants)
Networking: Same as ND H100 v5 (3.2 Tbps InfiniBand)
Status: Generally Available (October 2024)

Future: NVIDIA Blackwell Ultra

GPUs: NVIDIA Blackwell Ultra-based VMs planned for later 2025

Scale: Can scale to thousands of GPUs with Quantum-2 InfiniBand fabric

Operating Systems: Ubuntu 20.04/22.04, RHEL 7.9/8.7/9.3, AlmaLinux 8.8/9.2, SLES 15

Competitive Position: Hyperscaler with both NVIDIA GPUs and custom infrastructure; partnership with OpenAI drives GPU procurement.

Amazon Web Services: EC2 GPU Instances and UltraClusters

GPU Offerings:

EC2 P5 Instances (H100)

GPUs: 8x H100 80GB per instance
GPU Memory: 640GB total
Networking: 3,200 Gbps Elastic Fabric Adapter (EFA) Gen2
NVSwitch: 900 GB/s per GPU (3.6 TB/s bisectional per instance)
CPU: 3rd Gen AMD EPYC
System Memory: 2TB
Local Storage: 30TB NVMe
Performance: 6x faster time to solution, 40% lower cost vs previous gen (P4d)
Status: Generally Available (July 2023)

EC2 P5e Instances (H200)

GPUs: NVIDIA H200
Deployment: EC2 UltraClusters up to 20,000 H100/H200 GPUs
Status: Generally Available (September 2024)

EC2 UltraClusters

Scale: Up to 20,000 H100/H200 GPUs per cluster
Networking: Petabit-scale non-blocking network
Purpose: Largest distributed training workloads

Custom Silicon (alternative to NVIDIA):

Trainium (training): Trn1.32xlarge with 16 Trainium accelerators, 50% cost savings
Inferentia2 (inference): Inf2.48xlarge with 12 Inferentia2 accelerators, 4x throughput vs v1

Competitive Position: Hyperscaler with both NVIDIA GPUs and custom alternatives (Trainium/Inferentia) for cost-sensitive workloads.

Google Cloud: A3 VMs and TPU Integration

GPU Offerings:

A3 VMs (H100)

GPUs: NVIDIA H100
Deployment: Delivered as “GPU Supercomputer” with optimized networking
Status: Generally Available

Future: NVIDIA B200

GPUs: Google announced hosting NVIDIA B200 GPUs and specialized DGX boxes with Blackwell

TPU Strategy (Alternative to GPUs):

TPU v5e, v5p, v6 Trillium, v7 Ironwood (see Custom Silicon section)
TPUs for Google workloads and TensorFlow optimization
GPUs for PyTorch and broader ecosystem compatibility

Competitive Position: Unique dual-strategy with custom TPUs (optimized for TensorFlow/JAX) and NVIDIA GPUs (PyTorch/broad compatibility).

Oracle Cloud Infrastructure: Bare Metal GPU at Scale

GPU Offerings:

BM.GPU.H100.8 (Bare Metal)

GPUs: 8x H100 80GB per bare metal instance
NVLink: 3.2 TB/s bisectional bandwidth via NVSwitch and NVLink 4.0
CPU: 4th Gen Intel Xeon (112 cores)
System Memory: 2TB
Storage: 16x 3.84TB NVMe drives
Status: Generally Available (September 2023)

OCI Supercluster

Scale: Up to 16,384 H100 GPUs in single cluster
Networking: Ultra-low-latency, scale from single node to tens of thousands of GPUs
Architecture: Bare metal (not virtualized) for maximum performance

Future GPUs

H200: Announced for upcoming availability
Blackwell: NVIDIA Blackwell GPUs announced for future deployment

Performance Claim: 30x better AI inference, 4x better training vs A100

Competitive Position: Bare metal GPU instances eliminate hypervisor overhead; Supercluster architecture enables massive scale. Strategic partnership with NVIDIA.

Power Requirements and Infrastructure

Power Consumption by GPU Generation

GPU Model	TDP (Watts)	8-GPU Server (Watts)	Rack (5-6 servers, kW)	Cooling Requirement
NVIDIA A100	400W	3,200W	15-20 kW	Air cooling possible
NVIDIA H100	700W	5,600W	30-40 kW	Liquid cooling preferred
NVIDIA H200	700W	5,600W	30-40 kW	Liquid cooling required
NVIDIA B200	1,000W	8,000W	50-60 kW	Liquid cooling required
NVIDIA B300	1,400W	11,200W	70-80 kW	Liquid cooling mandatory
GB200 NVL72	—	—	120 kW	Rack-scale liquid cooling
GB300 NVL72	—	—	140 kW	Rack-scale liquid cooling

Rack Density Evolution

Traditional Datacenters (pre-AI):

Power Density: 5-15 kW per rack
Cooling: Air cooling with hot/cold aisle containment
Footprint: Standard 42U racks with 15-20% utilization

Early AI Infrastructure (A100 era, 2020-2023):

Power Density: 15-30 kW per rack
Cooling: Optimized air cooling, rear-door heat exchangers
Challenge: Pushing limits of air cooling

Modern AI Infrastructure (H100/H200 era, 2023-2025):

Power Density: 100-140 kW per rack (standard)
Cooling: Direct liquid cooling mandatory (cold plates, CDUs)
Vendors: Vertiv CoolChip, Supermicro DLC-2, HPE Cray, Lenovo Neptune, Dell

Next-Generation AI (Blackwell B300/GB300, 2025-2026):

Power Density: 140-200 kW per rack
Cooling: Rack-scale liquid cooling (GB300 NVL72 requires Vertiv CDU 121 at 121 kW)
Challenge: Approaching limits of single-phase direct liquid cooling

Future Vision (2027+):

Power Density: 200-300+ kW per rack
Cooling: Immersion cooling (single-phase or two-phase), on-chip microfluidics
Implications: Complete redesign of datacenter power delivery and thermal management

Cooling Infrastructure Requirements

Direct-to-Chip Liquid Cooling (Current Standard)

Architecture:

Cold Plates: Mounted directly on GPUs, CPUs, memory, VRMs with microchannel heat transfer
Coolant Distribution Units (CDUs): Separate facility chilled water (primary) from server coolant (secondary)
Manifolds: Distribute coolant to multiple servers in rack or row
Heat Rejection: Transfer heat to facility chilled water, cooling towers, or dry coolers

Heat Capture Efficiency:

70-80% typical (Dell, HPE)
98% advanced (Supermicro DLC-2)
Remaining heat cooled by low-volume air

CDU Capacity Requirements:

100 kW CDU: 1-2 AI racks (Vertiv CoolChip CDU 100)
121 kW CDU: 1x GB300 NVL72 rack (Vertiv CoolChip CDU 121)
350 kW CDU: Retrofit applications, liquid-to-air (Vertiv CoolChip CDU 350)
600 kW CDU: Row-level cooling (Vertiv CoolChip CDU 600)
2.3 MW CDU: Building-level cooling (Vertiv CoolChip CDU 2300)

Inlet Water Temperature: Up to 45°C for advanced systems (Supermicro), enables free cooling and district heating integration

Major Deployments:

CoreWeave: 100% liquid-cooled infrastructure for 130-140 kW racks
Meta Prometheus: Catalina high-power racks (~140 kW) with air-assisted liquid cooling
xAI Colossus: 100,000 H100 GPUs with Supermicro liquid cooling
Applied Digital Ellendale: Waterless closed-loop liquid cooling for 50,000 H100 GPUs

Infrastructure Cost per GPU

Capital Expenditure (rough estimates):

Component	A100 Era	H100/H200 Era	Blackwell Era
GPU Hardware	$10K-15K	$25K-40K	$35K-70K
Server (8 GPUs)	$120K-150K	$250K-400K	$350K-600K
Networking (per GPU)	$2K-5K	$5K-10K	$10K-15K
Cooling Infrastructure	$500-1K	$2K-5K	$5K-10K
Power Infrastructure	$1K-2K	$3K-5K	$5K-8K
Facility Overhead	$500-1K	$1K-2K	$2K-4K
Total per GPU	$14K-24K	$36K-62K	$57K-107K

Operational Expenditure (annual per GPU):

Component	A100 Era	H100/H200 Era	Blackwell Era
Power (@ $0.10/kWh, 80% utilization)	$280	$490	$980
Cooling (additional power)	$70	$120	$240
Maintenance	$500-1K	$1K-2K	$2K-3K
Facility Overhead	$200-500	$500-1K	$1K-2K
Total Annual per GPU	$1K-2K	$2K-4K	$4K-7K

Note: Prices highly variable based on supplier, allocation, volume, location, and market conditions. H100 secondary market reached 2-3x list price during peak shortage (2023).

Networking Requirements for GPU Clusters

Network Architecture for AI Training

Large-scale GPU training requires high-bandwidth, low-latency networking between GPUs:

Intra-Server GPU Communication:

NVLink: Direct GPU-to-GPU communication within server
NVSwitch: All-to-all connectivity for 8-GPU servers
Bandwidth: 900 GB/s per GPU (H100/H200), 3.6 TB/s bisectional for 8 GPUs

Inter-Server GPU Communication:

InfiniBand: Traditional fabric for HPC and AI
Ethernet with RDMA: Emerging alternative (xAI Colossus proves viability)

NVIDIA Quantum-2 InfiniBand

Specification:

Generation: 7th generation InfiniBand (NDR)
Speed: 400 Gb/s per port
Switch Configuration: 64x 400Gb/s ports or 128x 200Gb/s ports (32 OSFP connectors)
Form Factor: 1U switch (air-cooled and liquid-cooled variants)
Throughput: 51.2 Tb/s bidirectional aggregated
Packet Rate: 66.5 billion packets per second
Scalability: Up to 2,048 ports per configuration
Adapters: ConnectX-7 InfiniBand (PCIe Gen4/Gen5, single or dual 400Gb/s ports)

Features:

Software-defined networking (SDN)
In-Network Computing acceleration
Performance isolation (multi-tenancy support)
Advanced acceleration engines for collective operations
RDMA (Remote Direct Memory Access) for zero-copy data transfer

Deployments:

CoreWeave: 3.2 Tbps per VM, 16,384 H100 SXM5 on single fabric
Lambda Labs: Clusters up to 2,040+ GPUs on single Quantum-2 fabric
Meta 24K Clusters: NVIDIA Quantum2 InfiniBand variant (one of two architectures tested)
Microsoft Azure ND H100 v5: Dedicated 400Gb/s per GPU (3.2 Tbps per VM)

Backward Compatibility: 400Gb/s ports can connect to existing 200Gb/s or 100Gb/s infrastructure

NVIDIA Spectrum-X Ethernet

Specification:

Technology: Ethernet with RDMA
Switch Model: NVIDIA Spectrum SN5600
Port Speed: Up to 800 Gb/s
ASIC: Spectrum-4
NIC: NVIDIA BlueField-3 SuperNICs

xAI Colossus Deployment:

Scale: 100,000 H100 GPUs on single RDMA fabric
Architecture: Spectrum SN5600 switches (800Gb/s), BlueField-3 SuperNICs
Significance: Largest AI supercomputer using Ethernet (not InfiniBand), proving Ethernet RDMA viability for massive GPU clusters

Advantages vs InfiniBand:

Lower cost per port (commodity Ethernet economics)
Broader vendor ecosystem (not NVIDIA-exclusive)
Familiar operations for datacenter teams

Challenges vs InfiniBand:

Higher latency (microseconds) vs InfiniBand (sub-microsecond)
Less mature for AI/HPC workloads (but xAI validates architecture)

RoCE (RDMA over Converged Ethernet)

Specification:

Technology: RDMA over Converged Ethernet
Speed: 400 Gb/s endpoints

Meta 24K GPU Cluster Deployment:

Scale: 24,576 H100 GPUs (one of two clusters)
Architecture: 400Gbps RoCE endpoints
Purpose: Compare RoCE vs InfiniBand performance at scale

Status: Meta testing both RoCE and InfiniBand architectures to determine optimal network for future deployments.

Network Topology

Fat-Tree (Most Common):

Non-blocking architecture: Any server can communicate with any other at full bandwidth
Requires expensive spine switches and massive cabling
Used by CoreWeave, Lambda Labs, Microsoft Azure

Dragonfly+ (Alternative):

Lower cost than fat-tree for large scales
Some blocking, but optimized for AI traffic patterns
Explored by hyperscalers for 50K+ GPU clusters

3D Torus (Google TPU):

Custom topology for TPU pods
Optimized for specific workload patterns
Not applicable to NVIDIA/AMD GPUs

Bandwidth Requirements

Per GPU (ideal):

Local (intra-server): 900 GB/s per GPU (NVLink)
Network (inter-server): 400 Gb/s per GPU (InfiniBand/Ethernet RDMA)
Total: 3.2 Tbps per 8-GPU server

Collective Operations:

All-reduce (parameter synchronization): Bandwidth-limited
All-to-all (expert routing): Latency-sensitive
Broadcast, reduce-scatter: Critical for distributed training

Scaling Challenge: 100,000 GPUs require 4-5 stages of switching (spine, super-spine, etc.), each adding latency and cost.

Supply Chain Dynamics

GPU Allocation Strategies

NVIDIA’s allocation process determines who gets latest GPUs:

Tier 1 Allocation (Prioritized):

Strategic Cloud Partners: CoreWeave, Lambda Labs (NVIDIA investors)
Hyperscalers: Microsoft/Azure (OpenAI partnership), AWS, Google Cloud
Key AI Companies: OpenAI, Meta, xAI (strategic relationships)
Government/National Labs: US Department of Energy supercomputing centers

Tier 2 Allocation: 5. Enterprise Customers: Large corporations with multi-year commitments 6. OEM Partners: Dell, HPE, Lenovo (build-to-order) 7. Colocation Providers: Equinix, Digital Realty, CyrusOne

Tier 3 Allocation (Lowest Priority): 8. Small Cloud Providers: Spot market purchases 9. Startups: Limited quantities, long lead times 10. Individual Purchases: Consumer/workstation GPUs only (no datacenter allocation)

Allocation Factors:

Existing Relationship: Long-term customers prioritized
Order Size: Larger commitments (10,000+ GPUs) receive priority
Strategic Value: Partners that drive NVIDIA software ecosystem
Equity Stake: NVIDIA investments (CoreWeave) ensure allocation

Lead Times and Availability

Current Status (October 2025):

GPU Model	Lead Time	Availability	Notes
A100	1-2 months	Good supply	Mature production, ample capacity
H100	2-4 months	Good supply	Production matured through 2024
H200	3-6 months	Moderate supply	Ramping production, strong demand
B200	6-9 months	Limited supply	Early production ramp
B300	9-12 months	Very limited	Initial production allocation
GB200 NVL72	12+ months	Pre-orders	Requires full rack orders (18 nodes)
GB300 NVL72	12+ months	Pre-orders	Extremely limited initial production

Historical Context:

2023 H100 Shortage: Lead times reached 12+ months, secondary market at 200-300% premium
2024 H100 Supply Improvement: Lead times normalized to 3-6 months
2025 Blackwell Ramp: Similar shortage pattern expected, preferential allocation to Tier 1 customers

Secondary Market and Gray Market

Dynamics:

During shortages, allocated GPUs resold on secondary market at premium
Peak H100 pricing: $40K-60K vs $25K-30K list price (2023)
Brokers facilitate bulk purchases from customers with excess allocation
Major buyers: Startups, AI labs without direct NVIDIA relationships

Risks:

No warranty (NVIDIA honors only original purchaser)
Potential for counterfeit or refurbished units
No firmware/software support
Procurement uncertainty

Current Market (October 2025):

Secondary market premiums minimal for H100/H200 (improved supply)
Strong demand for B200/B300 allocations (not yet widely available on secondary market)

Alternative GPU Procurement

AMD MI300X:

Lead Time: 3-6 months (shorter than NVIDIA for equivalent performance tier)
Availability: Improving (not constrained like NVIDIA)
Price: 20-30% less than NVIDIA equivalents
Limitation: Smaller ecosystem, software maturity, customer hesitation

China Suppliers (Export-Controlled):

US export controls (October 2022, October 2023) restrict H100/H200/Blackwell to China
“China-compliant” variants (A800, H800) with reduced performance
Enforcement challenges; some GPUs diverted through third countries

Custom Silicon (Hyperscaler-Only):

Google TPU, AWS Trainium/Inferentia, Meta MTIA
Only available on respective cloud platforms (not procurable)
Reduces NVIDIA dependency for internal workloads

Future Roadmap: 2025-2027

NVIDIA Roadmap

2025:

Blackwell Ramp: B200, B300, GB200, GB300 production volume increases Q2-Q4
GB300 Deployments: CoreWeave first deployments, expanding through year
Spectrum-X Adoption: More deployments following xAI Colossus validation

2026:

Blackwell Refresh: Potential “Blackwell Ultra+” mid-generation refresh
Next Architecture Announcement: Post-Blackwell architecture preview (Rubin rumored)

2027:

Next-Gen Architecture: Successor to Blackwell (Rubin?)
- Projected Specs: 2,000W+ TDP, 400-500GB memory, 10-15 TB/s bandwidth
- Performance: 3-5x Blackwell
- Cooling: May require immersion cooling or on-chip microfluidics
Multi-Die Integration: Further scaling through chiplet architectures

AMD Roadmap

2025:

MI300X Production: Volume ramp through year
CDNA 4 Architecture (MI350): Expected Q4 2025 announcement
- Target: Match/exceed NVIDIA Blackwell performance
- Memory: 200-250GB HBM3e
- ROCm Maturity: Continued software ecosystem investment

2026:

MI350 Deployments: Production deployments by major cloud providers
Market Share Goal: 10% of AI training market (from 3-4% today)

Google TPU Roadmap

2025:

TPU v7 (Ironwood) GA: Production availability following preview
TPU Adoption: Increased TensorFlow/JAX workload migration

2026:

TPU v8: Expected next generation
- Focus: Inference optimization (following v7 inference focus)
- Scale: Larger pod configurations

AWS Custom Silicon

2025:

Trainium2: Next-generation training chip
- Target: Competitive with NVIDIA Blackwell
- Performance: 4-5x Trainium v1
Inferentia3: Inference chip refresh
- Target: 4x Inferentia2
- Model Support: 500B+ parameter models in single instance

Industry Trends (2025-2027)

Power Consumption:

2025: 1,000-1,400W per GPU (Blackwell B200/B300)
2026: 1,500-2,000W per GPU (refreshes, next-gen)
2027: 2,000W+ per GPU (new architectures)

Rack Density:

2025: 140-200 kW per rack standard for AI
2026: 200-300 kW per rack
2027: 300+ kW per rack (immersion cooling may become standard)

Memory Capacity:

2025: 192-288GB per GPU (B200/B300)
2026: 300-400GB per GPU
2027: 500GB+ per GPU (enabling trillion-parameter models on fewer GPUs)

Cooling Technology:

2025: Direct liquid cooling standard
2026: Advanced liquid cooling (rack-scale systems)
2027: Immersion cooling or on-chip cooling for highest densities

Total GPU Deployments (US only):

End 2025: 1.5M+ GPUs (50% growth)
End 2026: 2.5M+ GPUs (67% growth)
End 2027: 4M+ GPUs (60% growth, maturing market)

Market Dynamics:

NVIDIA dominance continues but shrinks from 95% to 85-90% (AMD gains share)
Custom silicon grows from 1-2% to 5-10% (hyperscaler internal optimization)
GPU supply constraints persist through 2025, moderate by 2026
Secondary market premiums decline as production capacity improves

Strategic Implications for Datacenter Operators

GPU Infrastructure as Competitive Advantage

2025 Reality:

Access to latest GPUs determines AI infrastructure competitiveness
6-12 month lead times require strategic planning and relationships
Allocation prioritization more valuable than pricing (scarcity > cost)

Operator Strategies:

Tier 1 Strategies (CoreWeave, xAI, Meta):

Strategic NVIDIA relationships (equity investments, long-term commitments)
Pre-orders for next-generation GPUs (commit before specs finalized)
Multi-generation roadmap planning (2-3 year GPU procurement pipeline)

Tier 2 Strategies (Colocation Providers, Mid-Tier Cloud):

OEM partnerships (Dell, HPE, Supermicro) for allocation access
Multi-vendor approach (AMD MI300X to diversify supply)
Customer pre-commitments to justify large GPU orders

Tier 3 Strategies (Startups, Smaller Operators):

Cloud provider utilization (CoreWeave, Lambda Labs) rather than ownership
Secondary market procurement (accept warranty and support limitations)
Alternative workloads (inference on older GPUs, not latest-generation training)

Build vs Buy Decision Framework

Build (Own GPUs):

Pros: No markup, full control, depreciation tax benefits, long-term cost savings
Cons: Large capital outlay ($50M+ for 1,000 GPUs), allocation uncertainty, obsolescence risk, operational complexity

Buy (Cloud Provider):

Pros: Pay-as-you-go, no capex, access to latest hardware, operational simplicity
Cons: Higher long-term cost (2-3x vs ownership), capacity constraints, vendor lock-in

Break-Even Analysis:

Cloud cost: $2-4/GPU-hour ($1,500-3,000 per GPU per month at 100% utilization)
Ownership cost: $50K capex + $300/month opex = $50K + $3,600/year
Break-even: 12-18 months at 100% utilization; 24-36 months at 50% utilization

Decision Factors:

Utilization: >50% utilization favors ownership; less than 30% favors cloud
Scale: less than 100 GPUs favor cloud; >1,000 GPUs favor ownership
Workload: Training (long-duration) favors ownership; inference (bursty) favors cloud
Capital Access: Equity-funded startups may prefer cloud; established companies favor ownership

Liquid Cooling Investment Imperative

Current Reality:

H100/H200: 700W TDP requires liquid cooling for rack densities >30 kW
Blackwell B200/B300: 1,000-1,400W mandates liquid cooling
GB300 NVL72: 140 kW/rack impossible without rack-scale liquid cooling

Infrastructure Requirements:

CDU Capacity: 121-140 kW per rack (Vertiv CoolChip CDU 121 for GB300)
Facility Chilled Water: Upgraded capacity for 10-20x heat load increase
Floor Loading: Liquid-filled racks weigh 2,000-3,000 lbs (structural analysis required)
Leak Detection: Mandatory for liquid-cooled facilities (protect $millions of GPUs)

Retrofit vs Greenfield:

Greenfield: Design liquid cooling from start ($300-500/kW premium vs air-cooled)
Retrofit: Convert air-cooled to liquid ($1,000-2,000/kW, limited by floor loading, chilled water capacity)

Economics:

Upfront: 30-50% higher capex for liquid cooling infrastructure
Operational: 40% power savings, 60% footprint reduction, 20% lower TCO (Supermicro claims)
Competitive: Mandatory for AI workloads; air-cooled facilities lose AI customers

Recommendation: All new datacenter construction should include liquid cooling infrastructure; retrofit existing facilities targeting AI customers.

Multi-Vendor GPU Strategy

Risk Mitigation:

NVIDIA allocation constraints create capacity risk
Single-vendor dependency limits negotiating leverage
Technology risk (architectural dead-ends)

AMD MI300X as Alternative:

Advantages: 192GB memory (largest), lower cost (20-30%), better availability
Challenges: ROCm software maturity, customer hesitation, smaller ecosystem
Use Cases: Inference (memory-bound), specific training workloads with PyTorch/AMD optimization

Custom Silicon (Hyperscaler-Only):

Google TPU: TensorFlow/JAX workloads
AWS Trainium/Inferentia: Cost-sensitive training/inference
Meta MTIA: Internal workloads only
Limitation: Not available for independent operators

Recommendation: Maintain primary NVIDIA relationship for performance-critical workloads; pilot AMD MI300X for memory-intensive and cost-sensitive deployments; avoid custom silicon unless hyperscaler-scale.

GPU Obsolescence and Refresh Cycles

Depreciation Reality:

H100 purchased 2023: $30K; residual value 2026: $5K-10K (70-80% depreciation over 3 years)
Performance-per-dollar improves 2-3x every 18-24 months
Customers demand latest GPUs (older generations uncompetitive for new contracts)

Refresh Strategies:

Aggressive Refresh (18-24 months):

Maximize competitiveness (always latest GPUs)
High capital expenditure (continuous purchases)
Suitable for: AI cloud providers (CoreWeave, Lambda Labs)

Moderate Refresh (24-36 months):

Balance performance and cost
Cascade older GPUs to inference workloads
Suitable for: Hyperscalers (AWS, Azure, GCP)

Extended Life (36-48 months):

Minimize capex (extend GPU life)
Accept reduced competitiveness for training
Suitable for: Cost-sensitive deployments, inference-only

Secondary Market:

Sell 2-3 year old GPUs to smaller operators or international markets
Typical recovery: 10-30% of original price
Reduces effective depreciation cost

Recommendation: Plan for 24-30 month refresh cycles; cascade older GPUs to inference; factor 70-80% depreciation into financial models.

Conclusion: The GPU Infrastructure Foundation

Over 1 million high-performance GPUs deployed across US datacenters represent the physical infrastructure enabling the AI revolution. NVIDIA’s architectural and ecosystem advantages have created near-monopoly in AI training, but emerging competition from AMD and custom silicon suggests gradual diversification. Power consumption evolution from 400W (A100) to 1,400W (B300) has forced complete transformation of datacenter cooling infrastructure, making liquid cooling mandatory for competitive AI deployments.

Key Takeaways

Scale: 1M+ GPUs in US (2025), growing to 2.5M+ (2026) and 4M+ (2027)
Deployments: xAI Colossus (230K GPUs), CoreWeave (250K+ GPUs), Meta Prometheus (500K+ planned) demonstrate gigascale GPU infrastructure
Technology: NVIDIA H100/H200 dominate current deployments; Blackwell B200/B300 ramping production
Power: GPU TDP increased 3.5x from A100 (400W) to B300 (1,400W), requiring liquid cooling revolution
Supply: 6-12 month lead times for latest GPUs; allocation strategy determines competitive advantage
Competition: AMD MI300X gaining traction (192GB memory advantage); custom silicon (TPU, Trainium) for specific workloads
Economics: GPU ownership break-even 12-24 months at high utilization; 70-80% depreciation over 3 years
Future: 2,000W+ GPUs by 2027 may require immersion cooling or on-chip microfluidics

Strategic Priorities for Operators

Near-Term (2025):

Secure Blackwell B200/B300 allocations (pre-orders, strategic relationships)
Deploy liquid cooling infrastructure (mandatory for competitive AI)
Evaluate AMD MI300X for memory-intensive workloads (diversify supply)

Medium-Term (2026):

Plan for 200-300 kW rack densities (advanced liquid or immersion cooling)
Refresh A100 and early H100 deployments (obsolescence management)
Develop multi-vendor GPU strategies (reduce NVIDIA dependency)

Long-Term (2027+):

Prepare for 300+ kW racks (immersion cooling, on-chip cooling)
Invest in software ecosystem for AMD/custom silicon (hedge NVIDIA concentration)
Consider GPU ownership vs cloud provider based on utilization economics

The GPU infrastructure buildout represents one of the largest technology capital expenditure cycles in history, comparable to internet backbone buildout (1990s) and smartphone infrastructure (2010s). Operators who successfully navigate GPU allocation, cooling infrastructure, and refresh economics will capture outsized share of AI infrastructure market through 2030.

Data sources: NVIDIA, AMD, Google, AWS specifications; operator disclosures from CoreWeave, xAI, Meta, Applied Digital, Lambda Labs, Crusoe Energy; analyst estimates; datacenter industry publications. Analysis current as of October 2025.