cooling infrastructure: the liquid revolution driven by ai
on this page
Cooling Infrastructure: The Liquid Revolution Driven by AI
The explosive growth of AI workloads has fundamentally transformed datacenter cooling requirements. Where traditional datacenters operated at 15-20 kW per rack with air cooling, modern AI infrastructure demands 100-140 kW per rack, with next-generation systems pushing toward 300+ kW. This transformation requires a complete shift from air-based cooling to advanced liquid cooling technologies.
Executive Summary
- Power Density Gap: Traditional air cooling maxes out at 20-30 kW/rack; AI workloads require 100-300+ kW/rack
- Cooling Technology Shift: Direct liquid cooling (DLC) and immersion cooling becoming standard for AI infrastructure
- Market Growth: Immersion cooling market projected to grow from 11.10B (2030) at 17.91% CAGR
- Vendor Ecosystem: Vertiv, Supermicro, HPE/Cray, GRC, LiquidStack, Asperitas leading liquid cooling innovation
- Water Efficiency: Liquid cooling captures 70-98% of heat at chip level, dramatically reducing water consumption
- Future Requirements: NVIDIA GB300 NVL72 racks at 140 kW standard; future generations may exceed 300 kW per rack
The transition to liquid cooling represents one of the most significant infrastructure shifts in datacenter history, driven primarily by the physics of AI accelerator thermal management.
Why Cooling is Critical for AI Datacenters
The Power Density Challenge
AI accelerators generate unprecedented heat loads:
GPU Generation | TDP (Watts) | Rack Power (8 GPUs) | Cooling Requirement |
---|---|---|---|
NVIDIA A100 | 400W (SXM4) | ~15-20 kW | Air cooling possible |
NVIDIA H100 | 700W (SXM5) | ~30-40 kW | Liquid cooling preferred |
NVIDIA H200 | 700W (SXM) | ~30-40 kW | Liquid cooling required |
NVIDIA B200 | 1,000W | ~50-60 kW | Liquid cooling required |
NVIDIA B300 | 1,400W | ~70-80 kW | Liquid cooling mandatory |
GB200 NVL72 | — | 120 kW/rack | Rack-scale liquid cooling |
GB300 NVL72 | — | 140 kW/rack | Rack-scale liquid cooling |
The Physics Problem: Air has limited thermal capacity. At high power densities, air cooling requires:
- Massive airflow volumes (noise, fan power overhead)
- Low inlet temperatures (expensive chilled water systems)
- Large floor space (hot/cold aisle containment)
- Poor heat capture efficiency (typically 50-70%)
The Economics of Liquid Cooling
While liquid cooling systems have higher upfront costs, they deliver significant operational advantages:
Benefits:
- 40% power savings: Reduced cooling overhead and fan power
- 60% footprint reduction: Higher rack densities allow more compute per square foot
- 40% lower water consumption: More efficient heat transfer reduces evaporative cooling needs
- 20% lower TCO: Total cost of ownership advantages over facility lifetime
- 98% heat capture: (Supermicro DLC-2) Direct chip contact captures nearly all waste heat
- Heat reuse potential: High-temperature coolant (up to 45°C inlet) enables district heating integration
Challenges:
- Higher initial capital expenditure for CDU infrastructure
- Specialized facilities and maintenance requirements
- Supply chain constraints for cooling components
- Retrofit complexity for existing air-cooled facilities
Cooling Technologies Comparison
Technology Overview Table
Technology | Rack Density | Heat Capture | PUE Range | Water Usage | Complexity | Cost | Maturity |
---|---|---|---|---|---|---|---|
Traditional Air Cooling | 15-20 kW | 50-70% | 1.4-1.8 | High (evaporative towers) | Low | $ | Mature |
Optimized Air (Hot Aisle) | 20-30 kW | 60-75% | 1.3-1.5 | Medium-High | Medium | $$ | Mature |
Direct-to-Chip Liquid (DLC) | 100-140 kW | 70-98% | 1.1-1.3 | Low-Medium | Medium-High | $$$ | Mature |
Single-Phase Immersion | 200-300 kW | 95%+ | 1.05-1.15 | Very Low | High | $$$$ | Early Commercial |
Two-Phase Immersion | 300+ kW | 95%+ | 1.05-1.15 | Very Low | Very High | $$$$ | Emerging |
Direct-to-Chip Liquid Cooling (DLC)
Technology: Cold plates with microchannels mounted directly on high-power components (CPUs, GPUs, memory, VRMs). Single-phase cooling fluid (water/glycol mix or dielectric fluid) circulates through cold plates to absorb heat.
Architecture:
- Cold Plates: Precision-machined metal plates with microchannel structures for maximum surface area
- Coolant Distribution Units (CDUs): Separate primary facility chilled water from secondary coolant loops
- Manifolds: Distribute coolant to multiple servers in a rack
- Heat Rejection: Transfer heat to facility chilled water or dry coolers
Power Density: 100-140 kW per rack (current); 200 kW+ (future generations)
Heat Capture Efficiency:
- 70-80% typical (Dell, HPE)
- 98% advanced (Supermicro DLC-2)
- Remaining 2-30% cooled by low-volume airflow for components not on cold plates
Advantages:
- Proven at scale for AI deployments
- Compatible with existing datacenter infrastructure (with CDU additions)
- Modular and serviceable at component level
- Can retrofit into raised-floor environments
- Noise reduction: As low as 50 dB (Supermicro) vs 80+ dB for high-airflow air cooling
Key Deployments:
- CoreWeave: 100% liquid-cooled infrastructure for 130-140 kW racks
- Meta: Air-assisted liquid cooling for Catalina high-power racks (~140 kW)
- xAI Colossus: 100,000 H100 GPUs with liquid cooling
- Applied Digital Ellendale: Closed-loop, waterless, direct-to-chip cooling for 50,000 H100 GPUs
Leading Vendors: Vertiv, Supermicro, Lenovo Neptune, Dell, HPE/Cray
Immersion Cooling: Single-Phase
Technology: IT equipment fully submerged in non-conductive dielectric fluid. Heat transfers from components directly to surrounding fluid through natural or forced convection. Fluid remains in liquid state (no phase change).
Fluid Types:
- Synthetic dielectric fluids (3M Novec, specialty engineered fluids)
- Mineral oils (for some applications)
Architecture:
- Immersion Tanks: Sealed enclosures containing servers and dielectric fluid
- Heat Exchangers: External heat exchangers cool the warmed fluid
- Fluid Circulation: Pumps circulate fluid through cooling loop (or natural convection for passive systems)
Power Density: 200-300 kW per rack equivalent
Advantages:
- Very high power densities (2-3x DLC)
- Eliminates server fans and traditional cooling infrastructure
- Excellent for high-density GPU clusters and crypto mining
- Natural cooling through fluid convection (passive systems)
- Dust and humidity protection
Challenges:
- Server warranty considerations
- Fluid management and maintenance complexity
- Higher upfront infrastructure costs
- Limited serviceability during operation
- Fluid disposal and environmental considerations
Key Vendor: Green Revolution Cooling (GRC)
Deployments:
- High-density crypto mining (80-100 kW+ rack densities)
- AI training clusters (emerging)
- Edge computing in harsh environments
Immersion Cooling: Two-Phase
Technology: IT equipment submerged in dielectric fluid with low boiling point (typically 50-65°C). Heat from components causes fluid to boil (phase change to vapor), carrying heat away. Vapor condenses on cooled coils above tank, returning as liquid.
Physics Advantage: Phase-change (liquid to vapor) absorbs significantly more heat energy than single-phase sensible heating, enabling even higher power densities with less fluid circulation.
Power Density: 300+ kW per rack equivalent
Advantages:
- Highest power densities achievable
- Most efficient heat transfer mechanism (latent heat of vaporization)
- Minimal or no pumping required (natural convection cycle)
- Passive cooling systems possible
Challenges:
- Complex fluid management (pressure control, vapor containment)
- Specialized fluids with environmental considerations (GWP)
- Higher costs than single-phase
- Limited commercial deployments to date
- More complex servicing procedures
Key Vendor: LiquidStack (pioneering commercial two-phase systems)
Deployments:
- Research and demonstration projects
- Specialized hyperscale applications (limited public information)
- Emerging for next-generation 300+ kW rack densities
Passive Immersion Cooling
Technology: Variant of immersion cooling that eliminates pumps entirely, relying on natural convection and thermosiphon effects.
Advantages:
- Highest reliability (fewer moving parts)
- Lower power consumption (no pumps)
- Reduced maintenance
Key Vendor: Asperitas (pioneering passive immersion since 2017)
Status: Production-scale deployments with measurable power savings
Vendor Ecosystem Deep Dive
Vertiv: CDU Market Leader
Specialization: Coolant Distribution Units (CDUs) for direct liquid cooling
Product Portfolio:
Product | Capacity | Type | Target Application |
---|---|---|---|
CoolChip CDU 70 | 70 kW | Liquid-to-liquid | Small AI clusters |
CoolChip CDU 100 | 100 kW | Liquid-to-liquid | Mid-size AI racks |
CoolChip CDU 121 | 121 kW | Liquid-to-liquid | NVIDIA GB300 NVL72 cabinets (optimized) |
CoolChip CDU 350 | 350 kW | Liquid-to-air | Retrofit applications |
CoolChip CDU 600 | 600 kW | Liquid-to-liquid | Large cluster support |
CoolChip CDU 2300 | 2.3 MW | Liquid-to-liquid | Row or building-level cooling |
Technology Details:
- Single-phase direct-to-chip cooling with cold plates
- CDUs separate primary chilled water loops from secondary fluid networks for server protection
- 3,000x more effective than air cooling (thermal conductivity of liquid vs air)
- Modular design allows scaling from single rack to entire buildings
Key Deployments:
- CoreWeave GB300 NVL72 systems (CDU 121)
- Multiple hyperscale and colocation providers
Innovation: Vertiv’s portfolio addresses the full spectrum from retrofit (liquid-to-air CDU 350) to greenfield hyperscale (2.3 MW CDU 2300), enabling liquid cooling adoption across existing and new facilities.
Supermicro: Rack-Scale Liquid Cooling
Specialization: Complete liquid-cooled server and rack systems optimized for AI workloads
DLC-2 System Specifications:
- 98% heat capture efficiency (industry-leading)
- Inlet water temperature: Up to 45°C (enables heat reuse and reduces chiller requirements)
- Noise level: As low as 50 dB (vs 80+ dB for equivalent air-cooled systems)
- Components cooled: CPU, GPU, PCIe Switch, DIMM, VRM, PSU (comprehensive)
CDU Products:
Product | Capacity | Configuration | Target Deployment |
---|---|---|---|
In-Rack CDU | 250 kW | Supports 64x 1000W NVIDIA Blackwell GPUs + 16x 500W CPUs in 48U rack | Single-rack AI systems |
In-Row CDU | 1.8 MW | Row-level cooling | Multi-rack AI clusters |
Technology Innovation:
- Cold plates with microchannels dissipate up to 1,600W for next-gen NVIDIA GPUs
- Single AI rack can generate over 100 kW heat (standard design target)
- Rack density: Up to 96 NVIDIA Blackwell GPUs per rack (highest density commercially available)
Performance Benefits:
- 40% power savings (reduced cooling overhead)
- 60% reduced footprint (higher rack densities)
- 40% lower water consumption (efficient heat transfer)
- 20% lower TCO (total cost of ownership)
Key Deployments:
- Applied Digital (GPU servers across North Dakota facilities)
- xAI Colossus (100,000 GPU cluster infrastructure)
- NVIDIA GB200 NVL72 reference design (72 Blackwell GPUs, 32 Grace CPUs per rack)
Market Position: Supermicro’s end-to-end approach (servers + CDUs + management software) makes them a one-stop solution for liquid-cooled AI infrastructure, particularly attractive to operators building greenfield facilities.
HPE (Cray): 100% Liquid-Cooled Supercomputing
Specialization: Purpose-built liquid-cooled supercomputing systems
Philosophy: 100% direct liquid cooling - no hybrid air/liquid systems
HPE Cray EX Platform:
- Supports up to 500W processors (CPU or GPU)
- Density: Up to 512 processors per cabinet (64 compute blade slots, 8 chassis)
- Sealed cooling: Closed-loop system with no heated air exhaust into datacenter
GPU Systems:
Product | GPU Configuration | Form Factor | Target Workload |
---|---|---|---|
HPE Cray XD670 | 8x NVIDIA H200 or H100 | 5U | LLM training, NLP |
HPE Cray EX154n | Up to 224 NVIDIA Blackwell GPUs per cabinet | Blade | Next-gen AI supercomputing |
Coolant Distribution:
- HPE Coolant Distribution Unit (CDU): 1.2 MW cooling capacity
- Configuration: One CDU supports maximum 4 cabinets
- Components: Heat exchanger, pumps, control valve, sensors, controller, valves, piping
Advantages:
- Proven at exascale supercomputing (world’s fastest supercomputers use Cray liquid cooling)
- Purpose-built for maximum performance per watt
- Complete thermal isolation from datacenter environment
Deployments:
- National lab supercomputers (Frontier, Aurora, El Capitan)
- HPC and AI training at scale
- Cloud providers deploying dedicated AI supercomputing infrastructure
Market Position: HPE/Cray targets the highest-performance segment where 100% liquid cooling is non-negotiable for power efficiency and performance.
Lenovo Neptune: Mainstream Liquid Cooling
Specialization: Bringing liquid cooling to mainstream enterprise and cloud
Experience: 13+ years of liquid cooling experience
Product Line:
Product | Type | Heat Capture | Energy Savings | Notes |
---|---|---|---|---|
Neptune Core | Open-loop direct water cooling | Up to 80% | — | Targets CPUs, GPUs, memory |
Neptune 6th Generation | Advanced DLC | — | Up to 40% | Latest generation |
ThinkSystem SR780a V3 | 8 fully interconnected NVIDIA GPUs | — | — | Lenovo Neptune liquid cooling |
Technology:
- Open-loop direct water cooling to high-power components
- Minimal airflow requirements (low noise, low fan power)
- Targets CPUs, GPUs, and memory
Deployments:
- AI workloads
- HPC clusters
- Enterprise AI infrastructure
Market Position: Lenovo focuses on making liquid cooling accessible to enterprises transitioning from air-cooled infrastructure, with emphasis on ease of deployment and long-term support.
Dell Technologies: DLC Portfolio
Specialization: Direct Liquid Cooling (DLC) across server portfolio
Portfolio: 12+ server platforms now DLC-ready
Key Platform:
Product | Configuration | Target Workload |
---|---|---|
PowerEdge XE9640 | 4x Intel Data Center GPU Max 1550 (DLC-cooled) | HPC and AI/ML |
Technology:
- Cold plates cool high-powered CPUs and GPUs
- Air cools remaining components (hybrid approach)
- 70-80% of heat removed at chip level (DLC portion)
Advantages:
- DLC manages heat density at chip level
- Easier retrofit into existing infrastructure vs full immersion
- Broad portfolio supports diverse workload requirements
Deployments:
- HPC workloads
- AI/ML training and inference
- Enterprise cloud infrastructure
Market Position: Dell’s extensive portfolio and global support infrastructure make them attractive for large enterprises and cloud providers with diverse datacenter footprints requiring standardized cooling approaches.
Green Revolution Cooling (GRC): Immersion Pioneer
Founded: 2009
Specialization: Single-phase immersion cooling
Technology:
- IT equipment submerged in dielectric fluid
- Increases performance while reducing power consumption
- Proven for extreme-density applications
Target Applications:
- High-density crypto mining (80-100 kW+ rack densities)
- AI model training
- HPC clusters requiring maximum density
Partnerships:
- Asperitas (education partnership on immersion cooling benefits and misconceptions)
Market Position: GRC pioneered commercial immersion cooling and continues to lead in single-phase immersion for the highest-density applications where DLC is insufficient.
LiquidStack: Two-Phase Innovation
Specialization: Two-phase immersion cooling (industry leader)
Technology:
- 2-phase immersion: Fluid boils on hot components, vapor condenses on cooled coils
- Awarded for building world’s most efficient liquid-cooled datacenter solutions
Facilities:
- Headquarters: Carrollton, Texas
- Inaugurated: March 2025 (3x production capacity increase)
Target Applications:
- AI data centers (highest density)
- Hyperscale computing
- Edge computing in space-constrained environments
Market Position: LiquidStack is pioneering commercial adoption of two-phase immersion, targeting next-generation 300+ kW rack densities that challenge even single-phase immersion systems.
Asperitas: Passive Immersion
Founded: First solution launched 2017
Specialization: Passive immersion cooling (no pumps)
Technology:
- Natural convection and thermosiphon effects
- Eliminates pumps for higher reliability
- Minimizes moving parts
Advantages:
- Highest reliability (fewest moving parts in industry)
- Lower ongoing power consumption
- Reduced maintenance complexity
Deployments:
- Production-scale immersion with measurable power savings
- High-density applications requiring maximum uptime
Partnerships:
- GRC (education partnership on immersion cooling benefits)
Market Position: Asperitas’s passive approach addresses the reliability and maintenance concerns that have slowed immersion cooling adoption, potentially enabling broader deployment.
Colocation Provider Liquid Cooling
Equinix: Liquid Cooling at Scale
Availability: 100 data centers across 45 metros
Technology: Direct-to-chip liquid cooling
Power Density: Customer requests reaching 60-80 kW racks (standard offerings)
Efficiency Benefits:
- Liquid-cooled racks (2,000 servers at 30 kW/rack) use 30% less energy vs air-cooled
- 66% less space vs air-cooled (15 kW/rack) for same compute capacity
Innovation Center:
- Co-Innovation Facility (CIF): Ashburn, Virginia
- Testing liquid-cooled NVIDIA GPUs
- Sustainable AI platform development
Strategy: Equinix is retrofitting liquid cooling across its global portfolio to meet AI workload demand, making liquid cooling accessible to colocation customers without requiring purpose-built facilities.
CyrusOne: 300 kW Intelliscale
Availability: All new datacenter designs; cost-effective retrofits for existing facilities
Technologies:
- Liquid-to-chip cooling
- Rear door heat exchanger
- Immersion cooling (full spectrum approach)
Power Density: Up to 300 kW per rack (Intelliscale solution)
Intelliscale Product:
- AI workload-specific datacenter solution
- Modular manufacturing for rapid deployment
- Efficient cooling up to 300 kW per rack
Strategy: CyrusOne is positioning Intelliscale as a turnkey solution for AI operators, targeting the highest power densities in the colocation market.
Flexential: High-Density AI Alliance
Strategy: Partnership approach with AI cloud providers
Technology: High-density liquid cooling
Key Deployments:
- Hillsboro, Oregon: 9 MW for CoreWeave (2024)
- Douglasville, Georgia: 9 MW for CoreWeave (2024)
Approach: Flexential is adapting existing facilities for liquid cooling through partnerships with AI-native operators like CoreWeave, leveraging their expertise in high-density AI infrastructure.
Aligned Data Centers: Purpose-Built Liquid Cooling
Strategy: New facilities designed liquid-cooling-first
Technology: Liquid-cooled infrastructure optimized for highest-density GPUs
Key Project:
- DFW-04: Plano, Texas
- 425,500 sq ft (39,500 sqm)
- Customer: Lambda Labs
- Construction: October 2025 - October 2026
- Designed for highest-density GPU configurations
Approach: Aligned is building greenfield facilities with liquid cooling as the foundational design principle, eliminating retrofit compromises.
Case Studies: Liquid Cooling in Production
CoreWeave: 130-140 kW Standard Density
Overview: GPU cloud computing leader with 250,000 GPU fleet (end 2024)
Cooling Strategy: 100% liquid cooling for all new facilities from 2025 onwards
Infrastructure:
- 33 operational facilities across United States and Europe
- All datacenters designed with liquid cooling foundation
- Purpose-built to support NVIDIA GB200/GB300 NVL72 clusters
Power Density:
- Current: ~130 kW racks standard
- GB200 NVL72: 120 kW per rack (72 Blackwell GPUs)
- GB300 NVL72: ~140 kW per rack (72 Blackwell Ultra GPUs)
Technology Partners:
- Vertiv (CDU 121 for GB300 systems)
- NVIDIA (architecture optimization)
Deployment Timeline:
- 2024: Mix of air-cooled (legacy) and liquid-cooled (new)
- 2025+: All new capacity liquid-cooled
Business Impact:
- 420 MW active power across 33 facilities
- 2,200 MW contracted power pipeline
- Liquid cooling essential to achieving contracted capacity within existing facility footprints
Quote Significance: CoreWeave’s CEO has stated liquid cooling is “non-negotiable” for delivering competitive AI infrastructure at scale, highlighting the technology shift from optional to mandatory.
xAI Colossus: 100,000 GPU Liquid-Cooled Cluster
Overview: World’s largest AI supercomputer (as of deployment)
Location: Memphis, Tennessee (former Electrolux factory, 785,000 sq ft)
Scale:
- Phase 1: 100,000 NVIDIA H100 GPUs (September 2024)
- Current: 230,000 GPUs (150K H100, 50K H200, 30K GB200) (June 2025)
- Future: 1,000,000 GPUs target across multiple facilities
Cooling Strategy: Hybrid air and liquid cooling
Infrastructure:
- 300 MW power (150 MW utility + 150 MW Tesla Megapack batteries)
- Single RDMA fabric interconnecting all GPUs
- Deployed in 122 days (infrastructure speed record)
Technology Partner: Supermicro
- Liquid-cooled server infrastructure
- Rack-scale cooling systems
- CDU deployment across facility
Innovation: Colossus demonstrates that 100,000+ GPU clusters can be deployed in existing industrial buildings (not purpose-built datacenters) using liquid cooling, dramatically reducing time-to-production for AI infrastructure.
Networking: NVIDIA Spectrum-X Ethernet with RDMA (not InfiniBand), proving liquid cooling works with diverse network architectures.
Meta: Catalina High-Power AI Racks
Overview: Social media giant building gigawatt-scale AI infrastructure
Cooling Evolution:
- AI Research SuperCluster (RSC) - 2022: 16,000 A100 GPUs (air-cooled)
- 24K GPU Clusters - 2024: 49,152 H100 GPUs across 2 clusters (hybrid cooling)
- Prometheus - 2026: 500,000+ GPUs (1+ GW, liquid cooling)
Catalina Rack System:
- Power density: ~140 kW per rack
- Technology: Air-assisted liquid cooling
- Design: Proprietary rack design for OCP (Open Compute Project) servers
- Platform: Grand Teton (OCP open hardware) with YV3 Sierra Point servers
Infrastructure Scale:
- Prometheus location: New Albany, Ohio
- Power: 1,020 MW (1+ GW)
- Deployment: Multiple datacenter buildings + colocation + temporary weather-proof tents
- Expected launch: 2026
- Purpose: Llama 4 training and AGI research
Cooling Strategy: Meta is deploying multiple cooling approaches in parallel:
- Direct liquid cooling for high-power racks
- Air-assisted liquid cooling for hybrid deployments
- Testing RoCE vs InfiniBand networking with different thermal architectures
Innovation: Meta’s open-source approach through OCP is driving industry standardization of liquid cooling for AI, enabling broader ecosystem adoption.
Applied Digital Ellendale: Waterless Liquid Cooling
Overview: Purpose-built HPC datacenter in North Dakota
Location: Ellendale, North Dakota
Scale:
- Initial: 180 MW
- Campus potential: 400 MW
- Pipeline: 1+ GW under study
GPU Capacity: Nearly 50,000 H100 SXM-class GPUs in single parallel compute cluster
Cooling Technology: Closed-loop, waterless, direct-to-chip liquid cooling
Innovation:
- Zero water consumption: Dry coolers reject heat without evaporative cooling
- Climate advantage: North Dakota’s cold climate reduces cooling power requirements
- Multi-story design: High-density racks in multi-level datacenter (space efficiency)
Status: Energized December 2024
Technology Partner: Supermicro (GPU servers and cooling infrastructure)
Environmental Impact:
- Eliminates millions of gallons of annual water consumption vs traditional cooling
- Demonstrates viability of waterless liquid cooling at scale
- Cold climate provides natural cooling assist (lower ambient temperatures)
Business Model: 15-year lease to CoreWeave for $7 billion total revenue, demonstrating the economics of purpose-built liquid-cooled infrastructure.
QTS Freedom Design: Water-Free Cooling at Scale
Overview: Hyperscale datacenter operator pioneering water-free cooling
Technology: Proprietary water-free cooling system
Benefits:
- Eliminates regional water stress concerns
- No water towers or evaporative cooling
- Suitable for arid regions (Arizona, Nevada, Texas)
Deployment: Multiple facilities across QTS portfolio
Innovation: QTS’s approach addresses growing regulatory and community concerns about datacenter water consumption, particularly in water-stressed regions experiencing datacenter growth (Phoenix, Las Vegas).
Technology Details: While specific details are proprietary, water-free cooling typically combines:
- Air-side economization (free cooling when ambient temps permit)
- Adiabatic cooling (limited water use only in extreme conditions)
- High-efficiency chillers with dry coolers
- Direct liquid cooling to reduce overall cooling loads
Technology Deep Dive: Direct Liquid Cooling Architecture
Cold Plate Design
Function: Transfer heat from chip surface to flowing coolant
Construction:
- Base Plate: High-conductivity metal (copper or aluminum) machined to match chip surface
- Microchannel Structure: Precisely engineered channels maximize surface area and turbulent flow
- Inlet/Outlet Ports: Connect to coolant distribution system
- Thermal Interface Material (TIM): Fills microscopic gaps between chip and cold plate
Performance:
- Modern cold plates can dissipate 1,600W per component (Supermicro specification for next-gen GPUs)
- Thermal resistance < 0.1 K/W (industry-leading designs)
- Flow rates: Typically 0.5-2.0 liters per minute per cold plate
Components Cooled (comprehensive DLC systems):
- CPUs (500W+)
- GPUs (700-1,400W current; higher future)
- Memory DIMMs (15-30W each, but high density)
- Voltage Regulator Modules (VRMs) (high power density, localized hot spots)
- PCIe switches (critical for GPU-to-GPU communication)
- Power supplies (efficiency increases when components run cooler)
Coolant Distribution Units (CDUs)
Function: Interface between facility chilled water and server coolant loops
Architecture:
Facility Chilled Water (Primary Loop)
↓
Heat Exchanger (in CDU)
↓
Secondary Coolant Loop (isolated, controlled)
↓
Manifold Distribution (to racks)
↓
Cold Plates (on servers)
↓
Return Manifold (from racks)
↓
Heat Exchanger (heat transfer back to primary)
↓
Facility Heat Rejection (cooling towers, dry coolers, etc.)
Key Components:
- Heat Exchanger: Plate-and-frame or brazed plate design for high efficiency
- Pumps: Variable-speed pumps maintain required flow rates
- Control Valve: Modulates primary chilled water flow to maintain secondary loop temperature
- Sensors: Temperature, pressure, flow monitoring throughout system
- Controller: PID control maintains setpoints and alarm conditions
- Filtration: Protects servers from particulates and corrosion products
- Piping: Manifolds distribute coolant to/from racks
Isolation Benefits:
- Protects servers from facility water quality issues
- Allows optimized coolant chemistry (corrosion inhibitors, biocides)
- Pressure isolation prevents facility pressure transients from affecting servers
- Enables different fluids (facility uses water; secondary may use water/glycol or dielectric fluids)
Capacity Scaling:
- Small: 70-100 kW (single rack)
- Medium: 350-600 kW (row-level)
- Large: 1.2-2.3 MW (building-level)
Manifold Distribution
Function: Distribute coolant from CDU to multiple servers
Configurations:
-
In-Rack Manifolds: CDU integrated into rack (Supermicro approach)
- Pros: Compact, minimizes piping runs, rack-level control
- Cons: Limited to single rack cooling capacity
-
In-Row Manifolds: CDU serves multiple racks in a row
- Pros: Efficient for traditional row-oriented datacenter layouts
- Cons: More complex piping, potential for imbalanced flow
-
Overhead Distribution: Ceiling-mounted manifolds drop coolant to racks
- Pros: Minimal floor space impact, clean aesthetics
- Cons: More complex installation, requires structural support
Quick Disconnects: Tool-less quick-disconnect couplings enable hot-swapping servers without draining coolant systems, critical for serviceability.
Coolant Types
Coolant | Temperature Range | Advantages | Disadvantages | Typical Use |
---|---|---|---|---|
Water | 5-45°C | High thermal capacity, low cost, readily available | Corrosion risk, freezing risk, conductivity concerns | Facility primary loops |
Water/Glycol Mix | -40 to 100°C | Freeze protection, corrosion inhibitors | Lower thermal capacity than pure water, periodic replacement | Outdoor equipment, cold climates |
Dielectric Fluids | -50 to 100°C+ | Electrically non-conductive (direct component contact safe), no corrosion | Higher cost, lower thermal capacity, environmental disposal concerns | Immersion cooling, critical direct-contact applications |
Engineered Fluids | Varies | Optimized thermal and electrical properties | Highest cost, specialized handling | High-performance and specialized applications |
Water Quality: For water-based coolants, critical parameters include:
- Conductivity: < 5 µS/cm (minimizes galvanic corrosion)
- pH: 7.0-9.0 (neutral to slightly alkaline)
- Dissolved oxygen: < 20 ppb (minimizes corrosion)
- Particulates: < 100 NTU (prevents cold plate clogging)
- Biocides: Prevent bacterial growth in closed loops
Heat Rejection Systems
Facility-Level Heat Removal: CDUs transfer heat to facility chilled water; facility must reject heat to ambient.
Options:
-
Cooling Towers (Evaporative):
- Cooling capacity: Excellent (can achieve very low water temperatures)
- Water consumption: High (evaporation)
- Efficiency: High (evaporative cooling is thermodynamically efficient)
- Climate suitability: All climates
- Considerations: Water availability, drift (water droplet release), Legionella risk
-
Dry Coolers (Air-Cooled Heat Exchangers):
- Cooling capacity: Moderate (limited by dry-bulb ambient temperature)
- Water consumption: Zero
- Efficiency: Lower than evaporative (higher temperature approach)
- Climate suitability: Cold/moderate climates (less effective in hot regions)
- Considerations: Larger footprint, higher fan power, noise
-
Adiabatic Coolers (Hybrid):
- Cooling capacity: High (dry cooling + evaporative assist)
- Water consumption: Low (only during peak cooling demand)
- Efficiency: Good balance
- Climate suitability: All climates
- Considerations: Complexity, maintenance of both systems
-
District Heating Integration:
- Concept: Sell waste heat to district heating systems for residential/commercial building heating
- Requirements: High coolant return temperature (40-60°C+), proximity to district heating infrastructure
- Benefits: Monetizes waste heat, improves datacenter sustainability metrics
- Deployments: Common in Nordic countries; emerging in other regions
Temperature Approach: The difference between ambient conditions and achievable chilled water temperature drives cooling efficiency:
- Evaporative cooling: 3-5°C approach to wet-bulb temperature
- Dry cooling: 8-12°C approach to dry-bulb temperature
- Hybrid: 5-8°C approach (variable based on mode)
Higher Inlet Temperatures: One key advantage of liquid cooling is accepting higher inlet water temperatures (up to 45°C for advanced systems like Supermicro DLC-2). This enables:
- Free cooling (economization) for more hours per year
- Reduced chiller runtime
- District heating integration (heat is valuable at higher temperatures)
- Lower facility cooling costs
Immersion Cooling Deep Dive
Single-Phase Immersion Architecture
Tank Design:
- Enclosure: Sealed tank with removable lid for server access
- Size: Typically equivalent to 1-3 traditional racks
- Server mounting: Vertical or horizontal orientation (vertical more common)
- Fluid volume: 200-500 liters typical (depends on tank size and server density)
Cooling Cycle:
- Servers generate heat → components transfer heat to surrounding fluid
- Warmed fluid rises (natural convection) or is pumped through external heat exchanger
- Heat exchanger cools fluid (facility chilled water on other side)
- Cooled fluid returns to tank bottom
- Cycle repeats
Passive vs Active:
- Passive (Asperitas): No pumps; natural convection and thermosiphon effects circulate fluid
- Active (GRC): Pumps force circulation for higher heat loads
Fluid Properties (typical single-phase immersion fluid):
- Dielectric strength: > 35 kV
- Thermal conductivity: 0.1-0.15 W/m·K
- Viscosity: 1-3 cSt (low viscosity aids natural convection)
- Flash point: > 100°C (safety)
- Global Warming Potential (GWP): Varies by fluid (newer fluids targeting low GWP)
Server Modifications:
- Remove fans (fluid convection replaces airflow)
- May remove heatsinks (fluid direct-contacts components in some designs)
- Seal or remove components incompatible with immersion (batteries, certain capacitors)
- Use immersion-rated storage devices (or seal existing)
Serviceability:
- Server removal: Requires fluid drainage or lifting from tank (fluid drips off)
- Fluid management: Periodic topping off (minimal losses in sealed systems)
- Heat exchanger maintenance: Standard chilled water equipment
Two-Phase Immersion Architecture
Tank Design:
- Lower section: Liquid pool where servers are submerged
- Upper section: Vapor space where boiling vapor rises
- Condenser coils: Mounted in vapor space above servers
- Sealed system: Pressure-controlled to maintain desired boiling point
Cooling Cycle:
- Servers generate heat → fluid boils on hot components (phase change: liquid → vapor)
- Vapor rises to upper tank section
- Vapor contacts chilled condenser coils → condenses back to liquid (phase change: vapor → liquid)
- Condensed liquid drips/flows back to lower pool
- Cycle repeats (thermosiphon - no pumps required)
Fluid Properties (two-phase immersion fluid):
- Boiling point: 50-65°C at atmospheric pressure (engineered for datacenter component temperatures)
- Dielectric strength: > 35 kV
- Latent heat of vaporization: High (this is the key advantage)
- GWP: Variable (some fluids have high GWP; industry developing alternatives)
Physics Advantage:
- Latent heat: Boiling absorbs ~100-200x more energy than sensible heating (raising temperature of liquid)
- Isothermal: Components operate at nearly constant temperature (boiling point)
- Passive: Natural circulation (vapor rises, liquid falls) requires no pumps
Challenges:
- Fluid containment: Vapor pressure requires sealed systems
- Condenser sizing: Must condense all generated vapor
- Fluid costs: Specialized fluids are expensive
- Environmental: Some fluids have high GWP (regulatory concerns)
Emerging Applications:
- 300+ kW racks (future GPU generations)
- Space-constrained edge deployments (highest compute per volume)
- Harsh environments (complete component protection)
Future Evolution: 300+ kW Racks and Beyond
NVIDIA GB300 Requirements
GB300 NVL72 Specifications:
- GPU count: 72 Blackwell Ultra GPUs per rack
- CPU count: 36 NVIDIA Grace CPUs per rack
- DPU count: 18 NVIDIA BlueField-3 DPUs per rack
- Total GPU memory: 21 TB per rack
- Performance: 1.1 exaflops FP4 per rack
- Power consumption: ~140 kW per rack
Cooling Implications:
- Liquid cooling mandatory: 140 kW cannot be air-cooled at reasonable costs
- CDU capacity: Requires 140+ kW CDUs (Vertiv CDU 121 optimized for this)
- Rack-scale design: Cooling system integrated with rack (not server-by-server)
- Deployment constraint: Must deploy in multiples of 18 nodes (full NVL72 rack)
Facilities Impact:
- Traditional datacenter: 15-20 kW/rack → 140 kW/rack = 7-9x density increase
- Floor space efficiency: Same compute in 1/7 the footprint (capital expenditure efficiency)
- Power delivery: Requires high-density power distribution (busway, not traditional PDUs)
Vendor Ecosystem:
- Vertiv (CDU 121)
- CoreWeave (first GB300 deployment announced)
- Dell, Switch, Vertiv (initial deployment partners per NVIDIA)
Next-Generation Projections
2026-2027 GPU Generations (industry speculation based on thermal trends):
- GPU TDP: 1,600-2,000W per GPU
- Rack power: 200-300 kW per rack (8-16 GPU configurations)
- Cooling requirement: Advanced direct liquid cooling or single-phase immersion
2028+ Directions:
- On-chip cooling integration: Microfluidic cooling channels integrated into chip packaging
- Immersion standard: Two-phase immersion may become standard for 300+ kW racks
- Cryogenic cooling: Liquid nitrogen or refrigerant-based cooling for extreme densities (experimental)
On-Chip Cooling Integration
Concept: Instead of cold plates attached to chip surfaces, integrate cooling channels directly into chip packaging.
Approaches:
- Microfluidic channels: Microscopic channels etched into silicon interposer or package substrate
- 3D stacking: Cooling layers integrated between stacked chiplets
- Embedded heat pipes: Micro heat pipes in package substrate
Advantages:
- Minimal thermal resistance (cooling at source)
- Supports 2,000W+ per chip
- Enables extreme compute densities
Challenges:
- Manufacturing complexity
- Reliability (leaks catastrophic)
- Cost
- Serviceability (chip-level vs server-level maintenance)
Status: Research phase; demonstrations at 100-500W scale; commercial deployment 5-10 years out.
Heat Reuse Opportunities
District Heating Integration:
Traditional datacenters waste heat (rejected to ambient via cooling towers). Liquid cooling enables heat reuse:
Requirements:
- High temperature: 40-60°C+ coolant return temperature (liquid cooling achieves this)
- Proximity: < 5-10 km to district heating network (heat transfer losses)
- Demand: Nearby residential/commercial heating demand
Economics:
- Revenue: Sell waste heat to district heating operator (€10-30 per MWh thermal typical)
- Sustainability: Dramatically improves datacenter PUE and carbon metrics
- Community relations: Transforms datacenter from energy consumer to community energy provider
Deployments:
- Nordic countries: Multiple datacenters selling waste heat (Stockholm, Helsinki, Copenhagen)
- Emerging: Germany, Netherlands, UK exploring district heating integration
- US: Limited (sparse district heating infrastructure), but potential in cold-climate cities
Barriers:
- US lacks district heating infrastructure (common in Europe)
- Requires long-term contracts (datacenter and heating utility commitments)
- Initial capital investment for heat exchange and piping
Future Potential: As liquid cooling becomes standard, heat reuse could transform datacenter economics and sustainability profiles, particularly in cold climates with heating demand matching datacenter waste heat availability.
Water Usage Concerns and Solutions
The Water Problem:
Traditional datacenter cooling consumes enormous water volumes:
- Evaporative cooling towers: 1-2 million gallons per MW per year
- Large datacenter (100 MW): 100-200 million gallons per year
- Arid regions (Arizona, Nevada, Texas): Water stress + growing datacenter demand = regulatory and community opposition
Community Opposition Examples (from data):
- Tucson, Arizona: City council unanimously voted down $250M Amazon datacenter citing water concerns
- Multiple projects delayed or canceled due to water consumption opposition
Liquid Cooling Water Efficiency:
Liquid cooling improves water efficiency through:
- Higher heat capture efficiency (70-98% vs 50-70% air): Less total cooling load to reject
- Higher temperature heat rejection: Enables dry cooling or reduced evaporative cooling hours
- Closed-loop systems: Minimal water consumption (only makeup for leaks)
Waterless Cooling Technologies:
Technology | Water Savings | Power Density | Limitations |
---|---|---|---|
Dry Coolers (air-cooled heat exchangers) | 100% (zero water) | Moderate (limited by ambient temp) | Less effective in hot climates; higher power consumption |
Adiabatic Cooling | 90-95% (water only during peak heat) | High | Complexity; some water still required |
Closed-Loop Liquid Cooling | 95-99% (no evaporative losses) | Very High (100-300 kW) | Higher capital cost |
Regulatory Trends:
- Increasing scrutiny of datacenter water consumption
- Water impact studies required for large project approvals
- Preference/requirements for waterless or low-water cooling in water-stressed regions
Future Direction: Liquid cooling + dry heat rejection will likely become mandatory in arid regions, with water consumption a key metric in project approvals alongside power consumption.
Market Trends and Projections
Immersion Cooling Market Growth
Market Size:
- 2025: $4.87 billion
- 2030: $11.10 billion (projected)
- CAGR: 17.91% (2025-2030)
Growth Drivers:
- AI and ML training workloads (highest power densities)
- Cryptocurrency mining (mature immersion cooling market)
- Edge computing in space-constrained locations
- Sustainability mandates (PUE improvements)
Market Leaders:
- Green Revolution Cooling (GRC) - Single-phase pioneer
- LiquidStack - Two-phase innovation leader
- Asperitas - Passive immersion
- Submer - SmartPod systems
Adoption Barriers:
- Higher upfront costs vs air cooling
- Perceived operational complexity
- Server warranty concerns (diminishing as vendor support grows)
- Conservative datacenter operations culture
Adoption Accelerators:
- NVIDIA and AMD GPUs requiring liquid cooling (forcing market adoption)
- Hyperscaler validation (Meta, xAI, CoreWeave deployments prove immersion at scale)
- Colocation provider offerings (Equinix, CyrusOne making liquid cooling accessible)
- Regulatory pressure on water consumption (favors immersion/liquid vs evaporative air cooling)
Power Density Trends
Historical Evolution:
Era | Years | Rack Density | Cooling Technology | Workload Driver |
---|---|---|---|---|
Traditional | 2000-2015 | 5-10 kW | Air cooling (hot aisle/cold aisle) | General compute, web |
High-Density Air | 2015-2022 | 15-20 kW | Optimized air (containment, economization) | Cloud, virtualization |
Early AI | 2020-2024 | 30-50 kW | Hybrid air/liquid | GPU training (A100, early H100) |
AI Standard | 2024-2026 | 100-140 kW | Direct liquid cooling mandatory | H100, H200, GB200, GB300 |
Next-Gen AI | 2026+ | 200-300+ kW | Immersion cooling | Future GPU generations |
Current Reality (2025):
- Traditional datacenters: 15-20 kW/rack (air-cooled, legacy infrastructure)
- AI infrastructure standard: 100-140 kW/rack (liquid-cooled, greenfield/purpose-built)
- Cutting-edge: 300 kW/rack (CyrusOne Intelliscale, immersion cooling)
Implications:
- Stranded assets: Air-cooled datacenter capacity struggles to attract AI workloads
- Retrofit challenge: Converting air-cooled facilities to liquid cooling is complex and costly
- Greenfield advantage: Purpose-built liquid-cooled facilities command premium pricing
- Colocation evolution: Providers must retrofit or build new to remain competitive
GPU Deployment Scale (Liquid Cooling Drivers)
Major Deployments (from data):
Operator | GPU Count | Cooling Strategy | Timeline |
---|---|---|---|
CoreWeave | 250,000 (end 2024) | 100% liquid cooling for new facilities | Ongoing |
xAI Colossus | 230,000 (150K H100, 50K H200, 30K GB200) | Liquid cooling | 2024-2025 |
Meta Prometheus | 500,000+ planned (2026) | Liquid cooling (Catalina racks) | 2026 |
xAI Future | 1,000,000 target | Liquid cooling | Multi-year |
AWS UltraCluster | 20,000 H100/H200 per cluster | Liquid cooling for dense deployments | Ongoing |
Crusoe Energy | 100,000 per building capacity | Liquid cooling | Planned |
Market Dynamics:
- Total market approaching 1-2 million high-end GPUs deployed in liquid-cooled infrastructure by end 2025
- Each 1,000 GPUs at 100 kW/rack = 12-15 liquid-cooled racks (assuming 8 GPUs per server, 5-6 servers per rack)
- Liquid cooling infrastructure market driven by GPU procurement (NVIDIA GPU shortages mirror CDU supply constraints)
Vendor Response:
- CDU manufacturers (Vertiv, Supermicro) expanding production capacity
- Server OEMs (Dell, HPE, Lenovo, Supermicro) making liquid cooling standard option
- Colocation providers retrofitting facilities to capture AI workload demand
Conclusion: The Liquid Cooling Imperative
The transition from air cooling to liquid cooling represents a fundamental shift in datacenter infrastructure, driven by the inexorable physics of AI accelerator thermal management. As GPU power consumption increases from 700W (H100) to 1,400W+ (B300) and rack densities reach 140 kW (GB300 NVL72) with projections toward 300+ kW, liquid cooling has evolved from an exotic technology for supercomputers to a business requirement for competitive AI infrastructure.
Key Takeaways
-
Technology Maturity: Direct liquid cooling is proven at scale (CoreWeave, xAI, Meta) and commercially available from multiple vendors (Vertiv, Supermicro, HPE, Dell, Lenovo).
-
Economic Advantage: 40% power savings, 60% footprint reduction, and 20% lower TCO make liquid cooling economically superior for AI workloads despite higher upfront costs.
-
Environmental Benefits: 70-98% heat capture efficiency and 40% lower water consumption address growing sustainability and regulatory pressures.
-
Market Momentum: $4.87B immersion cooling market growing 17.91% annually; all major AI infrastructure operators adopting liquid cooling as standard.
-
Future-Proof: Only liquid cooling scales to 200-300+ kW rack densities required for next-generation AI accelerators; air cooling has reached physical limits.
-
Vendor Ecosystem: Mature ecosystem from CDUs (Vertiv) to immersion tanks (GRC, LiquidStack) to integrated server solutions (Supermicro, HPE) ensures competitive procurement and multi-source supply chains.
-
Colocation Adoption: Equinix, CyrusOne, Aligned, and others making liquid cooling available to customers without requiring purpose-built facilities, democratizing access to high-density AI infrastructure.
Strategic Implications
For AI Infrastructure Operators:
- Liquid cooling is now mandatory, not optional for competitive AI infrastructure
- Plan for 140 kW racks as baseline; design for 200+ kW for future-proofing
- Prioritize vendors with proven liquid cooling deployments at scale
- Consider waterless cooling for locations with water stress or regulatory constraints
For Datacenter Developers:
- Greenfield facilities should be designed liquid-cooling-first (avoiding costly retrofits)
- Partner with liquid cooling vendors early in design phase
- Plan for 2-3x power density headroom (140 kW today, 200-300 kW tomorrow)
- Water-free cooling may be mandatory in arid regions (regulatory and community pressure)
For Colocation Providers:
- Retrofit existing facilities or build new to remain competitive for AI workloads
- Offer liquid cooling as standard service (not premium option)
- Educate customers on benefits (density, efficiency, sustainability)
- Invest in operational expertise (maintenance, fluid management)
For Technology Vendors:
- Server OEMs: Integrate liquid cooling in design (not afterthought)
- Cooling vendors: Expand production capacity (demand exceeds supply)
- Chip designers: Prepare for on-chip cooling integration (5-10 year horizon)
The Path Forward
The datacenter industry is in the midst of its most significant infrastructure transformation since the adoption of virtualization. Liquid cooling, once limited to supercomputing and niche applications, is becoming the standard thermal management approach for the AI era. As GPU power consumption continues to increase and rack densities approach 300 kW, the question is no longer whether to adopt liquid cooling, but which liquid cooling technology best fits specific deployment requirements.
Organizations that embrace this transition early will gain competitive advantages in performance, efficiency, and cost. Those that delay risk being left with stranded air-cooled assets unable to support the AI workloads driving datacenter demand growth. The liquid revolution is here, and the infrastructure decisions made today will determine competitive positioning for the next decade.
Data sources: NVIDIA specifications, vendor technical documentation, industry deployments as of October 2025. Market projections from Mordor Intelligence and industry analysts. Deployment data from public company announcements and technical publications.