Photo: Investing.com
Amazon Web Services (AWS) has announced a major leap in AI infrastructure by designing its own in-row cooling technology to handle the heat generated by Nvidia’s new GPU-powered AI systems. As the generative AI arms race accelerates, traditional air cooling methods are proving inadequate for the massive energy demands of next-gen processors like Nvidia’s GB200 NVL72, which can house up to 72 GPUs in a single rack.
Rather than relying on external suppliers or building new data centers from scratch, AWS engineers created the In-Row Heat Exchanger (IRHX)—a plug-and-play solution tailored for both new and existing data centers.
Nvidia GPUs have become the cornerstone of AI model training and deployment, but their intense power consumption demands efficient heat dissipation systems. AWS initially considered third-party liquid cooling solutions and even building new liquid-cooled data centers. However, those plans were abandoned due to inefficiency and lack of scalability.
“They would take up too much data center floor space or increase water usage substantially,” said Dave Brown, AWS’s Vice President of Compute and Machine Learning Services.
“And while some solutions work in smaller volumes, they simply can't support AWS’s global scale.”
Instead, the company chose to innovate internally—leading to the creation of the IRHX.
The In-Row Heat Exchanger fits between server racks and circulates chilled water to absorb and remove heat. It enables AWS to scale up GPU workloads without rebuilding data centers or compromising on sustainability targets. This technology allows efficient cooling of Nvidia's Blackwell GPU-based systems, which demand both performance and density.
Customers can now access these systems through AWS’s P6e compute instances, which support large-scale AI model training and inferencing using Nvidia’s latest architecture.
AWS, the world’s largest cloud infrastructure provider, has a long-standing tradition of designing in-house hardware—from custom Graviton chips to Trainium AI accelerators and now its own cooling systems. By reducing dependency on external vendors, AWS improves cost efficiency, accelerates deployment speed, and gains more control over its data center ecosystems.
In Q1 2025, AWS posted its highest operating margin since at least 2014, highlighting the financial benefits of vertical integration in cloud infrastructure.
AWS is not alone in customizing hardware to meet AI demands. Microsoft, the world’s second-largest cloud provider, introduced its “Sidekicks” cooling system in 2023 to support its internally developed Maia AI chips. Companies like CoreWeave have also partnered with Nvidia to deploy high-performance AI clusters, but AWS’s scale and ability to engineer in-house solutions give it a unique edge.
With the global AI infrastructure market expanding rapidly, the focus is now shifting from just powerful chips to sustainable, efficient systems that can support their demands over the long term.
As AI workloads become increasingly power-hungry, AWS’s proactive approach—designing custom cooling solutions like the IRHX—positions it ahead of the curve. By tackling the thermal challenges of modern GPUs, Amazon is not only ensuring the future-readiness of its data centers but also reinforcing its leadership in the high-stakes cloud and AI war.
With Nvidia’s chips at the heart of this evolution, infrastructure innovations like these will define who leads the next generation of computing.