How Thermal Management is Changing in the Age of the Kilowatt Chip - Slashdot
source link: https://tech.slashdot.org/story/23/12/27/1320226/how-thermal-management-is-changing-in-the-age-of-the-kilowatt-chip
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How Thermal Management is Changing in the Age of the Kilowatt Chip
Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!
binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror
Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
Now that the first systems based on the GH200 make their way to market, it's become clear that form factor is very much being dictated by power density than anything else. It essentially boils down to how much surface area you have to dissipate the heat. Dig through the systems available today from Supermicro, Gigabyte, QCT, Pegatron, HPE, and others and you'll quickly notice a trend. Up to about 500 W per rack unit (RU) -- 1 kW in the case of Supermicro's MGX ARS-111GL-NHR -- these systems are largely air cooled. While hot, it's still a manageable thermal load to dissipate, working out to about 21-24 kW per rack. That's well within the power delivery and thermal management capacity of modern datacenters, especially those making use of rear door heat exchangers.
However, this changes when system builders start cramming more than a kilowatt of accelerators into each chassis. At this point most of the OEM systems we looked at switched to direct liquid cooling. Gigabyte's H263-V11, for example, offers up to four GH200 nodes in a single 2U chassis. That's two kilowatts per rack unit. So while a system like Nvidia's air-cooled DGX H100 with its eight 700 W H100s and twin Sapphire Rapids CPUs has a higher TDP at 10.2 kW, it's actually less power dense at 1.2 kW/RU.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK