How Thermal Management is Changing in the Age of the Kilowatt Chip - Slashdot - JOYK Joy of Geek, Geek News, Link all geek

How Thermal Management is Changing in the Age of the Kilowatt Chip

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

binspam dupe notthebest offtopic slownewsday stale stupid fresh funny insightful interesting maybe offtopic flamebait troll redundant overrated insightful interesting informative funny underrated descriptive typo dupe error

Sign up for the Slashdot newsletter! OR check out the new Slashdot job board to browse remote jobs or jobs in your area

Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!

An anonymous reader shares a report: As Moore's Law slowed to a crawl, chips, particularly those used in AI and high-performance computing (HPC), have steadily gotten hotter. In 2023 we saw accelerators enter the kilowatt range with the arrival of Nvidia's GH200 Superchips. We've known these chips would be hot for a while now -- Nvidia has been teasing the CPU-GPU franken-chip for the better part of two years. What we didn't know until recently is how OEMs and systems builders would respond to such a power-dense part. Would most of the systems be liquid cooled? Or, would most stick to air cooling? How many of these accelerators would they try to cram into a single box, and how big would the box be?

Now that the first systems based on the GH200 make their way to market, it's become clear that form factor is very much being dictated by power density than anything else. It essentially boils down to how much surface area you have to dissipate the heat. Dig through the systems available today from Supermicro, Gigabyte, QCT, Pegatron, HPE, and others and you'll quickly notice a trend. Up to about 500 W per rack unit (RU) -- 1 kW in the case of Supermicro's MGX ARS-111GL-NHR -- these systems are largely air cooled. While hot, it's still a manageable thermal load to dissipate, working out to about 21-24 kW per rack. That's well within the power delivery and thermal management capacity of modern datacenters, especially those making use of rear door heat exchangers.

However, this changes when system builders start cramming more than a kilowatt of accelerators into each chassis. At this point most of the OEM systems we looked at switched to direct liquid cooling. Gigabyte's H263-V11, for example, offers up to four GH200 nodes in a single 2U chassis. That's two kilowatts per rack unit. So while a system like Nvidia's air-cooled DGX H100 with its eight 700 W H100s and twin Sapphire Rapids CPUs has a higher TDP at 10.2 kW, it's actually less power dense at 1.2 kW/RU.

How Thermal Management is Changing in the Age of the Kilowatt Chip - Slashdot

Recommend

君联资本领投，眼科手术设备生产商犀燃医疗完成近亿元A轮融资

AI闭门会，大佬们都好敢说！

New York Times sues Microsoft and OpenAI for 'billions'

Chisel: A Modern Hardware Design Language

百济神州2024年的四道考题

2023网易未来大会开幕共话中国科技和AGI未来

速腾聚创今起招股，预期24年1月5日上市

规模30亿，鼎晖创新与成长三期成都基金完成首关

Urtopia Fusion 电动自行车将人工智能和先进技术融合在城市交通中

从今年的40个高分案例中，总结出公益营销4大趋势

About Joyk