2

Amazon EC2 的 Trn1 正式開放使用

 1 year ago
source link: https://blog.gslin.org/archives/2022/10/17/10924/amazon-ec2-%e7%9a%84-trn1-%e6%ad%a3%e5%bc%8f%e9%96%8b%e6%94%be%e4%bd%bf%e7%94%a8/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Amazon EC2 的 Trn1 正式開放使用

AWS 自家研發晶片的 trn1.* 上線了:「Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available」。

先前三家雲端的廠商只有 Google Cloud PlatformTPU 可以 train & evaluate,現在 AWS 推出 AWS Trainium,補上 train 這塊的產品。其中官方宣稱可以比 GPU 架構少 50% 的計算成本:

Trainium-based EC2 Trn1 instances solve this challenge by delivering faster time-to-train while offering up to 50% cost-to-train savings over comparable GPU-based instances.

然後 PyTorchTensorFlow 都有支援:

The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.

另外用 neuron-ls 可以看到 Neuron 裝置的資訊,不過沒看懂為什麼要 mask 掉 private ip 的資訊:

Jlp7Q7Z.png

大型的 cluster 會使用 Amazon FSx for Lustre 整合提供服務:

For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network.

IyMFZT3.png

但第一波開放的區域有點少,只有萬年美東一區 us-east-1 與美西二區 us-west-2

You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan.

us-east-1trn1.2xlarge 的價錢是 US$1.34375/hr,但沒有實際跑過比較好像沒辦法評估到底行不行...

但總算是擺出個產品對打看看,畢竟要夠大才能去訂製這些東西。

Related

AWS 開始推自己的 Machine Learning Chip

除了常見的 GPU 類,以及之前公佈過的 FPGA 外,這次 AWS 推出的是自己做的晶片 AWS Inferentia,以及對應到 EC2 上的機種 inf1:「Amazon EC2 Update – Inf1 Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing」。 從介紹可以看到支援的形式: Each AWS Inferentia chip supports up to 128 TOPS (trillions of operations per second) of performance at low power to enable multiple chips…

December 4, 2019

In "AWS"

Amazon EC2 的 F1 type 開放一般使用

AWS 提供更快計算 Bitcoin 的 FPGA 機種開放一般使用了:「Amazon EC2 F1 Instances, Customizable FPGAs for Hardware Acceleration Are Now Generally Available」。 在 AWS 開始提供服務後,應該會有更多 library 支援吧... 現在現有的應用要上去還得自己先刻些東西,不像 TensorFlow 可以透過 GPU 運算。 F1 instances include the latest 16 nm Xilinx UltraScale Plus FPGA with local 64 GiB DDR4 ECC protected memory, with a dedicated…

April 30, 2017

In "AWS"

Amazon EC2 推出 VT1 Instance

看到 Amazon EC2 推出新機種 vt1,專門為影片壓縮而推出的 family type:「New – Amazon EC2 VT1 Instances for Live Multi-stream Video Transcoding」。 主要是透過 Alveo U30 Data Center Accelerator Card 這張卡加速,號稱比 GPU 機器還要省 30% 的費用 (CPU 的話可以到 60%): These VT1 instances feature Xilinx® Alveo™ U30 media accelerator transcoding cards with accelerated H.264/AVC and H.265/HEVC codecs and…

September 16, 2021

In "AWS"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on October 17, 2022Categories AWS, Cloud, Computer, Hardware, Murmuring, Network, ServiceTags amazon, aws, cloud, cost, ec2, family, high, instance, learning, machine, ml, model, neuron, performance, service, train, training, trainium, trn1, type

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK