Amazon EC2 的 Trn1 正式開放使用

AWS 自家研發晶片的 trn1.* 上線了：「Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available」。

先前三家雲端的廠商只有 Google Cloud Platform 有 TPU 可以 train & evaluate，現在 AWS 推出 AWS Trainium，補上 train 這塊的產品。其中官方宣稱可以比 GPU 架構少 50% 的計算成本：

Trainium-based EC2 Trn1 instances solve this challenge by delivering faster time-to-train while offering up to 50% cost-to-train savings over comparable GPU-based instances.

然後 PyTorch 與 TensorFlow 都有支援：

The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.

另外用 neuron-ls 可以看到 Neuron 裝置的資訊，不過沒看懂為什麼要 mask 掉 private ip 的資訊：

大型的 cluster 會使用 Amazon FSx for Lustre 整合提供服務：

For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network.

但第一波開放的區域有點少，只有萬年美東一區 us-east-1 與美西二區 us-west-2：

You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan.

在 us-east-1 上 trn1.2xlarge 的價錢是 US$1.34375/hr，但沒有實際跑過比較好像沒辦法評估到底行不行...

但總算是擺出個產品對打看看，畢竟要夠大才能去訂製這些東西。

AWS 開始推自己的 Machine Learning Chip

除了常見的 GPU 類，以及之前公佈過的 FPGA 外，這次 AWS 推出的是自己做的晶片 AWS Inferentia，以及對應到 EC2 上的機種 inf1：「Amazon EC2 Update – Inf1 Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing」。從介紹可以看到支援的形式： Each AWS Inferentia chip supports up to 128 TOPS (trillions of operations per second) of performance at low power to enable multiple chips…

December 4, 2019

In "AWS"

Amazon EC2 的 F1 type 開放一般使用

AWS 提供更快計算 Bitcoin 的 FPGA 機種開放一般使用了：「Amazon EC2 F1 Instances, Customizable FPGAs for Hardware Acceleration Are Now Generally Available」。在 AWS 開始提供服務後，應該會有更多 library 支援吧... 現在現有的應用要上去還得自己先刻些東西，不像 TensorFlow 可以透過 GPU 運算。 F1 instances include the latest 16 nm Xilinx UltraScale Plus FPGA with local 64 GiB DDR4 ECC protected memory, with a dedicated…

April 30, 2017

In "AWS"

Amazon EC2 推出 VT1 Instance

看到 Amazon EC2 推出新機種 vt1，專門為影片壓縮而推出的 family type：「New – Amazon EC2 VT1 Instances for Live Multi-stream Video Transcoding」。主要是透過 Alveo U30 Data Center Accelerator Card 這張卡加速，號稱比 GPU 機器還要省 30% 的費用 (CPU 的話可以到 60%)： These VT1 instances feature Xilinx® Alveo™ U30 media accelerator transcoding cards with accelerated H.264/AVC and H.265/HEVC codecs and…

September 16, 2021

In "AWS"

Author Gea-Suan LinPosted on October 17, 2022Categories AWS, Cloud, Computer, Hardware, Murmuring, Network, ServiceTags amazon, aws, cloud, cost, ec2, family, high, instance, learning, machine, ml, model, neuron, performance, service, train, training, trainium, trn1, type

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Amazon EC2 的 Trn1 正式開放使用

Amazon EC2 的 Trn1 正式開放使用

Related

AWS 開始推自己的 Machine Learning Chip

Amazon EC2 的 F1 type 開放一般使用

Amazon EC2 推出 VT1 Instance

Leave a Reply

Post navigation

Recommend

How to export Amazon EC2 instances to a CSV file

Spring Boot & Amazon Web Services (EC2, RDS & S3)

Amazon EC2 Auto Scaling 支援 Warm Pools

Google 在南韓開放 app 裡面使用其他付款機制了

Kagi 的搜尋引擎開放註冊，以及公佈付費方案

Cloudflare 開放 RBAC 給所有人用

批次更新 Azure VM 之 RDP/SSH 開放來源 IP

Apple 在歐盟 DMA 的法規下被強制開放 App Store 與各種限制

Bluesky 開放一般註冊了

Threads 開放更多使用者進入 Fediverse

About Joyk