Amazon EC2 的 Trn1 正式開放使用
source link: https://blog.gslin.org/archives/2022/10/17/10924/amazon-ec2-%e7%9a%84-trn1-%e6%ad%a3%e5%bc%8f%e9%96%8b%e6%94%be%e4%bd%bf%e7%94%a8/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Amazon EC2 的 Trn1 正式開放使用
AWS 自家研發晶片的 trn1.*
上線了:「Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available」。
先前三家雲端的廠商只有 Google Cloud Platform 有 TPU 可以 train & evaluate,現在 AWS 推出 AWS Trainium,補上 train 這塊的產品。其中官方宣稱可以比 GPU 架構少 50% 的計算成本:
Trainium-based EC2 Trn1 instances solve this challenge by delivering faster time-to-train while offering up to 50% cost-to-train savings over comparable GPU-based instances.
然後 PyTorch 與 TensorFlow 都有支援:
The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.
另外用 neuron-ls
可以看到 Neuron 裝置的資訊,不過沒看懂為什麼要 mask 掉 private ip 的資訊:
大型的 cluster 會使用 Amazon FSx for Lustre 整合提供服務:
For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network.
但第一波開放的區域有點少,只有萬年美東一區 us-east-1
與美西二區 us-west-2
:
You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan.
在 us-east-1
上 trn1.2xlarge
的價錢是 US$1.34375/hr,但沒有實際跑過比較好像沒辦法評估到底行不行...
但總算是擺出個產品對打看看,畢竟要夠大才能去訂製這些東西。
Related
AWS 開始推自己的 Machine Learning Chip
除了常見的 GPU 類,以及之前公佈過的 FPGA 外,這次 AWS 推出的是自己做的晶片 AWS Inferentia,以及對應到 EC2 上的機種 inf1:「Amazon EC2 Update – Inf1 Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing」。 從介紹可以看到支援的形式: Each AWS Inferentia chip supports up to 128 TOPS (trillions of operations per second) of performance at low power to enable multiple chips…
December 4, 2019In "AWS"
Amazon EC2 的 F1 type 開放一般使用
AWS 提供更快計算 Bitcoin 的 FPGA 機種開放一般使用了:「Amazon EC2 F1 Instances, Customizable FPGAs for Hardware Acceleration Are Now Generally Available」。 在 AWS 開始提供服務後,應該會有更多 library 支援吧... 現在現有的應用要上去還得自己先刻些東西,不像 TensorFlow 可以透過 GPU 運算。 F1 instances include the latest 16 nm Xilinx UltraScale Plus FPGA with local 64 GiB DDR4 ECC protected memory, with a dedicated…
April 30, 2017In "AWS"
Amazon EC2 推出 VT1 Instance
看到 Amazon EC2 推出新機種 vt1,專門為影片壓縮而推出的 family type:「New – Amazon EC2 VT1 Instances for Live Multi-stream Video Transcoding」。 主要是透過 Alveo U30 Data Center Accelerator Card 這張卡加速,號稱比 GPU 機器還要省 30% 的費用 (CPU 的話可以到 60%): These VT1 instances feature Xilinx® Alveo™ U30 media accelerator transcoding cards with accelerated H.264/AVC and H.265/HEVC codecs and…
September 16, 2021In "AWS"
Author Gea-Suan LinPosted on October 17, 2022Categories AWS, Cloud, Computer, Hardware, Murmuring, Network, ServiceTags amazon, aws, cloud, cost, ec2, family, high, instance, learning, machine, ml, model, neuron, performance, service, train, training, trainium, trn1, type
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Notify me of follow-up comments by email.
Notify me of new posts by email.
To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)
Post navigation
Recommend
-
14
The HFT Guy A developer in London Amazon website is limited to 50 instances per page. Viewing lots of instances is a pain and it doesn’t support e...
-
9
Spring Boot & Amazon Web Services (EC2, RDS & S3) This post will take you through a step by step guide to building and deploying a simple Java app in the AWS cloud. The app will use a few well known AWS services which I’ll de...
-
4
Amazon EC2 Auto Scaling 支援 Warm Pools EC2 推出的新功能:「
-
3
Google 在南韓開放 app 裡面使用其他付款機制了 先前在「
-
8
Kagi 的搜尋引擎開放註冊,以及公佈付費方案 先前提過 Kagi 這個搜尋引擎 (「
-
3
Cloudflare 開放 RBAC 給所有人用 Cloudflare 宣佈讓所有人用
-
2
批次更新 Azure VM 之 RDP/SSH 開放來源 IP-黑暗執行緒 困擾我一陣子的小問題,今天花點時間寫幾行程式解決。 使用 Azure VM 時,我們需要使用 SSH 或 RDP 遠端登入,但實務上都建議要加限制來源 IP,以防成為攻擊或入侵的目標:
-
2
Apple 在歐盟 DMA 的法規下被強制開放 App Store 與各種限制 昨天科技圈最熱門的消息應該是 Apple 公開了在歐...
-
4
Bluesky 開放一般註冊了 Bluesky 宣佈開放一般註冊了,不再需要邀請碼:「
-
1
Threads 開放更多使用者進入 Fediverse Meta 宣佈了讓更多
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK