NVIDIA announces TensorRT-LLM for Windows that boosts LLMs by up to 4 times with...

NVIDIA announces TensorRT-LLM for Windows that boosts LLMs by up to 4 times with RTX GPUs

NVIDIA is already the kind of generative AI in terms of hardware. Its GPUs power data centers used by Microsoft, OpenAI, and others to run AI services like Bing Chat, ChatGPT, and more. Today, NVIDIA announced a new software tool designed to boost the performance of large language models (LLMs) on local Windows PCs.

In a blog post, NVIDIA announced that its TensorRT-LLM open-sourced library, which was previously released for data centers, is now available for Windows PCs. The big feature is that TensorRT-LLM allows LLMs to run up to four times faster on Windows PCs if they have NVIDIA GeForce RTX GPUs.

NVIDIA describes the benefits of TensorRT-LLM for both developers and end users in the post:

At higher batch sizes, this acceleration significantly improves the experience for more sophisticated LLM use — like writing and coding assistants that output multiple, unique auto-complete results at once. The result is accelerated performance and improved quality that lets users select the best of the bunch.

The blog post showed an example of how TensorRT-LLM works. It asked the question, "How does NVIDIA ACE generate emotional responses?" to the standard LLaMa 2 LLM, and it failed to offer an accurate response.

However, when an LLM is paired with a vector library or vector database, and then asked the same question, it generated not only an accurate answer, but the TensorRT-LLM library created a faster response. TensorRT-LLM should be available soon on NVIDIA's developer site.

NVIDIA also added some AI-based features in today's new GeForce driver update. That includes the new 1.5 version of its RTX Video Super Resolution feature for better upscaling and fewer compression effects when viewing online videos. It also added TensorRT AI acceleration for Stable Diffusion Web UI, allowing people with GeForce RTX GPUs to get images from the AI art creator faster than normal.

Recommend

我想总要有人做一个（免费的）大小周小组件吧

Java并发编程模式：探索不同的线程安全实现方式

Beyond HODLing: Here's How $Verse and CryptoWallet.com Lead the Way – Press rele...

Automating Mastodon Postings with ColdFusion

中金资本旗下首支乡村振兴主题基金在东北吉林设立

提效20%的秘密武器？以淘宝为例谈谈发票系统如何提效

[2023.10.17]没跑赢指数，加仓银行以及新能源，临时止盈医药（上午止盈，下午就涨上来...

德克萨斯大学研发出新的 VR 头显以测量大脑活动

YouTube is rolling out a new 'You' section as part of a substantial update

多人 VR 射击游戏《Breachers》将于 11 月登陆 PSVR2 头显

About Joyk