7

llama.cpp 的載入速度加速

 1 year ago
source link: https://blog.gslin.org/archives/2023/04/01/11123/llama-cpp-%e7%9a%84%e8%bc%89%e5%85%a5%e9%80%9f%e5%ba%a6%e5%8a%a0%e9%80%9f/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

llama.cpp 的載入速度加速

Hacker News 上看到「Llama.cpp 30B runs with only 6GB of RAM now (github.com/ggerganov)」這個消息,原 pull request 在「Make loading weights 10-100x faster #613」這邊。

這個 PR 的作者 Justine Tunney 在 PR 上有提到他改變 model 檔案格式,以便改用 mmap(),大幅降低了需要預先讀取的時間 (因為變成 lazy-loading style),而且這也讓系統可以利用 cache page,避免了 double buffering 的問題:

This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o.

這讓我想到在資料庫領域中,PostgreSQL 也會用 mmap() 操作,有點類似的概念。

另外 Justine Tunney 在這邊的 comment 有提到一個意外觀察到的現象,他發現實際在計算的時候用到的 model 內容意外的少:他用一個簡單的 prompt 測試,發現 20GB 的 30B model 檔案在他的 Intel 機器上實際只用到了 1.6GB 左右:

If I run 30B on my Intel machine:

[...]

As we can see, 400k page faults happen, which means only 1.6 gigabytes ((411522 * 4096) / (1024 * 1024)) of the 20 gigabyte weights file actually needed to be used.

這點他還在懷疑是不是他的修改有 bug,但目前他覺得不太像,也看不太出來:

Now, since my change is so new, it's possible my theory is wrong and this is just a bug. I don't actually understand the inner workings of LLaMA 30B well enough to know why it's sparse. Maybe we made some kind of rare mistake where llama.cpp is somehow evaluating 30B as though it were the 7B model. Anything's possible, however I don't think it's likely. I was pretty careful in writing this change, to compare the deterministic output of the LLaMA model, before and after the Git commit occurred. I haven't however actually found the time to reconcile the output of LLaMA C++ with something like PyTorch. It'd be great if someone could help with that, and possibly help us know why, from more a data science (rather than systems engineering perspective) why 30B is sparse.

如果不是 bug 的話,這其實冒出了一個很有趣的訊號,表示這些 model 是有可能再瘦身的?

Related

Meltdown 與 Spectre 都有用到的 FLUSH+RELOAD

Meltdown 與 Spectre 攻擊裡都有用到的 FLUSH+RELOAD 技巧。這個技巧是出自於 2013 年的「Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack」。當時還因此對 GnuPG 發了一個 CVE-2013-4242。 FLUSH+RELOAD 是希望透過 shared memory & cache 得到 side channel information,藉此突破安全機制。 論文裡面提到兩個攻擊模式,一種是在同一個 OS 裡面 (same-OS),另外一種是在同一台機器,但是是兩個不同的 VM (cross-VM)。攻擊的前提是要拿到與 GnuPG process 相同的 shared memory。兩個環境的作法都是透過 mmap() GnuPG 的執行檔以取得 shared memory。 在 same-OS 的情況下會使用同一個 process:…

January 5, 2018

In "Computer"

FreeBSD 上 PHP5 的 pecl-APC 效能

環境是單顆 E5405 的機器,上面的作業系統是 FreeBSD 7.1,應用程式的部份是 apache 2.2 (worker) + mod_fastcgi 2.4.6 + PHP 5.2.8 + APC 3.0.19。 由於前陣子發現 PHP 在 MP 架構下效能不太好,偶而會有一堆 php-cgi 卡住,用 top 會發現卡在 "lockf" 這個狀態,這時候前端的 L4 switch 會看到 500 Internal Server Error 而把這台機器暫時離線。等到 php-cgi 慢慢消化完,前端的 L4 switch 又會抓到 200。 在 php-cgi 卡住的狀況下試著用 gdb 找原因,當時 backtrace 發現是卡在 APC…

January 28, 2009

In "Computer"

關於 jquery-latest.js...

jQuery 官方希望大家不要再使用 jquery-latest.js 了:「Don’t Use jquery-latest.js」。 由於他們發現有大量的網站使用 jquery-latest.js,如果直接照著字面上的意思升級到 2.0,會造成這些網站在 IE{6,7,8} 上爛掉: To mitigate the risk of “breaking the web”, the jQuery team decided back in 2013 that jquery-latest.js could not be upgraded to the 2.0 branch even though that is technically the latest version. There would just be too many…

July 5, 2014

In "Browser"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on April 1, 2023Categories Computer, Murmuring, SoftwareTags ai, buffer, cache, cpp, double, learning, llama, machine, mmap, performance, speed, time

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK