目前可商用的 LLM

在 Ask Hacker News Weekly 上看到的討論，有人問了目前可商用的 LLM 有哪些：「Ask HN: Open source LLM for commercial use?」。

有人提到 Google 的 Flan 應該是目前最能打的？在 Hugging Face 上可以下載到：

I've seen this question asked repeatedly in many LLaMa threads, currently the best models that are truly open are the released models from the Flan family by Google, which includes Flan-T5[0] and Flan-UL2[1]. According to its paper, Flan-UL2 performs slightly better than Flan-T5-XXL.

然後差不多是 GPT-3 的等級，離 GPT-3.5 或是演伸出來的 ChatGPT 都還有段距離。但如果針對特定情境下 tune 的話應該還是能用的：

These models perform slightly better than GPT-3 under some tasks[2], but they're still far from achieving the results from GPT-3.5 and GPT-4. This becomes evident when you try to use them in the real world; they're not "good enough" for general use cases, unlike ChatGPT models. However, if you can restrict your use case to one particular domain, you can achieve pretty good results by further fine-tuning these models.

另外一則回覆有提到一些其他的 model：

The ones I saw mentioned so far were Flan, Cerebras, GPT-J, and RWKV.

Not yet mentioned:

* Pythia https://github.com/EleutherAI/pythia

* GLM-130B https://github.com/THUDM/GLM-130B - see also ChatGLM-6B https://github.com/THUDM/ChatGLM-6B

* GPT-NeoX-20B https://huggingface.co/EleutherAI/gpt-neox-20b

* GeoV-9B https://github.com/geov-ai/geov

* BLOOM https://huggingface.co/bigscience/bloom and BLOOMZ https://huggingface.co/bigscience/bloomz

看起來如果有需要用的話是可以從這裡面挖看看...

GPT 的進程 (或是 LLM 的進程)

前幾天不知道在哪邊看到「Five years of GPT progress」這篇，裡面整理了這五年 GPT/LLM 的進程，算是回顧性質的文章，裡面當然有提到技術改善的地方 (像是參數大小，類神經網路層的架構差異)，另外裡面都有原始論文或是資料的連結，然後作者也有描述一些當時的背景，對於要釐清歷史脈絡也蠻有幫助的。從 GPT、GPT-2、GPT-3 這三個 OpenAI 的作品開始講，然後提到 GPT-3 帶出來的新紀元。接著提到的是各家都開始進來參與的年代，Jurassic-1 (AI21 Labs)、Megatron-Turing NLG (Nvidia)、Gopher (DeepMind)、Chinchilla (DeepMind)、PaLM (Google AI)。然後是 LLaMa (Facebook)，第一個有參數夠大，而且效能夠好的 model，被放出來讓大家玩的 LLM。最後又回到 OpenAI 的 GPT-4。這樣整理讀起來清晰不少，但要注意裡面的發展不是線性關係，彼此之間互相影響交錯在跑 (因為中間還是有很多其他的論文互相影響)。

April 10, 2023

In "Computer"

玩最近 Facebook Research (Meta) 放出來的 LLaMA

很多地方應該都有提到 Facebook Research (Meta) 放出來的 LLaMA 了，對應的論文是「LLaMA: Open and Efficient Foundation Language Models」這篇，但這邊論文提到的 open 並不是一般常見的 open 定義，而只是常見的行銷詞彙而已，實際上只是 free for charging with constraints。另外要注意 LLaMA 是個 LLM 而已，跟 ChatGPT 不算是同樣性質的東西，能對比應該是 GPT-3 (或是 GPT-3.5)。主要是 ChatGPT 多了 SL 與 RL 的步驟，而產出來的東西更接近商業化產品要的結果。 LLaMA 的特點在於效能不錯，可以用 LLaMA-13B 打贏 GPT-3 (175B)，另外這次訓練出來最大的 LLaMA-65B 則可以站上第一梯隊 (與 DeepMind 的…

March 16, 2023

In "Computer"

透過 WebGPU 跑的 Web LLM

在 Simon Willison 這邊看到的玩法，透過 WebGPU 在瀏覽器上面直接跑 LLM 的 demo：「Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive」，專案在「Web LLM」這邊，可以直接玩。不過要注意一下瀏覽器的支援度，如果是 Chrome 的話需要 113+，但目前 stable 還是 112；而 Firefox 的話我試過在 about:config 裡面用 dom.webgpu.enabled 打開 WebGPU 支援，但重開瀏覽器後還是跑不動？(也有可能是 Linux 環境的關係) Update：應該是 Linux 環境的關係，我在 Linux 下用 dev channel (114)…

April 17, 2023

In "Browser"

Author Gea-Suan LinPosted on April 17, 2023Categories Computer, MurmuringTags ai, commercial, flan, google, language, large, learning, license, llm, machine, model, open, source, use

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

目前可商用的 LLM

目前可商用的 LLM

Related

GPT 的進程 (或是 LLM 的進程)

玩最近 Facebook Research (Meta) 放出來的 LLaMA

透過 WebGPU 跑的 Web LLM

Leave a Reply

Post navigation

Recommend

Why Did a 21-Year-Old Guardsman Have Access to State Secrets?

What Is Crypto, and Is It Making a Comeback?

The importance of platform engineers in a security program

小调查：各位每天的实际代码工作时间有多长？

Exploring the Uno Platform Visual Studio Wizard for Creating Apps for the Window...

Higher Order Components in React: Examples with Common Use-case Patterns

这款华为16英寸大屏笔记本堪称办公人福音 - 手机中国

硅谷 NEC 关系图：将中国企业拽进 AI 时代的人

SQL:2023 的新玩意

Generate placeholder images at edge with thumbhash

About Joyk