研究人员研发首个视觉、语言、音频和动作多模态模型 Unified-IO 2

8 months ago

source link: https://www.aixinzhijie.com/article/6842426
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

2023-12-30 09:29

研究人员研发首个视觉、语言、音频和动作多模态模型 Unified-IO 2

据新智元 12 月 30 日报道，来自艾伦人工智能研究所、伊利诺伊大学厄巴纳 - 香槟分校、华盛顿大学的学者提出了 Unified-IO 2。据悉，Unified-IO 2 是首个具备理解和创造图像、文本、音频以及动作能力的自回归多模态模型。

Unified-IO 2 在 GRIT 基准测试中取得了 SOTA，在超过 30 个基准测试中展现出了卓越的性能，包括图像生成与理解、文本理解、视频和音频理解以及机器人操作等领域。研究人员会将这些模型开放给研究社区，共同推动科学研究的进展。

Recommend

研究人员研发首个视觉、语言、音频和动作多模态模型 Unified-IO 2

研究人员研发首个视觉、语言、音频和动作多模态模型 Unified-IO 2

Recommend

Here are some of the best features to try on your new Apple Watch

跨越“偏见”——聊聊汽车广告当中的女性叙事

powerdesigner如何画er图

Thymeleaf的vscode插件

英伟达上架开卖首款中国特供版 RTX 4090D

Notes 5: IOI 2007 Aliens [Easy]

一年18场联名，喜茶营销不红了？

BlackRock Moves Towards Bitcoin ETF Approval; Solana And InQubeta Impress Invest...

Good bye 2023 , Happy new year ![Celebrations in the Middle East]

Greg Joswiak talks Steve Jobs keynotes, ‘Shot on iPhone’ event, and more in new...

About Joyk