Meta open sources ImageBind AI model that combines six different data types

Meta announced today that it's open-sourcing a new AI model called ImageBind. It's a multimodal AI designed to work with six different types of data, including text, audio, video, 3D, thermal, and motion. ImageBind can receive input in one of the supported data modes which it can relate to others.

For instance, it can find the sound of waves when given a picture of a beach. When it's fed with a photo of a tiger and the sound of the waterfall, the system can give a video that combines both, Meta CEO Mark Zuckerberg explained on his Instagram broadcast channel. "This is a step towards AIs that understand the world around them more like we do, which will make them a lot more useful and will open up totally new ways to create things," he said.

Meta explains in a blog post that ImageBind takes an approach similar to how humans can gather information from multiple senses, and process all of it simultaneously and holistically. In the future, it plans to expand the supported data modes to other senses such as touch, speech, smell, and brain fMRI signals, which will enable richer human-centric AI models.

For reference, existing AI models like Open AI's DALL E 2, MidJourney, and Stable Diffusion are trained to link text and images. These systems take inputs in the form of natural language text prompts and generate an image accordingly.

ImageBind can have various applications, for instance, it can be used to improve search functionality for pictures, videos, audio files, or text messages using a combination of text, audio, and image. Meta's AI tool Make-A-Scene which currently uses text prompts to generate images can leverage ImageBind to generate images using audio. Meta has published a research paper [PDF] describing its open-source AI model but it's yet to release a tool or consumer product based on it.

Meta open sources ImageBind AI model that combines six different data types

Recommend

Get the Razer BlackWidow V4 mechanical keyboard for its lowest price ever on Ama...

公牛集团新财报毫无惊喜，身家百亿的阮氏兄弟也有焦虑

批次双单位应用-产品单位

Workflow enablement for Registered Product – Warranty in SAP Business ByDesign

中国品牌日看科技长虹：产业报国的“品牌新力量”哪里来？

Data writeback from SAC to SAP BW4 HANA through Open Data Protocol (OData) Servi...

“中特估”动力源来自国资改革与价值重估共振

VOBA：和一颗绿苹果一起『去野』

SkySpaces is BlueSky’s answer to live audio - The Verge

ChatGPT会对我们日常生活带来什么影响？这些技术会改变我们学习阅读工作方式吗？ - 汀...

About Joyk