7

Meta open sources ImageBind AI model that combines six different data types

 1 year ago
source link: https://www.neowin.net/news/meta-open-sources-imagebind-ai-model-that-combines-six-different-data-types/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Meta open sources ImageBind AI model that combines six different data types

Image of Meta Logo

Meta announced today that it's open-sourcing a new AI model called ImageBind. It's a multimodal AI designed to work with six different types of data, including text, audio, video, 3D, thermal, and motion. ImageBind can receive input in one of the supported data modes which it can relate to others.

For instance, it can find the sound of waves when given a picture of a beach. When it's fed with a photo of a tiger and the sound of the waterfall, the system can give a video that combines both, Meta CEO Mark Zuckerberg explained on his Instagram broadcast channel. "This is a step towards AIs that understand the world around them more like we do, which will make them a lot more useful and will open up totally new ways to create things," he said.

Working Meta ImageBind

Meta explains in a blog post that ImageBind takes an approach similar to how humans can gather information from multiple senses, and process all of it simultaneously and holistically. In the future, it plans to expand the supported data modes to other senses such as touch, speech, smell, and brain fMRI signals, which will enable richer human-centric AI models.

For reference, existing AI models like Open AI's DALL E 2, MidJourney, and Stable Diffusion are trained to link text and images. These systems take inputs in the form of natural language text prompts and generate an image accordingly.

ImageBind can have various applications, for instance, it can be used to improve search functionality for pictures, videos, audio files, or text messages using a combination of text, audio, and image. Meta's AI tool Make-A-Scene which currently uses text prompts to generate images can leverage ImageBind to generate images using audio. Meta has published a research paper [PDF] describing its open-source AI model but it's yet to release a tool or consumer product based on it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK