Imagic: AI Image Editing from Text Commands

October 24th 2022

6 min

by @whatsai

Louis Bouchard

@whatsai

I explain Artificial Intelligence terms and news to non-experts.

350 reads Trending # 6

350 new reads in the last 24hr

Read this story in a terminal

Too Long; Didn't Read

Imagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. You can generate an image and then teach the model to edit it any way you want. Imagic: Text-Based Real Image Editing with Diffusion Models. ArXiv preprint arXiv: 2210.09276. Use it with Stable Diffusion: https://://://www.louisbouchard.ai/imagic/

Company Mentioned

YouTube

Your browser does not support theaudio element.

Read by Dr. One (en-US)

Audio Presented by

@whatsai

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

About @whatsai

LEARN MORE ABOUT @WHATSAI'S EXPERTISE AND PLACE ON THE INTERNET.

This week’s paper may just be your next favorite model to date.

If you think the recent image generation models like DALLE or Stable Diffusion are cool, you just won’t believe how incredible this one is.

"This one" is Imagic.

Imagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. Just look at that... You can generate an image and then teach the model to edit it any way you want.

Learn more in the video below...

References:

►Read the full article: https://www.louisbouchard.ai/imagic/
►Kawar, B., Zada, S., Lang, O., Tov, O., Chang, H., Dekel, T., Mosseri, I. and Irani, M., 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276.
► Use it with Stable Diffusion: https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

look at that you can generate an image

and then teach the model to edit it any

way you want this is a pretty big step

towards having your very own Photoshop

designer for free the model not only

understands what you want to show but

it's also able to stay realistic as well

as keeping the properties of the initial

images just look at how the dog stays

the same in all images here this task is

called text conditioned image editing

this means editing images by only using

text and an initial image which was

pretty much impossible not even a year

ago now look at what it can do yes this

is all done from a single input image

and a short sentence where you see what

you'd like to have how amazing is that

the only thing even cooler is how it

works let's dive into it but first if

you are currently learning AI or want to

start learning it you will love this

opportunity I know how hard it can be to

make real progress when learning AI

sometimes extra structure and

accountability can be what propos you to

the next level if that sounds like you

join the sponsor of this video Delta

Academy at Delta Academy you learn

reinforcement learning by building game

AIS in a live cohort go from zero to

alphago through export crafted

interactive tutorials live discussions

with these experts and weekly AI

building competitions it's not just

another course spam website it's intense

hand-on and focused on high quality

and Cambridge it's where coders go to

Future proof their carrier from the

advance of AI and have fun plus with a

live community of peers and experts to

push you forward you'll write iconic

algorithms in Python ranging from dqn to

alphago one of the coolest program ever

made join them now through my link below

and use the promo code what's AI to get

10 off

so how does iMagic work as we said it

takes an image and a caption to edit the

set image and you can even generate

multiple variations of it this model

like the vast majority of the papers

that are released these days is based on

diffusion models more specifically it

takes an image generator model that was

already trained to generate images from

text and adapts it to image editing in

their case it uses Imogen which I

covered in a previous video it's a

diffusion based generative model able to

create high definition images after

being trained on a huge data set of

image caption pairs in the case of

iMagic they simply take this pre-trained

imagen model as a Baseline and make

modifications to it in order to edit the

images sent as input keeping the image

specific appearance such as the dog's

race and identity and editing it

following our text so to start we have

to encode both the text and the initial

image Edge so that it can be understood

by our Imaging model when this is done

we optimize our text encodings our text

embeddings to better fit our initial

image basically taking our text

representation and optimize it for our

initial image called e optimize to be

sure it understands that in this example

we want to generate the same kind of

image with a similar looking bird and

background then we take our pre-trained

image generator to fine tune it meaning

that we will retrain the image and model

keeping the optimized text embeddings we

just produced the same so these two

steps are used to get the text embedding

closer to the image embedding by

freezing one of the two and getting the

other closer which will ensure that we

optimize for both the text and initial

image not only one of the two now that

our model understands the initial image

in our text and understands that they

are similar we need to teach it to

generate New Image variations for this

text this spark is super simple our text

embeddings and image optimized

embeddings are very similar but still

not the exact same the only thing we do

here is that we take the image embedding

in our encoded space and move it a bit

toward the text embedding at this moment

if you ask the iMagic model to generate

an image using the optimized text it

should give you the same image as your

input image so if you move the embedding

a bit toward your text embeddings it

will also edit the image a bit toward

what you want the more you move it in

this space the more the edit will be big

and the farther away you will get from

your initial image so the only thing you

need to figure out now is the size of

this step you want to take toward your

text and voila when you find your

perfect balance you have a new model

able to generate as many variations as

you want to conserve the important image

attribute views while editing the way

you want of course the results are not

perfect yet as you can see here where

the model either does not edit properly

or does random image modifications to

the initial image like cropping or

zooming inappropriately still it stays

pretty impressive if you ask me I find

the pace of the image generation

progress incredible and that's both

amazing and scary at the same time I'd

love to know your opinion on these kinds

of image generating and image editing

models do you think they are a good or

bad thing what kinds of consequences you

can think of from such models becoming

more and more powerful you can find more

details on the specific parameters they

use to achieve these results in their

paper which I definitely invite you to

read I also invite you to watch my image

and video if you'd like more information

about the image generation part and

understand how it works huge thanks to

my friends at Delta Academy for working

on making learning AI fun something I am

passionate about please give it a try

and let me know what you think I

personally love this way of teaching and

I am sure you will too thank you for

supporting my work by checking out their

website and by watching the whole video

and I hope you enjoyed it I will see you

next week with another amazing paper

Imagic: AI Image Editing from Text Commands

Louis Bouchard

@whatsai

Too Long; Didn't Read

Company Mentioned

References:

Video Transcript

Recommend

America’s Hidden (and Much Better) Economy

SAP TechEd 2022: Connect with SAP Integration Suite

Bolt.css

The iPad Lineup Is, Like, Growing Out Its Bangs or Something and Looks a Little...

Two alleged Chinese intelligence officers accused by DOJ of trying to buy info a...

Chime Vs Ally: Which Is Better for Online Banking?

Apple Releases watchOS 9.1 With Battery Life Improvements and Matter Integration

NASA proved it can deflect asteroids, but spotting them is difficult - The Washi...

晨星四季度展望：亚洲市场估值消化接近完成，可增仓三个性价比高的行业

当古代文明遇到前沿科技时

About Joyk