Imagic: AI Image Editing from Text Commands
source link: https://hackernoon.com/imagic-ai-image-editing-from-text-commands
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Imagic: AI Image Editing from Text Commands
Imagic: AI Image Editing from Text Commands
6 min
by @whatsai
Louis Bouchard
@whatsai
I explain Artificial Intelligence terms and news to non-experts.
Too Long; Didn't Read
Imagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. You can generate an image and then teach the model to edit it any way you want. Imagic: Text-Based Real Image Editing with Diffusion Models. ArXiv preprint arXiv: 2210.09276. Use it with Stable Diffusion: https://://://www.louisbouchard.ai/imagic/Company Mentioned
audio
element.Louis Bouchard
I explain Artificial Intelligence terms and news to non-experts.
This week’s paper may just be your next favorite model to date.
If you think the recent image generation models like DALLE or Stable Diffusion are cool, you just won’t believe how incredible this one is.
"This one" is Imagic.
Imagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. Just look at that... You can generate an image and then teach the model to edit it any way you want.
Learn more in the video below...
References:
►Read the full article: https://www.louisbouchard.ai/imagic/
►Kawar, B., Zada, S., Lang, O., Tov, O., Chang, H., Dekel, T., Mosseri, I. and Irani, M., 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276.
► Use it with Stable Diffusion: https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
Video Transcript
look at that you can generate an image
and then teach the model to edit it any
way you want this is a pretty big step
towards having your very own Photoshop
designer for free the model not only
understands what you want to show but
it's also able to stay realistic as well
as keeping the properties of the initial
images just look at how the dog stays
the same in all images here this task is
called text conditioned image editing
this means editing images by only using
text and an initial image which was
pretty much impossible not even a year
ago now look at what it can do yes this
is all done from a single input image
and a short sentence where you see what
you'd like to have how amazing is that
the only thing even cooler is how it
works let's dive into it but first if
you are currently learning AI or want to
start learning it you will love this
opportunity I know how hard it can be to
make real progress when learning AI
sometimes extra structure and
accountability can be what propos you to
the next level if that sounds like you
join the sponsor of this video Delta
Academy at Delta Academy you learn
reinforcement learning by building game
AIS in a live cohort go from zero to
alphago through export crafted
interactive tutorials live discussions
with these experts and weekly AI
building competitions it's not just
another course spam website it's intense
hand-on and focused on high quality
designed by experts from deepmind Oxford
and Cambridge it's where coders go to
Future proof their carrier from the
advance of AI and have fun plus with a
live community of peers and experts to
push you forward you'll write iconic
algorithms in Python ranging from dqn to
alphago one of the coolest program ever
made join them now through my link below
and use the promo code what's AI to get
10 off
so how does iMagic work as we said it
takes an image and a caption to edit the
set image and you can even generate
multiple variations of it this model
like the vast majority of the papers
that are released these days is based on
diffusion models more specifically it
takes an image generator model that was
already trained to generate images from
text and adapts it to image editing in
their case it uses Imogen which I
covered in a previous video it's a
diffusion based generative model able to
create high definition images after
being trained on a huge data set of
image caption pairs in the case of
iMagic they simply take this pre-trained
imagen model as a Baseline and make
modifications to it in order to edit the
images sent as input keeping the image
specific appearance such as the dog's
race and identity and editing it
following our text so to start we have
to encode both the text and the initial
image Edge so that it can be understood
by our Imaging model when this is done
we optimize our text encodings our text
embeddings to better fit our initial
image basically taking our text
representation and optimize it for our
initial image called e optimize to be
sure it understands that in this example
we want to generate the same kind of
image with a similar looking bird and
background then we take our pre-trained
image generator to fine tune it meaning
that we will retrain the image and model
keeping the optimized text embeddings we
just produced the same so these two
steps are used to get the text embedding
closer to the image embedding by
freezing one of the two and getting the
other closer which will ensure that we
optimize for both the text and initial
image not only one of the two now that
our model understands the initial image
in our text and understands that they
are similar we need to teach it to
generate New Image variations for this
text this spark is super simple our text
embeddings and image optimized
embeddings are very similar but still
not the exact same the only thing we do
here is that we take the image embedding
in our encoded space and move it a bit
toward the text embedding at this moment
if you ask the iMagic model to generate
an image using the optimized text it
should give you the same image as your
input image so if you move the embedding
a bit toward your text embeddings it
will also edit the image a bit toward
what you want the more you move it in
this space the more the edit will be big
and the farther away you will get from
your initial image so the only thing you
need to figure out now is the size of
this step you want to take toward your
text and voila when you find your
perfect balance you have a new model
able to generate as many variations as
you want to conserve the important image
attribute views while editing the way
you want of course the results are not
perfect yet as you can see here where
the model either does not edit properly
or does random image modifications to
the initial image like cropping or
zooming inappropriately still it stays
pretty impressive if you ask me I find
the pace of the image generation
progress incredible and that's both
amazing and scary at the same time I'd
love to know your opinion on these kinds
of image generating and image editing
models do you think they are a good or
bad thing what kinds of consequences you
can think of from such models becoming
more and more powerful you can find more
details on the specific parameters they
use to achieve these results in their
paper which I definitely invite you to
read I also invite you to watch my image
and video if you'd like more information
about the image generation part and
understand how it works huge thanks to
my friends at Delta Academy for working
on making learning AI fun something I am
passionate about please give it a try
and let me know what you think I
personally love this way of teaching and
I am sure you will too thank you for
supporting my work by checking out their
website and by watching the whole video
and I hope you enjoyed it I will see you
next week with another amazing paper
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK