THE BEST Photo to 3D AI Model !
source link: https://hackernoon.com/the-best-photo-to-3d-ai-model
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
THE BEST Photo to 3D AI Model !
As if taking a picture wasn’t a challenging enough technological prowess, we are now doing the opposite: modeling the world from pictures. I’ve covered amazing AI-based models that could take images and turn them into high-quality scenes. A challenging task that consists of taking a few images in the 2-dimensional picture world to create how the object or person would look in the real world.
Take a few pictures and instantly have a realistic model to insert into your product. How cool is that?!
The results have dramatically improved upon the first model I covered in 2020, called NeRF. And this improvement isn’t only about the quality of the results. NVIDIA made it even better.
Not only that the quality is comparable, if not better, but it is more than 1'000 times faster with less than two years of research.
Watch the video
References
►Read the full article: https://www.louisbouchard.ai/nvidia-photos-into-3d-scenes/
►NVIDIA's blog post (credit to video): https://blogs.nvidia.com/blog/2022/03/25/instant-nerf-research-3d-ai/
►NVIDIA's video: https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.mp4
►Paper: Thomas Muller, Alex Evans, Christoph Schied and Alexander
Keller, 2022, "Instant Neural Graphics Primitives with a Multiresolution
Hash Encoding", https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf
►Project link: https://nvlabs.github.io/instant-ngp/
►Code: https://github.com/NVlabs/instant-ngp
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
Video Transcript
as if taking a picture wasn't a
challenging enough technological prowess
we are now doing the opposite modeling
the world from pictures i've covered
amazing ai based models that could take
images and turn them into high quality
scenes a challenging task that consists
of taking a few images in the
two-dimensional picture world to create
how the object or person will look like
in the real world you can easily see how
useful this technology is for many
industries like video games animation
movies or advertising take a few
pictures and instantly have a realistic
model to insert into your product the
results have dramatically improved upon
the first model i covered in 2020 called
nerf and this improvement isn't only
about the quality of the results nvidia
made it even better not only that the
quality is comparable if not better but
it's more than one thousand times faster
with less than two years of research
this is the pace of ai research
exponential gains in quality and
efficiency a big factor that makes this
field so incredible you will be lost
with the new techniques and quality of
the results if you miss just a couple of
days which is why i first created this
channel and why you should also
subscribe just look at those 3d models
these cool models only needed a dozen
pictures and the ai guessed the missing
spot and created this beauty in seconds
something like this took hours to
produce with nerf let's dive into how
they made this much progress on so many
fronts in so little time but first i'd
like to take a few seconds to talk about
active loop an amazing company i
recently stumbled on and they are now
sponsoring this video active loop is
becoming popular with its open source
dataset format for ai hub one of the top
10 python packages in 2021 with active
loop hub you can treat your data sets as
numpy like arrays as a result you have a
simple dataset api for creating storing
version controlling and querying ai data
sets of any size it's perfect to
collaborate with your team and iterate
on your data sets the feature i like the
most is being able to stream my data
sets while training models in pytorch or
tensorflow this means anyone can access
any slice of the data and start training
models in seconds no matter how big is
the data set just like that how cool is
that with all these neat features hub
definitely frees me from building data
pipelines so i can train my models
faster active loop has just released
more than 100 image video and audio data
sets available almost instantly with a
single line of code try them out in your
workflows and let me know in the
comments below how it works i'd love to
know what you build with them
instant nerf attacks the task of inverse
rendering which consists of rendering a
3d representation from pictures a dozen
in this case approximating the real
shape of the object and how light will
behave on it so that it looks realistic
in any new scene here nerf stands for
neural radiance fields i will only do a
quick overview of how nerfs work as i
already covered this kind of network in
multiple videos which i invite you to
watch for more detail and a better
understanding quickly nerfs is a type of
neural network they take images and
camera settings as inputs and learn how
to produce an initial 3d representation
of the objects or scenes in the picture
fine tune this representation using
learn parameters from a supervised
learning settings this means that we
need a 3d object and a few images of it
at different known angles to train it
and the network will learn to recreate
the object to make the results as best
as possible we need a picture from
multiple viewpoints like this to be sure
we capture all or most sides of the
objects and we train this network to
understand general objects shapes and
light radiance we are asking it to learn
how to fill the missing parts based on
what it has seen before and how light
reacts to them in the 3d world basically
it will be like asking you to draw a
human without giving any details on the
hands you'd automatically assume the
person has five fingers based on your
knowledge this is easy for us as we have
many years of experience behind the belt
and one essential thing current ais are
lacking our intelligence we can create
links where there are none and do many
unbelievable things on the opposite side
ai needs specific rules or at least
examples to follow which is why we need
to give it what an object looks like in
the real world during its training phase
to improve then after such a training
process you only feed the images with
the camera angles at inference time and
it produces the final model in a few
hours did i see a few hours i'm sorry i
was still in 2021. it now does that in a
few seconds this new version by nvidia
called instant nerf is indeed 1000 times
faster than its nerf predecessor from a
year ago why because of multi-resolution
hash grid encoding multi-what
multi-resolution hash grid encoding they
explained it very clearly with this
sentence
we reduce the cost with a versatile new
input encoding that permits the use of a
smaller network without sacrificing
quality thus significantly reducing the
number of floating point and memory
access operations
in short they change how the nerf
network will see the inputs so our
initial 3d model prediction makes it
more digestible and information
efficient to use a smaller network while
keeping the quality of the outputs the
same keeping such a high quality using a
smaller network is possible because we
are not only learning the weights of the
nerf network during training but also
the way we are transforming those inputs
beforehand so the input is transformed
using trained functions here step one to
four compressed in a hash table to focus
on valuable information extremely
quickly and then sent to a much smaller
network in step 5 as the inputs are
similarly much smaller now they are
storing the values of any type in the
table with keys indicating where they
are stored for super efficient parallel
modifications and removing the lookup
time for big arrays during training and
inference this transformation and a much
smaller network is why instant nerf is
so much faster and why it made it into
this video and voila this is how nvidia
is now able to generate 3d models like
these in seconds
if this wasn't cool enough i said that
it can store values of any type which
means that this technique can not only
be used with nerfs but also with other
super cool applications like gigapixel
images that become just as incredibly
efficient of course this was just an
overview of this new paper attacking
this super interesting task in a novel
way i invite you to read their excellent
paper for more technical detail about
the multi-resolution hash grid encoding
approach and their implementation a link
to the paper and their code is in the
description below thank you for watching
the whole video please take a second to
let me know what you think of the
overall quality of the videos and new
editing i will see you next week with
another amazing paper
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK