20

Generate Novel Artistic Artworks with Deep Learning

 4 years ago
source link: https://towardsdatascience.com/generate-novel-artistic-artworks-with-deep-learning-f2f61da69e6e?gi=7c62ae386246
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

EzQNbmJ.png!web

UbMZnaZ.jpg!web

an example of Neural Style Transfer

1. Problem Statement

In this article, I will go on about using deep learning to compose images in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as neural style transfer ! This is a technique outlined in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style , which is a great read, and you should definitely check it out.

But, what is neural style transfer?

Neural style transfer is an optimization technique used to take three images, a content image, a style reference image (such as an artwork by a famous painter), and the input image you want to style — and blend them together such that the input image is transformed to look like the content image, but “painted” in the style of the style image, bridging the orbits of deep learning and art!

For example, let’s take an image of this turtle and Katsushika Hokusai’s The Great Wave off Kanagawa :

qyqmiif.jpg!web

Image of Green Sea Turtle -By P.Lindgren [CC BY-SA 3.0 ( https://creativecommons.org/licenses/by-sa/3.0)] , from Wikimedia Commons.

NrQBBvv.jpg!web

https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa

Now how would it look like if Hokusai decided to paint the picture of this turtle exclusively with this style? Something like this?

u2UnIvB.png!web

Is this magic or just deep learning? Fortunately, this doesn’t involve any witchcraft: style transfer is a fun and interesting technique that showcases the capabilities and internal representations of neural networks.

The principle of neural style transfer is to define two distance functions, one that describes how different the content of two images are , , and one that describes the difference between two images in terms of their style, . Then, given three images, a desired style image, a desired content image, and the input image (initialized with the content image), we try to transform the input image to minimize the content distance with the content image and its style distance with the style image. In summary, we’ll take the base input image, a content image that we want to match, and the style image that we want to match. We’ll transform the base input image by minimizing the content and style distances (losses) with backpropagation, creating an image that matches the content of the content image and the style of the style image.

In this article, we will be generating an image of the Louvre museum in Paris (content image C), mixed with a painting by Claude Monet, a leader of the impressionist movement (style image S).

2. Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning .

Following the original NST paper , I will be using the VGG network. Specifically, VGG-19 , a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low-level features (at the shallower layers) and high-level features (at the deeper layers).

The following code to load parameters from the VGG model (refer to Github repo for more information):

pp = pprint.PrettyPrinter(indent=4)
model = load_vgg_model(“pretrained-model/imagenet-vgg-verydeep-19.mat”)
pp.pprint(model)

The model is stored in a python dictionary. The python dictionary contains key-value pairs for each layer in which the ‘key’ is the variable name and the ‘value’ is a tensor for that layer.

3. Neural Style Transfer (NST)

We will build the Neural Style Transfer (NST) algorithm in three steps:

  • Build the content cost function J_content (C, G).
  • Build the style cost function J_style (S, G).
  • Put it together to obtain J(G) = α * J_content (C, G) + β * J_style (S, G).
Overall cost function of the neural style transfer algorithm

3.1 Computing content cost

In our running example, the content image C will be the picture of the Louvre Museum in Paris (scaled to 400 x 300 )

content_image = scipy.misc.imread(“images/louvre.jpg”)
imshow(content_image);

IBFrUbm.png!web

The content image (C) shows the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.

3.1.1 Match content of generated image G with image C

As aforementioned, the shallower layers of a ConvNet tend to detect lower-level features such as edges and simple textures; the deeper layers tend to detect higher-level features such as more complex textures as well as object classes.

We would like the generated image G to have similar content as the input image C . Suppose you have chosen some layer’s activations to represent the content of an image. In practice, you’ll get the most visually pleasing results if you choose a layer in the middle of the network — neither too shallow nor too deep.

Note: After you have finished this article’s example, feel free to experiment with different layers, to see how the results vary.

First, we will set the image C as the input to the pre-trained VGG network, and run forward propagation. Let a be the hidden layer activations in the layer you had chosen. This will be an nH × nW × nC tensor.

Repeat this process with the image G — set G as the input, and run forward propagation. Let a be the corresponding hidden layer activation.

We will then define the content cost function as:

Content cost function

Here, nH , nW, and nC are respectively the height, width and number of channels of the hidden layer you have chosen. The terms appear in a normalization term in the cost.

For clarity, note that a and a are the 3D volumes corresponding to a hidden layer’s activations. In order to compute the cost J_content (C, G), it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.

RJVZNfY.png!web

Unrolling 3D volumes of activation layers into a 2D matrix.

Technically, this unrolling step isn’t needed to compute J_content , but it will be of good practice for when you do need to carry out a similar operation later for computing the style cost J_style .

Implementing compute_content_cost()

compute_content_cost() function computes the content cost using TensorFlow.

The 3 steps to implement this function are:

  1. Retrieve dimensions from a_G .
  2. Unroll a_C and a_G as explained in the picture above.
  3. Compute the content cost.

In summary, the content cost takes a hidden layer activation of the neural network, and measures how different a and a are. When we minimize the content cost later, this will help make sure G has similar content as C .

3.2 Computing style cost

For our running example, we will use the following style image:

rQNFniI.png!web

By Claude Monet, a leader of the impressionist movement, painted in the style of impressionism .

3.2.1 Style matrix

The style matrix is also called a Gram matrix . In linear algebra, the Gram matrix G of a set of vectors (v₁,…, v n ) is the matrix of dot products, whose entries are G ij = vᵢᵀ vⱼ= np.dot(vᵢ, vⱼ)

In other words, G ij compares how similar vᵢ is to vⱼ. If they are highly similar, you would expect them to have a large dot product, and thus for G ij to be large.

Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature. G is used to denote the Style matrix (or Gram matrix); G also denotes the generated image. For this example, we will use G gram to refer to the Gram matrix, and G to denote the generated image.

In Neural Style Transfer (NST), you can compute the Style matrix by multiplying the “unrolled” filter matrix with its transpose:

ueaIbiM.png!web

G gram measures the correlation between two filters:

The result is a matrix of dimension ( nC, nC ) where nC is the number of filters (channels). The value G gram(i, j) measures how similar the activations of filter i are to the activations of filter j .

G gram also measures the prevalence of patterns or textures:

The diagonal elements G gram(i, i) measure how “active” a filter i is. For example, suppose filter i is detecting vertical textures in the image. Then G gram(i, i) measures how common vertical textures are in the image as a whole. If G gram(i, i) is large, this means that the image has a lot of vertical texture.

Implementing gram_matrix()

3.2.2 Style cost

The goal will be to minimize the distance between the Gram matrix of the style image S and the gram matrix of the generated image G.

For now, we are using only a single hidden layer a ˡ. The corresponding style cost for this layer is defined as:

Style cost

Implementing compute_layer_style_cost()

The 3 steps to implement this function are:

  1. Retrieve dimensions from the hidden layer activations a_G .
  2. Unroll the hidden layer activations a_S and a_G into 2D matrices, as explained in the figure above.
  3. Compute the Style matrix of the images S and G with the function we had previously written.
  4. Compute the Style cost.

3.2.3 Style Weights

So far, we have captured the style from only one layer. We would get better results if we “merge” style costs from several different layers. Each layer will be given weights ( λˡ ) that reflect how much each layer will contribute to the style. By default, we’ll give each layer equal weight, and the weights add up to 1. After completing this example, feel free to experiment with different weights to see how it changes the generated image G .

You can combine the style costs for different layers as follows:

where the values for λˡ are given in STYLE_LAYERS .

STYLE_LAYERS = [
 (‘conv1_1’, 0.2),
 (‘conv2_1’, 0.2),
 (‘conv3_1’, 0.2),
 (‘conv4_1’, 0.2),
 (‘conv5_1’, 0.2)]

Implementing compute_style_cost()

This function calls the compute_layer_style_cost(...) function several times, and weighs their results using the values in STYLE_LAYERS .

Description of compute_style_cost

For each layer:

  • Select the activation (the output tensor) of the current layer.
  • Get the style of the style image S from the current layer.
  • Get the style of the generated image G from the current layer.
  • Compute the style cost for the current layer
  • Add the weighted style cost to the overall style cost ( J_style )

Once done with the loop:

  • Return the overall style cost.

Note: In the inner-loop of the for-loop above, a_G is a tensor and hasn't been evaluated yet. It will be evaluated and updated at each iteration when we run the TensorFlow graph in model_nn() below.

In summary, the style of an image can be represented using the Gram matrix of a hidden layer’s activations. We get even better results by combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient. In addition, minimizing the style cost will cause the image G to follow the style of the image S .

3.3 Defining the total cost to optimize

Finally, let’s create a cost function that minimizes both the style and the content cost. The formula is:

Overall cost function for NST

Implementing total_cost()

The total cost is a linear combination of the content cost J_content (C, G) and the style cost J_style (S, G).

α and β are hyperparameters that control the relative weighting between content and style.

4. Solving the optimization problem

Finally, let’s put everything together to implement Neural Style Transfer!

Here’s what the program will have to do:

  1. Create an Interactive Session
  2. Load the content image
  3. Load the style image
  4. Randomly initialize the image to be generated
  5. Load the VGG19 model
  6. Build the TensorFlow graph:
  • Run the content image through the VGG19 model and compute the content cost
  • Run the style image through the VGG19 model and compute the style cost
  • Compute the total cost
  • Define the optimizer and the learning rate

7. Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.

Let's go through the individual steps in detail.

Interactive Sessions

We’ve previously implemented the overall cost J(G) . We’ll now set up TensorFlow to optimize this with respect to G .

To do so, our program has to reset the graph and use an “ Interactive Session ”. Unlike a regular session, the “Interactive Session” installs itself as the default session to build a graph. This allows us to run variables without constantly needing to refer to the session object (calling sess.run() ), which simplifies the code.

# Reset the graph
tf.reset_default_graph()# Start interactive session
sess = tf.InteractiveSession()

Content image

Let’s load, reshape, and normalize our content image (the Louvre museum picture):

content_image = scipy.misc.imread(“images/w_hotel.jpg”)
content_image = reshape_and_normalize_image(content_image)

Style image

Let’s load, reshape and normalize our style image (Claude Monet’s painting):

style_image = scipy.misc.imread(“images/starry_night.jpg”)
style_image = reshape_and_normalize_image(style_image)

Generated image correlated with content image

Now, we initialize the generated image as a noisy image created from the content_image .

The generated image is slightly correlated with the content image. By initializing the pixels of the generated image to be mostly noise but slightly correlated with the content image, this will help the content of the generated image more rapidly match the content of the content image.

Feel free to look in nst_utils.py to see the details of generate_noise_image(...) in the Github repo.

generated_image = generate_noise_image(content_image)
imshow(generated_image[0]);

AJjYJjM.png!web

generate_noise_image(content_image)

Load pre-trained VGG19 model

Next, as explained before, we shall load the VGG19 model.

model = load_vgg_model(“pretrained-model/imagenet-vgg-verydeep-19.mat”)

Content Cost

To get the program to compute the content cost, we will now assign a_C and a_G to be the appropriate hidden layer activations. We will use layer conv4_2 to compute the content cost. The code below does the following:

  1. Assign the content image to be the input to the VGG model.
  2. Set a_C to be the tensor giving the hidden layer activation for layer conv4_2 .
  3. Set a_G to be the tensor giving the hidden layer activation for the same layer.
  4. Compute the content cost using a_C and a_G .

Note: At this point, a_G is a tensor and hasn’t been evaluated. It will be evaluated and updated at each iteration when we run the Tensorflow graph in model_nn() below.

# Assign the content image to be the input of the VGG model. 
sess.run(model[‘input’].assign(content_image))# Select the output tensor of layer conv4_2
out = model[‘conv4_2’]# Set a_C to be the hidden layer activation from the layer we have selected
a_C = sess.run(out)# Set a_G to be the hidden layer activation from same layer. Here, a_G references model[‘conv4_2’] 
# and isn’t evaluated yet. Later in the code, we’ll assign the image G as the model input, so that
# when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
a_G = out# Compute the content cost
J_content = compute_content_cost(a_C, a_G)

Style cost

# Assign the input of the model to be the “style” image 
sess.run(model[‘input’].assign(style_image))# Compute the style cost
J_style = compute_style_cost(model, STYLE_LAYERS)

Total cost

Now that we have the content cost ( J_content) and style cost ( J_style ), compute the total cost J by calling total_cost() .

J = total_cost(J_content, J_style, alpha=10, beta=40)

Optimizer

Here, I used the Adam optimizer to minimize the total cost J .

# define optimizer
optimizer = tf.train.AdamOptimizer(2.0)# define train_step
train_step = optimizer.minimize(J)

Implementing model_nn()

The function initializes the variables of the TensorFlow graph, assigns the input image (initial generated image) as the input of the VGG19 model, and runs the train_step tensor (it was created in the code above this function) for a large number of steps.

Run the following code snippet to generate an artistic image. It should take about 3min on CPU for every 20 iterations but you start observing attractive results after ≈140 iterations. Neural Style Transfer is generally trained using GPUs.

model_nn(sess, generated_image)

You’re done! After running this, you should see something the image presented below on the right:

iUr6Zv2.png!web

Here are a few other examples:

  • The beautiful ruins of the ancient city of Persepolis (Iran) with the style of Van Gogh (The Starry Night)

fEnYjyU.png!web

  • The tomb of Cyrus the Great in Pasargadae with the style of a Ceramic Kashi from Ispahan.

JfeAjqV.png!web

  • A scientific study of a turbulent fluid with the style of a abstract blue fluid painting.

baaqA3q.png!web

6. Conclusion

You are now able to use Neural Style Transfer to generate artistic images. Neural Style Transfer is an algorithm, that given a content image C and a style image S , can generate an artistic image.

It uses representations (hidden layer activations) based on a pre-trained ConvNet. The content cost function is computed using one hidden layer’s activations; the style cost function for one layer is computed using the Gram matrix of that layer’s activations. The overall style cost function is obtained using several hidden layers.

Lastly, optimizing the total cost function results in synthesizing new images.

7. Citations & References

Github repo: https://github.com/TheClub4/artwork-neural-style-transfer

Special thanks to deeplearning.ai . Images courtesy of deeplearning.ai .

The Neural Style Transfer algorithm was due to Gatys et al. (2015). Harish Narayanan and Github user “log0” also have highly readable write-ups from which we drew inspiration. The pre-trained network used in this implementation is a VGG network, which is due to Simonyan and Zisserman (2015). Pre-trained weights were from the work of the MathConvNet team.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK