8

10 steps to a better Dockerfile

 2 years ago
source link: https://developers.redhat.com/articles/2021/10/12/10-steps-better-dockerfile
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

10 steps to a better Dockerfile Skip to main content

The journey to the cloud typically starts with containerizing your apps. One of the first challenges developers face is writing the blueprint for those container images—aka a Dockerfile. This article guides you through nine steps to writing better Dockerfiles. The basis for our example is a popular Spring application.

Prerequisites

Prerequisites

Make sure you have the following tools on your system to run this example:

I'm also assuming that you have some understanding of Java, Docker, and Dockerfiles. That is, if you have a Dockerfile, I expect that you can build the image and run a container using that image.

The application we'll use is spring-petclinic, which you should download by executing the following command:

git clone https://github.com/spring-projects/spring-petclinic.git

Compile the application to build the .jar file, because the rest of the article assumes that you have a precompiled application with a .jar file.

Step 0: Start with a working Dockerfile

Step 0: Start with a working Dockerfile

This is the foundation for all the rest of the steps in this article. We'll start with a very basic Dockerfile.

Contrary to popular belief, image size is not your first concern when writing a Dockerfile. If you're used to apt-get, you don't need to switch to a different package manager to convert your 700MB Ubuntu image to a 10MB Alpine image. The first step is to just get the Dockerfile to work, preferably using the same build commands that you're accustomed to.

Create a Dockerfile at the root of the project with the following contents:

FROM debian
RUN apt-get update


COPY . /app

# Install OpenJDK-11
RUN apt-get update && \
    apt-get install -y openjdk-11-jdk && \
    apt-get install -y ant && \
    apt-get clean;

# Fix certificate issues
RUN apt-get update && \
    apt-get install ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f;

# Set up JAVA_HOME -- useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME


ENTRYPOINT ["java","-jar","/app/target/app.jar"]

This Dockerfile works, but it could be improved upon. Just like a hundred-year-old house, we'll need to do some renovations.

Step 1: Optimize caching

Step 1: Optimize caching

In the first improvement step, we'll tackle caching.

"What is caching?" you might ask. Each instruction in the Dockerfile creates a new image layer and adds it to your local image cache. That image then becomes the read-only parent for the image created by the next instruction.

Layers stack on top of previous layers, adding functionality incrementally. Once Docker caches an image layer for an instruction, the layer doesn't need to be rebuilt. Caching reuses existing image layers and helps save expensive network calls. To read more about caching, please visit the article Intro Guide to Dockerfile Best Practices.

Here are a couple things to keep in mind about caching:

  • Order your steps in the Dockerfile from the least to the most frequently changing steps, to optimize your caching.
  • When copying files into your images, be very specific about what you want to copy, because any change in the files you're copying will break the cache.

The following Dockerfile introduces caching into our build:

# Note: The order matters for caching
FROM debian
RUN apt-get update

## Improvement 1A. Move copy command from here...

## Improvement 1B. Have cacheable units cached together. The apt-get update and install should happen together or not at all.

# Install OpenJDK-11
RUN apt-get update && \
    apt-get install -y openjdk-11-jdk && \
    apt-get install -y ant && \
    apt-get clean;

# Fix certificate issues
RUN apt-get update && \
    apt-get install ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f;

# Set up JAVA_HOME -- useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME

## Improvement 1A. ...to here
## This is because we're building from a pre-compiled application and don't need to copy the source files.

COPY target/spring-petclinic-2.4.5.jar /app

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Step 2: Eliminate unnecessary dependencies

Step 2: Eliminate unnecessary dependencies

The next improvement is to eliminate dependencies that you don't need in your image. You can always add the dependencies later if you need them. The new Dockerfile is:


FROM debian
RUN apt-get update

## Improvement 2: Remove unnecessary dependencies using the --no-install-recommends option.

# Install OpenJDK-11
RUN apt-get update && \
    apt-get install -y --no-install-recommends openjdk-11-jdk && \
    apt-get install -y ant && \
    apt-get clean;

# Fix certificate issues
RUN apt-get update && \
    apt-get install ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f;

# Set up JAVA_HOME -- useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME

COPY target/spring-petclinic-2.4.5.jar /app

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Step 3: Use an official image and specific image tags

Step 3: Use an official image and specific image tags

This step carries out possibly the most important improvements to the Dockerfile, because they affect security and maintainability:

  • Use official images, which are vetted for vulnerabilities, whenever possible.
  • Do not, under any circumstances, use the latest tag. Just don't.

If you’re tagging an image with latest, that’s all the information you have later, apart from the image ID. How will you distinguish between an old "latest" and a new "latest" as your application evolves? Operations such as deploying a new version of your application or rolling back to an earlier version are simply not possible if you don’t have two distinctly tagged images with stable tags.

The new Dockerfile for this step is:

# Use official image and specific image tags

## Improvement 3: Using an official image that already has Java and lists a specific version.
FROM openjdk:11
RUN apt-get update

COPY target/spring-petclinic-2.4.5.jar /app

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Suddenly the Dockerfile looks less ugly. Don't get too happy, though, because it will get complicated again.

Step 4: Look for minimal flavors

Step 4: Look for minimal flavors

In this step, we look for minimal flavors that will do the job.

Word of caution: This choice involves a balance between convenience and image size.

That means your favorite Alpine image will no longer let you run apt-get:

## Improvement 4: Look for minimal flavors.
FROM openjdk:17-alpine

## Warning: You're not going to get apt-get anymore
# RUN apt-get update

COPY target/spring-petclinic-2.4.5.jar /app

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Step 5: Build from source, not the build artifact

Step 5: Build from source, not the build artifact

So far, we have been using a prebuilt binary and copying only the .jar file. However, if the Dockerfile is indeed the blueprint, the source of truth should be the source code, not the build artifact. So we try the following:

## Improvement 5A: Build from source in a consistent environment.
FROM maven:3.8.1-ibmjava-8-alpine

WORKDIR /app

COPY pom.xml .

COPY src ./src

RUN mvn -e -B package

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

But this change introduces a problem: Every time you make a code change, the dependencies are fetched. The solution is to make dependency resolution a separate step:

## Improvement 5A: Build from source in a consistent environment.
FROM maven:3.8.1-ibmjava-8-alpine

WORKDIR /app

COPY pom.xml .

## Improvement 5B: Separate dependency from build step for caching.
RUN mvn -e -B dependency:resolve

COPY src ./src

RUN mvn -e -B package

ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Step 6: Introduce multi-stage builds

Step 6: Introduce multi-stage builds

So far you've made great progress. But...the image has become a lot bigger again. You have also added your development and build tools in the final image. Multi-stage builds to the rescue! Learn more about multi-stage builds in this article on the Docker site. The Dockerfile with a multi-stage build is:

## Improvement 6: Multi-stage build.
FROM maven:3.8.1-ibmjava-8-alpine AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package

FROM openjdk:17-alpine AS release
COPY --from=builder /app/target/spring-petclinic-2.4.5.jar /
ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Now you have achieved a separation of concerns using different stages. Moreover, you'll be pushing only the release stage to the registry; the files from the previous stages will not be shared. You can build just the release stage using the following command:

docker build -t <tag-name> . --target release

Step 7: Use global ARG

Step 7: Use global ARG

This next improvement is more of a best practice. It deals with the ARG instruction, which defines a variable for later use. Normally, the variable is available only until the image is built. Running containers can’t access the values of those variables. However, defining an ARG variable before the first FROM creates a global variable that can be referenced from all stages. ARG is the only instruction that can precede FROM in the Dockerfile:

## Improvement 7: Use global ARG.

ARG flavor=alpine

FROM maven:3.8.1-ibmjava-8-$flavor AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package

FROM openjdk:17-$flavor AS release
COPY --from=builder /app/target/spring-petclinic-2.4.5.jar /
ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

Step 8: Skip tests (optional)

Step 8: Skip tests (optional)

This step is for people who want to skip tests using the -DskipTests option. You can run the tests as separate stages and skip the tests in your build stage:

ARG flavor=alpine

FROM maven:3.8.1-ibmjava-8-$flavor AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
## Improvement 8: Run tests as separate stages. You can skip the tests here.
RUN mvn -e -B package -DskipTests

FROM builder AS unit-test
RUN mvn -e -B test

FROM openjdk:17-$flavor AS release
COPY --from=builder /app/target/spring-petclinic-2.4.5.jar /
ENTRYPOINT ["java","-jar","/app/spring-petclinic-2.4.5.jar"]

FROM release AS integration-test
RUN apk add --no-cache curl
# If you have an actual integration test script, refer to that below. You'll also need to copy that script from the build stage
# RUN ./test/int-test.sh

Step 9: Skip caching (optional)

Step 9: Skip caching (optional)

Caching is not always ideal. For example, when you clone a Git repository, the git clone command might never change, but the repository will. Caching can't recognize the change to the repository.

The simplest solution to avoid these issues is to not use the cache at all for such scenarios. A nice discussion on the Docker community forum lists best practices for getting code into a container (git clone versus copy versus data container).

The --no-cache argument completely discards the cache, so that all steps of the Dockerfile are always executed. For security patches or updates and use cases such as the repository issue just mentioned, it's helpful to periodically use the --no-cache argument.

The FROM instruction is the only line that is not affected by the --no-cache argument. If the base image is present in the machine, it won’t be pulled again.

Conclusion

The final version of the Dockerfile looks nowhere as slim (in terms of the number of lines; not image size) as what we had in Step 3. However, we've made lots of improvements in reproducibility, maintainability, and consistency. If you've enjoyed this article or have any feedback, please leave a comment or give me a shout-out on LinkedIn.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK