Using the Dockerfile described at the end of my last post, an image that weighed 1.2GB was built. Not crazy, but still a lot. Running a shell in the image (docker run -ti myapp /bin/sh) showed that the actual files were using around 500MB of disk space, so where was the rest coming from?
It turns out, that even if you delete dependencies (like compilers, etc); unless you do everything in the same command as the installation, the “layering” feature of Docker keeps a copy of that software, as an intermediate state. If you think about it, it makes sense, you might change a more recent layer (like decide to stop uninstalling certain dependencies), and Docker would still be able to re-use the cache from all previous layers.
If you use the Multistage feature, the previous layers remain as part of the “builder” images, and are not carried over to the final image.
Using Multistage prevents sharing secret tokens
I also discovered that with my previous approach, my claim
It is possible to provide an argument to a docker image, which can be used by bundler to authenticate with Github, and not have this token end up in the final image, by using a combination of Docker build-time variables, and Bundler support for credentials via ENV variables
was false, as you were still able to see the Github token by checking the history of the image:
The good news is that by using multistage builds (and provided that only the final image is uploaded to the registry), this “token” is not added to the final image’s history, as the bundle phase is replaced by a copy.
I still need to do decide exactly which binaries and libraries to copy from /usr/lib instead of the complete directory, but with this change I was able to reduce the image size to 600MB, roughly a 50% improvement.