De-containerizing my stuff |
Build, push, pull
When working with containers and container images, the typical process is
- to build the container image,
- push the image to a container registry,
- then pull the image into the container runtime environment (local Docker / podman, Kubernetes pod, ...).
The build process consists of following the various steps in the configuration (Dockerfile), leading to various layers. Steps with unchanged data can be skipped for performance reasons. Pushing and pulling involves copying or replicating these layers between machines. Thus, the smaller a container image and the more cached (skipped) layers, the more efficient the build process and the deployment of any container (image) revisions.
Add the Db2 Python client
The Db2 Python client, actually its four versions, is based on the C language Call Level Interface (CLI / ODBC). Adding the Db2 Python client to a project, especially to a container image, adds some hefty fine. Over the years, I have experimented with multi-stage builds to reduce the image size and speed up the build process. It involves to first create one or more intermediary images. Their data is abandoned later on, but you can copy objects over to the next stage. This feature allows to prepare the Db2 Python client with all its build requirements, then only keep the "nugget" - the core driver.
You can find one of such multi-stage Dockerfile configurations in my IBM Cloud code samples. It uses a regular Python container as base to install all requirements into a virtual Python environment. It includes downloading and preparing the Db2 client. Here is an excerpt (see the linked file for all steps and comments):
FROM python:3.8 AS builder WORKDIR /app ... ENV PATH="/venv/bin:$PATH"
...
RUN python -m venv /venv COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt
The second stage utilizes a slim Python container as base. Only the files from the virtual environment are copied over. Then, some necessary OS libraries like libxml2 are added, other files added and the container entrypoint defined (excerpt again):
FROM python:3.8-slim AS app ENV PATH="/venv/bin:$PATH" WORKDIR /app EXPOSE 8080 COPY --from=builder /venv /venv RUN apt-get update && \ apt-get install -y --no-install-recommends \ libxml2 && \ rm -rf /var/lib/apt/lists/* COPY ./ ./ ENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:8080","ghstats:app"]
The above allows for small container images and a quick build process. If only the Python app is changed as part of the regular code development, then almost everything is cached. This guarantees performant deployments of new code revisions.
Conclusions
Sometimes, defining more steps can reduce the actually processed steps. This is the case when using multi-stage builds for your apps with the Db2 Python client. It cuts unnecessary files from the production image and thus means smaller image sizes and quicker file transfers.
If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.