Monday, May 16, 2022

Containerize your Db2 Python app

De-containerizing my stuff
By following my blog posts (here or at the IBM Cloud site)  or my code samples, you might know that many of my apps are coded in Python. Moreover, many projects involve IBM Db2 (see "How to connect from Python to Db2"). With Docker and other container technologies around and many projects involving Kubernetes / OpenShift or Knative / IBM Cloud Code Engine, the next question is how to containerize / dockerize your Db2 application written in Python. Well, here I share some of my experience...

Build, push, pull

When working with containers and container images, the typical process is 

  • to build the container image,
  • push the image to a container registry,
  • then pull the image into the container runtime environment (local Docker / podman, Kubernetes pod, ...).

The build process consists of following the various steps in the configuration (Dockerfile), leading to various layers. Steps with unchanged data can be skipped for performance reasons. Pushing and pulling involves copying or replicating these layers between machines. Thus, the smaller a container image and the more cached (skipped) layers, the more efficient the build process and the deployment of any container (image) revisions.

Add the Db2 Python client

The Db2 Python client, actually its four versions, is based on the C language Call Level Interface (CLI / ODBC). Adding the Db2 Python client to a project, especially to a container image, adds some hefty fine. Over the years, I have experimented with multi-stage builds to reduce the image size and speed up the build process. It involves to first create one or more intermediary images. Their data is abandoned later on, but you can copy objects over to the next stage. This feature allows to prepare the Db2 Python client with all its build requirements, then only keep the "nugget" - the core driver.

You can find one of such multi-stage Dockerfile configurations in my IBM Cloud code samples. It uses a regular Python container as base to install all requirements into a virtual Python environment. It includes downloading and preparing the Db2 client. Here is an excerpt (see the linked file for all steps and comments):

FROM python:3.8 AS builder
WORKDIR /app
...
ENV PATH="/venv/bin:$PATH" 
...
RUN python -m venv /venv COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt

The second stage utilizes a slim Python container as base. Only the files from the virtual environment are copied over. Then, some necessary OS libraries like libxml2 are added, other files added and the container entrypoint defined (excerpt again):

FROM python:3.8-slim AS app
ENV PATH="/venv/bin:$PATH"
WORKDIR /app
EXPOSE 8080
COPY --from=builder /venv /venv
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        libxml2 && \
    rm -rf /var/lib/apt/lists/*
COPY ./ ./
ENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:8080","ghstats:app"]

The above allows for small container images and a quick build process. If only the Python app is changed as part of the regular code development, then almost everything is cached. This guarantees performant deployments of new code revisions.

Conclusions

Sometimes, defining more steps can reduce the actually processed steps. This is the case when using multi-stage builds for your apps with the Db2 Python client. It cuts unnecessary files from the production image and thus means smaller image sizes and quicker file transfers.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.