pdfium docker setup and deployment

published: August 21, 2025

tl;dr

PDFium:Google’s open-source PDF rendering library used in Chrome
Binary install:

Pre-built from bblanchon/pdfium-binaries

Python:Use pypdfium2 for complete bindings
Docker patterns:Multi-stage builds, architecture detection, optimized layers
Production:Security hardening, memory limits, health checks

overview

PDFium is a powerful, open-source PDF rendering and manipulation library originally developed by Google for Chrome. When containerized properly, it enables scalable, isolated PDF processing capabilities suitable for microservices architectures and cloud deployments.

installation approaches

pre-built binaries

The simplest approach uses pre-compiled binaries from bblanchon/pdfium-binaries:

# Download and extract PDFium binaries
cd /usr \
    && wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
        -O /tmp/pdfium.tar.gz \
    && tar -xzf /tmp/pdfium.tar.gz \
    && rm /tmp/pdfium.tar.gz

Supported architectures:

  • x64 (amd64) - Intel/AMD 64-bit
  • x86 - Intel/AMD 32-bit
  • arm64 - ARM 64-bit
  • arm - ARM 32-bit

pypdfium2 integration

For Python applications, pypdfium2 provides complete bindings you can easily add in your Dockerfile:

# Using pip
RUN pip install pypdfium2

# Using uv (faster)
RUN uv pip install pypdfium2

# For specific version with platform support
RUN pip install "pypdfium2>=5.0.0b2"

building from source

For maximum control or unsupported platforms:

# Install build dependencies
RUN apt-get update && apt-get install -y \
    git \
    python3 \
    build-essential \
    cmake \
    ninja-build

# Clone and build PDFium
RUN git clone https://pdfium.googlesource.com/pdfium.git /opt/pdfium \
    && cd /opt/pdfium \
    && gclient sync \
    && gn gen out/Release --args='is_debug=false' \
    && ninja -C out/Release pdfium

production dockerfile patterns

minimal python service

FROM python:3.12-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    wget \
    libglib2.0-0 \
    libnss3 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libcups2 \
    libxcomposite1 \
    libxrandr2 \
    libxss1 \
    libgtk-3-0 \
    libasound2 \
    && rm -rf /var/lib/apt/lists/*

# Install pypdfium2
RUN pip install --no-cache-dir pypdfium2>=5.0.0b2

# Copy application
COPY app.py /app/
WORKDIR /app

CMD ["python", "app.py"]

multi-stage build with binary installation

# Build stage
FROM ubuntu:24.04 AS builder

ARG ARCHITECTURE=x64
RUN apt-get update && apt-get install -y wget \
    && cd /usr \
    && wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
        -O /tmp/pdfium.tar.gz \
    && tar -xzf /tmp/pdfium.tar.gz

# Runtime stage
FROM ubuntu:24.04

# Copy PDFium binaries
COPY --from=builder /usr/lib/libpdfium.so /usr/lib/
COPY --from=builder /usr/include/pdfium /usr/include/pdfium

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    libstdc++6 \
    && rm -rf /var/lib/apt/lists/*

# Set library path
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

COPY app /app
WORKDIR /app
CMD ["./app"]

enterprise python api service

Based on real production deployment patterns:

ARG REGISTRY_HOST=registry.example.com
ARG REGISTRY_PORT=443
ARG PLATFORM=amd64

# Base image with Python and common tools
FROM ${REGISTRY_HOST}:${REGISTRY_PORT}/ubuntu-python:24.04-${PLATFORM} AS base

# Switch to root for system installations
USER root

# Architecture mapping for PDFium binaries
ARG ARCHITECTURE
ENV ARCHITECTURE=${ARCHITECTURE:-x64}

# Install PDFium binary
RUN cd /usr \
    && wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
        -O /tmp/pdfium.tar.gz \
    && tar -xzf /tmp/pdfium.tar.gz \
    && rm /tmp/pdfium.tar.gz

# Switch to application user
USER app
WORKDIR /app

# Copy dependency files first (cache optimization)
COPY --chown=app:app pyproject.toml /app/
COPY --chown=app:app uv.lock /app/

# Install Python dependencies
RUN uv sync --no-dev --no-cache

# Copy application code
COPY --chown=app:app ./src /app/src

# Create startup script
RUN echo '#!/bin/bash\n\
cd /app\n\
NUM_WORKERS=${NUM_WORKERS:-4}\n\
uv run hypercorn --config hypercorn.toml -w $NUM_WORKERS src.api:app' \
> /app/run.sh && chmod +x /app/run.sh

EXPOSE 8000
CMD ["bash", "/app/run.sh"]

environment configuration

library paths

# Set library path for PDFium
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

# For custom installation paths
ENV PDFIUM_PATH=/opt/pdfium
ENV LD_LIBRARY_PATH=$PDFIUM_PATH/lib:$LD_LIBRARY_PATH

pypdfium2 build configuration

# Build-time: Specify platform/binary to use during installation
# Options: auto, system-search, sourcebuild, linux_x64, etc.
ENV PDFIUM_PLATFORM=auto

# Build-time: Use system headers and binary (paths)
ENV PDFIUM_HEADERS=/usr/include/pdfium
ENV PDFIUM_BINARY=/usr/lib/libpdfium.so

pypdfium2 runtime configuration

# Logging level for CLI tools (DEBUG, INFO, WARNING, ERROR)
ENV PYPDFIUM_LOGLEVEL=WARNING

application-level configuration

# These are application-specific, not pypdfium2 variables
# Worker configuration for your app
ENV NUM_WORKERS=4

# Memory limits for your app (implement in your code)
ENV APP_MAX_MEMORY=2048

# Rendering defaults for your app (implement in your code)
ENV APP_PDF_SCALE=2.0
ENV APP_PDF_DPI=300

platform-specific considerations

arm architecture support

# Detect architecture at build time
ARG TARGETARCH
RUN ARCHITECTURE=$(case ${TARGETARCH} in \
    "amd64") echo "x64" ;; \
    "arm64") echo "arm64" ;; \
    "arm/v7") echo "arm" ;; \
    *) echo "x64" ;; \
    esac) \
    && wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz"

alpine linux

Alpine requires additional compatibility layers:

FROM alpine:3.19

# Install compatibility layer
RUN apk add --no-cache \
    gcompat \
    libstdc++ \
    libgcc

# Install PDFium (musl builds have limited support)
# Note: x86 musl build is currently failing, only x64 works
RUN wget -O- https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-musl-x64.tgz \
    | tar -xz -C /usr/local

optimization strategies

container size reduction

# Use slim base images
FROM python:3.12-slim instead of python:3.12

# Clean package manager cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Remove unnecessary files
RUN find /usr/local -name "*.pyc" -delete \
    && find /usr/local -name "__pycache__" -type d -delete

# Use --no-cache-dir with pip
RUN pip install --no-cache-dir pypdfium2

build caching

# Separate dependency installation from code
COPY requirements.txt .
RUN pip install -r requirements.txt

# Then copy application code
COPY . .

security hardening

# Run as non-root user
RUN useradd -m -u 1000 pdfapp
USER pdfapp

# Read-only root filesystem
RUN chmod -R 755 /app

# Security scanning
RUN apt-get update && apt-get install -y \
    --no-install-recommends \
    ca-certificates \
    && update-ca-certificates

common issues and solutions

missing shared libraries

Problem: libpdfium.so: cannot open shared object file

Solution:

# Ensure library path is set
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

# Or update ldconfig
RUN echo "/usr/lib" > /etc/ld.so.conf.d/pdfium.conf \
    && ldconfig

font rendering issues

Problem: PDF text appears as boxes or missing characters

Solution:

# Install font packages
RUN apt-get update && apt-get install -y \
    fonts-liberation \
    fonts-dejavu-core \
    fontconfig \
    && fc-cache -fv

memory constraints

Problem: OOM errors with large PDFs

Solution:

# Configure container limits at runtime
# docker run --memory=2g --memory-swap=2g

# Note: Memory limits must be enforced at container/application level
# pypdfium2 doesn't have built-in memory limit environment variables

architecture mismatches

Problem: Binary incompatible with container architecture

Solution:

# Use buildx for multi-arch builds
# docker buildx build --platform linux/amd64,linux/arm64 .

# Or detect at runtime
RUN ARCH=$(uname -m) && \
    case $ARCH in \
        x86_64) PDFIUM_ARCH="x64" ;; \
        aarch64) PDFIUM_ARCH="arm64" ;; \
        *) echo "Unsupported arch: $ARCH" && exit 1 ;; \
    esac && \
    wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${PDFIUM_ARCH}.tgz"

testing and validation

basic functionality test

# Add test script to container
COPY test_pdfium.py /app/
RUN python /app/test_pdfium.py && echo "PDFium test passed"
# test_pdfium.py
import pypdfium2 as pdfium

# Test basic functionality
pdf = pdfium.PdfDocument.new()
page = pdf.new_page(width=595, height=842)  # A4 size
assert len(pdf) == 1
print("PDFium installation verified")

health checks

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD python -c "import pypdfium2; print('healthy')" || exit 1

references

on this page