pdfium docker setup and deployment
on this page
tl;dr
PDFium: | Google’s open-source PDF rendering library used in Chrome |
Binary install: | Pre-built from bblanchon/pdfium-binaries |
Python: | Use pypdfium2 for complete bindings |
Docker patterns: | Multi-stage builds, architecture detection, optimized layers |
Production: | Security hardening, memory limits, health checks |
overview
PDFium is a powerful, open-source PDF rendering and manipulation library originally developed by Google for Chrome. When containerized properly, it enables scalable, isolated PDF processing capabilities suitable for microservices architectures and cloud deployments.
installation approaches
pre-built binaries
The simplest approach uses pre-compiled binaries from bblanchon/pdfium-binaries:
# Download and extract PDFium binaries
cd /usr \
&& wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
-O /tmp/pdfium.tar.gz \
&& tar -xzf /tmp/pdfium.tar.gz \
&& rm /tmp/pdfium.tar.gz
Supported architectures:
x64
(amd64) - Intel/AMD 64-bitx86
- Intel/AMD 32-bitarm64
- ARM 64-bitarm
- ARM 32-bit
pypdfium2 integration
For Python applications, pypdfium2 provides complete bindings you can easily add in your Dockerfile
:
# Using pip
RUN pip install pypdfium2
# Using uv (faster)
RUN uv pip install pypdfium2
# For specific version with platform support
RUN pip install "pypdfium2>=5.0.0b2"
building from source
For maximum control or unsupported platforms:
# Install build dependencies
RUN apt-get update && apt-get install -y \
git \
python3 \
build-essential \
cmake \
ninja-build
# Clone and build PDFium
RUN git clone https://pdfium.googlesource.com/pdfium.git /opt/pdfium \
&& cd /opt/pdfium \
&& gclient sync \
&& gn gen out/Release --args='is_debug=false' \
&& ninja -C out/Release pdfium
production dockerfile patterns
minimal python service
FROM python:3.12-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
wget \
libglib2.0-0 \
libnss3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libcups2 \
libxcomposite1 \
libxrandr2 \
libxss1 \
libgtk-3-0 \
libasound2 \
&& rm -rf /var/lib/apt/lists/*
# Install pypdfium2
RUN pip install --no-cache-dir pypdfium2>=5.0.0b2
# Copy application
COPY app.py /app/
WORKDIR /app
CMD ["python", "app.py"]
multi-stage build with binary installation
# Build stage
FROM ubuntu:24.04 AS builder
ARG ARCHITECTURE=x64
RUN apt-get update && apt-get install -y wget \
&& cd /usr \
&& wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
-O /tmp/pdfium.tar.gz \
&& tar -xzf /tmp/pdfium.tar.gz
# Runtime stage
FROM ubuntu:24.04
# Copy PDFium binaries
COPY --from=builder /usr/lib/libpdfium.so /usr/lib/
COPY --from=builder /usr/include/pdfium /usr/include/pdfium
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libstdc++6 \
&& rm -rf /var/lib/apt/lists/*
# Set library path
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH
COPY app /app
WORKDIR /app
CMD ["./app"]
enterprise python api service
Based on real production deployment patterns:
ARG REGISTRY_HOST=registry.example.com
ARG REGISTRY_PORT=443
ARG PLATFORM=amd64
# Base image with Python and common tools
FROM ${REGISTRY_HOST}:${REGISTRY_PORT}/ubuntu-python:24.04-${PLATFORM} AS base
# Switch to root for system installations
USER root
# Architecture mapping for PDFium binaries
ARG ARCHITECTURE
ENV ARCHITECTURE=${ARCHITECTURE:-x64}
# Install PDFium binary
RUN cd /usr \
&& wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz" \
-O /tmp/pdfium.tar.gz \
&& tar -xzf /tmp/pdfium.tar.gz \
&& rm /tmp/pdfium.tar.gz
# Switch to application user
USER app
WORKDIR /app
# Copy dependency files first (cache optimization)
COPY --chown=app:app pyproject.toml /app/
COPY --chown=app:app uv.lock /app/
# Install Python dependencies
RUN uv sync --no-dev --no-cache
# Copy application code
COPY --chown=app:app ./src /app/src
# Create startup script
RUN echo '#!/bin/bash\n\
cd /app\n\
NUM_WORKERS=${NUM_WORKERS:-4}\n\
uv run hypercorn --config hypercorn.toml -w $NUM_WORKERS src.api:app' \
> /app/run.sh && chmod +x /app/run.sh
EXPOSE 8000
CMD ["bash", "/app/run.sh"]
environment configuration
library paths
# Set library path for PDFium
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH
# For custom installation paths
ENV PDFIUM_PATH=/opt/pdfium
ENV LD_LIBRARY_PATH=$PDFIUM_PATH/lib:$LD_LIBRARY_PATH
pypdfium2 build configuration
# Build-time: Specify platform/binary to use during installation
# Options: auto, system-search, sourcebuild, linux_x64, etc.
ENV PDFIUM_PLATFORM=auto
# Build-time: Use system headers and binary (paths)
ENV PDFIUM_HEADERS=/usr/include/pdfium
ENV PDFIUM_BINARY=/usr/lib/libpdfium.so
pypdfium2 runtime configuration
# Logging level for CLI tools (DEBUG, INFO, WARNING, ERROR)
ENV PYPDFIUM_LOGLEVEL=WARNING
application-level configuration
# These are application-specific, not pypdfium2 variables
# Worker configuration for your app
ENV NUM_WORKERS=4
# Memory limits for your app (implement in your code)
ENV APP_MAX_MEMORY=2048
# Rendering defaults for your app (implement in your code)
ENV APP_PDF_SCALE=2.0
ENV APP_PDF_DPI=300
platform-specific considerations
arm architecture support
# Detect architecture at build time
ARG TARGETARCH
RUN ARCHITECTURE=$(case ${TARGETARCH} in \
"amd64") echo "x64" ;; \
"arm64") echo "arm64" ;; \
"arm/v7") echo "arm" ;; \
*) echo "x64" ;; \
esac) \
&& wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${ARCHITECTURE}.tgz"
alpine linux
Alpine requires additional compatibility layers:
FROM alpine:3.19
# Install compatibility layer
RUN apk add --no-cache \
gcompat \
libstdc++ \
libgcc
# Install PDFium (musl builds have limited support)
# Note: x86 musl build is currently failing, only x64 works
RUN wget -O- https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-musl-x64.tgz \
| tar -xz -C /usr/local
optimization strategies
container size reduction
# Use slim base images
FROM python:3.12-slim instead of python:3.12
# Clean package manager cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
# Remove unnecessary files
RUN find /usr/local -name "*.pyc" -delete \
&& find /usr/local -name "__pycache__" -type d -delete
# Use --no-cache-dir with pip
RUN pip install --no-cache-dir pypdfium2
build caching
# Separate dependency installation from code
COPY requirements.txt .
RUN pip install -r requirements.txt
# Then copy application code
COPY . .
security hardening
# Run as non-root user
RUN useradd -m -u 1000 pdfapp
USER pdfapp
# Read-only root filesystem
RUN chmod -R 755 /app
# Security scanning
RUN apt-get update && apt-get install -y \
--no-install-recommends \
ca-certificates \
&& update-ca-certificates
common issues and solutions
missing shared libraries
Problem: libpdfium.so: cannot open shared object file
Solution:
# Ensure library path is set
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH
# Or update ldconfig
RUN echo "/usr/lib" > /etc/ld.so.conf.d/pdfium.conf \
&& ldconfig
font rendering issues
Problem: PDF text appears as boxes or missing characters
Solution:
# Install font packages
RUN apt-get update && apt-get install -y \
fonts-liberation \
fonts-dejavu-core \
fontconfig \
&& fc-cache -fv
memory constraints
Problem: OOM errors with large PDFs
Solution:
# Configure container limits at runtime
# docker run --memory=2g --memory-swap=2g
# Note: Memory limits must be enforced at container/application level
# pypdfium2 doesn't have built-in memory limit environment variables
architecture mismatches
Problem: Binary incompatible with container architecture
Solution:
# Use buildx for multi-arch builds
# docker buildx build --platform linux/amd64,linux/arm64 .
# Or detect at runtime
RUN ARCH=$(uname -m) && \
case $ARCH in \
x86_64) PDFIUM_ARCH="x64" ;; \
aarch64) PDFIUM_ARCH="arm64" ;; \
*) echo "Unsupported arch: $ARCH" && exit 1 ;; \
esac && \
wget "https://github.com/bblanchon/pdfium-binaries/releases/latest/download/pdfium-linux-${PDFIUM_ARCH}.tgz"
testing and validation
basic functionality test
# Add test script to container
COPY test_pdfium.py /app/
RUN python /app/test_pdfium.py && echo "PDFium test passed"
# test_pdfium.py
import pypdfium2 as pdfium
# Test basic functionality
pdf = pdfium.PdfDocument.new()
page = pdf.new_page(width=595, height=842) # A4 size
assert len(pdf) == 1
print("PDFium installation verified")
health checks
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD python -c "import pypdfium2; print('healthy')" || exit 1
references
- PDFium Official Repository
- bblanchon/pdfium-binaries - Pre-built PDFium binaries
- pypdfium2 Documentation - Python bindings documentation
- pypdfium2 GitHub - Source code and issues
- Docker Best Practices - Official Docker guidelines
- Chrome PDFium Documentation - Chrome’s PDF implementation