CI/CD Pipeline¶

This document describes the Continuous Integration and Continuous Deployment (CI/CD) pipeline for the Dataknobs project. Our pipeline ensures code quality, runs comprehensive tests, and automates deployment processes.

Table of Contents¶

Overview
Pipeline Architecture
GitHub Actions Workflows
Code Quality Checks
Testing Pipeline
Build and Package
Deployment Strategy
Security Scanning
Monitoring and Alerts
Troubleshooting

Overview¶

Our CI/CD pipeline is built on GitHub Actions and provides:

Continuous Integration: Automated testing and quality checks on every commit
Continuous Deployment: Automated releases and package publishing
Multi-environment Support: Development, staging, and production deployments
Security Integration: Automated security scanning and vulnerability detection
Performance Monitoring: Performance regression detection

Pipeline Architecture¶

flowchart TD
    A[Code Commit] --> B[Trigger CI Pipeline]
    B --> C[Code Quality Checks]
    B --> D[Security Scanning]
    B --> E[Unit Tests]
    B --> F[Integration Tests]

    C --> G[Build Packages]
    D --> G
    E --> G
    F --> G

    G --> H{Branch Check}
    H -->|main| I[Deploy to Staging]
    H -->|develop| J[Deploy to Dev]
    H -->|release/*| K[Deploy to Production]

    I --> L[Staging Tests]
    J --> M[Dev Tests]
    K --> N[Production Health Check]

    L --> O[Promote to Production]
    M --> P[Integration Testing]
    N --> Q[Monitor & Alert]

GitHub Actions Workflows¶

Security note: All GitHub Actions references in these examples are pinned to full commit SHAs, not branch or tag refs. Mutable refs (@master, @v1) can be redirected if a repository is compromised. Always pin to an immutable SHA and add a version comment for readability.

Main CI Workflow¶

# .github/workflows/ci.yml
name: CI Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]
  schedule:
    # Run daily at 6 AM UTC
    - cron: '0 6 * * *'

env:
  PYTHON_VERSION: '3.11'
  POETRY_VERSION: '1.6.1'

jobs:
  code-quality:
    name: Code Quality Checks
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

    - name: Set up Python
      uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c  # v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}

    - name: Install Poetry
      uses: snok/install-poetry@76e04a911780d5b312d89783f7b1cd627778900a  # v1
      with:
        version: ${{ env.POETRY_VERSION }}

    - name: Install dependencies
      run: |
        poetry install --with dev,test

    - name: Run Black (formatting)
      run: |
        poetry run black --check packages/ tests/

    - name: Run isort (import sorting)
      run: |
        poetry run isort --check-only packages/ tests/

    - name: Run flake8 (linting)
      run: |
        poetry run flake8 packages/ tests/

    - name: Run mypy (type checking)
      run: |
        poetry run mypy packages/

    - name: Run pylint (code analysis)
      run: |
        poetry run pylint packages/ --output-format=github

  security-scan:
    name: Security Scanning
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

    - name: Run Bandit (security linting)
      run: |
        pip install bandit[toml]
        bandit -r packages/ -f json -o bandit-report.json

    - name: Upload Bandit results
      uses: actions/upload-artifact@ff15f0306b3f739f7b6fd43fb5d26cd321bd4de5  # v3
      if: always()
      with:
        name: bandit-results
        path: bandit-report.json

    - name: Run Safety (dependency vulnerabilities)
      run: |
        pip install safety
        safety check --json --output safety-report.json

    - name: Upload Safety results
      uses: actions/upload-artifact@ff15f0306b3f739f7b6fd43fb5d26cd321bd4de5  # v3
      if: always()
      with:
        name: safety-results
        path: safety-report.json

  test:
    name: Test Suite
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        python-version: ['3.12', '3.13']
        exclude:
          # Reduce matrix size for faster builds
          - os: macos-latest
            python-version: '3.8'

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c  # v4
      with:
        python-version: ${{ matrix.python-version }}

    - name: Install Poetry
      uses: snok/install-poetry@76e04a911780d5b312d89783f7b1cd627778900a  # v1
      with:
        version: ${{ env.POETRY_VERSION }}

    - name: Install dependencies
      run: |
        poetry install --with dev,test

    - name: Run unit tests
      run: |
        poetry run pytest tests/unit/ \
          --cov=packages/ \
          --cov-report=xml \
          --cov-report=term-missing \
          --junit-xml=junit-unit.xml

    - name: Run integration tests
      run: |
        poetry run pytest tests/integration/ \
          --junit-xml=junit-integration.xml

    - name: Upload test results
      uses: actions/upload-artifact@ff15f0306b3f739f7b6fd43fb5d26cd321bd4de5  # v3
      if: always()
      with:
        name: test-results-${{ matrix.os }}-${{ matrix.python-version }}
        path: |
          junit-*.xml
          coverage.xml

    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@ab904c41d6ece82784817410c45d8b8c02684457  # v3
      if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.11'
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella
        fail_ci_if_error: true

  build:
    name: Build Packages
    runs-on: ubuntu-latest
    needs: [code-quality, security-scan, test]

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
      with:
        fetch-depth: 0  # Full history for versioning

    - name: Set up Python
      uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c  # v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}

    - name: Install Poetry
      uses: snok/install-poetry@76e04a911780d5b312d89783f7b1cd627778900a  # v1
      with:
        version: ${{ env.POETRY_VERSION }}

    - name: Build packages
      run: |
        cd packages/common && poetry build
        cd ../structures && poetry build
        cd ../utils && poetry build
        cd ../xization && poetry build

    - name: Upload build artifacts
      uses: actions/upload-artifact@ff15f0306b3f739f7b6fd43fb5d26cd321bd4de5  # v3
      with:
        name: packages
        path: packages/*/dist/

    - name: Test package installation
      run: |
        pip install packages/common/dist/*.whl
        pip install packages/structures/dist/*.whl
        pip install packages/utils/dist/*.whl
        pip install packages/xization/dist/*.whl
        python -c "import dataknobs_common, dataknobs_structures, dataknobs_utils, dataknobs_xization"

Release Workflow¶

# .github/workflows/release.yml
name: Release Pipeline

on:
  push:
    tags:
      - 'v*.*.*'
  release:
    types: [published]

jobs:
  release:
    name: Build and Publish Release
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
      with:
        fetch-depth: 0

    - name: Set up Python
      uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c  # v4
      with:
        python-version: '3.11'

    - name: Install Poetry
      uses: snok/install-poetry@76e04a911780d5b312d89783f7b1cd627778900a  # v1
      with:
        version: '1.6.1'

    - name: Build packages
      run: |
        cd packages/common && poetry build
        cd ../structures && poetry build
        cd ../utils && poetry build
        cd ../xization && poetry build

    - name: Publish to PyPI
      env:
        POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_API_TOKEN }}
      run: |
        cd packages/common && poetry publish
        cd ../structures && poetry publish
        cd ../utils && poetry publish
        cd ../xization && poetry publish

    - name: Create GitHub Release
      uses: softprops/action-gh-release@de2c0eb89ae2a093876385947365aca7b0e5f844  # v1
      with:
        files: packages/*/dist/*
        generate_release_notes: true
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Documentation Workflow¶

# .github/workflows/docs.yml
name: Documentation

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    paths: [ 'docs/**', 'packages/**/*.py', 'mkdocs.yml' ]

jobs:
  docs:
    name: Build and Deploy Documentation
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

    - name: Set up Python
      uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c  # v4
      with:
        python-version: '3.11'

    - name: Install dependencies
      run: |
        pip install -r docs/requirements.txt

    - name: Build documentation
      run: |
        mkdocs build --strict

    - name: Deploy to GitHub Pages
      if: github.ref == 'refs/heads/main'
      uses: peaceiris/actions-gh-pages@373f7f263a76c20808c831209c920827a82a2847  # v3
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./site

Code Quality Checks¶

Code Formatting¶

# Black - Code formatting
black --line-length 88 packages/ tests/

# isort - Import sorting
isort packages/ tests/

# Configuration in pyproject.toml
[tool.black]
line-length = 88
target-version = ['py38']
include = '\.pyi?$'

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88

Linting¶

# flake8 - Style and error checking
flake8 packages/ tests/

# pylint - Advanced code analysis
pylint packages/ --output-format=colorized

# Configuration in setup.cfg
[flake8]
max-line-length = 88
extend-ignore = E203, W503
exclude = __pycache__,*.egg-info,.tox,.venv

Type Checking¶

# mypy - Static type checking
mypy packages/

# Configuration in pyproject.toml
[tool.mypy]
python_version = "3.8"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true

Testing Pipeline¶

Test Execution Strategy¶

# Unit tests - Fast, isolated
pytest tests/unit/ -x --ff

# Integration tests - Slower, component interaction
pytest tests/integration/ --maxfail=5

# End-to-end tests - Slowest, full workflows
pytest tests/e2e/ --dist=loadscope

# Performance tests - Benchmark critical paths
pytest tests/performance/ --benchmark-only

Test Configuration¶

# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = [
    "--strict-markers",
    "--strict-config",
    "--cov=packages",
    "--cov-branch",
    "--cov-report=term-missing:skip-covered",
    "--cov-report=html:reports/coverage",
    "--cov-report=xml",
    "--junit-xml=reports/junit.xml",
]
markers = [
    "slow: marks tests as slow",
    "integration: marks tests as integration tests",
    "e2e: marks tests as end-to-end tests",
    "performance: marks tests as performance benchmarks",
]

Build and Package¶

Package Building¶

# Build all packages
for package in common structures utils xization; do
    cd packages/$package
    poetry build
    cd ../..
done

# Verify package contents
tar -tzf packages/structures/dist/dataknobs-structures-*.tar.gz

# Test installation
pip install packages/structures/dist/*.whl
python -c "import dataknobs_structures; print('✓ Package imports successfully')"

Version Management¶

# Semantic versioning with poetry
poetry version patch    # 1.0.0 -> 1.0.1
poetry version minor    # 1.0.0 -> 1.1.0
poetry version major    # 1.0.0 -> 2.0.0

# Automatic versioning with commitizen
cz bump --changelog

Deployment Strategy¶

Environment Configuration¶

# environments/development.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dataknobs-config-dev
data:
  ENVIRONMENT: "development"
  LOG_LEVEL: "DEBUG"
  DATABASE_URL: "postgresql://dev-db:5432/dataknobs_dev"
  ELASTICSEARCH_URL: "http://dev-elasticsearch:9200"

Deployment Scripts¶

#!/bin/bash
# scripts/deploy.sh

set -e

ENVIRONMENT=${1:-development}
VERSION=${2:-latest}

echo "Deploying to $ENVIRONMENT (version: $VERSION)"

# Validate environment
case $ENVIRONMENT in
  development|staging|production)
    echo "Valid environment: $ENVIRONMENT"
    ;;
  *)
    echo "Invalid environment: $ENVIRONMENT"
    exit 1
    ;;
esac

# Deploy packages
for package in common structures utils xization; do
    echo "Deploying dataknobs-$package..."
    pip install --upgrade "dataknobs-$package==$VERSION"
done

# Health check
python -c "
import dataknobs_common, dataknobs_structures, dataknobs_utils, dataknobs_xization
print('✓ All packages deployed successfully')
"

echo "Deployment complete!"

Container Deployment¶

# docker/Dockerfile.prod
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY packages/ ./packages/

# Install dataknobs packages
RUN pip install -e packages/common \
    && pip install -e packages/structures \
    && pip install -e packages/utils \
    && pip install -e packages/xization

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
    CMD python -c "import dataknobs_common, dataknobs_structures, dataknobs_utils, dataknobs_xization" || exit 1

# Run application
CMD ["python", "-m", "your_application"]

Security Scanning¶

Dependency Scanning¶

# Safety - Check for known vulnerabilities
safety check
safety check --json --output safety-report.json

# pip-audit - Alternative vulnerability scanner
pip-audit --format=json --output=audit-report.json

Code Security Scanning¶

# Bandit - Security linting
bandit -r packages/ -f json -o bandit-report.json

# Semgrep - Static analysis security scanner
semgrep --config=python packages/

Container Security¶

# .github/workflows/security.yml
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@c1824fd6edce30d7ab345a9989de00bbd46ef284  # v0.34.0
  with:
    image-ref: 'dataknobs:latest'
    format: 'sarif'
    output: 'trivy-results.sarif'

- name: Upload Trivy scan results
  uses: github/codeql-action/upload-sarif@9e8d0789d4a0fa9ceb6b1738f7e269594bdd67f0  # v3.28.9
  with:
    sarif_file: 'trivy-results.sarif'

Monitoring and Alerts¶

Pipeline Monitoring¶

# Slack notifications for failures
- name: Notify Slack on failure
  if: failure()
  uses: 8398a7/action-slack@77eaa4f1c608a7d68b38af4e3f739dcd8cba273e  # v3
  with:
    status: failure
    channel: '#ci-cd'
    text: 'CI Pipeline failed for ${{ github.repository }}'
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Performance Monitoring¶

# tests/performance/test_benchmarks.py
import pytest

@pytest.mark.benchmark
def test_tree_creation_performance(benchmark):
    """Benchmark tree creation performance."""
    from dataknobs_structures import Tree

    def create_large_tree():
        root = Tree("root")
        for i in range(1000):
            child = root.add_child(f"child_{i}")
            for j in range(10):
                child.add_child(f"grandchild_{i}_{j}")
        return root

    result = benchmark(create_large_tree)
    assert result.num_children == 1000

Health Checks¶

#!/bin/bash
# scripts/health-check.sh

set -e

echo "Running health checks..."

# Check package imports
python -c "
import dataknobs_common, dataknobs_structures, dataknobs_utils, dataknobs_xization
print('✓ All packages import successfully')
"

# Check basic functionality
python -c "
from dataknobs_structures import Tree
tree = Tree('test')
child = tree.add_child('child')
assert tree.num_children == 1
print('✓ Basic functionality works')
"

echo "Health checks passed!"

Troubleshooting¶

Common Pipeline Issues¶

Test Failures:

# Debug test failures locally
pytest tests/unit/test_failing.py -vv --tb=long

# Run tests with same environment as CI
docker run --rm -v $(pwd):/app python:3.11-slim bash -c "
cd /app && 
pip install poetry && 
poetry install --with dev,test && 
poetry run pytest
"

Build Failures:

# Check build locally
cd packages/structures
poetry build
pip install dist/*.whl
python -c "import dataknobs_structures"

Dependency Conflicts:

# Check dependency resolution
poetry lock --check
poetry show --tree

# Update dependencies
poetry update

Pipeline Debugging¶

# Add debug step to workflow
- name: Debug Environment
  run: |
    echo "Python version: $(python --version)"
    echo "Pip version: $(pip --version)"
    echo "Poetry version: $(poetry --version)"
    echo "Installed packages:"
    pip list
    echo "Environment variables:"
    env | sort

Performance Issues¶

# Profile test execution
pytest --profile

# Run tests in parallel
pytest -n auto

# Use test result caching
pytest --cache-clear  # Clear cache
pytest --lf           # Run last failed
pytest --ff           # Run failures first

Best Practices¶

Pipeline Optimization¶

Parallel Execution: Run independent jobs in parallel
Caching: Cache dependencies and build artifacts
Fail Fast: Stop on first critical failure
Incremental Testing: Only test changed code when possible

Security Best Practices¶

Secret Management: Use GitHub secrets for sensitive data
Least Privilege: Grant minimal required permissions
Dependency Scanning: Regularly scan for vulnerabilities
Code Signing: Sign releases for authenticity

Monitoring and Alerting¶

Pipeline Metrics: Track build times and failure rates
Quality Gates: Enforce code quality thresholds
Early Warning: Alert on degrading metrics
Post-deployment Monitoring: Monitor application health

Resources¶

This CI/CD pipeline ensures high code quality, comprehensive testing, and reliable deployments for the Dataknobs project. For questions or improvements, please create an issue or start a discussion on GitHub.