Quality Checks Process for DataKnobs¶
Overview¶
DataKnobs uses a developer-driven quality assurance process where developers run comprehensive tests locally before creating pull requests. This approach ensures code quality while keeping CI/CD pipelines fast and cost-effective.
Key Principle: All tests, including integration tests with real services (PostgreSQL, Elasticsearch, LocalStack), must pass locally before a PR can be merged.
Quick Start¶
Before creating a pull request:
This single command will:
1. ✅ Start all required Docker services
2. ✅ Run unit tests
3. ✅ Run integration tests with real services
4. ✅ Check code style and linting
5. ✅ Generate coverage reports
6. ✅ Create artifacts in .quality-artifacts/
Important: The artifacts must be committed with your PR!
Detailed Process¶
1. Development Workflow¶
During development, you can run tests incrementally:
# Run only unit tests (fast, no services needed)
uv run pytest packages/*/tests/ -v -m "not integration"
# Run specific package tests
uv run pytest packages/data/tests/ -v
# Run with coverage
uv run pytest packages/*/tests/ -v --cov=packages --cov-report=term
2. Pre-PR Quality Checks (Required)¶
Before creating a PR to main or develop:
# Ensure Docker is running
docker info
# Run the complete quality check suite
./bin/run-quality-checks.sh
The script will:
- Start PostgreSQL, Elasticsearch, and LocalStack containers
- Wait for services to be healthy
- Run all tests (unit and integration)
- Perform linting and style checks
- Generate artifacts in .quality-artifacts/
Expected output:
═══════════════════════════════════════════════════════════════
Quality Check Summary
═══════════════════════════════════════════════════════════════
Linting: ✓ PASSED
Style Check: ✓ PASSED
Unit Tests: ✓ PASSED
Integration Tests: ✓ PASSED
✓ All critical checks passed!
Artifacts saved to: .quality-artifacts/
You can now create your pull request.
3. Commit the Artifacts¶
After running quality checks successfully:
# Add the quality artifacts to your commit
git add .quality-artifacts/
# Commit with your code changes
git commit -m "feat: implement new feature with passing quality checks"
# Push and create PR
git push origin your-branch
4. CI Validation¶
When you create a PR, GitHub Actions will: 1. Validate artifacts exist - Checks for required files 2. Verify freshness - Ensures artifacts are < 24 hours old 3. Confirm tests passed - Validates all tests show "PASS" status 4. Check coverage - Ensures minimum coverage threshold (70%)
The CI validation is fast (< 30 seconds) since it only validates artifacts, not re-running tests.
Required Services¶
The quality checks require these Docker services:
| Service | Purpose | Port | Test Type |
|---|---|---|---|
| PostgreSQL | Database storage | 5432 | Integration |
| Elasticsearch | Search functionality | 9200 | Integration |
| LocalStack | S3-compatible storage | 4566 | Integration |
Services are automatically started by run-quality-checks.sh.
Manual Service Management¶
If you need to manage services manually:
# Start all services
docker-compose up -d postgres elasticsearch localstack
# Check service health
docker-compose ps
# View service logs
docker-compose logs elasticsearch
# Stop services
docker-compose down
Artifact Structure¶
Quality checks generate these artifacts in .quality-artifacts/:
.quality-artifacts/
├── quality-summary.json # Overall pass/fail status
├── environment.json # Python version, OS, git info
├── unit-test-results.xml # JUnit format test results
├── integration-test-results.xml
├── coverage.xml # Coverage report
├── lint-report.json # Pylint results
├── style-check.json # Ruff style check results
└── signature.sha256 # Integrity checksum
Test Organization¶
Tests should be organized with pytest markers:
# Unit test (no external services)
def test_data_model():
assert DataModel().validate() == True
# Integration test (requires services)
@pytest.mark.integration
def test_elasticsearch_query():
es = Elasticsearch(['localhost:9200'])
result = es.search(index='test')
assert result['hits']['total']['value'] >= 0
Troubleshooting¶
Services Won't Start¶
# Check if ports are already in use
lsof -i :5432 # PostgreSQL
lsof -i :9200 # Elasticsearch
lsof -i :4566 # LocalStack
# Reset Docker services
docker-compose down -v
docker-compose up -d
Tests Pass Locally but CI Rejects Artifacts¶
Common causes:
- Artifacts too old - Re-run ./bin/run-quality-checks.sh
- Forgot to commit artifacts - Run git add .quality-artifacts/
- Modified artifacts - Don't edit files in .quality-artifacts/
Integration Tests Fail¶
# Check service connectivity
curl http://localhost:9200/_cluster/health # Elasticsearch
psql postgresql://postgres:postgres@localhost:5432/dataknobs # PostgreSQL
curl http://localhost:4566/_localstack/health # LocalStack
# View service logs
docker-compose logs postgres
docker-compose logs elasticsearch
docker-compose logs localstack
Out of Disk Space¶
# Clean up old Docker data
docker system prune -a --volumes
# Remove old test data
rm -rf ~/dataknobs_postgres_data
rm -rf ~/dataknobs_elasticsearch_data
rm -rf ~/dataknobs_localstack_data
Configuration¶
Linting and Code Style¶
DataKnobs uses Ruff for linting and code formatting. The project has a carefully configured set of linting rules that balance code quality with practicality. See the Linting Configuration documentation for details on: - Which error types are ignored and why - How to run linting checks - Understanding the remaining important errors
To run linting checks:
# Check specific package
uv run bin/validate.sh data
# Auto-fix formatting issues
uv run ruff format packages/*/src
Type Checking and Python Compatibility¶
DataKnobs maintains Python 3.9+ compatibility with modern type hints. See the Python Compatibility Guide for important requirements.
Key requirements:
- All Python files with type hints must include from __future__ import annotations
- Use modern type hint syntax (str | None instead of Optional[str])
- Run type checking with uv run mypy to use project dependencies
To run type checking:
# Check entire data package
uv run mypy packages/data/src/dataknobs_data
# Check specific file
uv run mypy packages/data/src/dataknobs_data/validation/constraints.py
Current status (August 31, 2025): - ✅ All tests pass on Python 3.9.6 - ✅ 49 source files updated with future annotations - 📊 MyPy errors reduced from 774 to 767
Environment Variables¶
The quality check scripts respect these environment variables:
# Maximum age of artifacts for CI validation (hours)
export MAX_AGE_HOURS=24
# Minimum required code coverage (percentage)
export REQUIRED_COVERAGE=70
# Custom pytest markers
export PYTEST_MARKERS="not slow"
Customizing Checks¶
To add custom checks, modify bin/run-quality-checks.sh:
# Add your custom check
print_status "Running custom security scan..."
if run_security_scan; then
print_success "Security scan passed"
else
print_error "Security scan failed"
OVERALL_STATUS="FAIL"
fi
Benefits of This Approach¶
- Fast CI/CD - GitHub Actions runs in < 30 seconds vs 5-10 minutes
- Real Integration Testing - Tests run against actual services, not mocks
- Cost Effective - No cloud service costs for every PR
- Developer Ownership - Developers verify their code works before PR
- Audit Trail - Artifacts provide evidence of test execution
FAQ¶
Q: Why not run tests in CI? A: Running PostgreSQL, Elasticsearch, and LocalStack for every PR is expensive and slow. Local testing with artifact validation is faster and more cost-effective.
Q: What if I don't have Docker?
A: Docker is required for integration tests. You can still run unit tests without Docker: uv run pytest -m "not integration"
Q: Can I skip integration tests?
A: For PRs to feature branches, you might skip integration tests. For PRs to main, they're required.
Q: How do I add a new service?
A: Add it to docker-compose.override.yml, update bin/run-quality-checks.sh to wait for it, and document it here.
Q: What if artifacts are accidentally modified?
A: The signature check will detect this. Re-run ./bin/run-quality-checks.sh to regenerate valid artifacts.
Summary¶
- Before PR: Run
./bin/run-quality-checks.sh - Commit: Include
.quality-artifacts/in your commit - CI Validates: Artifacts are checked automatically
- Merge: Only if all checks pass
This process ensures high code quality while keeping CI/CD fast and economical!