Contributing to Dataknobs¶

We welcome contributions to the Dataknobs project! This guide will help you get started with contributing code, documentation, bug reports, and feature requests.

Table of Contents¶

Code of Conduct
Getting Started
Development Setup
How to Contribute
Coding Standards
Testing Guidelines
Documentation
Submitting Changes
Review Process
Community

Code of Conduct¶

By participating in this project, you agree to abide by our Code of Conduct:

Be respectful: Treat all participants with respect and courtesy
Be inclusive: Welcome newcomers and encourage diverse perspectives
Be constructive: Focus on what is best for the community
Be patient: Remember that people have different skill levels and backgrounds
Be collaborative: Work together to resolve conflicts and reach consensus

Getting Started¶

Prerequisites¶

Before contributing, make sure you have:

Python 3.8+ installed
Git for version control
GitHub account for submitting contributions
Basic understanding of Python and software development practices

Find an Issue to Work On¶

Browse our GitHub Issues
Look for issues labeled good first issue if you're new to the project
Check issues labeled help wanted for areas where we need assistance
Comment on the issue to let others know you're working on it

Types of Contributions¶

We welcome various types of contributions:

Bug fixes: Help us identify and fix issues
New features: Add functionality that benefits users
Documentation: Improve guides, tutorials, and API docs
Tests: Increase code coverage and test quality
Performance improvements: Optimize existing code
Examples: Add usage examples and tutorials

Development Setup¶

1. Fork and Clone¶

# Fork the repository on GitHub, then clone your fork
git clone https://github.com/yourusername/dataknobs.git
cd dataknobs

# Add upstream remote
git remote add upstream https://github.com/original/dataknobs.git

2. Create Development Environment¶

Using UV (Recommended)¶

# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install all packages
uv sync --all-packages

# Install the dk command for easy development
./setup-dk.sh

Using pip (Alternative)¶

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements-dev.txt

# Install packages in development mode
pip install -e packages/common
pip install -e packages/structures
pip install -e packages/utils
pip install -e packages/xization

3. Verify Setup¶

# Using the dk command (recommended)
dk test           # Run tests
dk check          # Quick quality check
dk diagnose       # If something fails

# Or using traditional commands
pytest            # Run tests
ruff check packages/  # Check code style
mypy packages/    # Run type checking

4. Development Workflow with dk¶

The dk command simplifies your development workflow:

# Quick development cycle
dk check data     # Quick check while developing
dk fix            # Auto-fix style issues
dk test data      # Test your changes

# Before submitting PR
dk pr             # Full quality checks
dk diagnose       # If checks fail

See the dk Command Guide for full details.

How to Contribute¶

Reporting Bugs¶

When reporting bugs, please include:

Clear title: Briefly describe the issue
Description: Detailed explanation of the problem
Reproduction steps: How to reproduce the issue
Expected behavior: What should happen
Actual behavior: What actually happens
Environment: Python version, OS, package versions
Code samples: Minimal example demonstrating the issue

Bug Report Template:

## Bug Description
Brief description of the issue.

## Steps to Reproduce
1. Step one
2. Step two
3. Step three

## Expected Behavior
Describe what you expected to happen.

## Actual Behavior
Describe what actually happened.

## Environment
- Python version: 3.9.7
- Dataknobs version: 1.0.0
- OS: Ubuntu 20.04

## Code Sample
```python
# Minimal code example
from dataknobs_structures import Tree
tree = Tree("test")
# Issue occurs here

### Requesting Features

When requesting features:

1. **Use case**: Explain why this feature is needed
2. **Detailed description**: What the feature should do
3. **Proposed API**: How users would interact with it
4. **Alternatives considered**: Other approaches you've thought of
5. **Implementation notes**: Any technical considerations

**Feature Request Template:**

```markdown
## Feature Description
Brief description of the proposed feature.

## Use Case
Why is this feature needed? What problem does it solve?

## Proposed Implementation
How should this feature work? Include API examples.

```python
# Example of proposed API
from dataknobs_utils import new_feature
result = new_feature.process_data(data)

Alternatives Considered¶

What other approaches did you consider?

Additional Context¶

Any other relevant information.

### Making Code Changes

#### 1. Create a Feature Branch

```bash
# Update your main branch
git checkout main
git pull upstream main

# Create feature branch
git checkout -b feature/your-feature-name
# or for bug fixes:
git checkout -b bugfix/issue-description

2. Make Your Changes¶

Write clean, readable code
Follow existing code patterns
Add appropriate comments
Update docstrings for public APIs

3. Add Tests¶

# Example test structure
import pytest
from dataknobs_structures import Tree

class TestYourFeature:
    def test_basic_functionality(self):
        """Test the basic functionality of your feature."""
        # Arrange
        tree = Tree("test")

        # Act
        result = tree.your_new_method()

        # Assert
        assert result is not None
        assert isinstance(result, expected_type)

    def test_edge_cases(self):
        """Test edge cases and error conditions."""
        tree = Tree(None)
        with pytest.raises(ValueError):
            tree.your_new_method()

4. Update Documentation¶

Update docstrings for new/modified functions
Add usage examples
Update README if needed
Add entries to CHANGELOG if appropriate

Adding a New Package¶

When adding a new package to the monorepo, follow this process to ensure it's properly integrated:

1. Register the Package¶

Add your package to the central registry:

# Edit .dataknobs/packages.json
{
  "name": "newpackage",
  "pypi_name": "dataknobs-newpackage",
  "description": "Brief description of the package",
  "version": "0.1.0",
  "category": "core",  # or "experimental", "legacy"
  "requires_docs_build": true,
  "deprecated": false
}

2. Validate Package References¶

Run the validation script to check what needs updating:

# Using dk command
dk validate-pkgs

# Or directly
uv run python bin/validate-package-references.py

The validator will tell you exactly which files need updates.

3. Update Files as Needed¶

The validation typically catches:

GitHub Workflows: Add package to docs build steps
Release Workflow: Add to package selection dropdown
README.md: List the package in installation examples
Documentation: Add to package tables

4. Verify Before PR¶

# Run full PR checks (includes validation)
dk pr

# Or just run validation
dk validate-pkgs

Why This Matters¶

The package registry system ensures: - ✅ No missing references when adding packages - ✅ Consistent package information across files - ✅ Automated validation in CI - ✅ Clear documentation of package metadata

For more details, see .dataknobs/README.md for the full package registry documentation.

Coding Standards¶

Python Style Guide¶

We follow PEP 8 with some modifications:

Line length: 88 characters (Black default)
Imports: Use isort for import organization
Docstrings: Google style docstrings
Type hints: Required for all public APIs

Code Formatting¶

# Format code with Black
black packages/

# Sort imports with isort
isort packages/

# Check formatting
black --check packages/
isort --check-only packages/

Docstring Style¶

def example_function(param1: str, param2: int = 0) -> bool:
    """Brief description of the function.

    Longer description if needed. Explain the purpose,
    behavior, and any important details.

    Args:
        param1: Description of param1.
        param2: Description of param2. Defaults to 0.

    Returns:
        Description of return value.

    Raises:
        ValueError: If param1 is empty.
        TypeError: If param2 is not an integer.

    Example:
        Basic usage example:

        >>> result = example_function("test", 5)
        >>> print(result)
        True
    """
    if not param1:
        raise ValueError("param1 cannot be empty")

    # Implementation here
    return True

Type Hints¶

Important: All files with type hints must include from __future__ import annotations for Python 3.9 compatibility. See the Python Compatibility Guide for details.

from __future__ import annotations

from pathlib import Path
from typing import Any

# Good examples (modern style with future annotations)
def process_files(file_paths: list[Path]) -> dict[str, Any]:
    """Process multiple files and return results."""
    pass

def get_value(data: dict[str, Any], key: str, default: str | None = None) -> str | None:
    """Get value from dictionary with optional default."""
    pass

# For complex types, create type aliases
DocumentData = dict[str, str | int | list[str]]
ProcessingResult = dict[str, bool | str | list[DocumentData]]

Testing Guidelines¶

Test Structure¶

Organize tests to match the package structure:

tests/
├── unit/                    # Unit tests
│   ├── structures/
│   ├── utils/
│   └── xization/
├── integration/           # Integration tests
└── fixtures/              # Test fixtures and data

Writing Good Tests¶

import pytest
from unittest.mock import Mock, patch
from dataknobs_utils import file_utils

class TestFileUtils:
    """Test file utility functions."""

    def test_filepath_generator_basic(self):
        """Test basic filepath generation."""
        # Use descriptive test names
        # Test the happy path first
        pass

    def test_filepath_generator_empty_directory(self):
        """Test filepath generation with empty directory."""
        # Test edge cases
        pass

    def test_filepath_generator_nonexistent_path(self):
        """Test filepath generation with nonexistent path."""
        # Test error conditions
        with pytest.raises(FileNotFoundError):
            list(file_utils.filepath_generator("/nonexistent/path"))

    @patch('os.walk')
    def test_filepath_generator_with_mock(self, mock_walk):
        """Test filepath generation with mocked filesystem."""
        # Mock external dependencies when needed
        mock_walk.return_value = [("/test", [], ["file1.txt", "file2.txt"])]

        result = list(file_utils.filepath_generator("/test"))

        assert len(result) == 2
        assert "/test/file1.txt" in result
        assert "/test/file2.txt" in result

Test Coverage¶

# Run tests with coverage
pytest --cov=packages/ --cov-report=html

# View coverage report
open htmlcov/index.html

# Aim for >90% coverage
pytest --cov=packages/ --cov-fail-under=90

Integration Tests¶

# tests/integration/test_pipeline.py
import tempfile
from pathlib import Path
from dataknobs_utils import file_utils
from dataknobs_xization import normalize
from dataknobs_structures import Tree

def test_complete_text_processing_pipeline():
    """Test complete text processing pipeline integration."""
    with tempfile.TemporaryDirectory() as temp_dir:
        # Create test data
        test_file = Path(temp_dir) / "test.txt"
        test_file.write_text("getUserName() & validateInput")

        # Test file reading
        content = next(file_utils.fileline_generator(str(test_file)))
        assert content == "getUserName() & validateInput"

        # Test normalization
        normalized = normalize.expand_camelcase_fn(content)
        assert "get User Name" in normalized

        # Test tree structure
        tree = Tree(normalized)
        assert tree.data == normalized

Documentation¶

API Documentation¶

We use MkDocs with mkdocstrings for API documentation:

def new_function(param: str) -> str:
    """Brief description of the function.

    Longer description with examples and usage notes.

    Args:
        param: Description of the parameter.

    Returns:
        Description of the return value.

    Example:
        >>> result = new_function("test")
        >>> print(result)
        'processed: test'
    """
    return f"processed: {param}"

User Documentation¶

When adding new features, update:

User Guide: Add usage examples
API Reference: Ensure docstrings are complete
Examples: Add practical examples
README: Update if the change affects installation or basic usage

Documentation Style¶

Use clear, concise language
Provide practical examples
Include code snippets that work
Explain not just "how" but "why"
Use proper Markdown formatting

Submitting Changes¶

Pre-submission Checklist¶

Before submitting your pull request:

Running Pre-commit Checks¶

# Using dk command (recommended)
dk pr              # Run full PR quality checks
dk diagnose        # If checks fail, see what went wrong
dk fix             # Auto-fix style issues
dk test --last     # Re-run only failed tests

# Or manually run individual checks
uv run ruff check packages/    # Style check
uv run ruff format packages/   # Format code
uv run pylint packages/*/src   # Linting
uv run mypy packages/          # Type checking
uv run pytest                  # Run tests

Commit Messages¶

Use conventional commit messages:

type(scope): brief description

Longer description if needed.

- Bullet point changes
- Another change

Fixes #123

Types: - feat: New features - fix: Bug fixes - docs: Documentation changes - style: Code formatting changes - refactor: Code refactoring - test: Adding or updating tests - chore: Maintenance tasks

Examples:

git commit -m "feat(structures): add tree traversal method

Add breadth-first traversal option to Tree.find_nodes()
method to improve search performance for shallow targets.

- Add traversal parameter with 'dfs' and 'bfs' options
- Update tests and documentation
- Maintain backward compatibility

Fixes #45"

Creating Pull Request¶

Push your branch to your fork:

git push origin feature/your-feature-name

Create pull request on GitHub:
Use the pull request template
Provide clear title and description
Link related issues
Add screenshots if relevant

Pull request template:

## Description
Brief description of the changes.

## Changes Made
- List of changes
- Another change

## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests pass
- [ ] Manual testing performed

## Documentation
- [ ] Docstrings updated
- [ ] User guide updated
- [ ] Examples added

## Related Issues
Fixes #123
Closes #456

Review Process¶

What Reviewers Look For¶

Code Quality
Follows style guidelines
Clear and readable
Proper error handling
Efficient algorithms
Testing
Adequate test coverage
Tests actually test the feature
Edge cases covered
No flaky tests
Documentation
Clear docstrings
Updated user documentation
Examples work as expected
Compatibility
Doesn't break existing APIs
Works across supported Python versions
Handles backward compatibility

Addressing Feedback¶

Respond to comments promptly
Ask questions if feedback is unclear
Make requested changes
Update tests and documentation as needed
Mark conversations as resolved when addressed

Approval Process¶

At least one maintainer approval required
All checks must pass
No unresolved conversations
Documentation updated

Community¶

Communication Channels¶

GitHub Issues: Bug reports and feature requests
GitHub Discussions: Questions and community discussions
Pull Requests: Code contributions and reviews

Getting Help¶

Check existing documentation
Search GitHub issues
Ask in GitHub Discussions
Create new issue if needed

Recognition¶

We recognize contributors through:

Contributor list in README
Release notes acknowledgments
GitHub contributor statistics
Special recognition for significant contributions

Becoming a Maintainer¶

Active contributors may be invited to become maintainers based on:

Quality and quantity of contributions
Understanding of the codebase
Helpfulness to community members
Commitment to project values

Resources¶

Development Guide - Main development documentation
Architecture Overview - System design
Testing Guide - Detailed testing information
Python Style Guide - PEP 8 coding standards
Semantic Versioning - Versioning guidelines

Questions?¶

If you have questions about contributing:

Check the Development Guide
Search existing GitHub Issues
Start a GitHub Discussion
Create a new issue if your question hasn't been addressed

Thank you for contributing to Dataknobs! 🎉