Contributing to Dataknobs¶
We welcome contributions to the Dataknobs project! This guide will help you get started with contributing code, documentation, bug reports, and feature requests.
Table of Contents¶
- Code of Conduct
- Getting Started
- Development Setup
- How to Contribute
- Coding Standards
- Testing Guidelines
- Documentation
- Submitting Changes
- Review Process
- Community
Code of Conduct¶
By participating in this project, you agree to abide by our Code of Conduct:
- Be respectful: Treat all participants with respect and courtesy
- Be inclusive: Welcome newcomers and encourage diverse perspectives
- Be constructive: Focus on what is best for the community
- Be patient: Remember that people have different skill levels and backgrounds
- Be collaborative: Work together to resolve conflicts and reach consensus
Getting Started¶
Prerequisites¶
Before contributing, make sure you have:
- Python 3.8+ installed
- Git for version control
- GitHub account for submitting contributions
- Basic understanding of Python and software development practices
Find an Issue to Work On¶
- Browse our GitHub Issues
- Look for issues labeled
good first issueif you're new to the project - Check issues labeled
help wantedfor areas where we need assistance - Comment on the issue to let others know you're working on it
Types of Contributions¶
We welcome various types of contributions:
- Bug fixes: Help us identify and fix issues
- New features: Add functionality that benefits users
- Documentation: Improve guides, tutorials, and API docs
- Tests: Increase code coverage and test quality
- Performance improvements: Optimize existing code
- Examples: Add usage examples and tutorials
Development Setup¶
1. Fork and Clone¶
# Fork the repository on GitHub, then clone your fork
git clone https://github.com/yourusername/dataknobs.git
cd dataknobs
# Add upstream remote
git remote add upstream https://github.com/original/dataknobs.git
2. Create Development Environment¶
Using UV (Recommended)¶
# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install all packages
uv sync --all-packages
# Install the dk command for easy development
./setup-dk.sh
Using pip (Alternative)¶
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements-dev.txt
# Install packages in development mode
pip install -e packages/common
pip install -e packages/structures
pip install -e packages/utils
pip install -e packages/xization
3. Verify Setup¶
# Using the dk command (recommended)
dk test # Run tests
dk check # Quick quality check
dk diagnose # If something fails
# Or using traditional commands
pytest # Run tests
ruff check packages/ # Check code style
mypy packages/ # Run type checking
4. Development Workflow with dk¶
The dk command simplifies your development workflow:
# Quick development cycle
dk check data # Quick check while developing
dk fix # Auto-fix style issues
dk test data # Test your changes
# Before submitting PR
dk pr # Full quality checks
dk diagnose # If checks fail
See the dk Command Guide for full details.
How to Contribute¶
Reporting Bugs¶
When reporting bugs, please include:
- Clear title: Briefly describe the issue
- Description: Detailed explanation of the problem
- Reproduction steps: How to reproduce the issue
- Expected behavior: What should happen
- Actual behavior: What actually happens
- Environment: Python version, OS, package versions
- Code samples: Minimal example demonstrating the issue
Bug Report Template:
## Bug Description
Brief description of the issue.
## Steps to Reproduce
1. Step one
2. Step two
3. Step three
## Expected Behavior
Describe what you expected to happen.
## Actual Behavior
Describe what actually happened.
## Environment
- Python version: 3.9.7
- Dataknobs version: 1.0.0
- OS: Ubuntu 20.04
## Code Sample
```python
# Minimal code example
from dataknobs_structures import Tree
tree = Tree("test")
# Issue occurs here
### Requesting Features
When requesting features:
1. **Use case**: Explain why this feature is needed
2. **Detailed description**: What the feature should do
3. **Proposed API**: How users would interact with it
4. **Alternatives considered**: Other approaches you've thought of
5. **Implementation notes**: Any technical considerations
**Feature Request Template:**
```markdown
## Feature Description
Brief description of the proposed feature.
## Use Case
Why is this feature needed? What problem does it solve?
## Proposed Implementation
How should this feature work? Include API examples.
```python
# Example of proposed API
from dataknobs_utils import new_feature
result = new_feature.process_data(data)
Alternatives Considered¶
What other approaches did you consider?
Additional Context¶
Any other relevant information.
### Making Code Changes
#### 1. Create a Feature Branch
```bash
# Update your main branch
git checkout main
git pull upstream main
# Create feature branch
git checkout -b feature/your-feature-name
# or for bug fixes:
git checkout -b bugfix/issue-description
2. Make Your Changes¶
- Write clean, readable code
- Follow existing code patterns
- Add appropriate comments
- Update docstrings for public APIs
3. Add Tests¶
# Example test structure
import pytest
from dataknobs_structures import Tree
class TestYourFeature:
def test_basic_functionality(self):
"""Test the basic functionality of your feature."""
# Arrange
tree = Tree("test")
# Act
result = tree.your_new_method()
# Assert
assert result is not None
assert isinstance(result, expected_type)
def test_edge_cases(self):
"""Test edge cases and error conditions."""
tree = Tree(None)
with pytest.raises(ValueError):
tree.your_new_method()
4. Update Documentation¶
- Update docstrings for new/modified functions
- Add usage examples
- Update README if needed
- Add entries to CHANGELOG if appropriate
Adding a New Package¶
When adding a new package to the monorepo, follow this process to ensure it's properly integrated:
1. Register the Package¶
Add your package to the central registry:
# Edit .dataknobs/packages.json
{
"name": "newpackage",
"pypi_name": "dataknobs-newpackage",
"description": "Brief description of the package",
"version": "0.1.0",
"category": "core", # or "experimental", "legacy"
"requires_docs_build": true,
"deprecated": false
}
2. Validate Package References¶
Run the validation script to check what needs updating:
The validator will tell you exactly which files need updates.
3. Update Files as Needed¶
The validation typically catches:
- GitHub Workflows: Add package to docs build steps
- Release Workflow: Add to package selection dropdown
- README.md: List the package in installation examples
- Documentation: Add to package tables
4. Verify Before PR¶
Why This Matters¶
The package registry system ensures: - ✅ No missing references when adding packages - ✅ Consistent package information across files - ✅ Automated validation in CI - ✅ Clear documentation of package metadata
For more details, see .dataknobs/README.md for the full package registry documentation.
Coding Standards¶
Python Style Guide¶
We follow PEP 8 with some modifications:
- Line length: 88 characters (Black default)
- Imports: Use isort for import organization
- Docstrings: Google style docstrings
- Type hints: Required for all public APIs
Code Formatting¶
# Format code with Black
black packages/
# Sort imports with isort
isort packages/
# Check formatting
black --check packages/
isort --check-only packages/
Docstring Style¶
def example_function(param1: str, param2: int = 0) -> bool:
"""Brief description of the function.
Longer description if needed. Explain the purpose,
behavior, and any important details.
Args:
param1: Description of param1.
param2: Description of param2. Defaults to 0.
Returns:
Description of return value.
Raises:
ValueError: If param1 is empty.
TypeError: If param2 is not an integer.
Example:
Basic usage example:
>>> result = example_function("test", 5)
>>> print(result)
True
"""
if not param1:
raise ValueError("param1 cannot be empty")
# Implementation here
return True
Type Hints¶
Important: All files with type hints must include from __future__ import annotations for Python 3.9 compatibility. See the Python Compatibility Guide for details.
from __future__ import annotations
from pathlib import Path
from typing import Any
# Good examples (modern style with future annotations)
def process_files(file_paths: list[Path]) -> dict[str, Any]:
"""Process multiple files and return results."""
pass
def get_value(data: dict[str, Any], key: str, default: str | None = None) -> str | None:
"""Get value from dictionary with optional default."""
pass
# For complex types, create type aliases
DocumentData = dict[str, str | int | list[str]]
ProcessingResult = dict[str, bool | str | list[DocumentData]]
Testing Guidelines¶
Test Structure¶
Organize tests to match the package structure:
tests/
├── unit/ # Unit tests
│ ├── structures/
│ ├── utils/
│ └── xization/
├── integration/ # Integration tests
└── fixtures/ # Test fixtures and data
Writing Good Tests¶
import pytest
from unittest.mock import Mock, patch
from dataknobs_utils import file_utils
class TestFileUtils:
"""Test file utility functions."""
def test_filepath_generator_basic(self):
"""Test basic filepath generation."""
# Use descriptive test names
# Test the happy path first
pass
def test_filepath_generator_empty_directory(self):
"""Test filepath generation with empty directory."""
# Test edge cases
pass
def test_filepath_generator_nonexistent_path(self):
"""Test filepath generation with nonexistent path."""
# Test error conditions
with pytest.raises(FileNotFoundError):
list(file_utils.filepath_generator("/nonexistent/path"))
@patch('os.walk')
def test_filepath_generator_with_mock(self, mock_walk):
"""Test filepath generation with mocked filesystem."""
# Mock external dependencies when needed
mock_walk.return_value = [("/test", [], ["file1.txt", "file2.txt"])]
result = list(file_utils.filepath_generator("/test"))
assert len(result) == 2
assert "/test/file1.txt" in result
assert "/test/file2.txt" in result
Test Coverage¶
# Run tests with coverage
pytest --cov=packages/ --cov-report=html
# View coverage report
open htmlcov/index.html
# Aim for >90% coverage
pytest --cov=packages/ --cov-fail-under=90
Integration Tests¶
# tests/integration/test_pipeline.py
import tempfile
from pathlib import Path
from dataknobs_utils import file_utils
from dataknobs_xization import normalize
from dataknobs_structures import Tree
def test_complete_text_processing_pipeline():
"""Test complete text processing pipeline integration."""
with tempfile.TemporaryDirectory() as temp_dir:
# Create test data
test_file = Path(temp_dir) / "test.txt"
test_file.write_text("getUserName() & validateInput")
# Test file reading
content = next(file_utils.fileline_generator(str(test_file)))
assert content == "getUserName() & validateInput"
# Test normalization
normalized = normalize.expand_camelcase_fn(content)
assert "get User Name" in normalized
# Test tree structure
tree = Tree(normalized)
assert tree.data == normalized
Documentation¶
API Documentation¶
We use MkDocs with mkdocstrings for API documentation:
def new_function(param: str) -> str:
"""Brief description of the function.
Longer description with examples and usage notes.
Args:
param: Description of the parameter.
Returns:
Description of the return value.
Example:
>>> result = new_function("test")
>>> print(result)
'processed: test'
"""
return f"processed: {param}"
User Documentation¶
When adding new features, update:
- User Guide: Add usage examples
- API Reference: Ensure docstrings are complete
- Examples: Add practical examples
- README: Update if the change affects installation or basic usage
Documentation Style¶
- Use clear, concise language
- Provide practical examples
- Include code snippets that work
- Explain not just "how" but "why"
- Use proper Markdown formatting
Submitting Changes¶
Pre-submission Checklist¶
Before submitting your pull request:
- Code follows style guidelines
- All tests pass locally
- New tests added for new functionality
- Documentation updated
- Type hints added
- Docstrings written/updated
- CHANGELOG updated (if applicable)
Running Pre-commit Checks¶
# Using dk command (recommended)
dk pr # Run full PR quality checks
dk diagnose # If checks fail, see what went wrong
dk fix # Auto-fix style issues
dk test --last # Re-run only failed tests
# Or manually run individual checks
uv run ruff check packages/ # Style check
uv run ruff format packages/ # Format code
uv run pylint packages/*/src # Linting
uv run mypy packages/ # Type checking
uv run pytest # Run tests
Commit Messages¶
Use conventional commit messages:
type(scope): brief description
Longer description if needed.
- Bullet point changes
- Another change
Fixes #123
Types:
- feat: New features
- fix: Bug fixes
- docs: Documentation changes
- style: Code formatting changes
- refactor: Code refactoring
- test: Adding or updating tests
- chore: Maintenance tasks
Examples:
git commit -m "feat(structures): add tree traversal method
Add breadth-first traversal option to Tree.find_nodes()
method to improve search performance for shallow targets.
- Add traversal parameter with 'dfs' and 'bfs' options
- Update tests and documentation
- Maintain backward compatibility
Fixes #45"
Creating Pull Request¶
-
Push your branch to your fork:
-
Create pull request on GitHub:
- Use the pull request template
- Provide clear title and description
- Link related issues
-
Add screenshots if relevant
-
Pull request template:
## Description Brief description of the changes. ## Changes Made - List of changes - Another change ## Testing - [ ] Unit tests added/updated - [ ] Integration tests pass - [ ] Manual testing performed ## Documentation - [ ] Docstrings updated - [ ] User guide updated - [ ] Examples added ## Related Issues Fixes #123 Closes #456
Review Process¶
What Reviewers Look For¶
- Code Quality
- Follows style guidelines
- Clear and readable
- Proper error handling
-
Efficient algorithms
-
Testing
- Adequate test coverage
- Tests actually test the feature
- Edge cases covered
-
No flaky tests
-
Documentation
- Clear docstrings
- Updated user documentation
-
Examples work as expected
-
Compatibility
- Doesn't break existing APIs
- Works across supported Python versions
- Handles backward compatibility
Addressing Feedback¶
- Respond to comments promptly
- Ask questions if feedback is unclear
- Make requested changes
- Update tests and documentation as needed
- Mark conversations as resolved when addressed
Approval Process¶
- At least one maintainer approval required
- All checks must pass
- No unresolved conversations
- Documentation updated
Community¶
Communication Channels¶
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Questions and community discussions
- Pull Requests: Code contributions and reviews
Getting Help¶
- Check existing documentation
- Search GitHub issues
- Ask in GitHub Discussions
- Create new issue if needed
Recognition¶
We recognize contributors through:
- Contributor list in README
- Release notes acknowledgments
- GitHub contributor statistics
- Special recognition for significant contributions
Becoming a Maintainer¶
Active contributors may be invited to become maintainers based on:
- Quality and quantity of contributions
- Understanding of the codebase
- Helpfulness to community members
- Commitment to project values
Resources¶
- Development Guide - Main development documentation
- Architecture Overview - System design
- Testing Guide - Detailed testing information
- Python Style Guide - PEP 8 coding standards
- Semantic Versioning - Versioning guidelines
Questions?¶
If you have questions about contributing:
- Check the Development Guide
- Search existing GitHub Issues
- Start a GitHub Discussion
- Create a new issue if your question hasn't been addressed
Thank you for contributing to Dataknobs! 🎉