Linting Configuration Guidelines¶
Overview¶
This document explains the rationale behind the linting and type checking configuration for the Dataknobs project. It serves as a reference for understanding which error types are considered important versus cosmetic, and why certain rules are ignored.
The actual configuration is in pyproject.toml. For specific errors that need to be fixed in each package, see the package-specific checklists (e.g., packages/data/docs/linting-errors-checklist.md).
Recent Cleanup (August 2025)¶
A comprehensive linting cleanup was performed, reducing Ruff errors in the data package from ~40 to 10. Key achievements: - Fixed all functional issues (undefined names, import shadowing, unused variables) - Modernized NumPy random generation (NPY002) - Fixed loop variable overwrites that revealed a bug in vector search - Moved type-checking imports appropriately (TC001/003/004) - Established clear guidelines for stylistic vs functional errors
Error Categories and Decisions¶
1. Important Errors to Keep (NOT ignored)¶
Critical Bugs¶
- F811: Redefinition of unused variable - Can mask real bugs
- F821: Undefined name - Runtime errors
- B904: Raise without
frominside except - Loses exception context - PLE0704: Bare raise not in exception handler - Invalid Python
Code Quality¶
- F401: Unused imports (except in init.py)
- F402: Import shadowing
- B007: Unused loop variable not prefixed with underscore
- B027: Empty method without abstract decorator (use # noqa: B027 for intentional empty implementations)
- PLR1714: Consider merging multiple comparisons
- PLR5501: Consider using elif
- RUF005: Consider iterable unpacking
- NPY002: Replace legacy np.random.rand - use modern np.random.default_rng()
- SIM101: Multiple isinstance calls - merge for clarity
- PYI056: Use += for all modifications - better type checker support
- PLW2901: Loop variable overwritten - can indicate logic errors
- TC001/TC003/TC004: Type checking imports - proper placement for performance
Security¶
- S3: Various security issues
2. Ignored Error Categories¶
Whitespace/Formatting (Auto-fixable)¶
- W291, W293: Whitespace issues - Cosmetic, can be auto-fixed
- E501: Line too long - Already configured at 100 chars
Documentation¶
- D105, D107: Missing docstrings in special methods - Often self-explanatory
- D200, D415, D417: Docstring formatting - Minor style issues
Type Annotations¶
- ANN204: Missing return type for
__init__- Always returns None - ANN001, ANN003: Missing type annotations - Often obvious from context
- ANN201, ANN202, ANN205: Missing return types - Can be inferred
Import Location¶
- PLC0415: Import at top-level - Sometimes needed for:
- Lazy loading for performance
- Avoiding circular dependencies
- Conditional imports
Code Simplification (Stylistic Preference)¶
- SIM102: Combine nested if - Sometimes clearer as nested
- SIM103: Return negated condition directly - Sometimes clearer with explicit if/else
- SIM108: Use ternary operator - Can reduce readability
- SIM118: Use
key in dictinstead ofkey in dict.keys()- Explicit .keys() can be clearer - PLW3301: Nested max calls - More readable when nested for complex expressions
- RUF006: Store asyncio.create_task reference - Only needed if task cancellation is required
Complexity Metrics¶
- PLR0911: Too many returns - Already limited to 6
- PLR0912: Too many branches - Already limited to 12
- PLR0915: Too many statements - Already limited to 50
Unused Arguments¶
- ARG001, ARG002, ARG004: Unused arguments - Often required by:
- Interface contracts
- Callback signatures
- Override methods
Type System Updates¶
- UP035, UP038: Modern type syntax - Gradual migration
- UP028: Yield from - Not always clearer
Remaining Important Errors¶
After configuration, focus on these error types that indicate real issues:
Critical Bugs (Must Fix)¶
- F811: Redefinition of unused variable - Can mask real bugs
- F821: Undefined name - Will cause runtime errors
- PLE0704: Bare raise not in exception handler - Invalid Python
- B904: Raise without
fromin except - Loses exception context
Code Quality Issues (Should Fix)¶
- F841: Local variable assigned but never used - Dead code
- F401: Unused imports (except in init.py) - Dead code
- F402: Import shadowing - Can cause confusion
- PLW0127: Self-assignment - Use # noqa: PLW0127 when intentional for documentation
Security¶
- S3: Various security issues - Always important to address
MyPy Type Checking Configuration¶
Common Error Categories¶
- attr-defined: Accessing undefined attributes, often on None
- no-untyped-def: Missing type annotations
- union-attr: Union type attribute access without guards
- assignment: Type incompatibilities in assignments
- arg-type: Wrong argument types passed to functions
- no-any-return: Returning Any from typed functions
- unreachable: Dead code - indicates logic errors
- import-untyped: Missing type stubs for third-party libraries
- import-not-found: Optional dependencies not installed
Configuration Strategy¶
- Third-party libraries: Add to ignore list when stubs unavailable
- Complex modules: Relax strictness for gradual migration
- Optional dependencies: Ignore imports for feature-specific libraries
- Legacy code: Use per-module overrides to disable strict checking
Priority for Fixes¶
- Unreachable code - Always indicates logic errors
- None attribute access - Add proper type guards
- Type mismatches - Fix as you modify code
- Missing annotations - Add gradually, prioritize public APIs
Running Validation¶
# Run linting checks
uv run bin/validate.sh [package-name]
# Run type checking
uv run mypy packages/[package-name]/src
# Run both with detailed output
uv run bin/validate.sh [package-name] --verbose
Package-Specific Checklists¶
Each package should maintain its own error checklist documenting specific issues to address:
- Location: packages/[package-name]/docs/linting-errors-checklist.md
- Format: Checkbox list organized by priority
- Updates: As errors are fixed, check them off and remove when complete
Example: packages/data/docs/linting-errors-checklist.md