Skip to content

dataknobs-structures Complete API Reference

Complete auto-generated API documentation from source code docstrings.

💡 Also see: - Curated Guide - Hand-crafted tutorials and examples - Package Overview - Introduction and getting started - Source Code - View on GitHub


dataknobs_structures

Core data structures for AI knowledge bases and data processing.

The dataknobs-structures package provides fundamental data structures for building AI applications, knowledge bases, and data processing pipelines. It includes tree structures, document containers, record stores, and conditional dictionaries.

Modules

Tree - Hierarchical data structures

The Tree class provides a flexible node-based tree structure with: - Parent-child relationships with bidirectional links - Depth-first and breadth-first traversal - Node search and filtering - Path finding and common ancestor detection - Graphviz visualization support

Document - Text with metadata

Classes for managing text documents with associated metadata: - Text: Container combining content and metadata - TextMetaData: Structured metadata with IDs and labels - MetaData: Generic key-value metadata container

RecordStore - Tabular data management

A flexible record store that represents data as: - List of dictionaries (Python native) - pandas DataFrame (data analysis) - TSV/CSV files (disk persistence)

cdict - Conditional dictionary

A dictionary that validates items before acceptance using the strategy pattern. Rejected items are tracked separately for inspection.

Quick Examples

Using Tree
from dataknobs_structures import Tree

# Build tree structure
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")
grandchild = child1.add_child("grandchild")

# Search and traverse
found = root.find_nodes(lambda n: "child" in str(n.data))
edges = root.get_edges()  # All parent-child pairs
Using Text with metadata
from dataknobs_structures import Text, TextMetaData

# Create document with metadata
metadata = TextMetaData(
    text_id="doc_001",
    text_label="article",
    author="Alice",
    category="technology"
)
doc = Text("This is the document content...", metadata)

print(doc.text_id)     # "doc_001"
print(doc.text_label)  # "article"
print(doc.metadata.get_value("author"))  # "Alice"
Using RecordStore
from dataknobs_structures import RecordStore

# Create store with disk backing
store = RecordStore("/data/users.tsv")

# Add records
store.add_rec({"id": 1, "name": "Alice", "age": 30})
store.add_rec({"id": 2, "name": "Bob", "age": 25})

# Access as DataFrame or list
df = store.df
records = store.records

# Persist changes
store.save()
Using cdict
from dataknobs_structures import cdict

# Only accept positive integers
positive = cdict(lambda d, k, v: isinstance(v, int) and v > 0)
positive['a'] = 5    # Accepted
positive['b'] = -1   # Rejected

print(positive)          # {'a': 5}
print(positive.rejected)  # {'b': -1}

Design Philosophy

The structures in this package are designed to be: 1. Simple - Easy to understand and use with minimal boilerplate 2. Flexible - Support multiple representations and use cases 3. Composable - Work well together in larger systems 4. Practical - Solve real problems in AI and data workflows

Installation

pip install dataknobs-structures

For more detailed documentation, see the individual class and function docstrings.

Modules:

Name Description
conditional_dict

Conditional dictionary with validation using the strategy pattern.

document

Text and metadata containers for document processing.

record_store

Record storage with in-memory and disk persistence.

tree

Tree data structure with parent-child relationships and traversal methods.

Classes:

Name Description
cdict

Dictionary that validates key-value pairs before acceptance.

Text

Container combining text content with metadata.

TextMetaData

Specialized metadata container for text documents.

RecordStore

Container for tabular records with memory and disk representations.

Tree

A tree node with arbitrary data and parent-child relationships.

Functions:

Name Description
build_tree_from_string

Build a Tree from a parenthesized string representation.

Classes

cdict

cdict(accept_fn: Callable[[Dict, Any, Any], bool], *args: Any, **kwargs: Any)

Bases: dict

Dictionary that validates key-value pairs before acceptance.

A dictionary subclass that applies a validation function to each item before allowing it to be set. Items that fail validation are stored in a separate "rejected" dictionary rather than being added to the main dictionary.

This uses the strategy pattern where the validation logic is provided as a function at initialization time, allowing flexible and reusable validation rules.

Attributes:

Name Type Description
rejected Dict

Dictionary containing key-value pairs that failed validation.

accept_fn

The validation function used to check items.

Example
# Only accept string keys
str_keys = cdict(lambda d, k, v: isinstance(k, str))
str_keys['valid'] = 1      # Accepted
str_keys[123] = 2          # Rejected

# Only accept even numbers
evens = cdict(lambda d, k, v: isinstance(v, int) and v % 2 == 0)
evens['a'] = 2    # Accepted
evens['b'] = 3    # Rejected
evens['c'] = 4    # Accepted

print(evens)           # {'a': 2, 'c': 4}
print(evens.rejected)  # {'b': 3}

# Prevent duplicates using dict state
no_dups = cdict(lambda d, k, v: k not in d)
no_dups['x'] = 1
no_dups['x'] = 2  # Rejected (key already exists)

Initialize conditional dictionary with validation function.

Parameters:

Name Type Description Default
accept_fn Callable[[Dict, Any, Any], bool]

Validation function with signature (dict, key, value) -> bool. Returns True to accept the key-value pair, False to reject it. The dict parameter is the current state of this dictionary.

required
*args Any

Initial items as a mapping or iterable of key-value pairs.

()
**kwargs Any

Additional initial items as keyword arguments.

{}
Example
# Validation function receives dict, key, and value
def validate(d, key, val):
    return isinstance(val, int) and val > 0

# Initialize empty
d1 = cdict(validate)

# Initialize with items
d2 = cdict(validate, {'a': 1, 'b': 2})

# Initialize with kwargs
d3 = cdict(validate, x=5, y=10)

Methods:

Name Description
__setitem__

Set an item if it passes validation, otherwise reject it.

setdefault

Set default value if key doesn't exist and passes validation.

update

Update dictionary with key-value pairs, validating each.

Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
def __init__(
    self, accept_fn: Callable[[Dict, Any, Any], bool], *args: Any, **kwargs: Any
) -> None:
    """Initialize conditional dictionary with validation function.

    Args:
        accept_fn: Validation function with signature (dict, key, value) -> bool.
            Returns True to accept the key-value pair, False to reject it.
            The dict parameter is the current state of this dictionary.
        *args: Initial items as a mapping or iterable of key-value pairs.
        **kwargs: Additional initial items as keyword arguments.

    Example:
        ```python
        # Validation function receives dict, key, and value
        def validate(d, key, val):
            return isinstance(val, int) and val > 0

        # Initialize empty
        d1 = cdict(validate)

        # Initialize with items
        d2 = cdict(validate, {'a': 1, 'b': 2})

        # Initialize with kwargs
        d3 = cdict(validate, x=5, y=10)
        ```
    """
    super().__init__()
    self._rejected: Dict[Any, Any] = {}
    self.accept_fn = accept_fn
    # super().__init__(*args, **kwargs)
    self.update(*args, **kwargs)
Attributes
rejected property
rejected: Dict

Dictionary of rejected key-value pairs.

Returns:

Type Description
Dict

Dictionary containing items that failed validation.

Example
d = cdict(lambda _, k, v: v > 0)
d['a'] = 5   # Accepted
d['b'] = -1  # Rejected

print(d.rejected)  # {'b': -1}
Functions
__setitem__
__setitem__(key: Any, value: Any) -> None

Set an item if it passes validation, otherwise reject it.

Parameters:

Name Type Description Default
key Any

The dictionary key.

required
value Any

The value to set.

required
Example
d = cdict(lambda _, k, v: isinstance(v, str))
d['name'] = 'Alice'  # Accepted
d['age'] = 30        # Rejected (not a string)
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
def __setitem__(self, key: Any, value: Any) -> None:
    """Set an item if it passes validation, otherwise reject it.

    Args:
        key: The dictionary key.
        value: The value to set.

    Example:
        ```python
        d = cdict(lambda _, k, v: isinstance(v, str))
        d['name'] = 'Alice'  # Accepted
        d['age'] = 30        # Rejected (not a string)
        ```
    """
    if self.accept_fn(self, key, value):
        super().__setitem__(key, value)
    else:
        self._rejected[key] = value
setdefault
setdefault(key: Any, default: Any = None) -> Any

Set default value if key doesn't exist and passes validation.

If the key exists, returns its value. If not, validates the default value and sets it if accepted (or rejects it if not).

Parameters:

Name Type Description Default
key Any

The dictionary key.

required
default Any

Default value to set if key doesn't exist. Defaults to None.

None

Returns:

Type Description
Any

The existing value if key exists, the default value if set, or None

Any

if the default was rejected.

Example
d = cdict(lambda _, k, v: v > 0)
d['a'] = 5

print(d.setdefault('a', 10))  # 5 (existing value)
print(d.setdefault('b', 3))   # 3 (accepted and set)
print(d.setdefault('c', -1))  # None (rejected)
print(d)  # {'a': 5, 'b': 3}
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
def setdefault(self, key: Any, default: Any = None) -> Any:
    """Set default value if key doesn't exist and passes validation.

    If the key exists, returns its value. If not, validates the default value
    and sets it if accepted (or rejects it if not).

    Args:
        key: The dictionary key.
        default: Default value to set if key doesn't exist. Defaults to None.

    Returns:
        The existing value if key exists, the default value if set, or None
        if the default was rejected.

    Example:
        ```python
        d = cdict(lambda _, k, v: v > 0)
        d['a'] = 5

        print(d.setdefault('a', 10))  # 5 (existing value)
        print(d.setdefault('b', 3))   # 3 (accepted and set)
        print(d.setdefault('c', -1))  # None (rejected)
        print(d)  # {'a': 5, 'b': 3}
        ```
    """
    rv = None
    if key not in self:
        if self.accept_fn(self, key, default):
            super().__setitem__(key, default)
            rv = default
        else:
            self._rejected[key] = default
    else:
        rv = self[key]
    return rv
update
update(*args: Any, **kwargs: Any) -> None

Update dictionary with key-value pairs, validating each.

Accepts either a mapping object, an iterable of key-value pairs, or keyword arguments. Each item is validated and added or rejected individually.

Parameters:

Name Type Description Default
*args Any

Mapping or iterable of (key, value) pairs.

()
**kwargs Any

Key-value pairs as keyword arguments.

{}
Example
d = cdict(lambda _, k, v: isinstance(v, int))

# Update from dict
d.update({'a': 1, 'b': 'text', 'c': 3})
print(d)           # {'a': 1, 'c': 3}
print(d.rejected)  # {'b': 'text'}

# Update from kwargs
d.update(x=5, y='invalid')
print(d)           # {'a': 1, 'c': 3, 'x': 5}
print(d.rejected)  # {'b': 'text', 'y': 'invalid'}
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
def update(self, *args: Any, **kwargs: Any) -> None:
    """Update dictionary with key-value pairs, validating each.

    Accepts either a mapping object, an iterable of key-value pairs, or
    keyword arguments. Each item is validated and added or rejected individually.

    Args:
        *args: Mapping or iterable of (key, value) pairs.
        **kwargs: Key-value pairs as keyword arguments.

    Example:
        ```python
        d = cdict(lambda _, k, v: isinstance(v, int))

        # Update from dict
        d.update({'a': 1, 'b': 'text', 'c': 3})
        print(d)           # {'a': 1, 'c': 3}
        print(d.rejected)  # {'b': 'text'}

        # Update from kwargs
        d.update(x=5, y='invalid')
        print(d)           # {'a': 1, 'c': 3, 'x': 5}
        print(d.rejected)  # {'b': 'text', 'y': 'invalid'}
        ```
    """
    # Handle positional argument if present
    if args:
        other = args[0]
        if hasattr(other, "keys"):
            # It's a mapping-like object
            for key in other.keys():
                self.__setitem__(key, other[key])
        else:
            # It's an iterable of key-value pairs
            for key, value in other:
                self.__setitem__(key, value)
    # Handle keyword arguments
    for key, value in kwargs.items():
        self.__setitem__(key, value)

Text

Text(text: str, metadata: TextMetaData | None)

Container combining text content with metadata.

Wraps a text string with associated metadata for document processing, text analysis, and NLP pipelines. Provides convenient access to both the text content and its metadata attributes.

Attributes:

Name Type Description
text str

The text content string.

text_id Any

Unique identifier from metadata.

text_label str

Label/category from metadata.

metadata TextMetaData

Full TextMetaData object.

Example
# Create text with metadata
metadata = TextMetaData(
    text_id="email_001",
    text_label="customer_inquiry",
    sender="customer@example.com"
)
doc = Text("Hello, I have a question about...", metadata)

# Access content and metadata
print(doc.text)        # "Hello, I have a question about..."
print(doc.text_id)     # "email_001"
print(doc.text_label)  # "customer_inquiry"
print(doc.metadata.get_value("sender"))  # "customer@example.com"

# Minimal usage (auto-creates metadata)
simple_doc = Text("Some text", None)
print(simple_doc.text_id)  # 0 (default)

Initialize text document with content and metadata.

Parameters:

Name Type Description Default
text str

The text content string.

required
metadata TextMetaData | None

TextMetaData object with document metadata. If None, creates default metadata with text_id=0 and text_label="text".

required
Example
# With metadata
meta = TextMetaData(text_id="doc1", text_label="article")
doc = Text("Article content...", meta)

# Without metadata (uses defaults)
doc = Text("Content without metadata", None)
Source code in packages/structures/src/dataknobs_structures/document.py
def __init__(
    self,
    text: str,
    metadata: TextMetaData | None,
) -> None:
    """Initialize text document with content and metadata.

    Args:
        text: The text content string.
        metadata: TextMetaData object with document metadata. If None,
            creates default metadata with text_id=0 and text_label="text".

    Example:
        ```python
        # With metadata
        meta = TextMetaData(text_id="doc1", text_label="article")
        doc = Text("Article content...", meta)

        # Without metadata (uses defaults)
        doc = Text("Content without metadata", None)
        ```
    """
    self._text = text
    self._metadata = metadata if metadata is not None else TextMetaData(0, TEXT_LABEL)
Attributes
text property
text: str

The text content.

Returns:

Type Description
str

The text string.

text_id property
text_id: Any

The unique identifier from metadata.

Returns:

Type Description
Any

The text_id value.

text_label property
text_label: str

The label/category from metadata.

Returns:

Type Description
str

The text_label value.

metadata property
metadata: TextMetaData

The metadata object.

Returns:

Type Description
TextMetaData

The TextMetaData instance containing all metadata.

Functions

TextMetaData

TextMetaData(text_id: Any, text_label: str = TEXT_LABEL, **kwargs: Any)

Bases: MetaData

Specialized metadata container for text documents.

Extends MetaData to provide standard text document metadata fields including a unique identifier (text_id) and a label/category (text_label).

Attributes:

Name Type Description
text_id Any

Unique identifier for the text document.

text_label str | Any

Label or category for the text (e.g., "article", "email").

data dict[str, Any]

Full metadata dictionary including text_id, text_label, and any kwargs.

Example
# Create text metadata
meta = TextMetaData(
    text_id="doc_001",
    text_label="article",
    author="Jane Smith",
    word_count=1500
)

print(meta.text_id)      # "doc_001"
print(meta.text_label)   # "article"
print(meta.get_value("author"))  # "Jane Smith"

Initialize text metadata with ID and label.

Parameters:

Name Type Description Default
text_id Any

Unique identifier for the text. Can be any type (string, int, etc.).

required
text_label str

Label or category for the text. Defaults to "text".

TEXT_LABEL
**kwargs Any

Additional optional metadata key-value pairs.

{}
Example
# Minimal metadata
meta = TextMetaData(text_id="123")

# With custom label and extra fields
meta = TextMetaData(
    text_id="doc_001",
    text_label="news_article",
    published="2025-01-01",
    section="technology"
)
Source code in packages/structures/src/dataknobs_structures/document.py
def __init__(self, text_id: Any, text_label: str = TEXT_LABEL, **kwargs: Any) -> None:
    """Initialize text metadata with ID and label.

    Args:
        text_id: Unique identifier for the text. Can be any type (string, int, etc.).
        text_label: Label or category for the text. Defaults to "text".
        **kwargs: Additional optional metadata key-value pairs.

    Example:
        ```python
        # Minimal metadata
        meta = TextMetaData(text_id="123")

        # With custom label and extra fields
        meta = TextMetaData(
            text_id="doc_001",
            text_label="news_article",
            published="2025-01-01",
            section="technology"
        )
        ```
    """
    super().__init__(
        {
            TEXT_ID_ATTR: text_id,
            TEXT_LABEL_ATTR: text_label,
        },
        **kwargs,
    )
Attributes
text_id property
text_id: Any

The unique identifier for this text.

Returns:

Type Description
Any

The text_id value from metadata.

text_label property
text_label: str | Any

The label/category for this text.

Returns:

Type Description
str | Any

The text_label value from metadata.

Functions

RecordStore

RecordStore(
    tsv_fpath: str | None, df: DataFrame | None = None, sep: str = "\t"
)

Container for tabular records with memory and disk representations.

Manages a sequence of records (rows) that can be represented as a list of dictionaries, a pandas DataFrame, and/or a TSV/CSV file on disk. The store automatically synchronizes between these representations and provides simple CRUD operations.

Attributes:

Name Type Description
tsv_fpath

Path to the backing file on disk (None if not persisted).

df DataFrame | None

The records as a pandas DataFrame (lazily created from records).

records List[Dict[str, Any]]

The records as a list of dictionaries.

Example
# Create with disk persistence
store = RecordStore("/data/results.tsv")

# Add records
store.add_rec({"user": "alice", "score": 100})
store.add_rec({"user": "bob", "score": 95})

# Access data
print(len(store.records))  # 2
print(store.df.shape)      # (2, 2)

# Save and restore
store.save()
store.clear()
store.restore()  # Reloads from disk
Note

If tsv_fpath is None, the store operates entirely in memory with no disk persistence.

Initialize record store with optional file backing.

Parameters:

Name Type Description Default
tsv_fpath str | None

Path to TSV/CSV file on disk. If None, operates without disk persistence. If the file exists, loads data from it.

required
df DataFrame | None

Optional initial DataFrame to populate the store. Ignored if tsv_fpath points to an existing file.

None
sep str

File separator character. Defaults to tab ("\t") for TSV files. Use "," for CSV.

'\t'
Example
# In-memory only
store = RecordStore(None)

# With disk persistence
store = RecordStore("/data/records.tsv")

# CSV with comma separator
store = RecordStore("/data/records.csv", sep=",")

# Initialize with DataFrame
import pandas as pd
df = pd.DataFrame([{"a": 1}, {"a": 2}])
store = RecordStore("/data/data.tsv", df=df)

Methods:

Name Description
clear

Clear all records without saving changes.

add_rec

Add a record to the store.

save

Save records to the backing file.

restore

Restore records from disk, discarding in-memory changes.

Source code in packages/structures/src/dataknobs_structures/record_store.py
def __init__(
    self,
    tsv_fpath: str | None,
    df: pd.DataFrame | None = None,
    sep: str = "\t",
):
    r"""Initialize record store with optional file backing.

    Args:
        tsv_fpath: Path to TSV/CSV file on disk. If None, operates without
            disk persistence. If the file exists, loads data from it.
        df: Optional initial DataFrame to populate the store. Ignored if
            tsv_fpath points to an existing file.
        sep: File separator character. Defaults to tab ("\\t") for TSV files.
            Use "," for CSV.

    Example:
        ```python
        # In-memory only
        store = RecordStore(None)

        # With disk persistence
        store = RecordStore("/data/records.tsv")

        # CSV with comma separator
        store = RecordStore("/data/records.csv", sep=",")

        # Initialize with DataFrame
        import pandas as pd
        df = pd.DataFrame([{"a": 1}, {"a": 2}])
        store = RecordStore("/data/data.tsv", df=df)
        ```
    """
    self.tsv_fpath = tsv_fpath
    self.init_df = df
    self.sep = sep
    self._df: pd.DataFrame | None = None
    self._recs: List[Dict[str, Any]] = []  # Initialize as empty list, not None
    self._init_data(df)
Attributes
df property
df: DataFrame | None

Get records as a pandas DataFrame.

Lazily creates the DataFrame from records if it doesn't exist.

Returns:

Type Description
DataFrame | None

DataFrame representation of records, or None if no records.

Example
store = RecordStore(None)
store.add_rec({"name": "Alice", "age": 30})
store.add_rec({"name": "Bob", "age": 25})

df = store.df
print(df.shape)  # (2, 2)
print(df.columns.tolist())  # ['name', 'age']
records property
records: List[Dict[str, Any]]

Get records as a list of dictionaries.

Returns:

Type Description
List[Dict[str, Any]]

List of record dictionaries.

Example
store = RecordStore(None)
store.add_rec({"id": 1, "value": "A"})

for rec in store.records:
    print(rec["id"], rec["value"])
Functions
clear
clear() -> None

Clear all records without saving changes.

Removes all records from memory but does not affect the backing file. Use save() afterwards if you want to persist the empty state to disk.

Example
store = RecordStore("/data/records.tsv")
print(len(store.records))  # 100

store.clear()
print(len(store.records))  # 0

# File still contains 100 records until save() is called
Source code in packages/structures/src/dataknobs_structures/record_store.py
def clear(self) -> None:
    """Clear all records without saving changes.

    Removes all records from memory but does not affect the backing file.
    Use save() afterwards if you want to persist the empty state to disk.

    Example:
        ```python
        store = RecordStore("/data/records.tsv")
        print(len(store.records))  # 100

        store.clear()
        print(len(store.records))  # 0

        # File still contains 100 records until save() is called
        ```
    """
    self._recs.clear()
    self._df = None
add_rec
add_rec(rec: Dict[str, Any]) -> None

Add a record to the store.

Appends a new record and invalidates the DataFrame cache (will be rebuilt on next access to df property).

Parameters:

Name Type Description Default
rec Dict[str, Any]

Dictionary representing a single record (row).

required
Example
store = RecordStore("/data/users.tsv")

store.add_rec({"user_id": 1, "name": "Alice", "active": True})
store.add_rec({"user_id": 2, "name": "Bob", "active": False})

store.save()  # Persist to disk
Source code in packages/structures/src/dataknobs_structures/record_store.py
def add_rec(self, rec: Dict[str, Any]) -> None:
    """Add a record to the store.

    Appends a new record and invalidates the DataFrame cache (will be
    rebuilt on next access to df property).

    Args:
        rec: Dictionary representing a single record (row).

    Example:
        ```python
        store = RecordStore("/data/users.tsv")

        store.add_rec({"user_id": 1, "name": "Alice", "active": True})
        store.add_rec({"user_id": 2, "name": "Bob", "active": False})

        store.save()  # Persist to disk
        ```
    """
    self._recs.append(rec)
    self._df = None
save
save() -> None

Save records to the backing file.

Writes the current records to disk as a TSV/CSV file. Does nothing if tsv_fpath is None (in-memory only mode).

Example
store = RecordStore("/data/results.tsv")
store.add_rec({"metric": "accuracy", "value": 0.95})
store.add_rec({"metric": "precision", "value": 0.92})

store.save()  # Writes to /data/results.tsv
Note

The file is written with headers and without row indices.

Source code in packages/structures/src/dataknobs_structures/record_store.py
def save(self) -> None:
    """Save records to the backing file.

    Writes the current records to disk as a TSV/CSV file. Does nothing if
    tsv_fpath is None (in-memory only mode).

    Example:
        ```python
        store = RecordStore("/data/results.tsv")
        store.add_rec({"metric": "accuracy", "value": 0.95})
        store.add_rec({"metric": "precision", "value": 0.92})

        store.save()  # Writes to /data/results.tsv
        ```

    Note:
        The file is written with headers and without row indices.
    """
    if self.tsv_fpath is not None and self.df is not None:
        self.df.to_csv(self.tsv_fpath, sep=self.sep, index=False)
restore
restore(df: DataFrame | None = None) -> None

Restore records from disk, discarding in-memory changes.

Reloads data from the backing file (if it exists) or from the provided DataFrame. All current in-memory changes are lost.

Parameters:

Name Type Description Default
df DataFrame | None

Optional DataFrame to restore from. If None, uses the backing file (if available) or the initial DataFrame.

None
Example
store = RecordStore("/data/records.tsv")
original_count = len(store.records)

# Make changes
store.add_rec({"new": "data"})
store.clear()

# Undo changes
store.restore()
print(len(store.records))  # Back to original_count
Note

If tsv_fpath is None and no df is provided, restores to the initial DataFrame or creates an empty store.

Source code in packages/structures/src/dataknobs_structures/record_store.py
def restore(self, df: pd.DataFrame | None = None) -> None:
    """Restore records from disk, discarding in-memory changes.

    Reloads data from the backing file (if it exists) or from the provided
    DataFrame. All current in-memory changes are lost.

    Args:
        df: Optional DataFrame to restore from. If None, uses the backing
            file (if available) or the initial DataFrame.

    Example:
        ```python
        store = RecordStore("/data/records.tsv")
        original_count = len(store.records)

        # Make changes
        store.add_rec({"new": "data"})
        store.clear()

        # Undo changes
        store.restore()
        print(len(store.records))  # Back to original_count
        ```

    Note:
        If tsv_fpath is None and no df is provided, restores to the initial
        DataFrame or creates an empty store.
    """
    self._init_data(df if df is not None else self.init_df)

Tree

Tree(data: Any, parent: Union[Tree, Any] = None, child_pos: int | None = None)

A tree node with arbitrary data and parent-child relationships.

Each Tree instance represents a node in a tree structure that maintains: - Arbitrary data of any type - An ordered list of child nodes - A reference to its parent node (None for root) - Bidirectional links between parents and children for efficient traversal

The tree structure supports both depth-first and breadth-first traversal, node searching, ancestor finding, and various tree manipulation operations. All nodes are doubly-linked, allowing traversal both up (to parents) and down (to children).

Attributes:

Name Type Description
data Any

The data contained in this node (any type).

children List[Tree] | None

List of child Tree nodes, or None if no children.

parent Tree | None

Parent Tree node, or None if this is a root node.

root Tree

The root node of this tree (traverses up to find it).

depth int

Number of hops from the root to this node (root has depth 0).

Example
# Create a tree structure
root = Tree("root")
child1 = Tree("child1", parent=root)
child2 = Tree("child2", parent=root)

# Add grandchildren
grandchild = Tree("grandchild", parent=child1)

# Navigate the tree
print(grandchild.root.data)  # "root"
print(grandchild.depth)      # 2
print(child1.num_children)   # 1
Note

Tree nodes can be created standalone and connected later, or created with a parent reference for immediate attachment. When a node is added as a child, it's automatically removed from any previous parent.

Initialize a tree node with optional parent attachment.

Creates a new tree node containing arbitrary data. If a parent is provided, the new node is automatically added as a child of that parent. If the parent is not already a Tree instance, it will be wrapped in one.

Parameters:

Name Type Description Default
data Any

The data to be contained within this node. Can be any type.

required
parent Union[Tree, Any]

Optional parent node to attach to. Can be a Tree instance or data that will be wrapped in a Tree. If None, creates a root node.

None
child_pos int | None

Optional 0-based position among parent's children. If None, appends to the end of the parent's children list.

None
Example
# Create root node
root = Tree("root")

# Add children
child1 = Tree("child1", parent=root)
child2 = Tree("child2", parent=root, child_pos=0)  # Insert at start

# Create subtree
grandchild = Tree("grandchild", parent=child1)

Methods:

Name Description
__repr__

Return string representation of this tree.

has_children

Check if this node has any children.

has_parent

Check if this node has a parent.

add_child

Add a child node to this node.

add_edge

Add a parent-child edge to this tree.

prune

Remove this node from its parent.

find_nodes

Find nodes matching a condition using depth-first or breadth-first search.

collect_terminal_nodes

Collect all leaf nodes (nodes with no children) in this tree.

get_edges

Get all parent-child edges in this tree.

get_path

Get the path from the root to this node.

is_ancestor

Check if this node is an ancestor of another node.

find_deepest_common_ancestor

Find the deepest (closest) common ancestor of this node and another.

as_string

Get a string representation of this tree.

get_deepest_left

Get the leftmost terminal descendant of this node.

get_deepest_right

Get the rightmost terminal descendant of this node.

build_dot

Build a Graphviz Digraph for visualizing this tree.

Source code in packages/structures/src/dataknobs_structures/tree.py
def __init__(
    self,
    data: Any,
    parent: Union[Tree, Any] = None,
    child_pos: int | None = None,
):
    """Initialize a tree node with optional parent attachment.

    Creates a new tree node containing arbitrary data. If a parent is provided,
    the new node is automatically added as a child of that parent. If the parent
    is not already a Tree instance, it will be wrapped in one.

    Args:
        data: The data to be contained within this node. Can be any type.
        parent: Optional parent node to attach to. Can be a Tree instance or
            data that will be wrapped in a Tree. If None, creates a root node.
        child_pos: Optional 0-based position among parent's children. If None,
            appends to the end of the parent's children list.

    Example:
        ```python
        # Create root node
        root = Tree("root")

        # Add children
        child1 = Tree("child1", parent=root)
        child2 = Tree("child2", parent=root, child_pos=0)  # Insert at start

        # Create subtree
        grandchild = Tree("grandchild", parent=child1)
        ```
    """
    self._data = data
    self._children: List[Tree] | None = None
    self._parent: Tree | None = None
    if parent is not None:
        if not isinstance(parent, Tree):
            parent = Tree(parent)
        parent.add_child(self, child_pos)
Attributes
data property writable
data: Any

The data contained in this node.

Returns:

Type Description
Any

The arbitrary data stored in this node.

children property
children: List[Tree] | None

This node's children as an ordered list.

Returns:

Type Description
List[Tree] | None

List of child Tree nodes, or None if this node has no children.

parent property writable
parent: Tree | None

This node's parent.

Returns:

Type Description
Tree | None

Parent Tree node, or None if this is a root node.

root property
root: Tree

The root node of this tree.

Traverses up the parent chain to find the root node.

Returns:

Type Description
Tree

The root Tree node (the node with no parent).

sibnum property
sibnum: int

This node's position among its siblings.

Returns:

Type Description
int

0-based index among parent's children, or 0 if this is a root node.

num_siblings property
num_siblings: int

Total number of siblings including this node.

Returns:

Type Description
int

Number of children the parent has (including this node), or 1 if root.

next_sibling property
next_sibling: Tree | None

The next sibling of this node.

Returns:

Type Description
Tree | None

Next sibling Tree node in parent's children list, or None if this is

Tree | None

the last child or a root node.

prev_sibling property
prev_sibling: Tree | None

The previous sibling of this node.

Returns:

Type Description
Tree | None

Previous sibling Tree node in parent's children list, or None if this

Tree | None

is the first child or a root node.

num_children property
num_children: int

Number of children under this node.

Returns:

Type Description
int

Count of direct children (0 if no children).

depth property
depth: int

Depth of this node in the tree.

The depth is the number of edges from the root to this node. The root node has depth 0, its children have depth 1, and so on.

Returns:

Type Description
int

Number of hops from the root to this node (0 for root).

Example
root = Tree("root")
child = Tree("child", parent=root)
grandchild = Tree("grandchild", parent=child)

print(root.depth)        # 0
print(child.depth)       # 1
print(grandchild.depth)  # 2
Functions
__repr__
__repr__() -> str

Return string representation of this tree.

Returns:

Type Description
str

Multi-line string representation using as_string() method.

Source code in packages/structures/src/dataknobs_structures/tree.py
def __repr__(self) -> str:
    """Return string representation of this tree.

    Returns:
        Multi-line string representation using as_string() method.
    """
    return self.as_string(delim="  ", multiline=True)
has_children
has_children() -> bool

Check if this node has any children.

Returns:

Type Description
bool

True if this node has at least one child, False otherwise.

Source code in packages/structures/src/dataknobs_structures/tree.py
def has_children(self) -> bool:
    """Check if this node has any children.

    Returns:
        True if this node has at least one child, False otherwise.
    """
    return self._children is not None and len(self._children) > 0
has_parent
has_parent() -> bool

Check if this node has a parent.

Returns:

Type Description
bool

True if this node has a parent, False if this is a root node.

Note

The root of a tree has no parent.

Source code in packages/structures/src/dataknobs_structures/tree.py
def has_parent(self) -> bool:
    """Check if this node has a parent.

    Returns:
        True if this node has a parent, False if this is a root node.

    Note:
        The root of a tree has no parent.
    """
    return self._parent is not None
add_child
add_child(node_or_data: Union[Tree, Any], child_pos: int | None = None) -> Tree

Add a child node to this node.

If the child is already part of another tree, it will be pruned from that tree first. If node_or_data is not a Tree instance, it will be wrapped in a new Tree node. The child's parent reference is automatically updated.

Parameters:

Name Type Description Default
node_or_data Union[Tree, Any]

The child to add. Can be a Tree instance or any data that will be wrapped in a Tree node.

required
child_pos int | None

Optional 0-based position to insert the child. If None, appends to the end. If provided, inserts at the specified position, shifting existing children.

None

Returns:

Type Description
Tree

The child Tree node (either the provided node or a newly created one).

Example
# Create parent and add children
parent = Tree("parent")
child1 = parent.add_child("child1")
child2 = parent.add_child("child2")

# Insert at specific position
child0 = parent.add_child("child0", child_pos=0)

# Method chaining
root = Tree("root").add_child("a").add_child("b")
Note

The child's parent attribute is automatically updated to reference this node, maintaining the tree's structural integrity.

Source code in packages/structures/src/dataknobs_structures/tree.py
def add_child(self, node_or_data: Union[Tree, Any], child_pos: int | None = None) -> Tree:
    """Add a child node to this node.

    If the child is already part of another tree, it will be pruned from that
    tree first. If node_or_data is not a Tree instance, it will be wrapped in
    a new Tree node. The child's parent reference is automatically updated.

    Args:
        node_or_data: The child to add. Can be a Tree instance or any data that
            will be wrapped in a Tree node.
        child_pos: Optional 0-based position to insert the child. If None,
            appends to the end. If provided, inserts at the specified position,
            shifting existing children.

    Returns:
        The child Tree node (either the provided node or a newly created one).

    Example:
        ```python
        # Create parent and add children
        parent = Tree("parent")
        child1 = parent.add_child("child1")
        child2 = parent.add_child("child2")

        # Insert at specific position
        child0 = parent.add_child("child0", child_pos=0)

        # Method chaining
        root = Tree("root").add_child("a").add_child("b")
        ```

    Note:
        The child's parent attribute is automatically updated to reference
        this node, maintaining the tree's structural integrity.
    """
    if self._children is None:
        self._children = []
    if isinstance(node_or_data, Tree):
        child = node_or_data
        child.prune()
        if child_pos is not None and child_pos < len(self._children) and child_pos >= 0:
            self._children.insert(child_pos, child)
        else:
            self._children.append(child)
    else:
        child = Tree(node_or_data, self, child_pos=child_pos)
    child.parent = self
    return child
add_edge
add_edge(
    parent_node_or_data: Union[Tree, Any], child_node_or_data: Union[Tree, Any]
) -> Tuple[Tree, Tree]

Add a parent-child edge to this tree.

Creates or reuses nodes to establish a parent-child relationship. If the parent or child already exist in the tree (matched by data equality), they are reused. If the child exists elsewhere, it's moved to be a child of the parent. If neither exist, the parent is added as a child of this node.

Parameters:

Name Type Description Default
parent_node_or_data Union[Tree, Any]

The parent node (Tree instance) or data to match/create.

required
child_node_or_data Union[Tree, Any]

The child node (Tree instance) or data to match/create.

required

Returns:

Type Description
Tuple[Tree, Tree]

Tuple of (parent_node, child_node) Tree instances.

Example
# Build tree by adding edges
root = Tree("root")
root.add_edge("parent1", "child1")
root.add_edge("parent1", "child2")  # Reuses existing parent1
root.add_edge("child1", "grandchild")

# Result:
# root
#   parent1
#     child1
#       grandchild
#     child2
Note

Nodes are matched by data equality. If a node with matching data already exists in the tree, it will be reused rather than creating a duplicate.

Source code in packages/structures/src/dataknobs_structures/tree.py
def add_edge(
    self,
    parent_node_or_data: Union[Tree, Any],
    child_node_or_data: Union[Tree, Any],
) -> Tuple[Tree, Tree]:
    """Add a parent-child edge to this tree.

    Creates or reuses nodes to establish a parent-child relationship. If the
    parent or child already exist in the tree (matched by data equality), they
    are reused. If the child exists elsewhere, it's moved to be a child of the
    parent. If neither exist, the parent is added as a child of this node.

    Args:
        parent_node_or_data: The parent node (Tree instance) or data to match/create.
        child_node_or_data: The child node (Tree instance) or data to match/create.

    Returns:
        Tuple of (parent_node, child_node) Tree instances.

    Example:
        ```python
        # Build tree by adding edges
        root = Tree("root")
        root.add_edge("parent1", "child1")
        root.add_edge("parent1", "child2")  # Reuses existing parent1
        root.add_edge("child1", "grandchild")

        # Result:
        # root
        #   parent1
        #     child1
        #       grandchild
        #     child2
        ```

    Note:
        Nodes are matched by data equality. If a node with matching data
        already exists in the tree, it will be reused rather than creating
        a duplicate.
    """
    parent = None
    child = None

    if isinstance(parent_node_or_data, Tree):
        parent = parent_node_or_data
        # if it is not in this tree ...
        if (
            len(
                self.find_nodes(lambda node: node == parent, include_self=True, only_first=True)
            )
            == 0
        ):
            # ...then add it as a child of self
            self.add_child(parent)
    else:
        # can we find the data in this tree ...
        found = self.find_nodes(
            lambda node: node.data == parent_node_or_data, include_self=True, only_first=True
        )
        if len(found) > 0:
            parent = found[0]
        else:
            parent = self.add_child(parent_node_or_data)

    if isinstance(child_node_or_data, Tree):
        child = parent.add_child(child_node_or_data)
    else:
        # can we find the data in this tree ...
        found = self.find_nodes(
            lambda node: node.data == child_node_or_data, include_self=True, only_first=True
        )
        if len(found) > 0:
            child = parent.add_child(found[0])
        else:
            child = parent.add_child(child_node_or_data)

    return (parent, child)
prune
prune() -> Tree | None

Remove this node from its parent.

Detaches this node from its parent's children list and sets this node's parent to None. The node and its subtree remain intact.

Returns:

Type Description
Tree | None

This node's former parent, or None if this was already a root node.

Example
root = Tree("root")
child = root.add_child("child")
grandchild = child.add_child("grandchild")

# Prune child (and its subtree) from root
former_parent = child.prune()
print(former_parent.data)  # "root"
print(child.parent)        # None
print(child.children)      # [grandchild] - subtree intact
Source code in packages/structures/src/dataknobs_structures/tree.py
def prune(self) -> Tree | None:
    """Remove this node from its parent.

    Detaches this node from its parent's children list and sets this node's
    parent to None. The node and its subtree remain intact.

    Returns:
        This node's former parent, or None if this was already a root node.

    Example:
        ```python
        root = Tree("root")
        child = root.add_child("child")
        grandchild = child.add_child("grandchild")

        # Prune child (and its subtree) from root
        former_parent = child.prune()
        print(former_parent.data)  # "root"
        print(child.parent)        # None
        print(child.children)      # [grandchild] - subtree intact
        ```
    """
    result = self._parent
    if self._parent is not None:
        if self._parent.children is not None:
            self._parent.children.remove(self)
        self._parent = None
    return result
find_nodes
find_nodes(
    accept_node_fn: Callable[[Tree], bool],
    traversal: str = "dfs",
    include_self: bool = True,
    only_first: bool = False,
    highest_only: bool = False,
) -> List[Tree]

Find nodes matching a condition using depth-first or breadth-first search.

Searches the tree using the specified traversal strategy and returns all nodes for which the accept function returns True.

Parameters:

Name Type Description Default
accept_node_fn Callable[[Tree], bool]

Function that takes a Tree node and returns True to include it in results, False to skip it.

required
traversal str

Search strategy - either 'dfs' (depth-first) or 'bfs' (breadth-first). Defaults to 'dfs'.

'dfs'
include_self bool

If True, considers this node in the search. If False, starts with this node's children. Defaults to True.

True
only_first bool

If True, stops after finding the first match. Defaults to False.

False
highest_only bool

If True, doesn't search below matched nodes (stops descending when a match is found). Defaults to False.

False

Returns:

Type Description
List[Tree]

List of Tree nodes that matched the accept function.

Example
root = Tree("root")
root.add_child("apple")
root.add_child("banana")
child = root.add_child("parent")
child.add_child("apricot")

# Find all nodes containing 'a'
found = root.find_nodes(lambda n: 'a' in str(n.data))
# Returns: [root, apple, banana, parent, apricot]

# Find first node containing 'a'
first = root.find_nodes(
    lambda n: 'a' in str(n.data),
    only_first=True
)
# Returns: [root]

# Breadth-first search
bfs = root.find_nodes(lambda n: 'a' in str(n.data), traversal='bfs')
Source code in packages/structures/src/dataknobs_structures/tree.py
def find_nodes(
    self,
    accept_node_fn: Callable[[Tree], bool],
    traversal: str = "dfs",
    include_self: bool = True,
    only_first: bool = False,
    highest_only: bool = False,
) -> List[Tree]:
    """Find nodes matching a condition using depth-first or breadth-first search.

    Searches the tree using the specified traversal strategy and returns all
    nodes for which the accept function returns True.

    Args:
        accept_node_fn: Function that takes a Tree node and returns True to
            include it in results, False to skip it.
        traversal: Search strategy - either 'dfs' (depth-first) or 'bfs'
            (breadth-first). Defaults to 'dfs'.
        include_self: If True, considers this node in the search. If False,
            starts with this node's children. Defaults to True.
        only_first: If True, stops after finding the first match. Defaults
            to False.
        highest_only: If True, doesn't search below matched nodes (stops
            descending when a match is found). Defaults to False.

    Returns:
        List of Tree nodes that matched the accept function.

    Example:
        ```python
        root = Tree("root")
        root.add_child("apple")
        root.add_child("banana")
        child = root.add_child("parent")
        child.add_child("apricot")

        # Find all nodes containing 'a'
        found = root.find_nodes(lambda n: 'a' in str(n.data))
        # Returns: [root, apple, banana, parent, apricot]

        # Find first node containing 'a'
        first = root.find_nodes(
            lambda n: 'a' in str(n.data),
            only_first=True
        )
        # Returns: [root]

        # Breadth-first search
        bfs = root.find_nodes(lambda n: 'a' in str(n.data), traversal='bfs')
        ```
    """
    queue: Deque[Tree] = deque()
    found: List[Tree] = []
    if include_self:
        queue.append(self)
    elif self.children:
        queue.extend(self.children)
    while bool(queue):  # true while length(queue) > 0
        item = queue.popleft()
        if accept_node_fn(item):
            found.append(item)
            if only_first:
                break
            elif highest_only:
                continue
        if item.children:
            if traversal == "dfs":
                queue.extendleft(reversed(item.children))
            elif traversal == "bfs":
                queue.extend(item.children)
    return found
collect_terminal_nodes
collect_terminal_nodes(
    accept_node_fn: Callable[[Tree], bool] | None = None,
    _found: List[Tree] | None = None,
) -> List[Tree]

Collect all leaf nodes (nodes with no children) in this tree.

Recursively traverses the tree and collects nodes that have no children, optionally filtering them with an accept function.

Parameters:

Name Type Description Default
accept_node_fn Callable[[Tree], bool] | None

Optional function to filter which terminal nodes to include. If None, includes all terminal nodes.

None
_found List[Tree] | None

Internal parameter for recursion. Do not use.

None

Returns:

Type Description
List[Tree]

List of terminal (leaf) Tree nodes.

Example
root = Tree("root")
child1 = root.add_child("child1")
child1.add_child("leaf1")
child1.add_child("leaf2")
root.add_child("leaf3")

# Collect all terminal nodes
leaves = root.collect_terminal_nodes()
# Returns: [leaf1, leaf2, leaf3]

# Collect only specific leaves
filtered = root.collect_terminal_nodes(
    lambda n: "1" in str(n.data)
)
# Returns: [leaf1]
Source code in packages/structures/src/dataknobs_structures/tree.py
def collect_terminal_nodes(
    self, accept_node_fn: Callable[[Tree], bool] | None = None, _found: List[Tree] | None = None
) -> List[Tree]:
    """Collect all leaf nodes (nodes with no children) in this tree.

    Recursively traverses the tree and collects nodes that have no children,
    optionally filtering them with an accept function.

    Args:
        accept_node_fn: Optional function to filter which terminal nodes to
            include. If None, includes all terminal nodes.
        _found: Internal parameter for recursion. Do not use.

    Returns:
        List of terminal (leaf) Tree nodes.

    Example:
        ```python
        root = Tree("root")
        child1 = root.add_child("child1")
        child1.add_child("leaf1")
        child1.add_child("leaf2")
        root.add_child("leaf3")

        # Collect all terminal nodes
        leaves = root.collect_terminal_nodes()
        # Returns: [leaf1, leaf2, leaf3]

        # Collect only specific leaves
        filtered = root.collect_terminal_nodes(
            lambda n: "1" in str(n.data)
        )
        # Returns: [leaf1]
        ```
    """
    if _found is None:
        _found = []
    if not self._children:
        if accept_node_fn is None or accept_node_fn(self):
            _found.append(self)
    else:
        for child in self._children:
            child.collect_terminal_nodes(accept_node_fn=accept_node_fn, _found=_found)
    return _found
get_edges
get_edges(
    traversal: str = "bfs", include_self: bool = True, as_data: bool = True
) -> List[Tuple[Union[Tree, Any], Union[Tree, Any]]]

Get all parent-child edges in this tree.

Returns a list of (parent, child) tuples representing all edges in the tree, using either depth-first or breadth-first traversal.

Parameters:

Name Type Description Default
traversal str

Search strategy - either 'dfs' (depth-first) or 'bfs' (breadth-first). Defaults to 'bfs'.

'bfs'
include_self bool

If True, includes edges from this node to its children. If False, starts with this node's children. Defaults to True.

True
as_data bool

If True, returns data values in tuples. If False, returns Tree node instances. Defaults to True.

True

Returns:

Type Description
List[Tuple[Union[Tree, Any], Union[Tree, Any]]]

List of (parent, child) tuples, either as data values or Tree nodes.

Example
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")
child1.add_child("grandchild")

# Get edges as data
edges = root.get_edges()
# Returns: [("root", "child1"), ("root", "child2"),
#           ("child1", "grandchild")]

# Get edges as Tree nodes
edges_nodes = root.get_edges(as_data=False)
# Returns: [(root_node, child1_node), ...]
Source code in packages/structures/src/dataknobs_structures/tree.py
def get_edges(
    self,
    traversal: str = "bfs",
    include_self: bool = True,
    as_data: bool = True,
) -> List[Tuple[Union[Tree, Any], Union[Tree, Any]]]:
    """Get all parent-child edges in this tree.

    Returns a list of (parent, child) tuples representing all edges in the tree,
    using either depth-first or breadth-first traversal.

    Args:
        traversal: Search strategy - either 'dfs' (depth-first) or 'bfs'
            (breadth-first). Defaults to 'bfs'.
        include_self: If True, includes edges from this node to its children.
            If False, starts with this node's children. Defaults to True.
        as_data: If True, returns data values in tuples. If False, returns
            Tree node instances. Defaults to True.

    Returns:
        List of (parent, child) tuples, either as data values or Tree nodes.

    Example:
        ```python
        root = Tree("root")
        child1 = root.add_child("child1")
        child2 = root.add_child("child2")
        child1.add_child("grandchild")

        # Get edges as data
        edges = root.get_edges()
        # Returns: [("root", "child1"), ("root", "child2"),
        #           ("child1", "grandchild")]

        # Get edges as Tree nodes
        edges_nodes = root.get_edges(as_data=False)
        # Returns: [(root_node, child1_node), ...]
        ```
    """
    queue: Deque[Tree] = deque()
    result: List[Tuple[Union[Tree, Any], Union[Tree, Any]]] = []
    if self.children:
        queue.extend(self.children)
    while bool(queue):  # true while length(queue) > 0
        item = queue.popleft()
        if item.parent:
            if item.parent != self or include_self:
                result.append((item.parent.data, item.data) if as_data else (item.parent, item))
        if item.children:
            if traversal == "dfs":
                queue.extendleft(reversed(item.children))
            elif traversal == "bfs":
                queue.extend(item.children)
    return result
get_path
get_path() -> List[Tree]

Get the path from the root to this node.

Returns:

Type Description
List[Tree]

Ordered list of Tree nodes from root to this node (inclusive).

Example
root = Tree("root")
child = Tree("child", parent=root)
grandchild = Tree("grandchild", parent=child)

path = grandchild.get_path()
print([n.data for n in path])  # ["root", "child", "grandchild"]
Source code in packages/structures/src/dataknobs_structures/tree.py
def get_path(self) -> List[Tree]:
    """Get the path from the root to this node.

    Returns:
        Ordered list of Tree nodes from root to this node (inclusive).

    Example:
        ```python
        root = Tree("root")
        child = Tree("child", parent=root)
        grandchild = Tree("grandchild", parent=child)

        path = grandchild.get_path()
        print([n.data for n in path])  # ["root", "child", "grandchild"]
        ```
    """
    path: Deque[Tree] = deque()
    node: Tree | None = self
    while node is not None:
        path.appendleft(node)
        node = node.parent
    return list(path)
is_ancestor
is_ancestor(other: Tree, self_is_ancestor: bool = False) -> bool

Check if this node is an ancestor of another node.

An ancestor is any node on the path from another node to the root.

Parameters:

Name Type Description Default
other Tree

The potential descendant node to check.

required
self_is_ancestor bool

If True, considers a node to be its own ancestor. Defaults to False.

False

Returns:

Type Description
bool

True if this node is an ancestor of the other node, False otherwise.

Example
root = Tree("root")
child = Tree("child", parent=root)
grandchild = Tree("grandchild", parent=child)

print(root.is_ancestor(grandchild))         # True
print(child.is_ancestor(grandchild))        # True
print(grandchild.is_ancestor(root))         # False
print(grandchild.is_ancestor(grandchild))   # False
print(grandchild.is_ancestor(grandchild, self_is_ancestor=True))  # True
Source code in packages/structures/src/dataknobs_structures/tree.py
def is_ancestor(self, other: Tree, self_is_ancestor: bool = False) -> bool:
    """Check if this node is an ancestor of another node.

    An ancestor is any node on the path from another node to the root.

    Args:
        other: The potential descendant node to check.
        self_is_ancestor: If True, considers a node to be its own ancestor.
            Defaults to False.

    Returns:
        True if this node is an ancestor of the other node, False otherwise.

    Example:
        ```python
        root = Tree("root")
        child = Tree("child", parent=root)
        grandchild = Tree("grandchild", parent=child)

        print(root.is_ancestor(grandchild))         # True
        print(child.is_ancestor(grandchild))        # True
        print(grandchild.is_ancestor(root))         # False
        print(grandchild.is_ancestor(grandchild))   # False
        print(grandchild.is_ancestor(grandchild, self_is_ancestor=True))  # True
        ```
    """
    result = False
    parent = other if self_is_ancestor else other.parent
    while parent is not None:
        if parent == self:
            result = True
            break
        parent = parent.parent
    return result
find_deepest_common_ancestor
find_deepest_common_ancestor(other: Tree | None) -> Tree | None

Find the deepest (closest) common ancestor of this node and another.

The deepest common ancestor is the lowest node that is an ancestor of both nodes. This is also known as the lowest common ancestor (LCA).

Parameters:

Name Type Description Default
other Tree | None

The other node to find a common ancestor with. Can be None.

required

Returns:

Type Description
Tree | None

The deepest common ancestor Tree node, or None if other is None or

Tree | None

if the nodes are not in the same tree.

Example
root = Tree("root")
left = root.add_child("left")
right = root.add_child("right")
left_child = left.add_child("left_child")
right_child = right.add_child("right_child")

# Find common ancestor
ancestor = left_child.find_deepest_common_ancestor(right_child)
print(ancestor.data)  # "root"

# Nodes in same branch
ancestor = left_child.find_deepest_common_ancestor(left)
print(ancestor.data)  # "left"

# Same node
ancestor = left_child.find_deepest_common_ancestor(left_child)
print(ancestor.data)  # "left_child"
Source code in packages/structures/src/dataknobs_structures/tree.py
def find_deepest_common_ancestor(self, other: Tree | None) -> Tree | None:
    """Find the deepest (closest) common ancestor of this node and another.

    The deepest common ancestor is the lowest node that is an ancestor of
    both nodes. This is also known as the lowest common ancestor (LCA).

    Args:
        other: The other node to find a common ancestor with. Can be None.

    Returns:
        The deepest common ancestor Tree node, or None if other is None or
        if the nodes are not in the same tree.

    Example:
        ```python
        root = Tree("root")
        left = root.add_child("left")
        right = root.add_child("right")
        left_child = left.add_child("left_child")
        right_child = right.add_child("right_child")

        # Find common ancestor
        ancestor = left_child.find_deepest_common_ancestor(right_child)
        print(ancestor.data)  # "root"

        # Nodes in same branch
        ancestor = left_child.find_deepest_common_ancestor(left)
        print(ancestor.data)  # "left"

        # Same node
        ancestor = left_child.find_deepest_common_ancestor(left_child)
        print(ancestor.data)  # "left_child"
        ```
    """
    if other is None:
        return None
    if self == other:
        return self
    result: Tree | None = None
    mypath, otherpath = self.get_path(), other.get_path()
    mypathlen, otherpathlen = len(mypath), len(otherpath)
    mypathidx, otherpathidx = 0, 0
    while mypathidx < mypathlen and otherpathidx < otherpathlen:
        mynode, othernode = mypath[mypathidx], otherpath[otherpathidx]
        mypathidx += 1
        otherpathidx += 1
        if mynode != othernode:
            break  # diverged
        else:
            result = mynode
    return result
as_string
as_string(delim: str = ' ', multiline: bool = False) -> str

Get a string representation of this tree.

Creates a parenthesized string representation of the tree structure, with optional multiline formatting and custom indentation.

Parameters:

Name Type Description Default
delim str

The delimiter/indentation to use between levels. Defaults to a single space.

' '
multiline bool

If True, includes newlines for readability. If False, creates a compact single-line representation. Defaults to False.

False

Returns:

Type Description
str

String representation of this tree and its descendants.

Example
root = Tree("root")
child1 = root.add_child("child1")
child1.add_child("leaf1")
root.add_child("child2")

# Compact representation
print(root.as_string())
# Output: (root child1 leaf1 child2)

# Multiline with indentation
print(root.as_string(delim="  ", multiline=True))
# Output:
# (root
#   child1
#     leaf1
#   child2)
Source code in packages/structures/src/dataknobs_structures/tree.py
def as_string(self, delim: str = " ", multiline: bool = False) -> str:
    """Get a string representation of this tree.

    Creates a parenthesized string representation of the tree structure,
    with optional multiline formatting and custom indentation.

    Args:
        delim: The delimiter/indentation to use between levels. Defaults to
            a single space.
        multiline: If True, includes newlines for readability. If False,
            creates a compact single-line representation. Defaults to False.

    Returns:
        String representation of this tree and its descendants.

    Example:
        ```python
        root = Tree("root")
        child1 = root.add_child("child1")
        child1.add_child("leaf1")
        root.add_child("child2")

        # Compact representation
        print(root.as_string())
        # Output: (root child1 leaf1 child2)

        # Multiline with indentation
        print(root.as_string(delim="  ", multiline=True))
        # Output:
        # (root
        #   child1
        #     leaf1
        #   child2)
        ```
    """
    result = ""
    if self._children:
        btwn = "\n" if multiline else ""
        result = "(" + str(self.data)
        for child in self._children:
            d = (child.depth if multiline else 1) * delim
            result += btwn + d + child.as_string(delim=delim, multiline=multiline)
        result += ")"
    else:
        result = str(self.data)
    return result
get_deepest_left
get_deepest_left() -> Tree

Get the leftmost terminal descendant of this node.

Follows the leftmost (first) child at each level until reaching a leaf node.

Returns:

Type Description
Tree

The terminal (leaf) node reached by always taking the first child.

Example
root = Tree("root")
left = root.add_child("left")
left.add_child("left_left")
root.add_child("right")

leftmost = root.get_deepest_left()
print(leftmost.data)  # "left_left"
Source code in packages/structures/src/dataknobs_structures/tree.py
def get_deepest_left(self) -> Tree:
    """Get the leftmost terminal descendant of this node.

    Follows the leftmost (first) child at each level until reaching a leaf node.

    Returns:
        The terminal (leaf) node reached by always taking the first child.

    Example:
        ```python
        root = Tree("root")
        left = root.add_child("left")
        left.add_child("left_left")
        root.add_child("right")

        leftmost = root.get_deepest_left()
        print(leftmost.data)  # "left_left"
        ```
    """
    node = self
    while node.has_children() and node.children is not None:
        node = node.children[0]
    return node
get_deepest_right
get_deepest_right() -> Tree

Get the rightmost terminal descendant of this node.

Follows the rightmost (last) child at each level until reaching a leaf node.

Returns:

Type Description
Tree

The terminal (leaf) node reached by always taking the last child.

Example
root = Tree("root")
root.add_child("left")
right = root.add_child("right")
right.add_child("right_right")

rightmost = root.get_deepest_right()
print(rightmost.data)  # "right_right"
Source code in packages/structures/src/dataknobs_structures/tree.py
def get_deepest_right(self) -> Tree:
    """Get the rightmost terminal descendant of this node.

    Follows the rightmost (last) child at each level until reaching a leaf node.

    Returns:
        The terminal (leaf) node reached by always taking the last child.

    Example:
        ```python
        root = Tree("root")
        root.add_child("left")
        right = root.add_child("right")
        right.add_child("right_right")

        rightmost = root.get_deepest_right()
        print(rightmost.data)  # "right_right"
        ```
    """
    node = self
    while node.has_children() and node.children is not None:
        node = node.children[-1]
    return node
build_dot
build_dot(
    node_name_fn: Callable[[Tree], str] | None = None, **kwargs: Any
) -> graphviz.graphs.Digraph

Build a Graphviz Digraph for visualizing this tree.

Creates a directed graph representation of the tree structure that can be rendered to various formats (PNG, PDF, SVG, etc.) using Graphviz.

Parameters:

Name Type Description Default
node_name_fn Callable[[Tree], str] | None

Optional function to generate node labels. Takes a Tree node and returns a string. If None, uses str(node.data).

None
**kwargs Any

Additional keyword arguments passed to graphviz.Digraph constructor (e.g., name, format, node_attr, edge_attr).

{}

Returns:

Type Description
Digraph

A graphviz.Digraph object representing this tree.

Example
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")

# Basic usage
dot = root.build_dot(name='MyTree', format='png')
print(dot.source)  # View the DOT source

# Render to file
dot.render('/tmp/tree', format='png')

# Custom node labels
dot = root.build_dot(
    node_name_fn=lambda n: f"[{n.data}]",
    node_attr={'shape': 'box', 'style': 'filled', 'fillcolor': 'lightblue'}
)

# Display in Jupyter
from IPython.display import Image
Image(filename=dot.render('/tmp/tree'))
Note

Requires the graphviz package and Graphviz system installation.

Source code in packages/structures/src/dataknobs_structures/tree.py
def build_dot(
    self, node_name_fn: Callable[[Tree], str] | None = None, **kwargs: Any
) -> graphviz.graphs.Digraph:
    """Build a Graphviz Digraph for visualizing this tree.

    Creates a directed graph representation of the tree structure that can be
    rendered to various formats (PNG, PDF, SVG, etc.) using Graphviz.

    Args:
        node_name_fn: Optional function to generate node labels. Takes a Tree
            node and returns a string. If None, uses str(node.data).
        **kwargs: Additional keyword arguments passed to graphviz.Digraph
            constructor (e.g., name, format, node_attr, edge_attr).

    Returns:
        A graphviz.Digraph object representing this tree.

    Example:
        ```python
        root = Tree("root")
        child1 = root.add_child("child1")
        child2 = root.add_child("child2")

        # Basic usage
        dot = root.build_dot(name='MyTree', format='png')
        print(dot.source)  # View the DOT source

        # Render to file
        dot.render('/tmp/tree', format='png')

        # Custom node labels
        dot = root.build_dot(
            node_name_fn=lambda n: f"[{n.data}]",
            node_attr={'shape': 'box', 'style': 'filled', 'fillcolor': 'lightblue'}
        )

        # Display in Jupyter
        from IPython.display import Image
        Image(filename=dot.render('/tmp/tree'))
        ```

    Note:
        Requires the graphviz package and Graphviz system installation.
    """
    if node_name_fn is None:
        def node_name_fn(n):
            return str(n.data)
    dot = graphviz.Digraph(**kwargs)
    ids = {}  # ids[node] -> id
    for idx, node in enumerate(self.root.find_nodes(lambda _n: True, traversal="bfs")):
        ids[node] = idx
        dot.node(f"N_{idx:03}", node_name_fn(node))
    for node1, node2 in self.get_edges(as_data=False):
        idx1 = ids[node1]
        idx2 = ids[node2]
        dot.edge(f"N_{idx1:03}", f"N_{idx2:03}")
    return dot

Functions

build_tree_from_string

build_tree_from_string(from_string: str) -> Tree

Build a Tree from a parenthesized string representation.

Parses a string representation of a tree (such as produced by Tree.as_string()) and reconstructs the tree structure. The string format uses parentheses to indicate parent-child relationships.

Parameters:

Name Type Description Default
from_string str

The tree string in parenthesized format, e.g., "(root child1 child2)".

required

Returns:

Type Description
Tree

The reconstructed Tree with all nodes and structure preserved.

Example
# Build from string
tree_str = "(root (child1 leaf1 leaf2) child2)"
tree = build_tree_from_string(tree_str)

print(tree.data)              # "root"
print(tree.num_children)      # 2
print(tree.children[0].data)  # "child1"

# Roundtrip: tree -> string -> tree
original = Tree("root")
original.add_child("child")
tree_str = original.as_string()
reconstructed = build_tree_from_string(tree_str)
Source code in packages/structures/src/dataknobs_structures/tree.py
def build_tree_from_string(from_string: str) -> Tree:
    """Build a Tree from a parenthesized string representation.

    Parses a string representation of a tree (such as produced by Tree.as_string())
    and reconstructs the tree structure. The string format uses parentheses to
    indicate parent-child relationships.

    Args:
        from_string: The tree string in parenthesized format, e.g., "(root child1 child2)".

    Returns:
        The reconstructed Tree with all nodes and structure preserved.

    Example:
        ```python
        # Build from string
        tree_str = "(root (child1 leaf1 leaf2) child2)"
        tree = build_tree_from_string(tree_str)

        print(tree.data)              # "root"
        print(tree.num_children)      # 2
        print(tree.children[0].data)  # "child1"

        # Roundtrip: tree -> string -> tree
        original = Tree("root")
        original.add_child("child")
        tree_str = original.as_string()
        reconstructed = build_tree_from_string(tree_str)
        ```
    """
    if not from_string.strip().startswith("("):
        return Tree(from_string)
    data = OneOrMore(nested_expr()).parse_string(from_string)
    return build_tree_from_list(data.as_list())