dataknobs-structures Complete API Reference¶
Complete auto-generated API documentation from source code docstrings.
💡 Also see: - Curated Guide - Hand-crafted tutorials and examples - Package Overview - Introduction and getting started - Source Code - View on GitHub
dataknobs_structures ¶
Core data structures for AI knowledge bases and data processing.
The dataknobs-structures package provides fundamental data structures for building AI applications, knowledge bases, and data processing pipelines. It includes tree structures, document containers, record stores, and conditional dictionaries.
Modules¶
Tree - Hierarchical data structures¶
The Tree class provides a flexible node-based tree structure with: - Parent-child relationships with bidirectional links - Depth-first and breadth-first traversal - Node search and filtering - Path finding and common ancestor detection - Graphviz visualization support
Document - Text with metadata¶
Classes for managing text documents with associated metadata: - Text: Container combining content and metadata - TextMetaData: Structured metadata with IDs and labels - MetaData: Generic key-value metadata container
RecordStore - Tabular data management¶
A flexible record store that represents data as: - List of dictionaries (Python native) - pandas DataFrame (data analysis) - TSV/CSV files (disk persistence)
cdict - Conditional dictionary¶
A dictionary that validates items before acceptance using the strategy pattern. Rejected items are tracked separately for inspection.
Quick Examples¶
Using Tree¶
from dataknobs_structures import Tree
# Build tree structure
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")
grandchild = child1.add_child("grandchild")
# Search and traverse
found = root.find_nodes(lambda n: "child" in str(n.data))
edges = root.get_edges() # All parent-child pairs
Using Text with metadata¶
from dataknobs_structures import Text, TextMetaData
# Create document with metadata
metadata = TextMetaData(
text_id="doc_001",
text_label="article",
author="Alice",
category="technology"
)
doc = Text("This is the document content...", metadata)
print(doc.text_id) # "doc_001"
print(doc.text_label) # "article"
print(doc.metadata.get_value("author")) # "Alice"
Using RecordStore¶
from dataknobs_structures import RecordStore
# Create store with disk backing
store = RecordStore("/data/users.tsv")
# Add records
store.add_rec({"id": 1, "name": "Alice", "age": 30})
store.add_rec({"id": 2, "name": "Bob", "age": 25})
# Access as DataFrame or list
df = store.df
records = store.records
# Persist changes
store.save()
Using cdict¶
from dataknobs_structures import cdict
# Only accept positive integers
positive = cdict(lambda d, k, v: isinstance(v, int) and v > 0)
positive['a'] = 5 # Accepted
positive['b'] = -1 # Rejected
print(positive) # {'a': 5}
print(positive.rejected) # {'b': -1}
Design Philosophy¶
The structures in this package are designed to be: 1. Simple - Easy to understand and use with minimal boilerplate 2. Flexible - Support multiple representations and use cases 3. Composable - Work well together in larger systems 4. Practical - Solve real problems in AI and data workflows
Installation¶
For more detailed documentation, see the individual class and function docstrings.
Modules:
| Name | Description |
|---|---|
conditional_dict |
Conditional dictionary with validation using the strategy pattern. |
document |
Text and metadata containers for document processing. |
record_store |
Record storage with in-memory and disk persistence. |
tree |
Tree data structure with parent-child relationships and traversal methods. |
Classes:
| Name | Description |
|---|---|
cdict |
Dictionary that validates key-value pairs before acceptance. |
Text |
Container combining text content with metadata. |
TextMetaData |
Specialized metadata container for text documents. |
RecordStore |
Container for tabular records with memory and disk representations. |
Tree |
A tree node with arbitrary data and parent-child relationships. |
Functions:
| Name | Description |
|---|---|
build_tree_from_string |
Build a Tree from a parenthesized string representation. |
Classes¶
cdict ¶
Bases: dict
Dictionary that validates key-value pairs before acceptance.
A dictionary subclass that applies a validation function to each item before allowing it to be set. Items that fail validation are stored in a separate "rejected" dictionary rather than being added to the main dictionary.
This uses the strategy pattern where the validation logic is provided as a function at initialization time, allowing flexible and reusable validation rules.
Attributes:
| Name | Type | Description |
|---|---|---|
rejected |
Dict
|
Dictionary containing key-value pairs that failed validation. |
accept_fn |
The validation function used to check items. |
Example
# Only accept string keys
str_keys = cdict(lambda d, k, v: isinstance(k, str))
str_keys['valid'] = 1 # Accepted
str_keys[123] = 2 # Rejected
# Only accept even numbers
evens = cdict(lambda d, k, v: isinstance(v, int) and v % 2 == 0)
evens['a'] = 2 # Accepted
evens['b'] = 3 # Rejected
evens['c'] = 4 # Accepted
print(evens) # {'a': 2, 'c': 4}
print(evens.rejected) # {'b': 3}
# Prevent duplicates using dict state
no_dups = cdict(lambda d, k, v: k not in d)
no_dups['x'] = 1
no_dups['x'] = 2 # Rejected (key already exists)
Initialize conditional dictionary with validation function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accept_fn
|
Callable[[Dict, Any, Any], bool]
|
Validation function with signature (dict, key, value) -> bool. Returns True to accept the key-value pair, False to reject it. The dict parameter is the current state of this dictionary. |
required |
*args
|
Any
|
Initial items as a mapping or iterable of key-value pairs. |
()
|
**kwargs
|
Any
|
Additional initial items as keyword arguments. |
{}
|
Example
Methods:
| Name | Description |
|---|---|
__setitem__ |
Set an item if it passes validation, otherwise reject it. |
setdefault |
Set default value if key doesn't exist and passes validation. |
update |
Update dictionary with key-value pairs, validating each. |
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
Attributes¶
rejected
property
¶
Functions¶
__setitem__ ¶
Set an item if it passes validation, otherwise reject it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
Any
|
The dictionary key. |
required |
value
|
Any
|
The value to set. |
required |
Example
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
setdefault ¶
Set default value if key doesn't exist and passes validation.
If the key exists, returns its value. If not, validates the default value and sets it if accepted (or rejects it if not).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
Any
|
The dictionary key. |
required |
default
|
Any
|
Default value to set if key doesn't exist. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
The existing value if key exists, the default value if set, or None |
Any
|
if the default was rejected. |
Example
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
update ¶
Update dictionary with key-value pairs, validating each.
Accepts either a mapping object, an iterable of key-value pairs, or keyword arguments. Each item is validated and added or rejected individually.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Mapping or iterable of (key, value) pairs. |
()
|
**kwargs
|
Any
|
Key-value pairs as keyword arguments. |
{}
|
Example
d = cdict(lambda _, k, v: isinstance(v, int))
# Update from dict
d.update({'a': 1, 'b': 'text', 'c': 3})
print(d) # {'a': 1, 'c': 3}
print(d.rejected) # {'b': 'text'}
# Update from kwargs
d.update(x=5, y='invalid')
print(d) # {'a': 1, 'c': 3, 'x': 5}
print(d.rejected) # {'b': 'text', 'y': 'invalid'}
Source code in packages/structures/src/dataknobs_structures/conditional_dict.py
Text ¶
Container combining text content with metadata.
Wraps a text string with associated metadata for document processing, text analysis, and NLP pipelines. Provides convenient access to both the text content and its metadata attributes.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
The text content string. |
text_id |
Any
|
Unique identifier from metadata. |
text_label |
str
|
Label/category from metadata. |
metadata |
TextMetaData
|
Full TextMetaData object. |
Example
# Create text with metadata
metadata = TextMetaData(
text_id="email_001",
text_label="customer_inquiry",
sender="customer@example.com"
)
doc = Text("Hello, I have a question about...", metadata)
# Access content and metadata
print(doc.text) # "Hello, I have a question about..."
print(doc.text_id) # "email_001"
print(doc.text_label) # "customer_inquiry"
print(doc.metadata.get_value("sender")) # "customer@example.com"
# Minimal usage (auto-creates metadata)
simple_doc = Text("Some text", None)
print(simple_doc.text_id) # 0 (default)
Initialize text document with content and metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text content string. |
required |
metadata
|
TextMetaData | None
|
TextMetaData object with document metadata. If None, creates default metadata with text_id=0 and text_label="text". |
required |
Example
Source code in packages/structures/src/dataknobs_structures/document.py
Attributes¶
text_id
property
¶
The unique identifier from metadata.
Returns:
| Type | Description |
|---|---|
Any
|
The text_id value. |
text_label
property
¶
The label/category from metadata.
Returns:
| Type | Description |
|---|---|
str
|
The text_label value. |
metadata
property
¶
The metadata object.
Returns:
| Type | Description |
|---|---|
TextMetaData
|
The TextMetaData instance containing all metadata. |
Functions¶
TextMetaData ¶
Bases: MetaData
Specialized metadata container for text documents.
Extends MetaData to provide standard text document metadata fields including a unique identifier (text_id) and a label/category (text_label).
Attributes:
| Name | Type | Description |
|---|---|---|
text_id |
Any
|
Unique identifier for the text document. |
text_label |
str | Any
|
Label or category for the text (e.g., "article", "email"). |
data |
dict[str, Any]
|
Full metadata dictionary including text_id, text_label, and any kwargs. |
Example
Initialize text metadata with ID and label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_id
|
Any
|
Unique identifier for the text. Can be any type (string, int, etc.). |
required |
text_label
|
str
|
Label or category for the text. Defaults to "text". |
TEXT_LABEL
|
**kwargs
|
Any
|
Additional optional metadata key-value pairs. |
{}
|
Example
Source code in packages/structures/src/dataknobs_structures/document.py
RecordStore ¶
Container for tabular records with memory and disk representations.
Manages a sequence of records (rows) that can be represented as a list of dictionaries, a pandas DataFrame, and/or a TSV/CSV file on disk. The store automatically synchronizes between these representations and provides simple CRUD operations.
Attributes:
| Name | Type | Description |
|---|---|---|
tsv_fpath |
Path to the backing file on disk (None if not persisted). |
|
df |
DataFrame | None
|
The records as a pandas DataFrame (lazily created from records). |
records |
List[Dict[str, Any]]
|
The records as a list of dictionaries. |
Example
# Create with disk persistence
store = RecordStore("/data/results.tsv")
# Add records
store.add_rec({"user": "alice", "score": 100})
store.add_rec({"user": "bob", "score": 95})
# Access data
print(len(store.records)) # 2
print(store.df.shape) # (2, 2)
# Save and restore
store.save()
store.clear()
store.restore() # Reloads from disk
Note
If tsv_fpath is None, the store operates entirely in memory with no disk persistence.
Initialize record store with optional file backing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tsv_fpath
|
str | None
|
Path to TSV/CSV file on disk. If None, operates without disk persistence. If the file exists, loads data from it. |
required |
df
|
DataFrame | None
|
Optional initial DataFrame to populate the store. Ignored if tsv_fpath points to an existing file. |
None
|
sep
|
str
|
File separator character. Defaults to tab ("\t") for TSV files. Use "," for CSV. |
'\t'
|
Example
# In-memory only
store = RecordStore(None)
# With disk persistence
store = RecordStore("/data/records.tsv")
# CSV with comma separator
store = RecordStore("/data/records.csv", sep=",")
# Initialize with DataFrame
import pandas as pd
df = pd.DataFrame([{"a": 1}, {"a": 2}])
store = RecordStore("/data/data.tsv", df=df)
Methods:
| Name | Description |
|---|---|
clear |
Clear all records without saving changes. |
add_rec |
Add a record to the store. |
save |
Save records to the backing file. |
restore |
Restore records from disk, discarding in-memory changes. |
Source code in packages/structures/src/dataknobs_structures/record_store.py
Attributes¶
df
property
¶
Get records as a pandas DataFrame.
Lazily creates the DataFrame from records if it doesn't exist.
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
DataFrame representation of records, or None if no records. |
records
property
¶
Functions¶
clear ¶
Clear all records without saving changes.
Removes all records from memory but does not affect the backing file. Use save() afterwards if you want to persist the empty state to disk.
Example
Source code in packages/structures/src/dataknobs_structures/record_store.py
add_rec ¶
Add a record to the store.
Appends a new record and invalidates the DataFrame cache (will be rebuilt on next access to df property).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
Dict[str, Any]
|
Dictionary representing a single record (row). |
required |
Example
Source code in packages/structures/src/dataknobs_structures/record_store.py
save ¶
Save records to the backing file.
Writes the current records to disk as a TSV/CSV file. Does nothing if tsv_fpath is None (in-memory only mode).
Example
Note
The file is written with headers and without row indices.
Source code in packages/structures/src/dataknobs_structures/record_store.py
restore ¶
Restore records from disk, discarding in-memory changes.
Reloads data from the backing file (if it exists) or from the provided DataFrame. All current in-memory changes are lost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame | None
|
Optional DataFrame to restore from. If None, uses the backing file (if available) or the initial DataFrame. |
None
|
Example
Note
If tsv_fpath is None and no df is provided, restores to the initial DataFrame or creates an empty store.
Source code in packages/structures/src/dataknobs_structures/record_store.py
Tree ¶
A tree node with arbitrary data and parent-child relationships.
Each Tree instance represents a node in a tree structure that maintains: - Arbitrary data of any type - An ordered list of child nodes - A reference to its parent node (None for root) - Bidirectional links between parents and children for efficient traversal
The tree structure supports both depth-first and breadth-first traversal, node searching, ancestor finding, and various tree manipulation operations. All nodes are doubly-linked, allowing traversal both up (to parents) and down (to children).
Attributes:
| Name | Type | Description |
|---|---|---|
data |
Any
|
The data contained in this node (any type). |
children |
List[Tree] | None
|
List of child Tree nodes, or None if no children. |
parent |
Tree | None
|
Parent Tree node, or None if this is a root node. |
root |
Tree
|
The root node of this tree (traverses up to find it). |
depth |
int
|
Number of hops from the root to this node (root has depth 0). |
Example
# Create a tree structure
root = Tree("root")
child1 = Tree("child1", parent=root)
child2 = Tree("child2", parent=root)
# Add grandchildren
grandchild = Tree("grandchild", parent=child1)
# Navigate the tree
print(grandchild.root.data) # "root"
print(grandchild.depth) # 2
print(child1.num_children) # 1
Note
Tree nodes can be created standalone and connected later, or created with a parent reference for immediate attachment. When a node is added as a child, it's automatically removed from any previous parent.
Initialize a tree node with optional parent attachment.
Creates a new tree node containing arbitrary data. If a parent is provided, the new node is automatically added as a child of that parent. If the parent is not already a Tree instance, it will be wrapped in one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The data to be contained within this node. Can be any type. |
required |
parent
|
Union[Tree, Any]
|
Optional parent node to attach to. Can be a Tree instance or data that will be wrapped in a Tree. If None, creates a root node. |
None
|
child_pos
|
int | None
|
Optional 0-based position among parent's children. If None, appends to the end of the parent's children list. |
None
|
Example
Methods:
| Name | Description |
|---|---|
__repr__ |
Return string representation of this tree. |
has_children |
Check if this node has any children. |
has_parent |
Check if this node has a parent. |
add_child |
Add a child node to this node. |
add_edge |
Add a parent-child edge to this tree. |
prune |
Remove this node from its parent. |
find_nodes |
Find nodes matching a condition using depth-first or breadth-first search. |
collect_terminal_nodes |
Collect all leaf nodes (nodes with no children) in this tree. |
get_edges |
Get all parent-child edges in this tree. |
get_path |
Get the path from the root to this node. |
is_ancestor |
Check if this node is an ancestor of another node. |
find_deepest_common_ancestor |
Find the deepest (closest) common ancestor of this node and another. |
as_string |
Get a string representation of this tree. |
get_deepest_left |
Get the leftmost terminal descendant of this node. |
get_deepest_right |
Get the rightmost terminal descendant of this node. |
build_dot |
Build a Graphviz Digraph for visualizing this tree. |
Source code in packages/structures/src/dataknobs_structures/tree.py
Attributes¶
data
property
writable
¶
The data contained in this node.
Returns:
| Type | Description |
|---|---|
Any
|
The arbitrary data stored in this node. |
children
property
¶
This node's children as an ordered list.
Returns:
| Type | Description |
|---|---|
List[Tree] | None
|
List of child Tree nodes, or None if this node has no children. |
parent
property
writable
¶
This node's parent.
Returns:
| Type | Description |
|---|---|
Tree | None
|
Parent Tree node, or None if this is a root node. |
root
property
¶
The root node of this tree.
Traverses up the parent chain to find the root node.
Returns:
| Type | Description |
|---|---|
Tree
|
The root Tree node (the node with no parent). |
sibnum
property
¶
This node's position among its siblings.
Returns:
| Type | Description |
|---|---|
int
|
0-based index among parent's children, or 0 if this is a root node. |
num_siblings
property
¶
Total number of siblings including this node.
Returns:
| Type | Description |
|---|---|
int
|
Number of children the parent has (including this node), or 1 if root. |
next_sibling
property
¶
prev_sibling
property
¶
num_children
property
¶
Number of children under this node.
Returns:
| Type | Description |
|---|---|
int
|
Count of direct children (0 if no children). |
depth
property
¶
Depth of this node in the tree.
The depth is the number of edges from the root to this node. The root node has depth 0, its children have depth 1, and so on.
Returns:
| Type | Description |
|---|---|
int
|
Number of hops from the root to this node (0 for root). |
Functions¶
__repr__ ¶
Return string representation of this tree.
Returns:
| Type | Description |
|---|---|
str
|
Multi-line string representation using as_string() method. |
has_children ¶
Check if this node has any children.
Returns:
| Type | Description |
|---|---|
bool
|
True if this node has at least one child, False otherwise. |
Source code in packages/structures/src/dataknobs_structures/tree.py
has_parent ¶
Check if this node has a parent.
Returns:
| Type | Description |
|---|---|
bool
|
True if this node has a parent, False if this is a root node. |
Note
The root of a tree has no parent.
Source code in packages/structures/src/dataknobs_structures/tree.py
add_child ¶
Add a child node to this node.
If the child is already part of another tree, it will be pruned from that tree first. If node_or_data is not a Tree instance, it will be wrapped in a new Tree node. The child's parent reference is automatically updated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_or_data
|
Union[Tree, Any]
|
The child to add. Can be a Tree instance or any data that will be wrapped in a Tree node. |
required |
child_pos
|
int | None
|
Optional 0-based position to insert the child. If None, appends to the end. If provided, inserts at the specified position, shifting existing children. |
None
|
Returns:
| Type | Description |
|---|---|
Tree
|
The child Tree node (either the provided node or a newly created one). |
Example
Note
The child's parent attribute is automatically updated to reference this node, maintaining the tree's structural integrity.
Source code in packages/structures/src/dataknobs_structures/tree.py
add_edge ¶
add_edge(
parent_node_or_data: Union[Tree, Any], child_node_or_data: Union[Tree, Any]
) -> Tuple[Tree, Tree]
Add a parent-child edge to this tree.
Creates or reuses nodes to establish a parent-child relationship. If the parent or child already exist in the tree (matched by data equality), they are reused. If the child exists elsewhere, it's moved to be a child of the parent. If neither exist, the parent is added as a child of this node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent_node_or_data
|
Union[Tree, Any]
|
The parent node (Tree instance) or data to match/create. |
required |
child_node_or_data
|
Union[Tree, Any]
|
The child node (Tree instance) or data to match/create. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Tree, Tree]
|
Tuple of (parent_node, child_node) Tree instances. |
Example
Note
Nodes are matched by data equality. If a node with matching data already exists in the tree, it will be reused rather than creating a duplicate.
Source code in packages/structures/src/dataknobs_structures/tree.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 | |
prune ¶
Remove this node from its parent.
Detaches this node from its parent's children list and sets this node's parent to None. The node and its subtree remain intact.
Returns:
| Type | Description |
|---|---|
Tree | None
|
This node's former parent, or None if this was already a root node. |
Example
Source code in packages/structures/src/dataknobs_structures/tree.py
find_nodes ¶
find_nodes(
accept_node_fn: Callable[[Tree], bool],
traversal: str = "dfs",
include_self: bool = True,
only_first: bool = False,
highest_only: bool = False,
) -> List[Tree]
Find nodes matching a condition using depth-first or breadth-first search.
Searches the tree using the specified traversal strategy and returns all nodes for which the accept function returns True.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accept_node_fn
|
Callable[[Tree], bool]
|
Function that takes a Tree node and returns True to include it in results, False to skip it. |
required |
traversal
|
str
|
Search strategy - either 'dfs' (depth-first) or 'bfs' (breadth-first). Defaults to 'dfs'. |
'dfs'
|
include_self
|
bool
|
If True, considers this node in the search. If False, starts with this node's children. Defaults to True. |
True
|
only_first
|
bool
|
If True, stops after finding the first match. Defaults to False. |
False
|
highest_only
|
bool
|
If True, doesn't search below matched nodes (stops descending when a match is found). Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
List[Tree]
|
List of Tree nodes that matched the accept function. |
Example
root = Tree("root")
root.add_child("apple")
root.add_child("banana")
child = root.add_child("parent")
child.add_child("apricot")
# Find all nodes containing 'a'
found = root.find_nodes(lambda n: 'a' in str(n.data))
# Returns: [root, apple, banana, parent, apricot]
# Find first node containing 'a'
first = root.find_nodes(
lambda n: 'a' in str(n.data),
only_first=True
)
# Returns: [root]
# Breadth-first search
bfs = root.find_nodes(lambda n: 'a' in str(n.data), traversal='bfs')
Source code in packages/structures/src/dataknobs_structures/tree.py
collect_terminal_nodes ¶
collect_terminal_nodes(
accept_node_fn: Callable[[Tree], bool] | None = None,
_found: List[Tree] | None = None,
) -> List[Tree]
Collect all leaf nodes (nodes with no children) in this tree.
Recursively traverses the tree and collects nodes that have no children, optionally filtering them with an accept function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accept_node_fn
|
Callable[[Tree], bool] | None
|
Optional function to filter which terminal nodes to include. If None, includes all terminal nodes. |
None
|
_found
|
List[Tree] | None
|
Internal parameter for recursion. Do not use. |
None
|
Returns:
| Type | Description |
|---|---|
List[Tree]
|
List of terminal (leaf) Tree nodes. |
Example
root = Tree("root")
child1 = root.add_child("child1")
child1.add_child("leaf1")
child1.add_child("leaf2")
root.add_child("leaf3")
# Collect all terminal nodes
leaves = root.collect_terminal_nodes()
# Returns: [leaf1, leaf2, leaf3]
# Collect only specific leaves
filtered = root.collect_terminal_nodes(
lambda n: "1" in str(n.data)
)
# Returns: [leaf1]
Source code in packages/structures/src/dataknobs_structures/tree.py
get_edges ¶
get_edges(
traversal: str = "bfs", include_self: bool = True, as_data: bool = True
) -> List[Tuple[Union[Tree, Any], Union[Tree, Any]]]
Get all parent-child edges in this tree.
Returns a list of (parent, child) tuples representing all edges in the tree, using either depth-first or breadth-first traversal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traversal
|
str
|
Search strategy - either 'dfs' (depth-first) or 'bfs' (breadth-first). Defaults to 'bfs'. |
'bfs'
|
include_self
|
bool
|
If True, includes edges from this node to its children. If False, starts with this node's children. Defaults to True. |
True
|
as_data
|
bool
|
If True, returns data values in tuples. If False, returns Tree node instances. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
List[Tuple[Union[Tree, Any], Union[Tree, Any]]]
|
List of (parent, child) tuples, either as data values or Tree nodes. |
Example
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")
child1.add_child("grandchild")
# Get edges as data
edges = root.get_edges()
# Returns: [("root", "child1"), ("root", "child2"),
# ("child1", "grandchild")]
# Get edges as Tree nodes
edges_nodes = root.get_edges(as_data=False)
# Returns: [(root_node, child1_node), ...]
Source code in packages/structures/src/dataknobs_structures/tree.py
get_path ¶
Get the path from the root to this node.
Returns:
| Type | Description |
|---|---|
List[Tree]
|
Ordered list of Tree nodes from root to this node (inclusive). |
Example
Source code in packages/structures/src/dataknobs_structures/tree.py
is_ancestor ¶
Check if this node is an ancestor of another node.
An ancestor is any node on the path from another node to the root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Tree
|
The potential descendant node to check. |
required |
self_is_ancestor
|
bool
|
If True, considers a node to be its own ancestor. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
True if this node is an ancestor of the other node, False otherwise. |
Example
root = Tree("root")
child = Tree("child", parent=root)
grandchild = Tree("grandchild", parent=child)
print(root.is_ancestor(grandchild)) # True
print(child.is_ancestor(grandchild)) # True
print(grandchild.is_ancestor(root)) # False
print(grandchild.is_ancestor(grandchild)) # False
print(grandchild.is_ancestor(grandchild, self_is_ancestor=True)) # True
Source code in packages/structures/src/dataknobs_structures/tree.py
find_deepest_common_ancestor ¶
Find the deepest (closest) common ancestor of this node and another.
The deepest common ancestor is the lowest node that is an ancestor of both nodes. This is also known as the lowest common ancestor (LCA).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Tree | None
|
The other node to find a common ancestor with. Can be None. |
required |
Returns:
| Type | Description |
|---|---|
Tree | None
|
The deepest common ancestor Tree node, or None if other is None or |
Tree | None
|
if the nodes are not in the same tree. |
Example
root = Tree("root")
left = root.add_child("left")
right = root.add_child("right")
left_child = left.add_child("left_child")
right_child = right.add_child("right_child")
# Find common ancestor
ancestor = left_child.find_deepest_common_ancestor(right_child)
print(ancestor.data) # "root"
# Nodes in same branch
ancestor = left_child.find_deepest_common_ancestor(left)
print(ancestor.data) # "left"
# Same node
ancestor = left_child.find_deepest_common_ancestor(left_child)
print(ancestor.data) # "left_child"
Source code in packages/structures/src/dataknobs_structures/tree.py
as_string ¶
Get a string representation of this tree.
Creates a parenthesized string representation of the tree structure, with optional multiline formatting and custom indentation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
delim
|
str
|
The delimiter/indentation to use between levels. Defaults to a single space. |
' '
|
multiline
|
bool
|
If True, includes newlines for readability. If False, creates a compact single-line representation. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
String representation of this tree and its descendants. |
Example
root = Tree("root")
child1 = root.add_child("child1")
child1.add_child("leaf1")
root.add_child("child2")
# Compact representation
print(root.as_string())
# Output: (root child1 leaf1 child2)
# Multiline with indentation
print(root.as_string(delim=" ", multiline=True))
# Output:
# (root
# child1
# leaf1
# child2)
Source code in packages/structures/src/dataknobs_structures/tree.py
get_deepest_left ¶
Get the leftmost terminal descendant of this node.
Follows the leftmost (first) child at each level until reaching a leaf node.
Returns:
| Type | Description |
|---|---|
Tree
|
The terminal (leaf) node reached by always taking the first child. |
Example
Source code in packages/structures/src/dataknobs_structures/tree.py
get_deepest_right ¶
Get the rightmost terminal descendant of this node.
Follows the rightmost (last) child at each level until reaching a leaf node.
Returns:
| Type | Description |
|---|---|
Tree
|
The terminal (leaf) node reached by always taking the last child. |
Example
Source code in packages/structures/src/dataknobs_structures/tree.py
build_dot ¶
build_dot(
node_name_fn: Callable[[Tree], str] | None = None, **kwargs: Any
) -> graphviz.graphs.Digraph
Build a Graphviz Digraph for visualizing this tree.
Creates a directed graph representation of the tree structure that can be rendered to various formats (PNG, PDF, SVG, etc.) using Graphviz.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_name_fn
|
Callable[[Tree], str] | None
|
Optional function to generate node labels. Takes a Tree node and returns a string. If None, uses str(node.data). |
None
|
**kwargs
|
Any
|
Additional keyword arguments passed to graphviz.Digraph constructor (e.g., name, format, node_attr, edge_attr). |
{}
|
Returns:
| Type | Description |
|---|---|
Digraph
|
A graphviz.Digraph object representing this tree. |
Example
root = Tree("root")
child1 = root.add_child("child1")
child2 = root.add_child("child2")
# Basic usage
dot = root.build_dot(name='MyTree', format='png')
print(dot.source) # View the DOT source
# Render to file
dot.render('/tmp/tree', format='png')
# Custom node labels
dot = root.build_dot(
node_name_fn=lambda n: f"[{n.data}]",
node_attr={'shape': 'box', 'style': 'filled', 'fillcolor': 'lightblue'}
)
# Display in Jupyter
from IPython.display import Image
Image(filename=dot.render('/tmp/tree'))
Note
Requires the graphviz package and Graphviz system installation.
Source code in packages/structures/src/dataknobs_structures/tree.py
Functions¶
build_tree_from_string ¶
Build a Tree from a parenthesized string representation.
Parses a string representation of a tree (such as produced by Tree.as_string()) and reconstructs the tree structure. The string format uses parentheses to indicate parent-child relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
from_string
|
str
|
The tree string in parenthesized format, e.g., "(root child1 child2)". |
required |
Returns:
| Type | Description |
|---|---|
Tree
|
The reconstructed Tree with all nodes and structure preserved. |
Example
# Build from string
tree_str = "(root (child1 leaf1 leaf2) child2)"
tree = build_tree_from_string(tree_str)
print(tree.data) # "root"
print(tree.num_children) # 2
print(tree.children[0].data) # "child1"
# Roundtrip: tree -> string -> tree
original = Tree("root")
original.add_child("child")
tree_str = original.as_string()
reconstructed = build_tree_from_string(tree_str)