Basic Usage¶
This guide covers the fundamental features of Dataknobs packages.
Data Structures¶
Trees¶
Trees are hierarchical data structures used for representing relationships.
from dataknobs_structures import Tree
# Create a tree manually
tree = Tree()
root = tree.add_node("root")
child1 = tree.add_child(root, "child1")
child2 = tree.add_child(root, "child2")
leaf = tree.add_child(child1, "leaf")
# Traverse the tree
for node in tree.traverse():
print(f"Node: {node.value}, Level: {node.level}")
Documents¶
Documents represent text with metadata and structure.
from dataknobs_structures import Text, TextMetaData
# Create a document with metadata
metadata = TextMetaData(
source="example.txt",
created_at="2024-01-01",
author="John Doe"
)
text = Text("This is the document content.", metadata)
print(f"Content: {text.content}")
print(f"Source: {text.metadata.source}")
Conditional Dictionaries¶
Conditional dictionaries allow filtering of key-value pairs.
from dataknobs_structures import cdict
# Create a conditional dict that only accepts string values
def accept_strings(d, k, v):
return isinstance(v, str)
cd = cdict(accept_strings, {"name": "Alice", "age": 30})
# Only "name" will be stored
Utilities¶
JSON Utilities¶
from dataknobs_utils import json_utils
data = {
"users": [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
]
}
# Get nested values
first_user = json_utils.get_value(data, "users.0")
print(first_user) # {"name": "Alice", "age": 30}
# Set nested values
json_utils.set_value(data, "users.0.age", 31)
File Utilities¶
from dataknobs_utils import file_utils
# Read and write files
content = file_utils.read_file("input.txt")
processed = content.upper()
file_utils.write_file("output.txt", processed)
# Work with JSON files
data = file_utils.read_json("config.json")
data["updated"] = True
file_utils.write_json("config.json", data)
Text Processing¶
Normalization¶
from dataknobs_xization import basic_normalization_fn
# Basic text normalization
text = " HELLO World!!! "
normalized = basic_normalization_fn(text)
print(normalized) # "hello world!"
# Custom normalization
def custom_normalize(text):
return text.lower().replace("!", "").strip()
result = custom_normalize("Hello World!")
print(result) # "hello world"
Tokenization¶
from dataknobs_xization import masking_tokenizer
# Tokenize text with masking
text = "John Doe lives at 123 Main St"
tokenizer = masking_tokenizer.MaskingTokenizer()
tokens = tokenizer.tokenize(text)
# Tokens will include masked versions for sensitive data
for token in tokens:
print(f"Token: {token.value}, Type: {token.type}")
Working with RecordStore¶
from dataknobs_structures import RecordStore
# Create a record store
store = RecordStore()
# Add records
store.add_record("user:1", {"name": "Alice", "age": 30})
store.add_record("user:2", {"name": "Bob", "age": 25})
# Retrieve records
user1 = store.get_record("user:1")
print(user1) # {"name": "Alice", "age": 30}
# Query records
young_users = store.query(lambda r: r.get("age", 0) < 30)
for user in young_users:
print(user)
Error Handling¶
All packages include proper error handling:
from dataknobs_structures import Tree
try:
tree = Tree()
# Attempt to access non-existent node
node = tree.get_node("nonexistent")
except KeyError as e:
print(f"Node not found: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Beyond the Basics¶
Dataknobs includes powerful packages for more advanced use cases:
For AI Applications: - Bots Package - Build intelligent chatbots with memory and RAG - LLM Package - Integrate language models with prompt management
For Data Engineering: - FSM Package - Orchestrate complex workflows with finite state machines - Data Package - Unified interface across PostgreSQL, Elasticsearch, S3, and more - Config Package - Environment-aware configuration management
Next Steps¶
- Explore Advanced Usage for complex scenarios and heavier packages
- Read Best Practices for production deployments
- Check the API Reference for detailed documentation
- Browse Examples for real-world use cases