Getting Started with Dataknobs¶
This guide will help you get up and running with Dataknobs based on your use case.
Prerequisites¶
- Python 3.12 or higher
- pip or uv package manager
Installation¶
Choose Based on Your Use Case¶
Installing with uv¶
If you're using the uv package manager:
Installing from Source¶
Clone the repository and install in development mode:
Quick Start by Use Case¶
Building AI Chatbots¶
Create an intelligent chatbot with memory and tools:
import asyncio
from dataknobs_bots import DynaBot, BotContext
async def main():
# Configure bot from dictionary or YAML
config = {
"llm": {
"provider": "openai",
"model": "gpt-4",
"temperature": 0.7
},
"conversation_storage": {"backend": "memory"},
"memory": {
"type": "buffer",
"max_messages": 10
},
"prompts": {
"support_assistant": "You are a helpful customer support assistant."
},
"system_prompt": {"name": "support_assistant"}
}
# Create bot
bot = await DynaBot.from_config(config)
# Create context for conversation
context = BotContext(
conversation_id="support-001",
client_id="my-company",
user_id="customer123"
)
# Chat with context retention
response = await bot.chat("I need help with my order", context)
print(response)
asyncio.run(main())
Processing Data with Workflows¶
Build robust ETL pipelines with finite state machines:
from dataknobs_fsm import SimpleFSM, DataHandlingMode
# Define a data processing pipeline
config = {
"name": "user_import",
"states": [
{"name": "load", "is_start": True},
{"name": "validate"},
{"name": "transform"},
{"name": "save", "is_end": True}
],
"arcs": [
{"from": "load", "to": "validate"},
{"from": "validate", "to": "transform"},
{"from": "transform", "to": "save"}
]
}
fsm = SimpleFSM(config, data_mode=DataHandlingMode.COPY)
result = fsm.process({"users": [{"name": "Alice", "age": 30}]})
Working with Multiple Data Backends¶
Use a unified interface across different storage systems:
from dataknobs_config import Config
from dataknobs_data import database_factory, Record, Query
# Load configuration (supports environment variables)
config = Config("config.yaml") # or dict
config.register_factory("database", database_factory)
# Get database instance - backend determined by config
# Supports: Memory, File, PostgreSQL, Elasticsearch, S3
db = config.get_instance("databases", "primary")
# Unified API regardless of backend
record = Record({"name": "Alice", "email": "alice@example.com"})
record_id = db.create(record)
# Query with same API across all backends
results = db.search(Query().filter("name", "=", "Alice"))
Learn more about Data → | Learn more about Config →
Integrating LLMs¶
Manage prompts and multi-provider LLM access:
from dataknobs_llm import create_llm_provider, LLMMessage
# Create LLM provider
llm = create_llm_provider({
"provider": "openai",
"model": "gpt-4",
"api_key": "your-key"
})
# Generate completion
messages = [
LLMMessage(role="user", content="What's the capital of France?")
]
response = await llm.generate(messages)
print(response.content)
# Continue conversation
messages.append(LLMMessage(role="assistant", content=response.content))
messages.append(LLMMessage(role="user", content="What's its population?"))
response = await llm.generate(messages) # Maintains context
Working with Data Structures¶
Use trees, documents, and utilities for common tasks:
from dataknobs_structures import Tree
from dataknobs_utils import json_utils
from dataknobs_xization import normalize
# Hierarchical data
tree = Tree("root")
chapter1 = tree.add_child("Chapter 1")
chapter1.add_child("Section 1.1")
chapter1.add_child("Section 1.2")
# Navigate tree
for node in tree.traverse():
print(f"{' ' * node.level}{node.value}")
# JSON utilities
data = {"users": {"alice": {"age": 30, "city": "Paris"}}}
age = json_utils.get_value(data, "users.alice.age") # 30
# Text normalization
text = " Hello WORLD!!! "
normalized = normalize.basic_normalization_fn(text) # "hello world!"
Learn more about Structures → | Learn more about Utils →
Understanding the Package Ecosystem¶
Dataknobs packages are organized by capability:
Configuration & Data Layer (Foundation for other packages):
- dataknobs-config: Environment-aware configuration management
- dataknobs-data: Unified data access across multiple backends
AI & LLM Capabilities (Building intelligent applications):
- dataknobs-llm: LLM integration with prompt management
- dataknobs-bots: Pre-built AI agents with memory and tools
Workflow & Processing (Orchestrating complex operations):
- dataknobs-fsm: Finite state machines for robust pipelines
Core Utilities (Building blocks):
- dataknobs-structures: Trees, documents, record stores
- dataknobs-utils: JSON, file operations, integrations
- dataknobs-xization: Text normalization and tokenization
- dataknobs-common: Shared base classes
Next Steps¶
Based on what you want to build:
For AI/ML Projects: 1. Start with Bots Quickstart for chatbots 2. Or LLM Quickstart for custom LLM integration 3. Add Data package for persistence 4. Use FSM for complex AI workflows
For Data Engineering: 1. Begin with FSM Quickstart 2. Add Data package for backend abstraction 3. Use Config for environment management 4. Explore Data Examples
For General Development: 1. Check Basic Usage Guide for structures and utilities 2. Explore Advanced Usage for patterns 3. Browse Examples for real-world use cases
Getting Help¶
- Documentation: Comprehensive guides in the User Guide
- API Reference: Detailed API documentation for each package
- Examples: Real-world usage examples
- GitHub Issues: Report bugs or request features at GitHub
Common Patterns¶
Combining Packages¶
Packages work seamlessly together:
# FSM + Data + Config: Robust data pipeline
from dataknobs_fsm import SimpleFSM
from dataknobs_data import database_factory
from dataknobs_config import Config
config = Config("pipeline.yaml")
config.register_factory("database", database_factory)
# FSM can access database through config
fsm = SimpleFSM(pipeline_config)
fsm.context["db"] = config.get_instance("databases", "primary")
# Bots + LLM + Data: Chatbot with persistence
from dataknobs_bots import BotRegistry
from dataknobs_data import MemoryDatabase
registry = BotRegistry()
db = MemoryDatabase() # For conversation history
bot = registry.create_bot("assistant", {
"llm": {"provider": "openai"},
"memory": {"type": "buffer"}
})
Environment-Based Configuration¶
All packages support environment variables through Config:
# config.yaml
databases:
primary:
backend: ${DB_BACKEND:memory} # Default to memory
connection: ${DB_CONNECTION:}
llm:
provider: ${LLM_PROVIDER:openai}
api_key: ${OPENAI_API_KEY}
from dataknobs_config import Config
# Reads from environment or uses defaults
config = Config("config.yaml")
Troubleshooting¶
Import Errors¶
Use the new package names with underscores:
# ✅ Correct
from dataknobs_structures import Tree
from dataknobs_bots import BotRegistry
# ❌ Old style (deprecated)
from dataknobs.structures import Tree
Missing Dependencies¶
Some packages require additional dependencies:
# For PostgreSQL support
pip install psycopg2-binary
# For Elasticsearch
pip install elasticsearch
# For S3 support
pip install boto3
# For LLM providers
pip install openai anthropic
See the Installation Guide for complete dependency information.