User Guide¶
Welcome to the Dataknobs User Guide. This section provides detailed information on using Dataknobs effectively for building knowledge-centric applications.
Getting Started¶
- Quick Start - Get up and running quickly
- Basic Usage - Common patterns and examples
- Advanced Usage - Advanced features and techniques
- Best Practices - Recommended patterns
Core Capabilities¶
AI Agents & Chatbots¶
Package: dataknobs-bots
Configuration-driven AI agents with memory, RAG, reasoning strategies, and multi-tenancy.
- User Guide - Comprehensive tutorials
- Configuration - Bot configuration options
- Architecture - Understanding bot components
- Tools - Built-in and custom tools
Use Cases: Customer support bots, virtual assistants, knowledge base Q&A, multi-user chat systems
Configuration Management¶
Package: dataknobs-config
Flexible configuration with environment variable support, factory patterns, and cross-references.
- Configuration System - Understanding the configuration architecture
- Environment Variables - Using environment-based configuration
- Factory Registration - Dynamic object creation
Use Cases: Multi-environment deployments, dynamic backend selection, application configuration
Data Abstraction¶
Package: dataknobs-data
Unified interface across Memory, File, PostgreSQL, Elasticsearch, and S3 backends with transactions and migrations.
- Record Model - Working with records
- Query Interface - Building queries
- Backends - Choosing and configuring backends
- Async Pooling - High-performance async operations
- Pandas Integration - DataFrame workflows
Use Cases: Backend-agnostic data access, ETL pipelines, multi-backend applications, data migration
Workflow Orchestration¶
Package: dataknobs-fsm
Finite State Machine framework for building robust, testable data processing pipelines.
- FSM Basics - Introduction to the FSM framework
- Data Handling Modes - Understanding COPY, REFERENCE, and DIRECT modes
- Resources - Built-in resource managers (DB, HTTP, LLM, files)
- Streaming Workflows - Building streaming data pipelines
- Configuration - YAML/JSON-based workflow definitions
- Debugging FSMs - Using AdvancedFSM for debugging
Use Cases: ETL pipelines, data validation, multi-step processing, workflow automation
LLM Integration¶
Package: dataknobs-llm
Multi-provider LLM integration with prompt management, conversations, versioning, and tool calling.
- Prompts - Prompt template management and versioning
- Conversations - Multi-turn conversation handling
- Flows - Complex LLM workflows
- Tools and Enhancements - Function calling and tool use
- Performance - Optimization and cost tracking
Use Cases: Chatbots, content generation, code analysis, document summarization, Q&A systems
Data Structures¶
Package: dataknobs-structures
Core data structures for organizing knowledge: trees, documents, record stores, conditional dictionaries.
- Tree Structures - Hierarchical data organization
- Documents - Text and metadata handling
- Record Stores - Simple key-value storage
- Conditional Dictionaries - Filtered dictionaries
Use Cases: Hierarchical data, document management, knowledge graphs, data organization
Utilities¶
Package: dataknobs-utils
Utility functions for JSON manipulation, file operations, HTTP requests, and more.
- JSON Utils - JSON navigation and manipulation
- File Utils - File I/O operations
- Elasticsearch - Elasticsearch helpers
- LLM Utils - LLM-related utilities
Use Cases: JSON processing, file handling, search integration, API interactions
Text Processing¶
Package: dataknobs-xization
Text normalization, tokenization, masking, and lexical analysis for NLP and data processing.
- Tokenization - Text tokenization strategies
- Normalization - Text normalization functions
- Masking - PII and sensitive data masking
Use Cases: Data anonymization, text preprocessing, NLP pipelines, search indexing
Common Workflows¶
Building a Data Pipeline¶
Combine FSM, Data, and Config packages:
from dataknobs_fsm import SimpleFSM
from dataknobs_data import database_factory
from dataknobs_config import Config
# Load configuration
config = Config("pipeline.yaml")
config.register_factory("database", database_factory)
# Access database through config
source_db = config.get_instance("databases", "source")
target_db = config.get_instance("databases", "target")
# Define FSM workflow
fsm = SimpleFSM({
"states": [...],
"arcs": [...]
})
# Process with database access
fsm.context["source"] = source_db
fsm.context["target"] = target_db
result = fsm.process(data)
Building an AI Chatbot¶
Combine Bots, LLM, and Data packages:
from dataknobs_bots import BotRegistry
from dataknobs_data import PostgresDatabase
# Persistent storage for conversations
db = PostgresDatabase(connection_string="...")
# Configure bot with memory and RAG
bot_config = {
"llm": {"provider": "openai", "model": "gpt-4"},
"memory": {"type": "vector", "db": db},
"knowledge_base": {"type": "elasticsearch", "index": "docs"}
}
registry = BotRegistry()
bot = registry.create_bot("support", bot_config)
# Multi-session conversations with persistence
response = bot.chat("How do I reset my password?", session_id="user123")
Processing Text at Scale¶
Combine FSM, Data, and Xization packages:
from dataknobs_fsm import SimpleFSM
from dataknobs_data import S3Database
from dataknobs_xization import normalize
# Read from S3, process, write back
s3_db = S3Database(bucket="documents")
fsm_config = {
"name": "text_processor",
"states": [
{"name": "load", "is_start": True},
{"name": "normalize"},
{"name": "save", "is_end": True}
],
"arcs": [
{
"from": "load",
"to": "normalize",
"transform": {
"type": "inline",
"code": "lambda data, ctx: normalize.basic_normalization_fn(data['text'])"
}
}
]
}
fsm = SimpleFSM(fsm_config)
Learning Path¶
Beginners - Start Here: 1. Quick Start - Get familiar with basic concepts 2. Basic Usage - Learn core data structures and utilities 3. Examples - See practical applications
Intermediate - Build Applications: 1. Configuration System - Environment management 2. Data Abstraction - Backend-agnostic data access 3. FSM Workflows - Build robust pipelines 4. Advanced Usage - Advanced patterns
Advanced - AI & Complex Systems: 1. LLM Integration - Integrate language models 2. AI Agents - Build intelligent chatbots 3. Streaming Workflows - Real-time processing 4. Production Best Practices - Deploy at scale
Package Integration¶
Dataknobs packages are designed to work together seamlessly:
- Config → Data: Dynamic backend configuration
- Data → FSM: Database access in workflows
- LLM → Bots: LLM integration in AI agents
- Bots → Data: Persistent conversation memory
- FSM → LLM: LLM calls in workflow states
- Utils → Everything: Common utilities across all packages
Additional Resources¶
- API Reference - Complete API documentation
- Examples - Real-world usage examples
- Development Guide - Contributing and extending
- Migration Guide - Upgrading from legacy versions