Skip to content

dataknobs-bots

Configuration-driven AI agents and chatbots for the DataKnobs ecosystem

Overview

The dataknobs-bots package provides a comprehensive framework for building sophisticated AI agents and chatbots with:

  • Configuration-First Design - Define bot behavior entirely through YAML/JSON configuration
  • Multi-Tenant Architecture - Single bot instance serves multiple clients with isolated conversations
  • Memory Management - Buffer, summary, and vector memory implementations for context management
  • RAG Support - Built-in knowledge base with document chunking and semantic search
  • Tool Integration - Configuration-driven tool loading without code changes
  • Reasoning Strategies - Simple, Chain-of-Thought, and ReAct reasoning patterns
  • Production Ready - PostgreSQL storage, error handling, horizontal scaling, and logging

Quick Start

# Install the package
pip install dataknobs-bots
import asyncio
from dataknobs_bots import DynaBot, BotContext

async def main():
    # Configuration
    config = {
        "llm": {
            "provider": "ollama",
            "model": "gemma3:1b",
            "temperature": 0.7,
            "max_tokens": 1000
        },
        "conversation_storage": {
            "backend": "memory"
        },
        "memory": {
            "type": "buffer",
            "max_messages": 10
        }
    }

    # Create bot from configuration
    bot = await DynaBot.from_config(config)

    # Create conversation context
    context = BotContext(
        conversation_id="conv-001",
        client_id="demo-client",
        user_id="user-123"
    )

    # Chat with the bot
    response = await bot.chat("Hello! What can you help me with?", context)
    print(f"Bot: {response}")

if __name__ == "__main__":
    asyncio.run(main())

Key Features

🤖 Configuration-Driven Architecture

  • Zero-Code Bot Creation - Define entire bot behavior through configuration
  • Hot Reload - Update bot configuration without restarting
  • Environment Overrides - Override settings via environment variables
  • XRef System - Reuse and compose configuration across bots

Learn more →

👥 Multi-Tenancy & Scaling

  • Client Isolation - Each client gets isolated conversations and data
  • Bot Registry - Centralized bot management with LRU caching
  • Stateless Design - Perfect for horizontal scaling in containerized environments
  • Connection Pooling - PostgreSQL connection pooling for high concurrency

Learn more →

🧠 Memory Systems

  • Buffer Memory - Simple sliding window of recent messages
  • Vector Memory - Semantic search over conversation history
  • Async Support - Fully asynchronous for high throughput
  • Conversation Storage - Memory, PostgreSQL, or custom backends

Learn more →

📚 RAG (Retrieval Augmented Generation)

  • Document Ingestion - Load markdown, JSON, JSONL documents from directories
  • Smart Chunking - Uses dataknobs-xization for intelligent document chunking
  • Hybrid Search - Combined text + vector search with RRF fusion
  • Vector Search - FAISS, Chroma, Pinecone support
  • Context Injection - Automatic context injection into prompts

Learn more →

🛠️ Tool Integration

  • Configuration-Driven - Load tools directly from configuration
  • XRef Support - Reference predefined tools across configurations
  • Built-in Tools - Knowledge search and common utilities
  • Custom Tools - Easy-to-implement tool interface

Learn more →

🎯 Reasoning Strategies

  • Simple Reasoning - Direct LLM response
  • ReAct Reasoning - Reasoning + Acting pattern for tool-using agents
  • Wizard Reasoning - Guided conversational flows with FSM-backed state management
  • Grounded / Hybrid - Deterministic KB retrieval, with optional ReAct tool use
  • Configurable - Switch reasoning strategies via configuration
  • Extensible - Implement, register, and configure custom strategies via the plugin registry

Built-in strategies → | Custom strategies →

📊 Wizard Observability & Task Tracking

  • Task Tracking - Granular progress tracking within wizard stages
  • Transition Audit Trail - Complete history of state transitions with timestamps
  • State Snapshots - Read-only snapshots for UI display and debugging
  • Dependency Management - Task dependencies with automatic availability tracking
  • Progress Calculation - Task-based or stage-based progress metrics

Learn more →

📦 Artifact System

  • Artifact Registry - Async CRUD with provenance tracking and lifecycle hooks
  • Versioning - Create new versions while preserving history
  • Provenance - Full creation context: sources, tool chain, LLM invocations, revisions
  • Lifecycle Management - 8-state lifecycle with enforced transitions
  • Wizard Transforms - Pre-built transforms for create, review, revise, approve workflows
  • LLM Tools - 5 context-aware tools for conversational artifact management
  • Display Helpers - Markdown formatting for evaluations, comparisons, and provenance
  • Assessment Sessions - Quiz workflows with response tracking and scoring

Learn more →

✅ Rubric Evaluation System

  • Multi-Criteria Assessment - Weighted criteria with configurable quality levels
  • Three Scoring Methods - Deterministic functions, JSON Schema validation, LLM classification
  • Feedback Generation - LLM-enhanced or deterministic evaluation summaries
  • Rubric Registry - Versioned rubric storage backed by AsyncDatabase
  • Meta-Rubric - Built-in quality validation for rubric definitions

Learn more →

🧠 Context Accumulator

  • Progressive Context - Build context throughout conversation
  • Assumption Tracking - Track inferred vs. confirmed information
  • Prompt Injection - Generate formatted context for prompts
  • Priority Sections - Organize context by importance

Learn more →

🔄 Task Injection

  • Event-Driven Tasks - Add tasks dynamically based on events
  • Artifact Events - React to artifact creation, review completion
  • Stage Events - Respond to stage transitions
  • Block Transitions - Prevent progress until conditions are met

Learn more →

🎯 Focus Guards

  • Drift Detection - Identify when conversations go off-topic
  • Tangent Tolerance - Allow some flexibility before correction
  • Keyword Detection - Configure topic and off-topic keywords
  • Correction Prompts - Generate prompts to redirect conversation

Learn more →

📋 Config Versioning

  • Immutable Versions - Track configuration changes over time
  • Rollback - Restore previous configurations
  • Diff - Compare versions to see changes
  • Audit Trail - Complete history with timestamps and reasons

Learn more →

⚙️ Per-Request Config Overrides

  • Dynamic Model Selection - Switch models per-request without creating new bots
  • Parameter Tuning - Override temperature, max_tokens, top_p per request
  • A/B Testing - Compare models or parameters without redeployment
  • Cost Optimization - Route simple queries to cheaper models
  • Usage Tracking - Metadata tracks which overrides were applied

Learn more →

🔄 Dynamic Registration

  • Registry Backends - Store bot configs in PostgreSQL, S3, or HTTP services
  • Caching with TTL - LRU cache with time-based expiration
  • Hot Reload - Update bot configs without application restarts
  • Event-Driven Invalidation - Event bus integration for cache invalidation
  • Knowledge Storage - S3/file storage for raw knowledge files before embedding

Learn more →

Architecture

graph TD
    A[DynaBot] --> B[LLM Provider]
    A --> C[Conversation Storage]
    A --> D[Tool Registry]
    A --> E[Memory]
    A --> F[Knowledge Base]
    A --> G[Reasoning Strategy]

    B --> H[Ollama/OpenAI/Anthropic]
    C --> I[Memory/PostgreSQL]
    E --> J[Buffer/Vector]
    F --> K[RAG Pipeline]
    G --> L[Simple/ReAct/Wizard]

    style A fill:#4CAF50
    style B fill:#2196F3
    style C fill:#FF9800

Package Structure

dataknobs_bots/
├── bot/                    # Core bot implementation
│   ├── base.py            # DynaBot class
│   ├── context.py         # BotContext dataclass
│   ├── manager.py         # BotManager for multi-tenant
│   └── registry.py        # Bot registry
├── middleware/            # Request/response middleware
│   ├── base.py           # Middleware interface
│   ├── cost.py           # Cost tracking
│   └── logging.py        # Logging middleware
├── api/                   # FastAPI integration
│   ├── dependencies.py   # Dependency injection
│   └── exceptions.py     # API exceptions
├── memory/                # Memory implementations
│   ├── base.py           # Memory interface
│   ├── buffer.py         # Buffer memory
│   └── vector.py         # Vector memory
├── knowledge/            # RAG implementation
│   └── rag.py           # RAGKnowledgeBase class
├── artifacts/           # Artifact management
│   ├── models.py       # Artifact, ArtifactStatus, ArtifactTypeDefinition
│   ├── provenance.py   # ProvenanceRecord, SourceReference, ToolInvocation, LLMInvocation, RevisionRecord
│   ├── registry.py     # ArtifactRegistry (async CRUD, lifecycle, review integration)
│   ├── transitions.py  # Status transition rules (TransitionValidator)
│   ├── transforms.py   # Wizard transforms (create, review, revise, approve, save_draft)
│   ├── tools.py        # LLM tools (Create, Update, Query, SubmitForReview, Get)
│   ├── display.py      # Markdown formatting (evaluation, comparison, provenance)
│   └── assessment.py   # Assessment sessions (StudentResponse, AssessmentSession, CumulativePerformance)
├── rubrics/             # Rubric evaluation system
│   ├── models.py       # Rubric, RubricCriterion, RubricLevel, ScoringMethod, CriterionResult, RubricEvaluation
│   ├── executor.py     # FunctionRegistry, RubricExecutor (deterministic, schema, LLM decode)
│   ├── feedback.py     # Feedback generation (LLM-enhanced and deterministic)
│   ├── registry.py     # RubricRegistry (versioned storage, target lookup)
│   └── meta.py         # Meta-rubric (structural quality validation)
├── context/             # Context management
│   ├── accumulator.py  # ConversationContext, Assumption
│   └── builder.py      # ContextBuilder, ContextPersister
├── config/              # Configuration utilities
│   ├── resolution.py   # Resource resolution
│   └── versioning.py   # ConfigVersionManager
├── reasoning/           # Reasoning strategies
│   ├── base.py         # ReasoningStrategy interface & StrategyCapabilities
│   ├── registry.py     # Strategy plugin registry
│   ├── simple.py       # Simple reasoning
│   ├── react.py        # ReAct reasoning
│   ├── wizard.py       # Wizard reasoning
│   ├── grounded.py     # Grounded reasoning (deterministic retrieval)
│   ├── hybrid.py       # Hybrid reasoning (grounded + ReAct)
│   ├── wizard_fsm.py   # FSM wrapper for wizards
│   ├── wizard_loader.py # Config loader
│   ├── wizard_hooks.py # Lifecycle hooks
│   ├── observability.py # Task tracking & state snapshots
│   ├── task_injection.py # Dynamic task injection
│   └── focus_guard.py  # Drift detection & correction
├── tools/              # Built-in tools
│   └── knowledge_search.py
└── utils/              # Utilities

Documentation

Guides

Reasoning Strategies

Artifact & Evaluation System

  • Artifact System - Versioned artifacts with provenance, lifecycle, transforms, tools, and display helpers
  • Rubric Evaluation - Multi-criteria evaluation with deterministic, schema, and LLM scoring
  • Context Accumulator - Build and manage conversation context

Advanced Features

API Reference

Examples

Installation

Basic Installation

pip install dataknobs-bots

With Optional Dependencies

# PostgreSQL storage
pip install dataknobs-bots[postgres]

# Vector memory with FAISS
pip install dataknobs-bots[faiss]

# All optional dependencies
pip install dataknobs-bots[all]

# For development
pip install dataknobs-bots[dev]

Requirements

  • Python 3.12+
  • dataknobs-config (automatically installed)
  • dataknobs-llm (automatically installed)
  • dataknobs-data (automatically installed)
  • dataknobs-xization (automatically installed)

Use Cases

Customer Support Bot

Multi-tenant bot with knowledge base for customer support:

config = {
    "llm": {"provider": "openai", "model": "gpt-4"},
    "conversation_storage": {"backend": "postgres"},
    "memory": {"type": "buffer", "max_messages": 20},
    "knowledge_base": {
        "enabled": True,
        "documents_path": "./support_docs"
    }
}

bot = await DynaBot.from_config(config)

Personal Assistant

Agent with tools for task automation:

config = {
    "llm": {"provider": "anthropic", "model": "claude-3-sonnet"},
    "reasoning": {"strategy": "react"},
    "tools": [
        {"class": "tools.CalendarTool", "params": {}},
        {"class": "tools.EmailTool", "params": {}},
        {"class": "tools.WeatherTool", "params": {}}
    ]
}

bot = await DynaBot.from_config(config)

Document Q&A

RAG-powered document question answering:

config = {
    "llm": {"provider": "ollama", "model": "llama3.1:8b"},
    "knowledge_base": {
        "enabled": True,
        "documents_path": "./company_docs",
        "chunking": {"max_chunk_size": 500}
    }
}

bot = await DynaBot.from_config(config)

Testing

# Run tests
cd packages/bots
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ --cov=src/dataknobs_bots --cov-report=html

# Run integration tests (requires Ollama)
TEST_OLLAMA=true uv run pytest tests/integration/ -v

Contributing

Contributions are welcome! Please see the Contributing Guide for details.

License

MIT License - see LICENSE for details.

Support

Roadmap

  • Streaming responses
  • Per-request LLM config overrides (A/B testing, dynamic model selection)
  • Artifact tracking with provenance, lifecycle management, and wizard transforms
  • Rubric evaluation system with deterministic, schema, and LLM scoring
  • Assessment sessions with response tracking and cumulative performance
  • Context accumulator with assumption tracking
  • Task injection for dynamic workflows
  • Focus guards for drift detection
  • Configuration versioning
  • Multi-modal support (images, audio)
  • Advanced memory strategies (hybrid, hierarchical)
  • Tool marketplace
  • Web UI for bot management
  • Performance monitoring and analytics
  • Voice interface support