Skip to content

dataknobs-bots Complete API Reference

Complete auto-generated API documentation from source code docstrings.

💡 Also see: - Curated API Guide - Hand-crafted tutorials and examples - Package Overview - Introduction and getting started - Source Code - View on GitHub


dataknobs_bots

DataKnobs Bots - Configuration-driven AI agents.

Modules:

Name Description
api

FastAPI integration components for dataknobs_bots.

artifacts

Artifact management for conversational workflows.

bot

Bot core components.

config

Configuration utilities for DynaBot.

context

Context management for conversational workflows.

generators

Deterministic content generators for structured output production.

knowledge

Knowledge base implementations for DynaBot.

memory

Memory implementations for DynaBot.

middleware

Middleware components for bot request/response lifecycle.

providers

Provider creation utilities for dataknobs-bots.

reasoning

Reasoning strategies for DynaBot.

registry

Registry module for bot registration storage and management.

review

Review system for validating artifacts.

rubrics

Rubric-based evaluation system for structured content assessment.

testing

Testing utilities for dataknobs-bots.

tools

Tools for DynaBot.

utils

Utility functions and helpers for the dataknobs_bots package.

Classes:

Name Description
BotContext

Runtime context for bot execution.

BotManager

Manages multiple DynaBot instances for multi-tenancy.

BotRegistry

Multi-tenant bot registry with caching and environment support.

DynaBot

Configuration-driven chatbot leveraging the DataKnobs ecosystem.

UndoResult

Result of an undo operation.

ConfigDraftManager

File-based draft manager for interactive config creation.

ConfigTemplate

A reusable DynaBot configuration template.

ConfigTemplateRegistry

Registry for managing and applying configuration templates.

ConfigValidator

Pluggable validation engine for DynaBot configurations.

DraftMetadata

Metadata for a configuration draft.

DynaBotConfigBuilder

Fluent builder for DynaBot configurations.

DynaBotConfigSchema

Queryable registry of valid DynaBot configuration options.

TemplateVariable

Definition of a template variable.

ToolCatalog

Registry mapping tool names to class paths and default configuration.

ToolEntry

Metadata for a tool in the catalog.

ValidationResult

Result of validating a configuration.

RAGKnowledgeBase

RAG knowledge base using dataknobs-xization for chunking and vector search.

BufferMemory

Simple buffer memory keeping last N messages.

CompositeMemory

Combines multiple memory strategies into one.

Memory

Abstract base class for memory implementations.

SummaryMemory

Memory that summarizes older messages to maintain long context windows.

VectorMemory

Vector-based semantic memory using dataknobs-data vector stores.

CostTrackingMiddleware

Middleware for tracking LLM API costs and usage.

LoggingMiddleware

Middleware for tracking conversation interactions.

Middleware

Base class for bot middleware.

ReActReasoning

ReAct (Reasoning + Acting) strategy.

ReasoningStrategy

Abstract base class for reasoning strategies.

SimpleReasoning

Simple reasoning strategy that makes direct LLM calls.

StrategyCapabilities

Declares what a reasoning strategy manages autonomously.

StrategyRegistry

Registry mapping strategy names to their factories.

BotTestHarness

High-level test helper for ALL DynaBot behavioral tests.

CaptureReplay

Loads a capture JSON file and creates pre-loaded EchoProviders.

TurnResult

Result of a single bot.chat() or bot.greet() turn.

WizardConfigBuilder

Fluent builder for wizard configuration dicts.

AddKBResourceTool

Tool for adding a resource to the knowledge base resource list.

CheckKnowledgeSourceTool

Tool for verifying a knowledge source directory exists and has content.

GetTemplateDetailsTool

Tool for getting detailed information about a template.

IngestKnowledgeBaseTool

Tool for writing the KB ingestion manifest and finalizing KB config.

KnowledgeSearchTool

Tool for searching the knowledge base.

ListAvailableToolsTool

Tool for listing tools available to configure for a bot.

ListKBResourcesTool

Tool for listing currently tracked knowledge base resources.

ListTemplatesTool

Tool for listing available configuration templates.

PreviewConfigTool

Tool for previewing the configuration being built.

RemoveKBResourceTool

Tool for removing a resource from the knowledge base resource list.

SaveConfigTool

Tool for saving/finalizing the configuration.

Functions:

Name Description
normalize_wizard_state

Normalize wizard metadata to canonical structure.

create_default_catalog

Create a new ToolCatalog pre-populated with built-in tools.

create_knowledge_base_from_config

Create knowledge base from configuration.

create_memory_from_config

Create memory instance from configuration.

create_reasoning_from_config

Create reasoning strategy from configuration.

register_strategy

Register a custom reasoning strategy.

inject_providers

Inject LLM providers into a DynaBot instance for testing.

Attributes:

Name Type Description
default_catalog ToolCatalog

Module-level singleton catalog pre-populated with built-in tools.

Attributes

default_catalog module-attribute

default_catalog: ToolCatalog = ToolCatalog()

Module-level singleton catalog pre-populated with built-in tools.

Classes

BotContext dataclass

BotContext(
    conversation_id: str,
    client_id: str,
    user_id: str | None = None,
    session_metadata: dict[str, Any] = dict(),
    request_metadata: dict[str, Any] = dict(),
)

Runtime context for bot execution.

Supports dict-like access for dynamic attributes via request_metadata. Use context["key"] or context.get("key") for dynamic data.

Attributes:

Name Type Description
conversation_id str

Unique identifier for the conversation

client_id str

Identifier for the client/tenant

user_id str | None

Optional user identifier

session_metadata dict[str, Any]

Metadata for the session

request_metadata dict[str, Any]

Metadata for the current request (also used for dict-like access)

Methods:

Name Description
__getitem__

Get item from request_metadata using dict-like access.

__setitem__

Set item in request_metadata using dict-like access.

__contains__

Check if key exists in request_metadata.

get

Get item from request_metadata with optional default.

copy

Create a copy of this context with optional field overrides.

Functions
__getitem__
__getitem__(key: str) -> Any

Get item from request_metadata using dict-like access.

Parameters:

Name Type Description Default
key str

Key to retrieve

required

Returns:

Type Description
Any

Value from request_metadata

Raises:

Type Description
KeyError

If key not found in request_metadata

Source code in packages/bots/src/dataknobs_bots/bot/context.py
def __getitem__(self, key: str) -> Any:
    """Get item from request_metadata using dict-like access.

    Args:
        key: Key to retrieve

    Returns:
        Value from request_metadata

    Raises:
        KeyError: If key not found in request_metadata
    """
    return self.request_metadata[key]
__setitem__
__setitem__(key: str, value: Any) -> None

Set item in request_metadata using dict-like access.

Parameters:

Name Type Description Default
key str

Key to set

required
value Any

Value to store

required
Source code in packages/bots/src/dataknobs_bots/bot/context.py
def __setitem__(self, key: str, value: Any) -> None:
    """Set item in request_metadata using dict-like access.

    Args:
        key: Key to set
        value: Value to store
    """
    self.request_metadata[key] = value
__contains__
__contains__(key: str) -> bool

Check if key exists in request_metadata.

Parameters:

Name Type Description Default
key str

Key to check

required

Returns:

Type Description
bool

True if key exists in request_metadata

Source code in packages/bots/src/dataknobs_bots/bot/context.py
def __contains__(self, key: str) -> bool:
    """Check if key exists in request_metadata.

    Args:
        key: Key to check

    Returns:
        True if key exists in request_metadata
    """
    return key in self.request_metadata
get
get(key: str, default: Any = None) -> Any

Get item from request_metadata with optional default.

Parameters:

Name Type Description Default
key str

Key to retrieve

required
default Any

Default value if key not found

None

Returns:

Type Description
Any

Value from request_metadata or default

Source code in packages/bots/src/dataknobs_bots/bot/context.py
def get(self, key: str, default: Any = None) -> Any:
    """Get item from request_metadata with optional default.

    Args:
        key: Key to retrieve
        default: Default value if key not found

    Returns:
        Value from request_metadata or default
    """
    return self.request_metadata.get(key, default)
copy
copy(**overrides: Any) -> BotContext

Create a copy of this context with optional field overrides.

Creates shallow copies of session_metadata and request_metadata dicts to avoid mutation issues between the original and copy.

Parameters:

Name Type Description Default
**overrides Any

Field values to override in the copy

{}

Returns:

Type Description
BotContext

New BotContext instance with copied values

Example

ctx = BotContext(conversation_id="conv-1", client_id="client-1") ctx2 = ctx.copy(conversation_id="conv-2") ctx2.conversation_id 'conv-2'

Source code in packages/bots/src/dataknobs_bots/bot/context.py
def copy(self, **overrides: Any) -> "BotContext":
    """Create a copy of this context with optional field overrides.

    Creates shallow copies of session_metadata and request_metadata dicts
    to avoid mutation issues between the original and copy.

    Args:
        **overrides: Field values to override in the copy

    Returns:
        New BotContext instance with copied values

    Example:
        >>> ctx = BotContext(conversation_id="conv-1", client_id="client-1")
        >>> ctx2 = ctx.copy(conversation_id="conv-2")
        >>> ctx2.conversation_id
        'conv-2'
    """
    return BotContext(
        conversation_id=overrides.get("conversation_id", self.conversation_id),
        client_id=overrides.get("client_id", self.client_id),
        user_id=overrides.get("user_id", self.user_id),
        session_metadata=overrides.get(
            "session_metadata", dict(self.session_metadata)
        ),
        request_metadata=overrides.get(
            "request_metadata", dict(self.request_metadata)
        ),
    )

BotManager

BotManager(
    config_loader: ConfigLoaderType | None = None,
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
)

Manages multiple DynaBot instances for multi-tenancy.

.. deprecated:: Use :class:BotRegistry or :class:InMemoryBotRegistry instead.

BotManager handles: - Bot instance creation and caching - Client-level isolation - Configuration loading and validation - Bot lifecycle management - Environment-aware resource resolution (optional)

Each client/tenant gets its own bot instance, which can serve multiple users. The underlying DynaBot architecture ensures conversation isolation through BotContext with different conversation_ids.

Attributes:

Name Type Description
bots

Cache of bot_id -> DynaBot instances

config_loader

Optional configuration loader (sync or async)

environment_name str | None

Current environment name (if environment-aware)

Example
# Basic usage with inline configuration
manager = BotManager()
bot = await manager.get_or_create("my-bot", config={
    "llm": {"provider": "openai", "model": "gpt-4o"},
    "conversation_storage": {"backend": "memory"},
})

# With environment-aware configuration
manager = BotManager(environment="production")
bot = await manager.get_or_create("my-bot", config={
    "bot": {
        "llm": {"$resource": "default", "type": "llm_providers"},
        "conversation_storage": {"$resource": "db", "type": "databases"},
    }
})

# With config loader function
def load_config(bot_id: str) -> dict:
    return load_yaml(f"configs/{bot_id}.yaml")

manager = BotManager(config_loader=load_config)
bot = await manager.get_or_create("my-bot")

# List active bots
active_bots = manager.list_bots()

Initialize BotManager.

Parameters:

Name Type Description Default
config_loader ConfigLoaderType | None

Optional configuration loader. Can be: - An object with a .load(bot_id) method (sync or async) - A callable function: bot_id -> config_dict (sync or async) - None (configurations must be provided explicitly)

None
environment EnvironmentConfig | str | None

Environment name or EnvironmentConfig for resource resolution. If None, environment-aware features are disabled unless an EnvironmentAwareConfig is passed to get_or_create(). If a string, loads environment config from env_dir.

None
env_dir str | Path

Directory containing environment config files. Only used if environment is a string name.

'config/environments'

Methods:

Name Description
get_or_create

Get existing bot or create new one.

get

Get bot without creating if doesn't exist.

remove

Remove bot instance.

reload

Reload bot instance with fresh configuration.

list_bots

List all active bot IDs.

get_bot_count

Get count of active bots.

clear_all

Clear all bot instances.

get_portable_config

Get portable configuration for storage.

__repr__

String representation.

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
def __init__(
    self,
    config_loader: ConfigLoaderType | None = None,
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
):
    """Initialize BotManager.

    Args:
        config_loader: Optional configuration loader.
            Can be:
            - An object with a `.load(bot_id)` method (sync or async)
            - A callable function: bot_id -> config_dict (sync or async)
            - None (configurations must be provided explicitly)
        environment: Environment name or EnvironmentConfig for resource resolution.
            If None, environment-aware features are disabled unless
            an EnvironmentAwareConfig is passed to get_or_create().
            If a string, loads environment config from env_dir.
        env_dir: Directory containing environment config files.
            Only used if environment is a string name.
    """
    warnings.warn(_DEPRECATION_MESSAGE, DeprecationWarning, stacklevel=2)

    self._bots: dict[str, DynaBot] = {}
    self._config_loader = config_loader
    self._env_dir = Path(env_dir)

    # Load environment config if specified
    self._environment: EnvironmentConfig | None = None
    if environment is not None:
        try:
            from dataknobs_config import EnvironmentConfig

            if isinstance(environment, str):
                self._environment = EnvironmentConfig.load(environment, env_dir)
            else:
                self._environment = environment
            logger.info(f"Initialized BotManager with environment: {self._environment.name}")
        except ImportError:
            logger.warning(
                "dataknobs_config not installed, environment-aware features disabled"
            )
    else:
        logger.info("Initialized BotManager")
Attributes
environment_name property
environment_name: str | None

Get current environment name, or None if not environment-aware.

environment property
environment: EnvironmentConfig | None

Get current environment config, or None if not environment-aware.

Functions
get_or_create async
get_or_create(
    bot_id: str,
    config: dict[str, Any] | EnvironmentAwareConfig | None = None,
    use_environment: bool | None = None,
    config_key: str = "bot",
) -> DynaBot

Get existing bot or create new one.

Parameters:

Name Type Description Default
bot_id str

Bot identifier (e.g., "customer-support", "sales-assistant")

required
config dict[str, Any] | EnvironmentAwareConfig | None

Optional bot configuration. Can be: - dict with resolved values (traditional) - dict with $resource references (requires environment) - EnvironmentAwareConfig instance If not provided and config_loader is set, will load configuration.

None
use_environment bool | None

Whether to use environment-aware resolution. - True: Use environment for $resource resolution - False: Use config as-is (no resolution) - None (default): Auto-detect based on whether manager has an environment configured or config is EnvironmentAwareConfig

None
config_key str

Key within config containing bot configuration. Defaults to "bot". Set to None to use root config. Only used when use_environment is True.

'bot'

Returns:

Type Description
DynaBot

DynaBot instance

Raises:

Type Description
ValueError

If config is None and no config_loader is set

Example
# Traditional usage (no environment resolution)
manager = BotManager()
bot = await manager.get_or_create("support-bot", config={
    "llm": {"provider": "openai", "model": "gpt-4"},
    "conversation_storage": {"backend": "memory"},
})

# Environment-aware usage with $resource references
manager = BotManager(environment="production")
bot = await manager.get_or_create("support-bot", config={
    "bot": {
        "llm": {"$resource": "default", "type": "llm_providers"},
        "conversation_storage": {"$resource": "db", "type": "databases"},
    }
})

# Explicit environment resolution control
bot = await manager.get_or_create(
    "support-bot",
    config=my_config,
    use_environment=True,
    config_key="bot"
)
Source code in packages/bots/src/dataknobs_bots/bot/manager.py
async def get_or_create(
    self,
    bot_id: str,
    config: dict[str, Any] | EnvironmentAwareConfig | None = None,
    use_environment: bool | None = None,
    config_key: str = "bot",
) -> DynaBot:
    """Get existing bot or create new one.

    Args:
        bot_id: Bot identifier (e.g., "customer-support", "sales-assistant")
        config: Optional bot configuration. Can be:
            - dict with resolved values (traditional)
            - dict with $resource references (requires environment)
            - EnvironmentAwareConfig instance
            If not provided and config_loader is set, will load configuration.
        use_environment: Whether to use environment-aware resolution.
            - True: Use environment for $resource resolution
            - False: Use config as-is (no resolution)
            - None (default): Auto-detect based on whether manager has
              an environment configured or config is EnvironmentAwareConfig
        config_key: Key within config containing bot configuration.
                   Defaults to "bot". Set to None to use root config.
                   Only used when use_environment is True.

    Returns:
        DynaBot instance

    Raises:
        ValueError: If config is None and no config_loader is set

    Example:
        ```python
        # Traditional usage (no environment resolution)
        manager = BotManager()
        bot = await manager.get_or_create("support-bot", config={
            "llm": {"provider": "openai", "model": "gpt-4"},
            "conversation_storage": {"backend": "memory"},
        })

        # Environment-aware usage with $resource references
        manager = BotManager(environment="production")
        bot = await manager.get_or_create("support-bot", config={
            "bot": {
                "llm": {"$resource": "default", "type": "llm_providers"},
                "conversation_storage": {"$resource": "db", "type": "databases"},
            }
        })

        # Explicit environment resolution control
        bot = await manager.get_or_create(
            "support-bot",
            config=my_config,
            use_environment=True,
            config_key="bot"
        )
        ```
    """
    # Return cached bot if exists
    if bot_id in self._bots:
        logger.debug(f"Returning cached bot: {bot_id}")
        return self._bots[bot_id]

    # Load configuration if not provided
    if config is None:
        if self._config_loader is None:
            raise ValueError(
                f"No configuration provided for bot '{bot_id}' "
                "and no config_loader is set"
            )
        config = await self._load_config(bot_id)

    # Determine whether to use environment resolution
    is_env_aware_config = False
    try:
        from dataknobs_config import EnvironmentAwareConfig

        is_env_aware_config = isinstance(config, EnvironmentAwareConfig)
    except ImportError:
        pass

    should_use_environment = use_environment
    if should_use_environment is None:
        # Auto-detect: use environment if manager has one or config is EnvironmentAwareConfig
        should_use_environment = self._environment is not None or is_env_aware_config

    # Create new bot
    logger.info(f"Creating new bot: {bot_id} (environment_aware={should_use_environment})")

    if should_use_environment:
        bot = await DynaBot.from_environment_aware_config(
            config,
            environment=self._environment,
            env_dir=self._env_dir,
            config_key=config_key,
        )
    else:
        # Traditional path - use config as-is
        bot = await DynaBot.from_config(config)

    # Cache and return
    self._bots[bot_id] = bot
    return bot
get async
get(bot_id: str) -> DynaBot | None

Get bot without creating if doesn't exist.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
DynaBot | None

DynaBot instance if exists, None otherwise

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
async def get(self, bot_id: str) -> DynaBot | None:
    """Get bot without creating if doesn't exist.

    Args:
        bot_id: Bot identifier

    Returns:
        DynaBot instance if exists, None otherwise
    """
    return self._bots.get(bot_id)
remove async
remove(bot_id: str) -> bool

Remove bot instance.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
bool

True if bot was removed, False if didn't exist

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
async def remove(self, bot_id: str) -> bool:
    """Remove bot instance.

    Args:
        bot_id: Bot identifier

    Returns:
        True if bot was removed, False if didn't exist
    """
    if bot_id in self._bots:
        logger.info(f"Removing bot: {bot_id}")
        del self._bots[bot_id]
        return True
    return False
reload async
reload(bot_id: str) -> DynaBot

Reload bot instance with fresh configuration.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
DynaBot

New DynaBot instance

Raises:

Type Description
ValueError

If no config_loader is set

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
async def reload(self, bot_id: str) -> DynaBot:
    """Reload bot instance with fresh configuration.

    Args:
        bot_id: Bot identifier

    Returns:
        New DynaBot instance

    Raises:
        ValueError: If no config_loader is set
    """
    if self._config_loader is None:
        raise ValueError("Cannot reload without config_loader")

    # Remove existing bot
    await self.remove(bot_id)

    # Create new one
    return await self.get_or_create(bot_id)
list_bots
list_bots() -> list[str]

List all active bot IDs.

Returns:

Type Description
list[str]

List of bot identifiers

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
def list_bots(self) -> list[str]:
    """List all active bot IDs.

    Returns:
        List of bot identifiers
    """
    return list(self._bots.keys())
get_bot_count
get_bot_count() -> int

Get count of active bots.

Returns:

Type Description
int

Number of active bot instances

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
def get_bot_count(self) -> int:
    """Get count of active bots.

    Returns:
        Number of active bot instances
    """
    return len(self._bots)
clear_all async
clear_all() -> None

Clear all bot instances.

Useful for testing or when restarting the service.

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
async def clear_all(self) -> None:
    """Clear all bot instances.

    Useful for testing or when restarting the service.
    """
    logger.info("Clearing all bot instances")
    self._bots.clear()
get_portable_config
get_portable_config(
    config: dict[str, Any] | EnvironmentAwareConfig,
) -> dict[str, Any]

Get portable configuration for storage.

Extracts portable config (with $resource references intact, environment variables unresolved) suitable for storing in registries or databases.

Parameters:

Name Type Description Default
config dict[str, Any] | EnvironmentAwareConfig

Configuration to make portable. Can be dict or EnvironmentAwareConfig.

required

Returns:

Type Description
dict[str, Any]

Portable configuration dictionary

Example
manager = BotManager(environment="production")

# Get portable config from EnvironmentAwareConfig
portable = manager.get_portable_config(env_aware_config)

# Store in registry (portable across environments)
await registry.store(bot_id, portable)
Source code in packages/bots/src/dataknobs_bots/bot/manager.py
def get_portable_config(
    self,
    config: dict[str, Any] | EnvironmentAwareConfig,
) -> dict[str, Any]:
    """Get portable configuration for storage.

    Extracts portable config (with $resource references intact,
    environment variables unresolved) suitable for storing in
    registries or databases.

    Args:
        config: Configuration to make portable.
            Can be dict or EnvironmentAwareConfig.

    Returns:
        Portable configuration dictionary

    Example:
        ```python
        manager = BotManager(environment="production")

        # Get portable config from EnvironmentAwareConfig
        portable = manager.get_portable_config(env_aware_config)

        # Store in registry (portable across environments)
        await registry.store(bot_id, portable)
        ```
    """
    return DynaBot.get_portable_config(config)
__repr__
__repr__() -> str

String representation.

Source code in packages/bots/src/dataknobs_bots/bot/manager.py
def __repr__(self) -> str:
    """String representation."""
    bots = ", ".join(self._bots.keys())
    env = f", environment={self._environment.name!r}" if self._environment else ""
    return f"BotManager(bots=[{bots}], count={len(self._bots)}{env})"

BotRegistry

BotRegistry(
    backend: RegistryBackend | None = None,
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
    cache_ttl: int = 300,
    max_cache_size: int = 1000,
    validate_on_register: bool = True,
    config_key: str = "bot",
)

Multi-tenant bot registry with caching and environment support.

The BotRegistry manages multiple bot instances for different clients/tenants. It provides: - Pluggable storage backends via RegistryBackend protocol - Environment-aware configuration resolution - Portability validation to ensure configs work across environments - LRU-style caching with TTL for bot instances - Thread-safe access

This enables: - Multi-tenant SaaS platforms - A/B testing with different bot configurations - Horizontal scaling with stateless bot instances - Cross-environment deployment with portable configs

Attributes:

Name Type Description
backend RegistryBackend

Storage backend for configurations

environment EnvironmentConfig | None

Environment for $resource resolution

cache_ttl int

Time-to-live for cached bots in seconds

max_cache_size int

Maximum number of bots to cache

Example
from dataknobs_bots.bot import BotRegistry
from dataknobs_bots.registry import InMemoryBackend

# Create registry
registry = BotRegistry(
    backend=InMemoryBackend(),
    environment="production",
    cache_ttl=300,
)
await registry.initialize()

# Register portable configuration
await registry.register("client-123", {
    "bot": {
        "llm": {"$resource": "default", "type": "llm_providers"},
    }
})

# Get bot for a client
bot = await registry.get_bot("client-123")

# Use the bot
response = await bot.chat(message, context)

Initialize bot registry.

Parameters:

Name Type Description Default
backend RegistryBackend | None

Storage backend for configurations. If None, uses InMemoryBackend.

None
environment EnvironmentConfig | str | None

Environment name or EnvironmentConfig for $resource resolution. If None, configs are used as-is without environment resolution.

None
env_dir str | Path

Directory containing environment config files. Only used if environment is a string name.

'config/environments'
cache_ttl int

Cache time-to-live in seconds (default: 300)

300
max_cache_size int

Maximum cached bots (default: 1000)

1000
validate_on_register bool

If True, validate config portability when registering (default: True)

True
config_key str

Key within config containing bot configuration. Defaults to "bot". Used during environment resolution.

'bot'

Methods:

Name Description
initialize

Initialize the registry and backend.

close

Close the registry and backend.

register

Register or update a bot configuration.

get_bot

Get bot instance for a client.

get_config

Get stored configuration for a bot.

get_registration

Get full registration including metadata.

unregister

Remove a bot registration (hard delete).

deactivate

Deactivate a bot registration (soft delete).

exists

Check if an active bot registration exists.

list_bots

List all active bot IDs.

count

Count active bot registrations.

get_cached_bots

Get list of currently cached bot IDs.

clear_cache

Clear all cached bot instances.

register_client

Register or update a client's bot configuration.

remove_client

Remove a client from the registry.

get_cached_clients

Get list of currently cached client IDs.

__repr__

String representation.

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
def __init__(
    self,
    backend: RegistryBackend | None = None,
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
    cache_ttl: int = 300,
    max_cache_size: int = 1000,
    validate_on_register: bool = True,
    config_key: str = "bot",
):
    """Initialize bot registry.

    Args:
        backend: Storage backend for configurations.
            If None, uses InMemoryBackend.
        environment: Environment name or EnvironmentConfig for
            $resource resolution. If None, configs are used as-is
            without environment resolution.
        env_dir: Directory containing environment config files.
            Only used if environment is a string name.
        cache_ttl: Cache time-to-live in seconds (default: 300)
        max_cache_size: Maximum cached bots (default: 1000)
        validate_on_register: If True, validate config portability
            when registering (default: True)
        config_key: Key within config containing bot configuration.
            Defaults to "bot". Used during environment resolution.
    """
    self._backend = backend or InMemoryBackend()
    self._env_dir = Path(env_dir)
    self._cache_ttl = cache_ttl
    self._max_cache_size = max_cache_size
    self._validate_on_register = validate_on_register
    self._config_key = config_key

    # Bot instance cache: bot_id -> (DynaBot, cached_timestamp)
    self._cache: dict[str, tuple[DynaBot, float]] = {}
    self._lock = asyncio.Lock()
    self._initialized = False

    # Load environment config if specified
    self._environment: EnvironmentConfig | None = None
    if environment is not None:
        try:
            from dataknobs_config import EnvironmentConfig as EnvConfig

            if isinstance(environment, str):
                self._environment = EnvConfig.load(environment, env_dir)
            else:
                self._environment = environment
            logger.info(f"BotRegistry using environment: {self._environment.name}")
        except ImportError:
            logger.warning(
                "dataknobs_config not installed, environment-aware features disabled"
            )
Attributes
backend property
backend: RegistryBackend

Get the storage backend.

environment property
environment: EnvironmentConfig | None

Get current environment config, or None if not environment-aware.

environment_name property
environment_name: str | None

Get current environment name, or None if not environment-aware.

cache_ttl property
cache_ttl: int

Get cache TTL in seconds.

max_cache_size property
max_cache_size: int

Get maximum cache size.

Functions
initialize async
initialize() -> None

Initialize the registry and backend.

Must be called before using the registry.

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def initialize(self) -> None:
    """Initialize the registry and backend.

    Must be called before using the registry.
    """
    if not self._initialized:
        await self._backend.initialize()
        self._initialized = True
        logger.info("BotRegistry initialized")
close async
close() -> None

Close the registry and backend.

Closes all cached bot instances and the storage backend.

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def close(self) -> None:
    """Close the registry and backend.

    Closes all cached bot instances and the storage backend.
    """
    async with self._lock:
        for bot_id, (bot, _) in self._cache.items():
            await self._close_bot(bot_id, bot)
        self._cache.clear()
    await self._backend.close()
    self._initialized = False
    logger.info("BotRegistry closed")
register async
register(
    bot_id: str,
    config: dict[str, Any],
    status: str = "active",
    skip_validation: bool = False,
) -> Registration

Register or update a bot configuration.

Stores a portable configuration in the backend. By default, validates that the configuration is portable (no resolved local values).

Parameters:

Name Type Description Default
bot_id str

Unique bot identifier

required
config dict[str, Any]

Bot configuration dictionary (should be portable)

required
status str

Registration status (default: active)

'active'
skip_validation bool

If True, skip portability validation

False

Returns:

Type Description
Registration

Registration object with metadata

Raises:

Type Description
PortabilityError

If config is not portable and validation is enabled

Example
# Register with portable config
reg = await registry.register("support-bot", {
    "bot": {
        "llm": {"$resource": "default", "type": "llm_providers"},
    }
})
print(f"Registered at: {reg.created_at}")

# Update existing registration
reg = await registry.register("support-bot", new_config)
print(f"Updated at: {reg.updated_at}")
Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def register(
    self,
    bot_id: str,
    config: dict[str, Any],
    status: str = "active",
    skip_validation: bool = False,
) -> Registration:
    """Register or update a bot configuration.

    Stores a portable configuration in the backend. By default, validates
    that the configuration is portable (no resolved local values).

    Args:
        bot_id: Unique bot identifier
        config: Bot configuration dictionary (should be portable)
        status: Registration status (default: active)
        skip_validation: If True, skip portability validation

    Returns:
        Registration object with metadata

    Raises:
        PortabilityError: If config is not portable and validation is enabled

    Example:
        ```python
        # Register with portable config
        reg = await registry.register("support-bot", {
            "bot": {
                "llm": {"$resource": "default", "type": "llm_providers"},
            }
        })
        print(f"Registered at: {reg.created_at}")

        # Update existing registration
        reg = await registry.register("support-bot", new_config)
        print(f"Updated at: {reg.updated_at}")
        ```
    """
    # Validate portability if enabled
    if self._validate_on_register and not skip_validation:
        validate_portability(config)

    # Validate capability requirements if environment is available
    if self._validate_on_register and self._environment and not skip_validation:
        from .validation import validate_bot_capabilities

        # Extract the bot section if config_key is set
        bot_section = config.get(self._config_key, config) if self._config_key else config
        cap_warnings = validate_bot_capabilities(bot_section, self._environment)
        for warning in cap_warnings:
            logger.warning("Bot %s: %s", bot_id, warning)

    # Store in backend
    registration = await self._backend.register(bot_id, config, status)

    # Invalidate cache for this bot
    async with self._lock:
        if bot_id in self._cache:
            old_bot, _ = self._cache.pop(bot_id)
            await self._close_bot(bot_id, old_bot)
            logger.debug(f"Invalidated cache for bot: {bot_id}")

    logger.info(f"Registered bot: {bot_id}")
    return registration
get_bot async
get_bot(bot_id: str, force_refresh: bool = False) -> DynaBot

Get bot instance for a client.

Bots are cached for performance. If a cached bot exists and hasn't expired, it's returned. Otherwise, a new bot is created from the stored configuration with environment resolution applied.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required
force_refresh bool

If True, bypass cache and create fresh bot

False

Returns:

Type Description
DynaBot

DynaBot instance for the client

Raises:

Type Description
KeyError

If no registration exists for the bot_id

ValueError

If bot configuration is invalid

Example
# Get cached bot
bot = await registry.get_bot("client-123")

# Force refresh (e.g., after config change)
bot = await registry.get_bot("client-123", force_refresh=True)
Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def get_bot(
    self,
    bot_id: str,
    force_refresh: bool = False,
) -> DynaBot:
    """Get bot instance for a client.

    Bots are cached for performance. If a cached bot exists and hasn't
    expired, it's returned. Otherwise, a new bot is created from the
    stored configuration with environment resolution applied.

    Args:
        bot_id: Bot identifier
        force_refresh: If True, bypass cache and create fresh bot

    Returns:
        DynaBot instance for the client

    Raises:
        KeyError: If no registration exists for the bot_id
        ValueError: If bot configuration is invalid

    Example:
        ```python
        # Get cached bot
        bot = await registry.get_bot("client-123")

        # Force refresh (e.g., after config change)
        bot = await registry.get_bot("client-123", force_refresh=True)
        ```
    """
    async with self._lock:
        # Check cache
        if not force_refresh and bot_id in self._cache:
            bot, cached_at = self._cache[bot_id]
            if time.time() - cached_at < self._cache_ttl:
                logger.debug(f"Returning cached bot: {bot_id}")
                return bot

        # Close stale/replaced bot if present
        if bot_id in self._cache:
            old_bot, _ = self._cache.pop(bot_id)
            await self._close_bot(bot_id, old_bot)

        # Load configuration from backend
        config = await self._backend.get_config(bot_id)
        if config is None:
            raise KeyError(f"No bot configuration found for: {bot_id}")

        # Create bot with environment resolution if configured
        if self._environment is not None:
            logger.debug(f"Creating bot with environment resolution: {bot_id}")
            bot = await DynaBot.from_environment_aware_config(
                config,
                environment=self._environment,
                env_dir=self._env_dir,
                config_key=self._config_key,
            )
        else:
            # Traditional path - use config as-is
            # Extract bot config if wrapped in config_key
            bot_config = config.get(self._config_key, config)
            logger.debug(f"Creating bot without environment resolution: {bot_id}")
            bot = await DynaBot.from_config(bot_config)

        # Cache the bot
        self._cache[bot_id] = (bot, time.time())
        logger.info(f"Created bot: {bot_id}")

        # Evict old entries if cache is full
        if len(self._cache) > self._max_cache_size:
            await self._evict_oldest()

        return bot
get_config async
get_config(bot_id: str) -> dict[str, Any] | None

Get stored configuration for a bot.

Returns the portable configuration as stored, without environment resolution applied.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
dict[str, Any] | None

Configuration dict if found, None otherwise

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def get_config(self, bot_id: str) -> dict[str, Any] | None:
    """Get stored configuration for a bot.

    Returns the portable configuration as stored, without
    environment resolution applied.

    Args:
        bot_id: Bot identifier

    Returns:
        Configuration dict if found, None otherwise
    """
    return await self._backend.get_config(bot_id)
get_registration async
get_registration(bot_id: str) -> Registration | None

Get full registration including metadata.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
Registration | None

Registration if found, None otherwise

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def get_registration(self, bot_id: str) -> Registration | None:
    """Get full registration including metadata.

    Args:
        bot_id: Bot identifier

    Returns:
        Registration if found, None otherwise
    """
    return await self._backend.get(bot_id)
unregister async
unregister(bot_id: str) -> bool

Remove a bot registration (hard delete).

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
bool

True if removed, False if not found

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def unregister(self, bot_id: str) -> bool:
    """Remove a bot registration (hard delete).

    Args:
        bot_id: Bot identifier

    Returns:
        True if removed, False if not found
    """
    # Remove from cache
    async with self._lock:
        if bot_id in self._cache:
            old_bot, _ = self._cache.pop(bot_id)
            await self._close_bot(bot_id, old_bot)

    result = await self._backend.unregister(bot_id)
    if result:
        logger.info(f"Unregistered bot: {bot_id}")
    return result
deactivate async
deactivate(bot_id: str) -> bool

Deactivate a bot registration (soft delete).

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
bool

True if deactivated, False if not found

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def deactivate(self, bot_id: str) -> bool:
    """Deactivate a bot registration (soft delete).

    Args:
        bot_id: Bot identifier

    Returns:
        True if deactivated, False if not found
    """
    # Remove from cache
    async with self._lock:
        if bot_id in self._cache:
            old_bot, _ = self._cache.pop(bot_id)
            await self._close_bot(bot_id, old_bot)

    result = await self._backend.deactivate(bot_id)
    if result:
        logger.info(f"Deactivated bot: {bot_id}")
    return result
exists async
exists(bot_id: str) -> bool

Check if an active bot registration exists.

Parameters:

Name Type Description Default
bot_id str

Bot identifier

required

Returns:

Type Description
bool

True if registration exists and is active

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def exists(self, bot_id: str) -> bool:
    """Check if an active bot registration exists.

    Args:
        bot_id: Bot identifier

    Returns:
        True if registration exists and is active
    """
    return await self._backend.exists(bot_id)
list_bots async
list_bots() -> list[str]

List all active bot IDs.

Returns:

Type Description
list[str]

List of active bot identifiers

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def list_bots(self) -> list[str]:
    """List all active bot IDs.

    Returns:
        List of active bot identifiers
    """
    return await self._backend.list_ids()
count async
count() -> int

Count active bot registrations.

Returns:

Type Description
int

Number of active registrations

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def count(self) -> int:
    """Count active bot registrations.

    Returns:
        Number of active registrations
    """
    return await self._backend.count()
get_cached_bots
get_cached_bots() -> list[str]

Get list of currently cached bot IDs.

Returns:

Type Description
list[str]

List of bot IDs with cached instances

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
def get_cached_bots(self) -> list[str]:
    """Get list of currently cached bot IDs.

    Returns:
        List of bot IDs with cached instances
    """
    return list(self._cache.keys())
clear_cache async
clear_cache() -> None

Clear all cached bot instances.

Closes each bot before removing it from cache. Does not affect stored registrations.

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def clear_cache(self) -> None:
    """Clear all cached bot instances.

    Closes each bot before removing it from cache.
    Does not affect stored registrations.
    """
    async with self._lock:
        for bot_id, (bot, _) in self._cache.items():
            await self._close_bot(bot_id, bot)
        self._cache.clear()
    logger.debug("Cleared bot cache")
register_client async
register_client(client_id: str, bot_config: dict[str, Any]) -> None

Register or update a client's bot configuration.

.. deprecated:: Use :meth:register instead.

Parameters:

Name Type Description Default
client_id str

Client/tenant identifier

required
bot_config dict[str, Any]

Bot configuration dictionary

required
Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def register_client(
    self, client_id: str, bot_config: dict[str, Any]
) -> None:
    """Register or update a client's bot configuration.

    .. deprecated::
        Use :meth:`register` instead.

    Args:
        client_id: Client/tenant identifier
        bot_config: Bot configuration dictionary
    """
    await self.register(client_id, bot_config)
remove_client async
remove_client(client_id: str) -> None

Remove a client from the registry.

.. deprecated:: Use :meth:unregister instead.

Parameters:

Name Type Description Default
client_id str

Client/tenant identifier

required
Source code in packages/bots/src/dataknobs_bots/bot/registry.py
async def remove_client(self, client_id: str) -> None:
    """Remove a client from the registry.

    .. deprecated::
        Use :meth:`unregister` instead.

    Args:
        client_id: Client/tenant identifier
    """
    await self.unregister(client_id)
get_cached_clients
get_cached_clients() -> list[str]

Get list of currently cached client IDs.

.. deprecated:: Use :meth:get_cached_bots instead.

Returns:

Type Description
list[str]

List of client IDs with cached bots

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
def get_cached_clients(self) -> list[str]:
    """Get list of currently cached client IDs.

    .. deprecated::
        Use :meth:`get_cached_bots` instead.

    Returns:
        List of client IDs with cached bots
    """
    return self.get_cached_bots()
__repr__
__repr__() -> str

String representation.

Source code in packages/bots/src/dataknobs_bots/bot/registry.py
def __repr__(self) -> str:
    """String representation."""
    env = f", environment={self._environment.name!r}" if self._environment else ""
    return (
        f"BotRegistry(backend={self._backend!r}, "
        f"cached={len(self._cache)}{env})"
    )

DynaBot

DynaBot(
    llm: AsyncLLMProvider,
    prompt_builder: AsyncPromptBuilder,
    conversation_storage: ConversationStorage,
    tool_registry: ToolRegistry | None = None,
    memory: Memory | None = None,
    knowledge_base: KnowledgeBase | None = None,
    kb_auto_context: bool = True,
    reasoning_strategy: Any | None = None,
    middleware: list[Middleware] | None = None,
    system_prompt_name: str | None = None,
    system_prompt_content: str | None = None,
    system_prompt_rag_configs: list[dict[str, Any]] | None = None,
    default_temperature: float = 0.7,
    default_max_tokens: int = 1000,
    context_transform: Callable[[str], str] | None = None,
    max_tool_iterations: int = _DEFAULT_MAX_TOOL_ITERATIONS,
    tool_timeout: float = _DEFAULT_TOOL_TIMEOUT,
    tool_loop_timeout: float = _DEFAULT_TOOL_LOOP_TIMEOUT,
)

Configuration-driven chatbot leveraging the DataKnobs ecosystem.

DynaBot provides a flexible, configuration-driven bot that can be customized for different use cases through YAML/JSON configuration files.

.. versionadded:: 0.14.0 DynaBot-level tool execution loop — strategies that pass tools to the LLM but do not execute tool_calls themselves (e.g. SimpleReasoning) now have their tool calls executed automatically by the bot pipeline.

Attributes:

Name Type Description
llm

LLM provider for generating responses

prompt_builder

Prompt builder for managing prompts

conversation_storage

Storage backend for conversations

tool_registry

Registry of available tools

memory

Optional memory implementation for context

knowledge_base

Optional knowledge base for RAG

reasoning_strategy

Optional reasoning strategy

middleware list[Middleware]

List of middleware for request/response processing

system_prompt_name

Name of the system prompt template to use

system_prompt_content

Inline system prompt content (alternative to name)

system_prompt_rag_configs

RAG configurations for inline system prompts

default_temperature

Default temperature for LLM generation

default_max_tokens

Default max tokens for LLM generation

Initialize DynaBot.

Parameters:

Name Type Description Default
llm AsyncLLMProvider

LLM provider instance

required
prompt_builder AsyncPromptBuilder

Prompt builder instance

required
conversation_storage ConversationStorage

Conversation storage backend

required
tool_registry ToolRegistry | None

Optional tool registry

None
memory Memory | None

Optional memory implementation

None
knowledge_base KnowledgeBase | None

Optional knowledge base

None
kb_auto_context bool

Whether to auto-inject KB results into messages. When False, the KB is still available for tool-based access but not automatically queried on every message.

True
reasoning_strategy Any | None

Optional reasoning strategy

None
middleware list[Middleware] | None

Optional list of Middleware instances

None
system_prompt_name str | None

Name of system prompt template (mutually exclusive with content)

None
system_prompt_content str | None

Inline system prompt content (mutually exclusive with name)

None
system_prompt_rag_configs list[dict[str, Any]] | None

RAG configurations for inline system prompts

None
default_temperature float

Default temperature (0-1)

0.7
default_max_tokens int

Default max tokens to generate

1000
context_transform Callable[[str], str] | None

Optional callable applied to each content string (KB chunks, memory context) before it is injected into the prompt. Use this to sanitize or fence external content against prompt injection.

None
max_tool_iterations int

Maximum number of tool execution rounds before returning. When a strategy returns a response with tool_calls, DynaBot executes the tools and re-generates. This cap prevents infinite loops when the model keeps requesting the same tools.

_DEFAULT_MAX_TOOL_ITERATIONS
tool_timeout float

Per-tool execution timeout in seconds. If a single tool call exceeds this duration, it is cancelled and an error observation is recorded.

_DEFAULT_TOOL_TIMEOUT
tool_loop_timeout float

Wall-clock budget in seconds for the tool execution loop (across all iterations). Checked at the start of each iteration and before each LLM re-call. For chat(), the LLM re-call is also bounded by the remaining budget via asyncio.wait_for(). For stream_chat(), a streaming re-call that starts within budget runs to completion (async generators cannot be reliably cancelled mid-chunk). Individual tool executions are always bounded by tool_timeout.

_DEFAULT_TOOL_LOOP_TIMEOUT

Methods:

Name Description
register_provider

Register an auxiliary LLM/embedding provider by role.

get_provider

Get a registered provider by role.

from_config

Create DynaBot from configuration.

from_environment_aware_config

Create DynaBot with environment-aware configuration.

get_portable_config

Extract portable configuration for storage.

chat

Process a chat message.

greet

Generate a bot-initiated greeting before the user speaks.

stream_chat

Stream chat response token by token.

get_conversation

Retrieve conversation history.

clear_conversation

Clear a conversation's history.

get_wizard_state

Get current wizard state for a conversation.

close

Close the bot and clean up resources.

__aenter__

Async context manager entry.

__aexit__

Async context manager exit - ensures cleanup.

get_conversation_manager

Get a cached conversation manager by conversation ID.

undo_last_turn

Undo the last conversational turn (user message + bot response).

rewind_to_turn

Rewind conversation to after the given turn number.

Source code in packages/bots/src/dataknobs_bots/bot/base.py
def __init__(
    self,
    llm: AsyncLLMProvider,
    prompt_builder: AsyncPromptBuilder,
    conversation_storage: ConversationStorage,
    tool_registry: ToolRegistry | None = None,
    memory: Memory | None = None,
    knowledge_base: KnowledgeBase | None = None,
    kb_auto_context: bool = True,
    reasoning_strategy: Any | None = None,
    middleware: list[Middleware] | None = None,
    system_prompt_name: str | None = None,
    system_prompt_content: str | None = None,
    system_prompt_rag_configs: list[dict[str, Any]] | None = None,
    default_temperature: float = 0.7,
    default_max_tokens: int = 1000,
    context_transform: Callable[[str], str] | None = None,
    max_tool_iterations: int = _DEFAULT_MAX_TOOL_ITERATIONS,
    tool_timeout: float = _DEFAULT_TOOL_TIMEOUT,
    tool_loop_timeout: float = _DEFAULT_TOOL_LOOP_TIMEOUT,
):
    """Initialize DynaBot.

    Args:
        llm: LLM provider instance
        prompt_builder: Prompt builder instance
        conversation_storage: Conversation storage backend
        tool_registry: Optional tool registry
        memory: Optional memory implementation
        knowledge_base: Optional knowledge base
        kb_auto_context: Whether to auto-inject KB results into messages.
            When False, the KB is still available for tool-based access
            but not automatically queried on every message.
        reasoning_strategy: Optional reasoning strategy
        middleware: Optional list of Middleware instances
        system_prompt_name: Name of system prompt template (mutually exclusive with content)
        system_prompt_content: Inline system prompt content (mutually exclusive with name)
        system_prompt_rag_configs: RAG configurations for inline system prompts
        default_temperature: Default temperature (0-1)
        default_max_tokens: Default max tokens to generate
        context_transform: Optional callable applied to each content string
            (KB chunks, memory context) before it is injected into the
            prompt.  Use this to sanitize or fence external content
            against prompt injection.
        max_tool_iterations: Maximum number of tool execution rounds
            before returning.  When a strategy returns a response with
            ``tool_calls``, DynaBot executes the tools and re-generates.
            This cap prevents infinite loops when the model keeps
            requesting the same tools.
        tool_timeout: Per-tool execution timeout in seconds.  If a
            single tool call exceeds this duration, it is cancelled
            and an error observation is recorded.
        tool_loop_timeout: Wall-clock budget in seconds for the
            tool execution loop (across all iterations).  Checked
            at the start of each iteration and before each LLM
            re-call.  For ``chat()``, the LLM re-call is also
            bounded by the remaining budget via
            ``asyncio.wait_for()``.  For ``stream_chat()``, a
            streaming re-call that starts within budget runs to
            completion (async generators cannot be reliably
            cancelled mid-chunk).  Individual tool executions are always
            bounded by ``tool_timeout``.
    """
    self.llm = llm
    self.prompt_builder = prompt_builder
    self.conversation_storage = conversation_storage
    self.tool_registry = tool_registry or ToolRegistry()
    self.memory = memory
    self.knowledge_base = knowledge_base
    self._kb_auto_context = kb_auto_context
    self.reasoning_strategy = reasoning_strategy
    self.middleware: list[Middleware] = middleware or []
    self.system_prompt_name = system_prompt_name
    self.system_prompt_content = system_prompt_content
    self.system_prompt_rag_configs = system_prompt_rag_configs
    self.default_temperature = default_temperature
    self.default_max_tokens = default_max_tokens
    self._context_transform = context_transform
    self._max_tool_iterations = max_tool_iterations
    if tool_timeout < 0:
        raise ValueError(
            f"tool_timeout must be non-negative, got {tool_timeout}"
        )
    if tool_loop_timeout < 0:
        raise ValueError(
            f"tool_loop_timeout must be non-negative, got "
            f"{tool_loop_timeout}"
        )
    self._tool_timeout = tool_timeout
    self._tool_loop_timeout = tool_loop_timeout
    self._owns_llm = True  # Set False by from_config() when llm= injected
    self._conversation_managers: dict[str, ConversationManager] = {}
    self._turn_checkpoints: dict[str, list[tuple[str, int]]] = {}
    self._providers: dict[str, AsyncLLMProvider] = {}
Attributes
all_providers property
all_providers: dict[str, AsyncLLMProvider]

All registered providers keyed by role.

Always includes "main" (self.llm). Subsystems add their own entries during construction. Returns a fresh dict (snapshot) on each call.

Functions
register_provider
register_provider(role: str, provider: AsyncLLMProvider) -> None

Register an auxiliary LLM/embedding provider by role.

Providers registered here are included in all_providers for observability and enumeration. The registry is a catalog — it does not manage provider lifecycle. Each subsystem closes the providers it created (originator-owns-lifecycle).

The "main" role is reserved for self.llm and cannot be overwritten.

Parameters:

Name Type Description Default
role str

Unique role identifier (e.g. "memory_embedding").

required
provider AsyncLLMProvider

The provider instance.

required
Source code in packages/bots/src/dataknobs_bots/bot/base.py
def register_provider(self, role: str, provider: AsyncLLMProvider) -> None:
    """Register an auxiliary LLM/embedding provider by role.

    Providers registered here are included in ``all_providers`` for
    observability and enumeration.  The registry is a catalog — it
    does not manage provider lifecycle.  Each subsystem closes the
    providers it created (originator-owns-lifecycle).

    The ``"main"`` role is reserved for ``self.llm`` and cannot be
    overwritten.

    Args:
        role: Unique role identifier (e.g. ``"memory_embedding"``).
        provider: The provider instance.
    """
    if role == PROVIDER_ROLE_MAIN:
        logger.warning(
            "Cannot register provider with reserved role %r — "
            "use the 'llm' constructor parameter instead",
            PROVIDER_ROLE_MAIN,
        )
        return
    self._providers[role] = provider
get_provider
get_provider(role: str) -> AsyncLLMProvider | None

Get a registered provider by role.

Parameters:

Name Type Description Default
role str

Provider role identifier.

required

Returns:

Type Description
AsyncLLMProvider | None

The provider, or None if not registered.

Source code in packages/bots/src/dataknobs_bots/bot/base.py
def get_provider(self, role: str) -> AsyncLLMProvider | None:
    """Get a registered provider by role.

    Args:
        role: Provider role identifier.

    Returns:
        The provider, or ``None`` if not registered.
    """
    if role == PROVIDER_ROLE_MAIN:
        return self.llm
    return self._providers.get(role)
from_config async classmethod
from_config(
    config: dict[str, Any],
    *,
    llm: AsyncLLMProvider | None = None,
    middleware: list[Middleware] | None = None,
) -> DynaBot

Create DynaBot from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary containing: - llm: LLM configuration (provider, model, etc.). Optional when the llm kwarg is provided. - conversation_storage: Storage configuration. Two modes: - backend: Database backend key for the default DataknobsConversationStorage (e.g. "memory", "sqlite", "postgres"). - storage_class: Dotted import path to a custom ConversationStorage class (e.g. "myapp.storage:AcmeStorage"). The class must implement ConversationStorage including the async create(config) classmethod. - tools: Optional list of tool configurations - memory: Optional memory configuration - knowledge_base: Optional knowledge base configuration - reasoning: Optional reasoning strategy configuration - middleware: Optional middleware configurations (ignored when the middleware kwarg is provided) - prompts: Optional prompts library (dict of name -> content) - system_prompt: Optional system prompt configuration (see below) - config_base_path: Optional base directory for resolving relative config file paths (e.g. wizard_config). When set, relative paths in nested configs are resolved against this directory instead of the current working directory.

required
llm AsyncLLMProvider | None

Pre-built LLM provider. When provided, config["llm"] is optional and the provider is used as-is (no initialization or cleanup — the caller owns the lifecycle). Use this to share a single provider across multiple bot instances.

None
middleware list[Middleware] | None

Pre-built middleware list. When provided, replaces any middleware defined in config.

None

Returns:

Type Description
DynaBot

Configured DynaBot instance

System Prompt Formats

The system_prompt can be specified in multiple ways:

  • String: Smart detection - if the string exists as a template name in the prompt library, it's used as a template reference; otherwise it's treated as inline content.

  • Dict with name: {"name": "template_name"} - explicit template reference

  • Dict with name + strict: {"name": "template_name", "strict": true} - raises error if template doesn't exist
  • Dict with content: {"content": "inline prompt text"} - inline content
  • Dict with content + rag_configs: inline content with RAG enhancement
Example
bot = await DynaBot.from_config(config)

# With a shared provider
shared_llm = OllamaProvider({"provider": "ollama", "model": "llama3.2"})
await shared_llm.initialize()
bot = await DynaBot.from_config(
    {"conversation_storage": {"backend": "memory"}},
    llm=shared_llm,
)

# With pre-built middleware
bot = await DynaBot.from_config(config, middleware=[my_middleware])
Source code in packages/bots/src/dataknobs_bots/bot/base.py
@classmethod
async def from_config(
    cls,
    config: dict[str, Any],
    *,
    llm: AsyncLLMProvider | None = None,
    middleware: list[Middleware] | None = None,
) -> DynaBot:
    """Create DynaBot from configuration.

    Args:
        config: Configuration dictionary containing:
            - llm: LLM configuration (provider, model, etc.).
              Optional when the ``llm`` kwarg is provided.
            - conversation_storage: Storage configuration.  Two modes:
                - ``backend``: Database backend key for the default
                  DataknobsConversationStorage (e.g. ``"memory"``,
                  ``"sqlite"``, ``"postgres"``).
                - ``storage_class``: Dotted import path to a custom
                  ConversationStorage class (e.g.
                  ``"myapp.storage:AcmeStorage"``).  The class must
                  implement ``ConversationStorage`` including the
                  async ``create(config)`` classmethod.
            - tools: Optional list of tool configurations
            - memory: Optional memory configuration
            - knowledge_base: Optional knowledge base configuration
            - reasoning: Optional reasoning strategy configuration
            - middleware: Optional middleware configurations (ignored
              when the ``middleware`` kwarg is provided)
            - prompts: Optional prompts library (dict of name -> content)
            - system_prompt: Optional system prompt configuration (see below)
            - config_base_path: Optional base directory for resolving
              relative config file paths (e.g. wizard_config). When set,
              relative paths in nested configs are resolved against this
              directory instead of the current working directory.
        llm: Pre-built LLM provider.  When provided, ``config["llm"]``
            is optional and the provider is used as-is (no initialization
            or cleanup — the caller owns the lifecycle).  Use this to
            share a single provider across multiple bot instances.
        middleware: Pre-built middleware list.  When provided, replaces
            any middleware defined in config.

    Returns:
        Configured DynaBot instance

    System Prompt Formats:
        The system_prompt can be specified in multiple ways:

        - String: Smart detection - if the string exists as a template name
          in the prompt library, it's used as a template reference; otherwise
          it's treated as inline content.

        - Dict with name: `{"name": "template_name"}` - explicit template reference
        - Dict with name + strict: `{"name": "template_name", "strict": true}` -
          raises error if template doesn't exist
        - Dict with content: `{"content": "inline prompt text"}` - inline content
        - Dict with content + rag_configs: inline content with RAG enhancement

    Example:
        ```python
        bot = await DynaBot.from_config(config)

        # With a shared provider
        shared_llm = OllamaProvider({"provider": "ollama", "model": "llama3.2"})
        await shared_llm.initialize()
        bot = await DynaBot.from_config(
            {"conversation_storage": {"backend": "memory"}},
            llm=shared_llm,
        )

        # With pre-built middleware
        bot = await DynaBot.from_config(config, middleware=[my_middleware])
        ```
    """
    if llm is not None:
        # Caller-owned provider — skip creation/initialization.
        # Caller is responsible for lifecycle (initialize/close).
        llm_config = config.get("llm", {})
        bot = await cls._build_from_config(
            config, llm, llm_config, middleware_override=middleware
        )
        bot._owns_llm = False  # Caller owns lifecycle
        return bot

    # Create LLM provider from config
    llm_config = config["llm"]

    from dataknobs_llm.llm import LLMProviderFactory

    created_llm = LLMProviderFactory(is_async=True).create(llm_config)
    await created_llm.initialize()

    # Everything below can fail; ensure the provider is closed on error
    # so we don't leak aiohttp sessions or other resources.
    try:
        return await cls._build_from_config(
            config, created_llm, llm_config,
            middleware_override=middleware,
        )
    except Exception:
        await created_llm.close()
        raise
from_environment_aware_config async classmethod
from_environment_aware_config(
    config: EnvironmentAwareConfig | dict[str, Any],
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
    config_key: str = "bot",
) -> DynaBot

Create DynaBot with environment-aware configuration.

This is the recommended entry point for environment-portable bots. Resource references ($resource) are resolved against the environment config, and environment variables are substituted at instantiation time (late binding).

Parameters:

Name Type Description Default
config EnvironmentAwareConfig | dict[str, Any]

EnvironmentAwareConfig instance or dict with $resource references. If dict, will be wrapped in EnvironmentAwareConfig.

required
environment EnvironmentConfig | str | None

Environment name or EnvironmentConfig instance. If None, auto-detects from DATAKNOBS_ENVIRONMENT env var. Ignored if config is already an EnvironmentAwareConfig.

None
env_dir str | Path

Directory containing environment config files. Only used if environment is a string name.

'config/environments'
config_key str

Key within config containing bot configuration. Defaults to "bot". Set to None to use root config.

'bot'

Returns:

Type Description
DynaBot

Fully initialized DynaBot instance with resolved resources

Example
# With portable config dict
config = {
    "bot": {
        "llm": {
            "$resource": "default",
            "type": "llm_providers",
            "temperature": 0.7,
        },
        "conversation_storage": {
            "$resource": "conversations",
            "type": "databases",
        },
    }
}
bot = await DynaBot.from_environment_aware_config(config)

# With explicit environment
bot = await DynaBot.from_environment_aware_config(
    config,
    environment="production",
    env_dir="configs/environments"
)

# With EnvironmentAwareConfig instance
from dataknobs_config import EnvironmentAwareConfig
env_config = EnvironmentAwareConfig.load_app("my-bot", ...)
bot = await DynaBot.from_environment_aware_config(env_config)
Note

The config should use $resource references for infrastructure:

bot:
  llm:
    $resource: default      # Logical name
    type: llm_providers     # Resource type
    temperature: 0.7        # Behavioral param (portable)

The environment config provides concrete bindings:

resources:
  llm_providers:
    default:
      provider: openai
      model: gpt-4
      api_key: ${OPENAI_API_KEY}

Source code in packages/bots/src/dataknobs_bots/bot/base.py
@classmethod
async def from_environment_aware_config(
    cls,
    config: EnvironmentAwareConfig | dict[str, Any],
    environment: EnvironmentConfig | str | None = None,
    env_dir: str | Path = "config/environments",
    config_key: str = "bot",
) -> DynaBot:
    """Create DynaBot with environment-aware configuration.

    This is the recommended entry point for environment-portable bots.
    Resource references ($resource) are resolved against the environment
    config, and environment variables are substituted at instantiation time
    (late binding).

    Args:
        config: EnvironmentAwareConfig instance or dict with $resource references.
               If dict, will be wrapped in EnvironmentAwareConfig.
        environment: Environment name or EnvironmentConfig instance.
                    If None, auto-detects from DATAKNOBS_ENVIRONMENT env var.
                    Ignored if config is already an EnvironmentAwareConfig.
        env_dir: Directory containing environment config files.
                Only used if environment is a string name.
        config_key: Key within config containing bot configuration.
                   Defaults to "bot". Set to None to use root config.

    Returns:
        Fully initialized DynaBot instance with resolved resources

    Example:
        ```python
        # With portable config dict
        config = {
            "bot": {
                "llm": {
                    "$resource": "default",
                    "type": "llm_providers",
                    "temperature": 0.7,
                },
                "conversation_storage": {
                    "$resource": "conversations",
                    "type": "databases",
                },
            }
        }
        bot = await DynaBot.from_environment_aware_config(config)

        # With explicit environment
        bot = await DynaBot.from_environment_aware_config(
            config,
            environment="production",
            env_dir="configs/environments"
        )

        # With EnvironmentAwareConfig instance
        from dataknobs_config import EnvironmentAwareConfig
        env_config = EnvironmentAwareConfig.load_app("my-bot", ...)
        bot = await DynaBot.from_environment_aware_config(env_config)
        ```

    Note:
        The config should use $resource references for infrastructure:
        ```yaml
        bot:
          llm:
            $resource: default      # Logical name
            type: llm_providers     # Resource type
            temperature: 0.7        # Behavioral param (portable)
        ```

        The environment config provides concrete bindings:
        ```yaml
        resources:
          llm_providers:
            default:
              provider: openai
              model: gpt-4
              api_key: ${OPENAI_API_KEY}
        ```
    """
    from dataknobs_config import EnvironmentAwareConfig, EnvironmentConfig

    # Wrap dict in EnvironmentAwareConfig if needed
    if isinstance(config, dict):
        # Load or use provided environment
        if isinstance(environment, EnvironmentConfig):
            env_config = environment
        else:
            env_config = EnvironmentConfig.load(environment, env_dir)

        config = EnvironmentAwareConfig(
            config=config,
            environment=env_config,
        )
    elif environment is not None:
        # Switch environment on existing EnvironmentAwareConfig
        config = config.with_environment(environment, env_dir)

    # Resolve resources and env vars (late binding happens here)
    if config_key:
        resolved = config.resolve_for_build(config_key)
    else:
        resolved = config.resolve_for_build()

    # Delegate to existing from_config
    return await cls.from_config(resolved)
get_portable_config staticmethod
get_portable_config(
    config: EnvironmentAwareConfig | dict[str, Any],
) -> dict[str, Any]

Extract portable configuration for storage.

Returns configuration with $resource references intact and environment variables unresolved. This is the config that should be stored in registries or databases for cross-environment portability.

Parameters:

Name Type Description Default
config EnvironmentAwareConfig | dict[str, Any]

EnvironmentAwareConfig instance or portable dict

required

Returns:

Type Description
dict[str, Any]

Portable configuration dictionary

Example
from dataknobs_config import EnvironmentAwareConfig

# From EnvironmentAwareConfig
env_config = EnvironmentAwareConfig.load_app("my-bot", ...)
portable = DynaBot.get_portable_config(env_config)

# Store portable config in registry
await registry.store(bot_id, portable)

# Dict passes through unchanged
portable = DynaBot.get_portable_config({"bot": {...}})
Source code in packages/bots/src/dataknobs_bots/bot/base.py
@staticmethod
def get_portable_config(
    config: EnvironmentAwareConfig | dict[str, Any],
) -> dict[str, Any]:
    """Extract portable configuration for storage.

    Returns configuration with $resource references intact
    and environment variables unresolved. This is the config
    that should be stored in registries or databases for
    cross-environment portability.

    Args:
        config: EnvironmentAwareConfig instance or portable dict

    Returns:
        Portable configuration dictionary

    Example:
        ```python
        from dataknobs_config import EnvironmentAwareConfig

        # From EnvironmentAwareConfig
        env_config = EnvironmentAwareConfig.load_app("my-bot", ...)
        portable = DynaBot.get_portable_config(env_config)

        # Store portable config in registry
        await registry.store(bot_id, portable)

        # Dict passes through unchanged
        portable = DynaBot.get_portable_config({"bot": {...}})
        ```
    """
    # Import here to avoid circular dependency at module level
    try:
        from dataknobs_config import EnvironmentAwareConfig

        if isinstance(config, EnvironmentAwareConfig):
            return config.get_portable_config()
    except ImportError:
        pass

    # Dict passes through (assumed already portable)
    return config
chat async
chat(
    message: str,
    context: BotContext,
    temperature: float | None = None,
    max_tokens: int | None = None,
    rag_query: str | None = None,
    llm_config_overrides: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
    **kwargs: Any,
) -> str

Process a chat message.

Parameters:

Name Type Description Default
message str

User message to process

required
context BotContext

Bot execution context

required
temperature float | None

Optional temperature override

None
max_tokens int | None

Optional max tokens override

None
rag_query str | None

Optional explicit query for knowledge base retrieval. If provided, this is used instead of the message for RAG. Useful when the message contains literal text to analyze (e.g., "Analyze this prompt: [prompt text]") but you want to search for analysis techniques instead.

None
llm_config_overrides dict[str, Any] | None

Optional dict to override LLM config fields for this request only. Supported fields: model, temperature, max_tokens, top_p, stop_sequences, seed, options.

None
plugin_data dict[str, Any] | None

Optional dict to seed turn.plugin_data before middleware runs. Enables caller-managed lifecycle patterns (e.g., passing a DB session handle that middleware can use and finally_turn can close).

None
**kwargs Any

Additional arguments

{}

Returns:

Type Description
str

Bot response as string

Example
context = BotContext(
    conversation_id="conv-123",
    client_id="client-456",
    user_id="user-789"
)
response = await bot.chat("Hello!", context)

# With explicit RAG query
response = await bot.chat(
    "Analyze this: Write a poem about cats",
    context,
    rag_query="prompt analysis techniques evaluation"
)

# With LLM config overrides (switch model per-request)
response = await bot.chat(
    "Explain quantum computing",
    context,
    llm_config_overrides={"model": "gpt-4-turbo", "temperature": 0.9}
)
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def chat(
    self,
    message: str,
    context: BotContext,
    temperature: float | None = None,
    max_tokens: int | None = None,
    rag_query: str | None = None,
    llm_config_overrides: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
    **kwargs: Any,
) -> str:
    """Process a chat message.

    Args:
        message: User message to process
        context: Bot execution context
        temperature: Optional temperature override
        max_tokens: Optional max tokens override
        rag_query: Optional explicit query for knowledge base retrieval.
                  If provided, this is used instead of the message for RAG.
                  Useful when the message contains literal text to analyze
                  (e.g., "Analyze this prompt: [prompt text]") but you want
                  to search for analysis techniques instead.
        llm_config_overrides: Optional dict to override LLM config fields
                  for this request only. Supported fields: model, temperature,
                  max_tokens, top_p, stop_sequences, seed, options.
        plugin_data: Optional dict to seed ``turn.plugin_data`` before
                  middleware runs.  Enables caller-managed lifecycle
                  patterns (e.g., passing a DB session handle that
                  middleware can use and ``finally_turn`` can close).
        **kwargs: Additional arguments

    Returns:
        Bot response as string

    Example:
        ```python
        context = BotContext(
            conversation_id="conv-123",
            client_id="client-456",
            user_id="user-789"
        )
        response = await bot.chat("Hello!", context)

        # With explicit RAG query
        response = await bot.chat(
            "Analyze this: Write a poem about cats",
            context,
            rag_query="prompt analysis techniques evaluation"
        )

        # With LLM config overrides (switch model per-request)
        response = await bot.chat(
            "Explain quantum computing",
            context,
            llm_config_overrides={"model": "gpt-4-turbo", "temperature": 0.9}
        )
        ```
    """
    turn = TurnState(
        mode=TurnMode.CHAT,
        message=message,
        context=context,
        rag_query=rag_query,
        temperature=temperature,
        max_tokens=max_tokens,
        llm_config_overrides=llm_config_overrides,
        plugin_data=plugin_data or {},
    )
    try:
        await self._prepare_turn(turn)
        response = await self._generate_response(
            turn.manager, temperature, max_tokens, llm_config_overrides
        )

        # DynaBot-level tool execution loop.  Strategies that handle
        # tool_calls internally (e.g. ReAct) return responses without
        # tool_calls, so this loop is a no-op for them.
        loop_start = time.monotonic()
        for _iteration in range(self._max_tool_iterations):
            if (
                not self.tool_registry
                or not getattr(response, "tool_calls", None)
            ):
                break
            if time.monotonic() - loop_start >= self._tool_loop_timeout:
                logger.warning(
                    "Tool execution loop exceeded wall-clock timeout "
                    "(%.1fs)",
                    self._tool_loop_timeout,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )
                break
            await self._execute_tools(turn, response.tool_calls)
            # Accumulate usage from intermediate LLM calls
            turn.accumulate_usage(response)
            # Enforce remaining loop budget on the LLM re-call
            remaining = self._tool_loop_timeout - (
                time.monotonic() - loop_start
            )
            if remaining <= 0:
                logger.warning(
                    "Tool loop budget exhausted before LLM re-call "
                    "(%.1fs budget)",
                    self._tool_loop_timeout,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )
                break
            try:
                response = await asyncio.wait_for(
                    turn.manager.complete(
                        tools=list(self.tool_registry) or None,
                        temperature=temperature or self.default_temperature,
                        max_tokens=max_tokens or self.default_max_tokens,
                        llm_config_overrides=llm_config_overrides,
                    ),
                    timeout=remaining,
                )
            except (TimeoutError, asyncio.TimeoutError):
                logger.warning(
                    "LLM re-call exceeded remaining tool loop "
                    "budget (%.1fs remaining of %.1fs)",
                    remaining,
                    self._tool_loop_timeout,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )
                break
        else:
            # Loop completed without break — cap hit
            if self.tool_registry and getattr(
                response, "tool_calls", None
            ):
                logger.warning(
                    "Tool execution loop reached max iterations (%d) "
                    "with pending tool_calls",
                    self._max_tool_iterations,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )

        turn.response = response
        turn.response_content = self._extract_response_content(response)
        turn.populate_from_response(response, self.llm)
        await self._finalize_turn(turn)
        return turn.response_content
    except Exception as e:
        await self._call_on_error_middleware(e, message, context)
        raise
    finally:
        await self._call_finally_turn_middleware(turn)
greet async
greet(
    context: BotContext,
    *,
    initial_context: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
) -> str | None

Generate a bot-initiated greeting before the user speaks.

Delegates to the reasoning strategy's greet() method. Returns None if the bot has no reasoning strategy or the strategy does not support greetings (e.g. non-wizard strategies).

No user message is added to conversation history — the greeting is a bot-initiated assistant message only.

Parameters:

Name Type Description Default
context BotContext

Bot execution context

required
initial_context dict[str, Any] | None

Optional dict of initial data to seed into the reasoning strategy's state before generating the greeting. For wizard strategies, these values are merged into wizard_state.data so they are available to the start stage's prompt template and transforms.

None
plugin_data dict[str, Any] | None

Optional dict to seed turn.plugin_data before middleware runs. See chat() for details.

When reasoning_strategy is None, no turn is initiated but finally_turn still fires if plugin_data was provided, ensuring cleanup.

None

Returns:

Type Description
str | None

Greeting string, or None if the bot does not support greetings

Note

Middleware lifecycle for greet: on_turn_start(turn) and before_message("") are called before greeting generation; after_turn(turn) and after_message(...) are called on success (only when a response is generated); finally_turn(turn) fires on success, error, and when the strategy returns None (no greeting). If an error occurs, on_error hooks receive message="" since there is no user message. If a middleware hook itself fails, on_hook_error is called on all middleware.

Example
context = BotContext(conversation_id="conv-123", client_id="harness")
greeting = await bot.greet(context, initial_context={"user_name": "Alice"})
if greeting:
    print(f"Bot says: {greeting}")
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def greet(
    self,
    context: BotContext,
    *,
    initial_context: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
) -> str | None:
    """Generate a bot-initiated greeting before the user speaks.

    Delegates to the reasoning strategy's ``greet()`` method. Returns
    ``None`` if the bot has no reasoning strategy or the strategy does
    not support greetings (e.g. non-wizard strategies).

    No user message is added to conversation history — the greeting
    is a bot-initiated assistant message only.

    Args:
        context: Bot execution context
        initial_context: Optional dict of initial data to seed into
            the reasoning strategy's state before generating the
            greeting. For wizard strategies, these values are merged
            into ``wizard_state.data`` so they are available to the
            start stage's prompt template and transforms.
        plugin_data: Optional dict to seed ``turn.plugin_data`` before
            middleware runs.  See ``chat()`` for details.

            When ``reasoning_strategy`` is ``None``, no turn is
            initiated but ``finally_turn`` still fires if
            ``plugin_data`` was provided, ensuring cleanup.

    Returns:
        Greeting string, or None if the bot does not support greetings

    Note:
        Middleware lifecycle for greet: ``on_turn_start(turn)`` and
        ``before_message("")`` are called before greeting generation;
        ``after_turn(turn)`` and ``after_message(...)`` are called on
        success (only when a response is generated);
        ``finally_turn(turn)`` fires on success, error, and when
        the strategy returns ``None`` (no greeting).
        If an error occurs, ``on_error`` hooks receive
        ``message=""`` since there is no user message.  If a
        middleware hook itself fails, ``on_hook_error`` is called on
        all middleware.

    Example:
        ```python
        context = BotContext(conversation_id="conv-123", client_id="harness")
        greeting = await bot.greet(context, initial_context={"user_name": "Alice"})
        if greeting:
            print(f"Bot says: {greeting}")
        ```
    """
    if not self.reasoning_strategy:
        if plugin_data is not None:
            turn = TurnState(
                mode=TurnMode.GREET,
                message="",
                context=context,
                plugin_data=plugin_data,
            )
            await self._call_finally_turn_middleware(turn)
        return None

    turn = TurnState(
        mode=TurnMode.GREET,
        message="",
        context=context,
        initial_context=initial_context,
        plugin_data=plugin_data or {},
    )
    try:
        await self._prepare_turn(turn)

        response = await self.reasoning_strategy.greet(
            manager=turn.manager,
            llm=self.llm,
            initial_context=initial_context,
        )

        if response is None:
            return None

        turn.response = response
        turn.response_content = self._extract_response_content(response)
        # Note: greet responses are not checked for tool_calls.
        # Greetings are bot-initiated and strategies are not expected
        # to request tool calls during greet.  If this assumption
        # changes, add the tool execution loop here (matching
        # chat/stream_chat).
        turn.populate_from_response(response, self.llm)
        await self._finalize_turn(turn)
        return turn.response_content
    except Exception as e:
        await self._call_on_error_middleware(e, "", context)
        raise
    finally:
        await self._call_finally_turn_middleware(turn)
stream_chat async
stream_chat(
    message: str,
    context: BotContext,
    temperature: float | None = None,
    max_tokens: int | None = None,
    rag_query: str | None = None,
    llm_config_overrides: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
    **kwargs: Any,
) -> AsyncGenerator[LLMStreamResponse, None]

Stream chat response token by token.

Similar to chat() but yields LLMStreamResponse objects as they are generated, providing both the text delta and rich metadata (usage, finish_reason, is_final) for each chunk.

Parameters:

Name Type Description Default
message str

User message to process

required
context BotContext

Bot execution context

required
temperature float | None

Optional temperature override

None
max_tokens int | None

Optional max tokens override

None
rag_query str | None

Optional explicit query for knowledge base retrieval. If provided, this is used instead of the message for RAG.

None
llm_config_overrides dict[str, Any] | None

Optional dict to override LLM config fields for this request only. Supported fields: model, temperature, max_tokens, top_p, stop_sequences, seed, options.

None
plugin_data dict[str, Any] | None

Optional dict to seed turn.plugin_data before middleware runs. See chat() for details.

None
**kwargs Any

Additional arguments passed to LLM

{}

Yields:

Type Description
AsyncGenerator[LLMStreamResponse, None]

LLMStreamResponse objects with .delta (text), .is_final,

AsyncGenerator[LLMStreamResponse, None]

.usage, and .finish_reason attributes.

Example
context = BotContext(
    conversation_id="conv-123",
    client_id="client-456",
    user_id="user-789"
)

# Stream and display in real-time
async for chunk in bot.stream_chat("Explain quantum computing", context):
    print(chunk.delta, end="", flush=True)
print()  # Newline after streaming

# Accumulate response
full_response = ""
async for chunk in bot.stream_chat("Hello!", context):
    full_response += chunk.delta

# With LLM config overrides
async for chunk in bot.stream_chat(
    "Explain quantum computing",
    context,
    llm_config_overrides={"model": "gpt-4-turbo"}
):
    print(chunk.delta, end="", flush=True)
Note

Conversation history is automatically updated after streaming completes. When a reasoning_strategy is configured, the strategy produces the complete response and it is emitted as a single stream chunk.

Cleanup guarantee: finally_turn middleware fires via a finally block inside the async generator. In Python, async generator finally blocks execute only when the generator is fully consumed, explicitly closed (await gen.aclose()), or garbage collected. Callers that break out of the stream early should use contextlib.aclosing to guarantee prompt cleanup::

from contextlib import aclosing

async with aclosing(bot.stream_chat("msg", ctx)) as stream:
    async for chunk in stream:
        if done:
            break  # aclose() fires finally_turn
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def stream_chat(
    self,
    message: str,
    context: BotContext,
    temperature: float | None = None,
    max_tokens: int | None = None,
    rag_query: str | None = None,
    llm_config_overrides: dict[str, Any] | None = None,
    plugin_data: dict[str, Any] | None = None,
    **kwargs: Any,
) -> AsyncGenerator[LLMStreamResponse, None]:
    """Stream chat response token by token.

    Similar to chat() but yields ``LLMStreamResponse`` objects as they are
    generated, providing both the text delta and rich metadata (usage,
    finish_reason, is_final) for each chunk.

    Args:
        message: User message to process
        context: Bot execution context
        temperature: Optional temperature override
        max_tokens: Optional max tokens override
        rag_query: Optional explicit query for knowledge base retrieval.
                  If provided, this is used instead of the message for RAG.
        llm_config_overrides: Optional dict to override LLM config fields
                  for this request only. Supported fields: model, temperature,
                  max_tokens, top_p, stop_sequences, seed, options.
        plugin_data: Optional dict to seed ``turn.plugin_data`` before
                  middleware runs.  See ``chat()`` for details.
        **kwargs: Additional arguments passed to LLM

    Yields:
        LLMStreamResponse objects with ``.delta`` (text), ``.is_final``,
        ``.usage``, and ``.finish_reason`` attributes.

    Example:
        ```python
        context = BotContext(
            conversation_id="conv-123",
            client_id="client-456",
            user_id="user-789"
        )

        # Stream and display in real-time
        async for chunk in bot.stream_chat("Explain quantum computing", context):
            print(chunk.delta, end="", flush=True)
        print()  # Newline after streaming

        # Accumulate response
        full_response = ""
        async for chunk in bot.stream_chat("Hello!", context):
            full_response += chunk.delta

        # With LLM config overrides
        async for chunk in bot.stream_chat(
            "Explain quantum computing",
            context,
            llm_config_overrides={"model": "gpt-4-turbo"}
        ):
            print(chunk.delta, end="", flush=True)
        ```

    Note:
        Conversation history is automatically updated after streaming completes.
        When a reasoning_strategy is configured, the strategy produces the
        complete response and it is emitted as a single stream chunk.

        **Cleanup guarantee:** ``finally_turn`` middleware fires via a
        ``finally`` block inside the async generator.  In Python, async
        generator ``finally`` blocks execute only when the generator is
        fully consumed, explicitly closed (``await gen.aclose()``), or
        garbage collected.  Callers that break out of the stream early
        should use ``contextlib.aclosing`` to guarantee prompt cleanup::

            from contextlib import aclosing

            async with aclosing(bot.stream_chat("msg", ctx)) as stream:
                async for chunk in stream:
                    if done:
                        break  # aclose() fires finally_turn
    """
    turn = TurnState(
        mode=TurnMode.STREAM,
        message=message,
        context=context,
        rag_query=rag_query,
        temperature=temperature,
        max_tokens=max_tokens,
        llm_config_overrides=llm_config_overrides,
        plugin_data=plugin_data or {},
    )
    streaming_error: Exception | None = None
    stream_fully_consumed = False

    try:
        await self._prepare_turn(turn)

        # Track tool_calls across streaming rounds so the tool
        # execution loop can pick them up after the initial stream.
        pending_tool_calls: list[Any] | None = None

        if self.reasoning_strategy:
            # Delegate to the strategy's stream_generate().
            # Strategies with true streaming (SimpleReasoning) yield
            # LLMStreamResponse chunks; others yield a single complete
            # response that we wrap as a stream chunk.
            async for chunk in self.reasoning_strategy.stream_generate(
                manager=turn.manager,
                llm=self.llm,
                tools=list(self.tool_registry) or None,
                temperature=temperature or self.default_temperature,
                max_tokens=max_tokens or self.default_max_tokens,
                llm_config_overrides=llm_config_overrides,
            ):
                if isinstance(chunk, LLMStreamResponse):
                    turn.stream_chunks.append(chunk.delta)
                    if chunk.is_final or chunk.usage:
                        turn.populate_from_final_stream_chunk(
                            chunk, self.llm
                        )
                    # Intercept tool_calls: suppress is_final so the
                    # consumer knows more content may follow.
                    if chunk.tool_calls and self.tool_registry:
                        pending_tool_calls = chunk.tool_calls
                        yield LLMStreamResponse(
                            delta=chunk.delta,
                            is_final=False,
                            usage=chunk.usage,
                            model=chunk.model,
                        )
                    else:
                        yield chunk
                else:
                    # Strategy yielded a complete LLMResponse — wrap it
                    content = self._extract_response_content(chunk)
                    turn.stream_chunks.append(content)
                    turn.populate_from_response(chunk, self.llm)
                    # Check for tool_calls on the LLMResponse
                    if (
                        getattr(chunk, "tool_calls", None)
                        and self.tool_registry
                    ):
                        pending_tool_calls = chunk.tool_calls
                        yield LLMStreamResponse(
                            delta=content, is_final=False,
                        )
                    else:
                        yield LLMStreamResponse(
                            delta=content,
                            is_final=True,
                            finish_reason="stop",
                        )
        else:
            # No reasoning strategy — stream directly from LLM
            async for chunk in turn.manager.stream_complete(
                tools=list(self.tool_registry) or None,
                llm_config_overrides=llm_config_overrides,
                temperature=temperature or self.default_temperature,
                max_tokens=max_tokens or self.default_max_tokens,
                **kwargs,
            ):
                turn.stream_chunks.append(chunk.delta)
                if chunk.is_final or chunk.usage:
                    turn.populate_from_final_stream_chunk(chunk, self.llm)
                if chunk.tool_calls and self.tool_registry:
                    pending_tool_calls = chunk.tool_calls
                    yield LLMStreamResponse(
                        delta=chunk.delta,
                        is_final=False,
                        usage=chunk.usage,
                        model=chunk.model,
                    )
                else:
                    yield chunk

        # DynaBot-level tool execution loop for streaming.
        # Execute pending tool_calls, then re-stream until no
        # more tool_calls or max iterations reached.
        loop_start = time.monotonic()
        for _iteration in range(self._max_tool_iterations):
            if not pending_tool_calls or not self.tool_registry:
                break
            if time.monotonic() - loop_start >= self._tool_loop_timeout:
                logger.warning(
                    "Streaming tool execution loop exceeded "
                    "wall-clock timeout (%.1fs)",
                    self._tool_loop_timeout,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )
                break
            await self._execute_tools(turn, pending_tool_calls)
            # Accumulate usage from intermediate streaming rounds
            turn.accumulate_usage_from_stream()
            pending_tool_calls = None

            # Check remaining budget before starting LLM re-stream
            remaining = self._tool_loop_timeout - (
                time.monotonic() - loop_start
            )
            if remaining <= 0:
                logger.warning(
                    "Streaming tool loop budget exhausted before "
                    "LLM re-stream (%.1fs budget)",
                    self._tool_loop_timeout,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )
                break

            async for chunk in turn.manager.stream_complete(
                tools=list(self.tool_registry) or None,
                temperature=temperature or self.default_temperature,
                max_tokens=max_tokens or self.default_max_tokens,
                llm_config_overrides=llm_config_overrides,
            ):
                turn.stream_chunks.append(chunk.delta)
                if chunk.is_final or chunk.usage:
                    turn.populate_from_final_stream_chunk(
                        chunk, self.llm
                    )
                if chunk.tool_calls and self.tool_registry:
                    pending_tool_calls = chunk.tool_calls
                    yield LLMStreamResponse(
                        delta=chunk.delta,
                        is_final=False,
                        usage=chunk.usage,
                        model=chunk.model,
                    )
                else:
                    yield chunk
        else:
            # Loop completed without break — cap hit
            if pending_tool_calls and self.tool_registry:
                logger.warning(
                    "Streaming tool execution loop reached max "
                    "iterations (%d) with pending tool_calls",
                    self._max_tool_iterations,
                    extra={
                        "conversation_id": getattr(
                            turn.manager, "conversation_id", None
                        ),
                    },
                )

        stream_fully_consumed = True

    except Exception as e:
        streaming_error = e
        await self._call_on_error_middleware(e, message, context)
        raise
    finally:
        # Only finalize when the stream was fully consumed (not
        # on early exit via aclose/break, which would write
        # partial data to conversation history).
        if streaming_error is None and stream_fully_consumed:
            turn.response_content = "".join(turn.stream_chunks)
            await self._finalize_turn(turn)
        await self._call_finally_turn_middleware(turn)
get_conversation async
get_conversation(conversation_id: str) -> Any

Retrieve conversation history.

This method fetches the complete conversation state including all messages, metadata, and the message tree structure. Useful for displaying conversation history, debugging, analytics, or exporting conversations.

Parameters:

Name Type Description Default
conversation_id str

Unique identifier of the conversation to retrieve

required

Returns:

Type Description
Any

ConversationState object containing the full conversation history,

Any

or None if the conversation does not exist

Example
# Retrieve a conversation
conv_state = await bot.get_conversation("conv-123")

# Access messages
messages = conv_state.message_tree

# Access metadata
print(conv_state.metadata)
See Also
  • clear_conversation(): Clear/delete a conversation
  • chat(): Add messages to a conversation
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def get_conversation(self, conversation_id: str) -> Any:
    """Retrieve conversation history.

    This method fetches the complete conversation state including all messages,
    metadata, and the message tree structure. Useful for displaying conversation
    history, debugging, analytics, or exporting conversations.

    Args:
        conversation_id: Unique identifier of the conversation to retrieve

    Returns:
        ConversationState object containing the full conversation history,
        or None if the conversation does not exist

    Example:
        ```python
        # Retrieve a conversation
        conv_state = await bot.get_conversation("conv-123")

        # Access messages
        messages = conv_state.message_tree

        # Access metadata
        print(conv_state.metadata)
        ```

    See Also:
        - clear_conversation(): Clear/delete a conversation
        - chat(): Add messages to a conversation
    """
    return await self.conversation_storage.load_conversation(conversation_id)
clear_conversation async
clear_conversation(conversation_id: str) -> bool

Clear a conversation's history.

This method removes the conversation from both persistent storage and the internal cache. The next chat() call with this conversation_id will start a fresh conversation. Useful for:

  • Implementing "start over" functionality
  • Privacy/data deletion requirements
  • Testing and cleanup
  • Resetting conversation context

Parameters:

Name Type Description Default
conversation_id str

Unique identifier of the conversation to clear

required

Returns:

Type Description
bool

True if the conversation was deleted, False if it didn't exist

Example
# Clear a conversation
deleted = await bot.clear_conversation("conv-123")

if deleted:
    print("Conversation deleted")
else:
    print("Conversation not found")

# Next chat will start fresh
response = await bot.chat("Hello!", context)
Note

This operation is permanent and cannot be undone. The conversation cannot be recovered after deletion.

See Also
  • get_conversation(): Retrieve conversation before clearing
  • chat(): Will create new conversation after clearing
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def clear_conversation(self, conversation_id: str) -> bool:
    """Clear a conversation's history.

    This method removes the conversation from both persistent storage and the
    internal cache. The next chat() call with this conversation_id will start
    a fresh conversation. Useful for:

    - Implementing "start over" functionality
    - Privacy/data deletion requirements
    - Testing and cleanup
    - Resetting conversation context

    Args:
        conversation_id: Unique identifier of the conversation to clear

    Returns:
        True if the conversation was deleted, False if it didn't exist

    Example:
        ```python
        # Clear a conversation
        deleted = await bot.clear_conversation("conv-123")

        if deleted:
            print("Conversation deleted")
        else:
            print("Conversation not found")

        # Next chat will start fresh
        response = await bot.chat("Hello!", context)
        ```

    Note:
        This operation is permanent and cannot be undone. The conversation
        cannot be recovered after deletion.

    See Also:
        - get_conversation(): Retrieve conversation before clearing
        - chat(): Will create new conversation after clearing
    """
    # Remove from cache if present
    if conversation_id in self._conversation_managers:
        del self._conversation_managers[conversation_id]

    # Delete from storage
    return await self.conversation_storage.delete_conversation(conversation_id)
get_wizard_state async
get_wizard_state(conversation_id: str) -> dict[str, Any] | None

Get current wizard state for a conversation.

This method provides public access to wizard state without requiring access to private conversation managers. It checks the in-memory manager first (most current) and falls back to persisted storage.

Parameters:

Name Type Description Default
conversation_id str

Conversation identifier

required

Returns:

Type Description
dict[str, Any] | None

Wizard state dict with canonical structure, or None if no wizard

dict[str, Any] | None

active or conversation not found.

The returned dict follows the canonical schema

{ "current_stage": str, "stage_index": int, "total_stages": int, "progress": float, "completed": bool, "data": dict, "can_skip": bool, "can_go_back": bool, "suggestions": list[str], "history": list[str], }

Example
# Get wizard state for a conversation
state = await bot.get_wizard_state("conv-123")

if state:
    print(f"Current stage: {state['current_stage']}")
    print(f"Progress: {state['progress'] * 100:.0f}%")
    print(f"Collected data: {state['data']}")
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def get_wizard_state(self, conversation_id: str) -> dict[str, Any] | None:
    """Get current wizard state for a conversation.

    This method provides public access to wizard state without requiring
    access to private conversation managers. It checks the in-memory
    manager first (most current) and falls back to persisted storage.

    Args:
        conversation_id: Conversation identifier

    Returns:
        Wizard state dict with canonical structure, or None if no wizard
        active or conversation not found.

    The returned dict follows the canonical schema:
        {
            "current_stage": str,
            "stage_index": int,
            "total_stages": int,
            "progress": float,
            "completed": bool,
            "data": dict,
            "can_skip": bool,
            "can_go_back": bool,
            "suggestions": list[str],
            "history": list[str],
        }

    Example:
        ```python
        # Get wizard state for a conversation
        state = await bot.get_wizard_state("conv-123")

        if state:
            print(f"Current stage: {state['current_stage']}")
            print(f"Progress: {state['progress'] * 100:.0f}%")
            print(f"Collected data: {state['data']}")
        ```
    """
    # Fast path: in-memory cache
    manager = self._conversation_managers.get(conversation_id)
    if manager and manager.metadata:
        wizard_meta = manager.metadata.get("wizard")
        if wizard_meta:
            return self._normalize_wizard_state(wizard_meta)

    # Slow path: fall back to persisted storage
    state = await self.conversation_storage.load_conversation(conversation_id)
    if state and state.metadata:
        wizard_meta = state.metadata.get("wizard")
        if wizard_meta:
            return self._normalize_wizard_state(wizard_meta)

    return None
close async
close() -> None

Close the bot and clean up resources.

This method closes the LLM provider, conversation storage backend, reasoning strategy, and releases associated resources like HTTP connections and database connections. Should be called when the bot is no longer needed, especially in testing or when creating temporary bot instances.

Example
bot = await DynaBot.from_config(config)
try:
    response = await bot.chat("Hello", context)
finally:
    await bot.close()
Note

After calling close(), the bot should not be used for further operations. Create a new bot instance if needed.

Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def close(self) -> None:
    """Close the bot and clean up resources.

    This method closes the LLM provider, conversation storage backend,
    reasoning strategy, and releases associated resources like HTTP
    connections and database connections. Should be called when the bot
    is no longer needed, especially in testing or when creating temporary
    bot instances.

    Example:
        ```python
        bot = await DynaBot.from_config(config)
        try:
            response = await bot.chat("Hello", context)
        finally:
            await bot.close()
        ```

    Note:
        After calling close(), the bot should not be used for further operations.
        Create a new bot instance if needed.
    """
    # Each subsystem owns the lifecycle of the providers it created.
    # The provider registry is a catalog for observability — it does
    # not manage lifecycle.  DynaBot only closes self.llm (the main
    # provider it created).

    # Close subsystems — each closes its own providers and resources.
    if self.knowledge_base:
        try:
            await self.knowledge_base.close()
        except Exception:
            logger.exception("Error closing knowledge base")

    if self.reasoning_strategy:
        try:
            await self.reasoning_strategy.close()
        except Exception:
            logger.exception("Error closing reasoning strategy")

    if self.memory:
        try:
            await self.memory.close()
        except Exception:
            logger.exception("Error closing memory store")

    # Close conversation storage
    if self.conversation_storage:
        try:
            await self.conversation_storage.close()
        except Exception:
            logger.exception("Error closing conversation storage")

    # Close main LLM provider only if DynaBot created it.
    # When from_config(llm=...) was used, the caller owns the lifecycle.
    if self._owns_llm and self.llm and hasattr(self.llm, "close"):
        try:
            await self.llm.close()
        except Exception:
            logger.exception("Error closing main LLM provider")
__aenter__ async
__aenter__() -> Self

Async context manager entry.

Returns:

Type Description
Self

Self for use in async with statement

Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def __aenter__(self) -> Self:
    """Async context manager entry.

    Returns:
        Self for use in async with statement
    """
    return self
__aexit__ async
__aexit__(
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: TracebackType | None,
) -> None

Async context manager exit - ensures cleanup.

Parameters:

Name Type Description Default
exc_type type[BaseException] | None

Exception type if an exception occurred

required
exc_val BaseException | None

Exception value if an exception occurred

required
exc_tb TracebackType | None

Exception traceback if an exception occurred

required
Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def __aexit__(
    self,
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: TracebackType | None,
) -> None:
    """Async context manager exit - ensures cleanup.

    Args:
        exc_type: Exception type if an exception occurred
        exc_val: Exception value if an exception occurred
        exc_tb: Exception traceback if an exception occurred
    """
    await self.close()
get_conversation_manager
get_conversation_manager(conversation_id: str) -> ConversationManager | None

Get a cached conversation manager by conversation ID.

Returns None if no manager exists for the given ID (i.e. no turn has been processed for that conversation yet). Use this for cross-layer integration testing (e.g. injecting LLM-layer ConversationMiddleware into a manager after construction).

Parameters:

Name Type Description Default
conversation_id str

Conversation identifier

required

Returns:

Type Description
ConversationManager | None

Cached ConversationManager, or None

Source code in packages/bots/src/dataknobs_bots/bot/base.py
def get_conversation_manager(
    self, conversation_id: str
) -> ConversationManager | None:
    """Get a cached conversation manager by conversation ID.

    Returns ``None`` if no manager exists for the given ID (i.e. no
    turn has been processed for that conversation yet).  Use this for
    cross-layer integration testing (e.g. injecting LLM-layer
    ``ConversationMiddleware`` into a manager after construction).

    Args:
        conversation_id: Conversation identifier

    Returns:
        Cached ConversationManager, or None
    """
    return self._conversation_managers.get(conversation_id)
undo_last_turn async
undo_last_turn(context: BotContext) -> UndoResult

Undo the last conversational turn (user message + bot response).

Navigates the conversation tree back to the node_id recorded before the last turn started. The next chat() call will create a new branch from that point. The original branch is preserved in the tree.

Also rolls back: - Memory layer (pop N messages based on node depth difference) - Wizard FSM state (restored from per-node metadata) - Memory banks (reverted via backend-managed checkpointing)

Parameters:

Name Type Description Default
context BotContext

Bot execution context (identifies the conversation).

required

Returns:

Type Description
UndoResult

UndoResult with details about what was undone.

Raises:

Type Description
ValueError

If there's nothing to undo (at start of conversation).

Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def undo_last_turn(self, context: BotContext) -> UndoResult:
    """Undo the last conversational turn (user message + bot response).

    Navigates the conversation tree back to the node_id recorded before
    the last turn started. The next chat() call will create a new branch
    from that point. The original branch is preserved in the tree.

    Also rolls back:
    - Memory layer (pop N messages based on node depth difference)
    - Wizard FSM state (restored from per-node metadata)
    - Memory banks (reverted via backend-managed checkpointing)

    Args:
        context: Bot execution context (identifies the conversation).

    Returns:
        UndoResult with details about what was undone.

    Raises:
        ValueError: If there's nothing to undo (at start of conversation).
    """
    conv_id = context.conversation_id
    manager = self._conversation_managers.get(conv_id)
    if manager is None or manager.state is None:
        raise ValueError("No active conversation")

    checkpoints = self._turn_checkpoints.get(conv_id, [])
    if not checkpoints:
        raise ValueError("Nothing to undo")

    checkpoint_node_id, checkpoint_mem_count = checkpoints.pop()

    # Identify what we're undoing (last user message + last bot response).
    # For user messages, prefer raw_content from node metadata so that
    # UndoResult.undone_user_message reflects the original user input
    # rather than the KB/memory-augmented version.
    undone_user = ""
    undone_bot = ""
    nodes = manager.state.get_current_nodes()
    for node in reversed(nodes):
        role = node.message.role
        if role == "assistant" and not undone_bot:
            content = node.message.content
            undone_bot = content if isinstance(content, str) else str(content)
        elif role == "user" and not undone_user:
            raw = node.metadata.get("raw_content")
            if raw is not None:
                undone_user = raw
            else:
                content = node.message.content
                undone_user = content if isinstance(content, str) else str(content)
            break

    # Navigate back — next add_message() creates a sibling branch
    await manager.switch_to_node(checkpoint_node_id)

    # Roll back memory — use stored message count for accuracy
    current_mem_count = 0
    if self.memory:
        try:
            current_mem_count = len(await self.memory.get_context(""))
        except Exception:
            current_mem_count = 0
    messages_to_pop = current_mem_count - checkpoint_mem_count
    if self.memory and messages_to_pop > 0:
        try:
            await self.memory.pop_messages(messages_to_pop)
        except (ValueError, NotImplementedError):
            logger.warning(
                "Memory pop_messages failed for %d messages",
                messages_to_pop,
                exc_info=True,
            )

    # Restore wizard FSM state from checkpoint node's metadata
    self._restore_wizard_from_node(manager, checkpoint_node_id)

    # Revert banks via backend-managed checkpointing
    self._undo_banks_to_checkpoint(checkpoint_node_id)

    # Count remaining turns
    remaining_messages = manager.messages
    user_count = sum(
        1 for m in remaining_messages
        if (m.get("role") if isinstance(m, dict) else getattr(m, "role", "")) == "user"
    )

    return UndoResult(
        undone_user_message=undone_user,
        undone_bot_response=undone_bot,
        remaining_turns=user_count,
        branching=True,
    )
rewind_to_turn async
rewind_to_turn(context: BotContext, turn: int) -> UndoResult

Rewind conversation to after the given turn number.

Turn 0 is the first user-bot exchange. Rewinding to turn -1 means back to the start (before any user messages).

Parameters:

Name Type Description Default
context BotContext

Bot execution context.

required
turn int

Turn number to rewind to (-1 for conversation start).

required

Returns:

Type Description
UndoResult

UndoResult with details about what was undone.

Raises:

Type Description
ValueError

If turn number is invalid.

Source code in packages/bots/src/dataknobs_bots/bot/base.py
async def rewind_to_turn(
    self, context: BotContext, turn: int
) -> UndoResult:
    """Rewind conversation to after the given turn number.

    Turn 0 is the first user-bot exchange. Rewinding to turn -1
    means back to the start (before any user messages).

    Args:
        context: Bot execution context.
        turn: Turn number to rewind to (-1 for conversation start).

    Returns:
        UndoResult with details about what was undone.

    Raises:
        ValueError: If turn number is invalid.
    """
    conv_id = context.conversation_id
    checkpoints = self._turn_checkpoints.get(conv_id, [])
    target_count = turn + 1  # checkpoints[0] is before turn 0

    if target_count < 0 or target_count > len(checkpoints):
        raise ValueError(
            f"Invalid turn {turn}: conversation has "
            f"{len(checkpoints)} turns"
        )

    turns_to_undo = len(checkpoints) - target_count
    result = None
    for _ in range(turns_to_undo):
        result = await self.undo_last_turn(context)

    if result is None:
        raise ValueError("Nothing to undo")
    return result

UndoResult dataclass

UndoResult(
    undone_user_message: str,
    undone_bot_response: str,
    remaining_turns: int,
    branching: bool,
)

Result of an undo operation.

ConfigDraftManager

ConfigDraftManager(
    output_dir: Path,
    draft_prefix: str = "_draft-",
    max_age_hours: float = 24.0,
    metadata_key: str = "_draft",
)

File-based draft manager for interactive config creation.

Manages the lifecycle of configuration drafts: creation, incremental updates, finalization, and cleanup of stale drafts.

Draft files are named {prefix}{draft_id}.yaml and stored in the output directory. When a config_name is provided, a named alias file {config_name}.yaml is also maintained.

Initialize the draft manager.

Parameters:

Name Type Description Default
output_dir Path

Directory for draft and config files.

required
draft_prefix str

Prefix for draft file names.

'_draft-'
max_age_hours float

Default maximum age for stale draft cleanup.

24.0
metadata_key str

Key used to store draft metadata in config files.

'_draft'

Methods:

Name Description
create_draft

Create a new draft from a config dict.

update_draft

Update an existing draft.

get_draft

Retrieve a draft and its metadata.

finalize

Finalize a draft into a completed configuration.

discard

Discard a draft by removing its file.

list_drafts

List all current drafts.

cleanup_stale

Remove drafts older than the specified age.

Attributes:

Name Type Description
output_dir Path

The output directory for drafts.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def __init__(
    self,
    output_dir: Path,
    draft_prefix: str = "_draft-",
    max_age_hours: float = 24.0,
    metadata_key: str = "_draft",
) -> None:
    """Initialize the draft manager.

    Args:
        output_dir: Directory for draft and config files.
        draft_prefix: Prefix for draft file names.
        max_age_hours: Default maximum age for stale draft cleanup.
        metadata_key: Key used to store draft metadata in config files.
    """
    self._output_dir = output_dir
    self._draft_prefix = draft_prefix
    self._max_age_hours = max_age_hours
    self._metadata_key = metadata_key
Attributes
output_dir property
output_dir: Path

The output directory for drafts.

Functions
create_draft
create_draft(config: dict[str, Any], stage: str | None = None) -> str

Create a new draft from a config dict.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary to save as draft.

required
stage str | None

Current wizard stage.

None

Returns:

Type Description
str

The generated draft ID.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def create_draft(
    self,
    config: dict[str, Any],
    stage: str | None = None,
) -> str:
    """Create a new draft from a config dict.

    Args:
        config: Configuration dictionary to save as draft.
        stage: Current wizard stage.

    Returns:
        The generated draft ID.
    """
    draft_id = uuid.uuid4().hex[:8]
    now = datetime.now(timezone.utc).isoformat()

    metadata = DraftMetadata(
        draft_id=draft_id,
        created_at=now,
        last_updated=now,
        stage=stage,
    )

    self._write_draft(draft_id, config, metadata)
    logger.info(
        "Created draft %s at stage '%s'",
        draft_id,
        stage,
        extra={"draft_id": draft_id, "stage": stage},
    )
    return draft_id
update_draft
update_draft(
    draft_id: str,
    config: dict[str, Any],
    stage: str | None = None,
    config_name: str | None = None,
) -> None

Update an existing draft.

Parameters:

Name Type Description Default
draft_id str

The draft ID to update.

required
config dict[str, Any]

Updated configuration dictionary.

required
stage str | None

Current wizard stage.

None
config_name str | None

Optional name for the config file alias.

None

Raises:

Type Description
FileNotFoundError

If the draft file does not exist.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def update_draft(
    self,
    draft_id: str,
    config: dict[str, Any],
    stage: str | None = None,
    config_name: str | None = None,
) -> None:
    """Update an existing draft.

    Args:
        draft_id: The draft ID to update.
        config: Updated configuration dictionary.
        stage: Current wizard stage.
        config_name: Optional name for the config file alias.

    Raises:
        FileNotFoundError: If the draft file does not exist.
    """
    draft_path = self._draft_path(draft_id)
    if not draft_path.exists():
        raise FileNotFoundError(f"Draft not found: {draft_id}")

    existing = self._read_file(draft_path)
    existing_meta = existing.get(self._metadata_key, {})
    now = datetime.now(timezone.utc).isoformat()

    metadata = DraftMetadata(
        draft_id=draft_id,
        created_at=existing_meta.get("created_at", now),
        last_updated=now,
        stage=stage or existing_meta.get("stage"),
        config_name=config_name or existing_meta.get("config_name"),
    )

    self._write_draft(draft_id, config, metadata)

    # Also write named alias file if config_name is set
    if metadata.config_name:
        self._write_named_file(metadata.config_name, config, metadata)

    logger.info(
        "Updated draft %s at stage '%s'",
        draft_id,
        stage,
        extra={"draft_id": draft_id, "stage": stage},
    )
get_draft
get_draft(draft_id: str) -> tuple[dict[str, Any], DraftMetadata] | None

Retrieve a draft and its metadata.

Parameters:

Name Type Description Default
draft_id str

The draft ID to retrieve.

required

Returns:

Type Description
tuple[dict[str, Any], DraftMetadata] | None

Tuple of (config_dict, metadata), or None if not found.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def get_draft(
    self, draft_id: str
) -> tuple[dict[str, Any], DraftMetadata] | None:
    """Retrieve a draft and its metadata.

    Args:
        draft_id: The draft ID to retrieve.

    Returns:
        Tuple of (config_dict, metadata), or None if not found.
    """
    draft_path = self._draft_path(draft_id)
    if not draft_path.exists():
        return None

    data = self._read_file(draft_path)
    meta_dict = data.pop(self._metadata_key, {})
    metadata = DraftMetadata.from_dict(meta_dict)
    return data, metadata
finalize
finalize(draft_id: str, final_name: str | None = None) -> dict[str, Any]

Finalize a draft into a completed configuration.

Strips draft metadata, writes the final config file, and removes the draft file.

Parameters:

Name Type Description Default
draft_id str

The draft ID to finalize.

required
final_name str | None

Name for the final config file. If not provided, uses the config_name from draft metadata.

None

Returns:

Type Description
dict[str, Any]

The finalized configuration dict (without draft metadata).

Raises:

Type Description
FileNotFoundError

If the draft does not exist.

ValueError

If no final name can be determined.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def finalize(
    self,
    draft_id: str,
    final_name: str | None = None,
) -> dict[str, Any]:
    """Finalize a draft into a completed configuration.

    Strips draft metadata, writes the final config file, and
    removes the draft file.

    Args:
        draft_id: The draft ID to finalize.
        final_name: Name for the final config file. If not provided,
            uses the config_name from draft metadata.

    Returns:
        The finalized configuration dict (without draft metadata).

    Raises:
        FileNotFoundError: If the draft does not exist.
        ValueError: If no final name can be determined.
    """
    result = self.get_draft(draft_id)
    if result is None:
        raise FileNotFoundError(f"Draft not found: {draft_id}")

    config, metadata = result
    name = final_name or metadata.config_name
    if not name:
        raise ValueError(
            "No final_name provided and draft has no config_name set"
        )

    # Write final file without metadata
    self._ensure_output_dir()
    final_path = self._output_dir / f"{name}.yaml"
    self._write_yaml(final_path, config)

    # Remove draft file
    draft_path = self._draft_path(draft_id)
    if draft_path.exists():
        draft_path.unlink()

    logger.info(
        "Finalized draft %s as '%s'",
        draft_id,
        name,
        extra={"draft_id": draft_id, "final_name": name},
    )
    return config
discard
discard(draft_id: str) -> bool

Discard a draft by removing its file.

Parameters:

Name Type Description Default
draft_id str

The draft ID to discard.

required

Returns:

Type Description
bool

True if the draft was found and removed, False otherwise.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def discard(self, draft_id: str) -> bool:
    """Discard a draft by removing its file.

    Args:
        draft_id: The draft ID to discard.

    Returns:
        True if the draft was found and removed, False otherwise.
    """
    draft_path = self._draft_path(draft_id)
    if draft_path.exists():
        draft_path.unlink()
        logger.info("Discarded draft %s", draft_id)
        return True
    return False
list_drafts
list_drafts() -> list[DraftMetadata]

List all current drafts.

Returns:

Type Description
list[DraftMetadata]

List of DraftMetadata for all draft files.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def list_drafts(self) -> list[DraftMetadata]:
    """List all current drafts.

    Returns:
        List of DraftMetadata for all draft files.
    """
    result: list[DraftMetadata] = []
    if not self._output_dir.exists():
        return result

    for path in sorted(self._output_dir.glob(f"{self._draft_prefix}*.yaml")):
        try:
            data = self._read_file(path)
            meta_dict = data.get(self._metadata_key, {})
            if meta_dict:
                result.append(DraftMetadata.from_dict(meta_dict))
        except Exception:
            logger.exception("Failed to read draft: %s", path)
    return result
cleanup_stale
cleanup_stale(max_age_hours: float | None = None) -> int

Remove drafts older than the specified age.

Also strips stale draft metadata blocks from named config files.

Parameters:

Name Type Description Default
max_age_hours float | None

Maximum age in hours. Defaults to the manager's configured max_age_hours.

None

Returns:

Type Description
int

Number of stale drafts removed.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def cleanup_stale(self, max_age_hours: float | None = None) -> int:
    """Remove drafts older than the specified age.

    Also strips stale draft metadata blocks from named config files.

    Args:
        max_age_hours: Maximum age in hours. Defaults to the
            manager's configured max_age_hours.

    Returns:
        Number of stale drafts removed.
    """
    age_limit = max_age_hours if max_age_hours is not None else self._max_age_hours
    cutoff = time.time() - (age_limit * 3600)
    cleaned = 0

    if not self._output_dir.exists():
        return 0

    # Clean draft files
    for path in self._output_dir.glob(f"{self._draft_prefix}*.yaml"):
        try:
            data = self._read_file(path)
            meta = data.get(self._metadata_key, {})
            last_updated = meta.get("last_updated", "")
            if last_updated and _parse_timestamp(last_updated) < cutoff:
                path.unlink()
                cleaned += 1
                logger.info("Cleaned stale draft: %s", path.name)
        except Exception:
            logger.exception("Failed to cleanup draft: %s", path)

    # Strip stale metadata from named config files
    for path in self._output_dir.glob("*.yaml"):
        if path.name.startswith(self._draft_prefix):
            continue
        try:
            data = self._read_file(path)
            meta = data.get(self._metadata_key, {})
            if not meta:
                continue
            last_updated = meta.get("last_updated", "")
            if last_updated and _parse_timestamp(last_updated) < cutoff:
                data.pop(self._metadata_key, None)
                self._write_yaml(path, data)
                logger.info(
                    "Stripped stale metadata from %s", path.name
                )
        except Exception:
            logger.exception(
                "Failed to strip metadata from: %s", path
            )

    return cleaned

ConfigTemplate dataclass

ConfigTemplate(
    name: str,
    description: str = "",
    version: str = "1.0.0",
    tags: list[str] = list(),
    variables: list[TemplateVariable] = list(),
    structure: dict[str, Any] = dict(),
)

A reusable DynaBot configuration template.

Templates define a configuration structure with variable placeholders ({{var}}) that are substituted when the template is applied.

Attributes:

Name Type Description
name str

Template identifier (underscores internally).

description str

Human-readable description.

version str

Semantic version string.

tags list[str]

Tags for filtering and categorization.

variables list[TemplateVariable]

List of template variables.

structure dict[str, Any]

The config structure with {{var}} placeholders.

Methods:

Name Description
get_required_variables

Get variables that must be provided.

get_optional_variables

Get variables that have defaults or are not required.

to_dict

Convert to dictionary representation.

from_dict

Create a ConfigTemplate from a dictionary.

from_yaml_file

Load a ConfigTemplate from a YAML file.

Functions
get_required_variables
get_required_variables() -> list[TemplateVariable]

Get variables that must be provided.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def get_required_variables(self) -> list[TemplateVariable]:
    """Get variables that must be provided."""
    return [v for v in self.variables if v.required]
get_optional_variables
get_optional_variables() -> list[TemplateVariable]

Get variables that have defaults or are not required.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def get_optional_variables(self) -> list[TemplateVariable]:
    """Get variables that have defaults or are not required."""
    return [v for v in self.variables if not v.required]
to_dict
to_dict() -> dict[str, Any]

Convert to dictionary representation.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary representation."""
    return {
        "name": self.name,
        "description": self.description,
        "version": self.version,
        "tags": self.tags,
        "variables": [v.to_dict() for v in self.variables],
        "structure": self.structure,
    }
from_dict classmethod
from_dict(data: dict[str, Any]) -> ConfigTemplate

Create a ConfigTemplate from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with template fields.

required

Returns:

Type Description
ConfigTemplate

A new ConfigTemplate instance.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> ConfigTemplate:
    """Create a ConfigTemplate from a dictionary.

    Args:
        data: Dictionary with template fields.

    Returns:
        A new ConfigTemplate instance.
    """
    variables = [
        TemplateVariable.from_dict(v) for v in data.get("variables", [])
    ]
    return cls(
        name=data.get("name", ""),
        description=data.get("description", ""),
        version=data.get("version", "1.0.0"),
        tags=data.get("tags", []),
        variables=variables,
        structure=data.get("structure", {}),
    )
from_yaml_file classmethod
from_yaml_file(path: Path) -> ConfigTemplate

Load a ConfigTemplate from a YAML file.

Parameters:

Name Type Description Default
path Path

Path to the YAML file.

required

Returns:

Type Description
ConfigTemplate

A new ConfigTemplate instance.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

YAMLError

If the file is not valid YAML.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
@classmethod
def from_yaml_file(cls, path: Path) -> ConfigTemplate:
    """Load a ConfigTemplate from a YAML file.

    Args:
        path: Path to the YAML file.

    Returns:
        A new ConfigTemplate instance.

    Raises:
        FileNotFoundError: If the file does not exist.
        yaml.YAMLError: If the file is not valid YAML.
    """
    with open(path) as f:
        data = yaml.safe_load(f)
    if data is None:
        data = {}
    template = cls.from_dict(data)
    if not template.name:
        template.name = path.stem.replace("-", "_")
    return template

ConfigTemplateRegistry

ConfigTemplateRegistry()

Registry for managing and applying configuration templates.

Supports registration, tag-based filtering, variable validation, and template application with variable substitution.

Methods:

Name Description
register

Register a template.

get

Get a template by name.

list_templates

List templates, optionally filtered by tags.

load_from_file

Load and register a template from a YAML file.

load_from_directory

Load and register all templates from a directory.

apply_template

Apply a template with variable substitution.

validate_variables

Validate variables against a template's requirements.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def __init__(self) -> None:
    self._templates: dict[str, ConfigTemplate] = {}
Functions
register
register(template: ConfigTemplate) -> None

Register a template.

Parameters:

Name Type Description Default
template ConfigTemplate

The template to register.

required
Source code in packages/bots/src/dataknobs_bots/config/templates.py
def register(self, template: ConfigTemplate) -> None:
    """Register a template.

    Args:
        template: The template to register.
    """
    self._templates[template.name] = template
    logger.debug("Registered template: %s", template.name)
get
get(name: str) -> ConfigTemplate | None

Get a template by name.

Parameters:

Name Type Description Default
name str

Template name.

required

Returns:

Type Description
ConfigTemplate | None

The template, or None if not found.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def get(self, name: str) -> ConfigTemplate | None:
    """Get a template by name.

    Args:
        name: Template name.

    Returns:
        The template, or None if not found.
    """
    return self._templates.get(name)
list_templates
list_templates(tags: list[str] | None = None) -> list[ConfigTemplate]

List templates, optionally filtered by tags.

Parameters:

Name Type Description Default
tags list[str] | None

If provided, only return templates that have all specified tags.

None

Returns:

Type Description
list[ConfigTemplate]

List of matching templates.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def list_templates(
    self, tags: list[str] | None = None
) -> list[ConfigTemplate]:
    """List templates, optionally filtered by tags.

    Args:
        tags: If provided, only return templates that have all specified tags.

    Returns:
        List of matching templates.
    """
    templates = list(self._templates.values())
    if tags:
        tag_set = set(tags)
        templates = [t for t in templates if tag_set.issubset(set(t.tags))]
    return templates
load_from_file
load_from_file(path: Path) -> ConfigTemplate

Load and register a template from a YAML file.

Parameters:

Name Type Description Default
path Path

Path to the YAML file.

required

Returns:

Type Description
ConfigTemplate

The loaded template.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def load_from_file(self, path: Path) -> ConfigTemplate:
    """Load and register a template from a YAML file.

    Args:
        path: Path to the YAML file.

    Returns:
        The loaded template.
    """
    template = ConfigTemplate.from_yaml_file(path)
    self.register(template)
    return template
load_from_directory
load_from_directory(directory: Path) -> int

Load and register all templates from a directory.

Scans for *.yaml and *.yml files, skipping files named README or base.

Parameters:

Name Type Description Default
directory Path

Directory to scan.

required

Returns:

Type Description
int

Number of templates loaded.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def load_from_directory(self, directory: Path) -> int:
    """Load and register all templates from a directory.

    Scans for ``*.yaml`` and ``*.yml`` files, skipping files named
    ``README`` or ``base``.

    Args:
        directory: Directory to scan.

    Returns:
        Number of templates loaded.
    """
    count = 0
    for ext in ("*.yaml", "*.yml"):
        for path in sorted(directory.glob(ext)):
            if path.stem.lower() in ("readme", "base"):
                continue
            try:
                self.load_from_file(path)
                count += 1
            except Exception:
                logger.exception("Failed to load template from %s", path)
    return count
apply_template
apply_template(name: str, variables: dict[str, Any]) -> dict[str, Any]

Apply a template with variable substitution.

Deep-copies the template structure and substitutes all {{var}} placeholders with values from the variables dict.

Parameters:

Name Type Description Default
name str

Template name.

required
variables dict[str, Any]

Variable values to substitute.

required

Returns:

Type Description
dict[str, Any]

The resolved configuration dict.

Raises:

Type Description
KeyError

If the template is not found.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def apply_template(
    self,
    name: str,
    variables: dict[str, Any],
) -> dict[str, Any]:
    """Apply a template with variable substitution.

    Deep-copies the template structure and substitutes all ``{{var}}``
    placeholders with values from the variables dict.

    Args:
        name: Template name.
        variables: Variable values to substitute.

    Returns:
        The resolved configuration dict.

    Raises:
        KeyError: If the template is not found.
    """
    template = self._templates.get(name)
    if template is None:
        raise KeyError(f"Template not found: {name}")

    # Build full variable map: user values + defaults
    var_map = _build_variable_map(template, variables)

    structure = copy.deepcopy(template.structure)
    result: dict[str, Any] = substitute_template_vars(
        structure, var_map, preserve_missing=True
    )
    return result
validate_variables
validate_variables(name: str, variables: dict[str, Any]) -> ValidationResult

Validate variables against a template's requirements.

Checks that required variables are present and that values match any defined choices constraints.

Parameters:

Name Type Description Default
name str

Template name.

required
variables dict[str, Any]

Variable values to validate.

required

Returns:

Type Description
ValidationResult

ValidationResult with any issues found.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def validate_variables(
    self,
    name: str,
    variables: dict[str, Any],
) -> ValidationResult:
    """Validate variables against a template's requirements.

    Checks that required variables are present and that values
    match any defined choices constraints.

    Args:
        name: Template name.
        variables: Variable values to validate.

    Returns:
        ValidationResult with any issues found.
    """
    template = self._templates.get(name)
    if template is None:
        return ValidationResult.error(f"Template not found: {name}")

    result = ValidationResult.ok()

    for var in template.variables:
        if var.required and var.name not in variables:
            if var.default is None:
                result = result.merge(
                    ValidationResult.error(
                        f"Missing required variable: {var.name}"
                    )
                )
        if var.choices is not None and var.name in variables:
            value = variables[var.name]
            if value not in var.choices:
                result = result.merge(
                    ValidationResult.error(
                        f"Variable '{var.name}' has invalid value '{value}'. "
                        f"Valid choices: {var.choices}"
                    )
                )

    return result

ConfigValidator

ConfigValidator(schema: DynaBotConfigSchema | None = None)

Pluggable validation engine for DynaBot configurations.

Runs a pipeline of validators against a config dict and collects all errors and warnings into a single ValidationResult.

Example
validator = ConfigValidator()

# Add custom validator
def check_api_key(config):
    if "api_key" in str(config):
        return ValidationResult.warning("Config contains an API key")
    return ValidationResult.ok()

validator.register_validator("api_key_check", check_api_key)
result = validator.validate(my_config)

Initialize the validator.

Parameters:

Name Type Description Default
schema DynaBotConfigSchema | None

Optional config schema for schema-based validation.

None

Methods:

Name Description
register_validator

Register a named validation function.

validate

Run all validators against a configuration.

validate_completeness

Check that a config has the minimum required fields.

validate_portability

Check that a config is portable across environments.

validate_component

Validate a specific component section of the config.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def __init__(self, schema: DynaBotConfigSchema | None = None) -> None:
    """Initialize the validator.

    Args:
        schema: Optional config schema for schema-based validation.
    """
    self._schema = schema
    self._validators: dict[str, ValidatorFn] = {}
Functions
register_validator
register_validator(name: str, validator: ValidatorFn) -> None

Register a named validation function.

Parameters:

Name Type Description Default
name str

Unique name for this validator.

required
validator ValidatorFn

Function that takes a config dict and returns ValidationResult.

required
Source code in packages/bots/src/dataknobs_bots/config/validation.py
def register_validator(self, name: str, validator: ValidatorFn) -> None:
    """Register a named validation function.

    Args:
        name: Unique name for this validator.
        validator: Function that takes a config dict and returns ValidationResult.
    """
    self._validators[name] = validator
    logger.debug("Registered validator: %s", name)
validate
validate(config: dict[str, Any]) -> ValidationResult

Run all validators against a configuration.

Runs completeness check, schema validation (if schema provided), and all registered custom validators.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary to validate.

required

Returns:

Type Description
ValidationResult

Merged ValidationResult from all validators.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def validate(self, config: dict[str, Any]) -> ValidationResult:
    """Run all validators against a configuration.

    Runs completeness check, schema validation (if schema provided),
    and all registered custom validators.

    Args:
        config: Configuration dictionary to validate.

    Returns:
        Merged ValidationResult from all validators.
    """
    result = self.validate_completeness(config)

    if self._schema is not None:
        result = result.merge(self._schema.validate(config))

    for name, validator in self._validators.items():
        try:
            result = result.merge(validator(config))
        except Exception:
            logger.exception("Validator '%s' raised an exception", name)
            result = result.merge(
                ValidationResult.error(f"Validator '{name}' failed with an error")
            )

    return result
validate_completeness
validate_completeness(config: dict[str, Any]) -> ValidationResult

Check that a config has the minimum required fields.

A valid DynaBot config must have at minimum an LLM configuration and conversation storage configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary to check.

required

Returns:

Type Description
ValidationResult

ValidationResult with errors for missing required fields.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def validate_completeness(self, config: dict[str, Any]) -> ValidationResult:
    """Check that a config has the minimum required fields.

    A valid DynaBot config must have at minimum an LLM configuration
    and conversation storage configuration.

    Args:
        config: Configuration dictionary to check.

    Returns:
        ValidationResult with errors for missing required fields.
    """
    result = ValidationResult.ok()

    # Check for LLM config (flat or portable format)
    bot = config.get("bot", config)
    has_llm = "llm" in bot
    if not has_llm:
        result = result.merge(
            ValidationResult.error(
                "Missing required 'llm' configuration. "
                "Set llm.provider and llm.model, or use a $resource reference."
            )
        )

    # Check for conversation storage
    has_storage = "conversation_storage" in bot
    if not has_storage:
        result = result.merge(
            ValidationResult.error(
                "Missing required 'conversation_storage' configuration. "
                "Set conversation_storage.backend, "
                "conversation_storage.storage_class, "
                "or use a $resource reference."
            )
        )

    return result
validate_portability
validate_portability(config: dict[str, Any]) -> ValidationResult

Check that a config is portable across environments.

Wraps the portability checker from registry.portability to return a ValidationResult instead of raising exceptions.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary to check.

required

Returns:

Type Description
ValidationResult

ValidationResult with portability issues as warnings.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def validate_portability(self, config: dict[str, Any]) -> ValidationResult:
    """Check that a config is portable across environments.

    Wraps the portability checker from registry.portability to return
    a ValidationResult instead of raising exceptions.

    Args:
        config: Configuration dictionary to check.

    Returns:
        ValidationResult with portability issues as warnings.
    """
    try:
        issues = validate_portability(config, raise_on_error=False)
    except PortabilityError as e:
        return ValidationResult.error(str(e))

    if issues:
        return ValidationResult(
            valid=True,
            warnings=[f"Portability: {issue}" for issue in issues],
        )
    return ValidationResult.ok()
validate_component
validate_component(component: str, config: dict[str, Any]) -> ValidationResult

Validate a specific component section of the config.

Parameters:

Name Type Description Default
component str

Component name (e.g., 'llm', 'memory').

required
config dict[str, Any]

The component's configuration dictionary.

required

Returns:

Type Description
ValidationResult

ValidationResult for that component.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def validate_component(
    self, component: str, config: dict[str, Any]
) -> ValidationResult:
    """Validate a specific component section of the config.

    Args:
        component: Component name (e.g., 'llm', 'memory').
        config: The component's configuration dictionary.

    Returns:
        ValidationResult for that component.
    """
    if self._schema is None:
        return ValidationResult.ok()

    schema = self._schema.get_component_schema(component)
    if schema is None:
        return ValidationResult.warning(
            f"No schema registered for component '{component}'"
        )

    return _validate_against_schema(component, config, schema)

DraftMetadata dataclass

DraftMetadata(
    draft_id: str,
    created_at: str,
    last_updated: str,
    stage: str | None = None,
    complete: bool = False,
    config_name: str | None = None,
)

Metadata for a configuration draft.

Attributes:

Name Type Description
draft_id str

Unique identifier for the draft.

created_at str

ISO 8601 creation timestamp.

last_updated str

ISO 8601 last update timestamp.

stage str | None

Current wizard stage when draft was saved.

complete bool

Whether the draft represents a complete config.

config_name str | None

Optional name for the final config file.

Methods:

Name Description
to_dict

Convert to dictionary representation.

from_dict

Create DraftMetadata from a dictionary.

Functions
to_dict
to_dict() -> dict[str, Any]

Convert to dictionary representation.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary representation."""
    result: dict[str, Any] = {
        "id": self.draft_id,
        "created_at": self.created_at,
        "last_updated": self.last_updated,
        "complete": self.complete,
    }
    if self.stage is not None:
        result["stage"] = self.stage
    if self.config_name is not None:
        result["config_name"] = self.config_name
    return result
from_dict classmethod
from_dict(data: dict[str, Any]) -> DraftMetadata

Create DraftMetadata from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with metadata fields.

required

Returns:

Type Description
DraftMetadata

A new DraftMetadata instance.

Source code in packages/bots/src/dataknobs_bots/config/drafts.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> DraftMetadata:
    """Create DraftMetadata from a dictionary.

    Args:
        data: Dictionary with metadata fields.

    Returns:
        A new DraftMetadata instance.
    """
    return cls(
        draft_id=data.get("id", data.get("draft_id", "")),
        created_at=data.get("created_at", ""),
        last_updated=data.get("last_updated", ""),
        stage=data.get("stage"),
        complete=data.get("complete", False),
        config_name=data.get("config_name"),
    )

DynaBotConfigBuilder

DynaBotConfigBuilder(schema: DynaBotConfigSchema | None = None)

Fluent builder for DynaBot configurations.

Provides setter methods for each DynaBot component that return self for method chaining. Consumer-specific sections are added via set_custom_section().

Two output formats: - build() returns flat format compatible with DynaBot.from_config() - build_portable() returns environment-aware format with $resource references and a bot wrapper key

Initialize the builder.

Parameters:

Name Type Description Default
schema DynaBotConfigSchema | None

Optional schema for validation. If not provided, a default schema is created.

None

Methods:

Name Description
set_llm

Set the LLM provider configuration (flat/direct format).

set_llm_resource

Set the LLM configuration using a $resource reference.

set_conversation_storage

Set the conversation storage backend (flat/direct format).

set_conversation_storage_resource

Set conversation storage using a $resource reference.

set_conversation_storage_class

Set conversation storage using a custom ConversationStorage class.

set_memory

Set the memory configuration.

set_config_base_path

Set base path for resolving relative config file paths.

set_reasoning

Set the reasoning strategy.

set_reasoning_wizard

Set wizard reasoning with a config path, inline dict, or WizardConfig.

set_system_prompt

Set the system prompt configuration.

set_knowledge_base

Set the knowledge base configuration.

add_tool

Add a tool to the bot configuration.

add_tool_by_name

Add a tool to the config by looking up its catalog entry.

add_tools_by_name

Add multiple tools by name from the catalog.

add_middleware

Add middleware to the bot configuration.

set_custom_section

Set a custom (domain-specific) config section.

from_template

Initialize the builder from a template.

merge_overrides

Merge override values into the current configuration.

validate

Validate the current configuration.

build

Build the flat configuration dict.

build_portable

Build the portable configuration with $resource references.

to_yaml

Serialize the portable configuration as YAML.

reset

Reset the builder to an empty state.

from_config

Create a builder pre-populated from an existing config.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def __init__(self, schema: DynaBotConfigSchema | None = None) -> None:
    """Initialize the builder.

    Args:
        schema: Optional schema for validation. If not provided, a
            default schema is created.
    """
    self._schema = schema or DynaBotConfigSchema()
    self._config: dict[str, Any] = {}
    self._custom_sections: dict[str, Any] = {}
    self._validator = ConfigValidator(self._schema)
Functions
set_llm
set_llm(provider: str, model: str | None = None, **kwargs: Any) -> Self

Set the LLM provider configuration (flat/direct format).

Parameters:

Name Type Description Default
provider str

LLM provider name (e.g., 'ollama', 'openai').

required
model str | None

Model name or identifier.

None
**kwargs Any

Additional provider-specific settings (temperature, max_tokens, etc.).

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_llm(
    self,
    provider: str,
    model: str | None = None,
    **kwargs: Any,
) -> Self:
    """Set the LLM provider configuration (flat/direct format).

    Args:
        provider: LLM provider name (e.g., 'ollama', 'openai').
        model: Model name or identifier.
        **kwargs: Additional provider-specific settings
            (temperature, max_tokens, etc.).

    Returns:
        self for method chaining.
    """
    llm_config: dict[str, Any] = {"provider": provider}
    if model is not None:
        llm_config["model"] = model
    llm_config.update(kwargs)
    self._config["llm"] = llm_config
    return self
set_llm_resource
set_llm_resource(
    resource_name: str = "default",
    resource_type: str = "llm_providers",
    **overrides: Any,
) -> Self

Set the LLM configuration using a $resource reference.

Parameters:

Name Type Description Default
resource_name str

Resource name to resolve at runtime.

'default'
resource_type str

Resource type category.

'llm_providers'
**overrides Any

Override values applied after resolution.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_llm_resource(
    self,
    resource_name: str = "default",
    resource_type: str = "llm_providers",
    **overrides: Any,
) -> Self:
    """Set the LLM configuration using a $resource reference.

    Args:
        resource_name: Resource name to resolve at runtime.
        resource_type: Resource type category.
        **overrides: Override values applied after resolution.

    Returns:
        self for method chaining.
    """
    llm_config: dict[str, Any] = {
        "$resource": resource_name,
        "type": resource_type,
    }
    llm_config.update(overrides)
    self._config["llm"] = llm_config
    return self
set_conversation_storage
set_conversation_storage(backend: str, **kwargs: Any) -> Self

Set the conversation storage backend (flat/direct format).

Parameters:

Name Type Description Default
backend str

Storage backend name (e.g., 'memory', 'sqlite').

required
**kwargs Any

Additional backend-specific settings.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_conversation_storage(
    self,
    backend: str,
    **kwargs: Any,
) -> Self:
    """Set the conversation storage backend (flat/direct format).

    Args:
        backend: Storage backend name (e.g., 'memory', 'sqlite').
        **kwargs: Additional backend-specific settings.

    Returns:
        self for method chaining.
    """
    storage_config: dict[str, Any] = {"backend": backend}
    storage_config.update(kwargs)
    self._config["conversation_storage"] = storage_config
    return self
set_conversation_storage_resource
set_conversation_storage_resource(
    resource_name: str = "conversations",
    resource_type: str = "databases",
    **overrides: Any,
) -> Self

Set conversation storage using a $resource reference.

Parameters:

Name Type Description Default
resource_name str

Resource name to resolve at runtime.

'conversations'
resource_type str

Resource type category.

'databases'
**overrides Any

Override values applied after resolution.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_conversation_storage_resource(
    self,
    resource_name: str = "conversations",
    resource_type: str = "databases",
    **overrides: Any,
) -> Self:
    """Set conversation storage using a $resource reference.

    Args:
        resource_name: Resource name to resolve at runtime.
        resource_type: Resource type category.
        **overrides: Override values applied after resolution.

    Returns:
        self for method chaining.
    """
    storage_config: dict[str, Any] = {
        "$resource": resource_name,
        "type": resource_type,
    }
    storage_config.update(overrides)
    self._config["conversation_storage"] = storage_config
    return self
set_conversation_storage_class
set_conversation_storage_class(storage_class: str, **kwargs: Any) -> Self

Set conversation storage using a custom ConversationStorage class.

The class must implement ConversationStorage and provide an async create(config: dict) -> ConversationStorage classmethod.

Parameters:

Name Type Description Default
storage_class str

Dotted import path to the storage class (e.g., "myapp.storage:AcmeConversationStorage").

required
**kwargs Any

Additional config passed to create().

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_conversation_storage_class(
    self,
    storage_class: str,
    **kwargs: Any,
) -> Self:
    """Set conversation storage using a custom ConversationStorage class.

    The class must implement ``ConversationStorage`` and provide an async
    ``create(config: dict) -> ConversationStorage`` classmethod.

    Args:
        storage_class: Dotted import path to the storage class
            (e.g., ``"myapp.storage:AcmeConversationStorage"``).
        **kwargs: Additional config passed to ``create()``.

    Returns:
        self for method chaining.
    """
    storage_config: dict[str, Any] = {"storage_class": storage_class}
    storage_config.update(kwargs)
    self._config["conversation_storage"] = storage_config
    return self
set_memory
set_memory(memory_type: str, **kwargs: Any) -> Self

Set the memory configuration.

Parameters:

Name Type Description Default
memory_type str

Memory type (e.g., 'buffer', 'vector').

required
**kwargs Any

Additional memory settings (max_messages, etc.).

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_memory(self, memory_type: str, **kwargs: Any) -> Self:
    """Set the memory configuration.

    Args:
        memory_type: Memory type (e.g., 'buffer', 'vector').
        **kwargs: Additional memory settings (max_messages, etc.).

    Returns:
        self for method chaining.
    """
    memory_config: dict[str, Any] = {"type": memory_type}
    memory_config.update(kwargs)
    self._config["memory"] = memory_config
    return self
set_config_base_path
set_config_base_path(path: str | Path) -> Self

Set base path for resolving relative config file paths.

When set, relative paths in nested configs (e.g. wizard_config) are resolved against this directory instead of the current working directory.

Parameters:

Name Type Description Default
path str | Path

Base directory path (string or Path object).

required

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_config_base_path(self, path: str | Path) -> Self:
    """Set base path for resolving relative config file paths.

    When set, relative paths in nested configs (e.g. ``wizard_config``)
    are resolved against this directory instead of the current working
    directory.

    Args:
        path: Base directory path (string or Path object).

    Returns:
        self for method chaining.
    """
    self._config["config_base_path"] = str(path)
    return self
set_reasoning
set_reasoning(strategy: str, **kwargs: Any) -> Self

Set the reasoning strategy.

Parameters:

Name Type Description Default
strategy str

Reasoning strategy (e.g., 'simple', 'react', 'wizard').

required
**kwargs Any

Additional strategy settings.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_reasoning(self, strategy: str, **kwargs: Any) -> Self:
    """Set the reasoning strategy.

    Args:
        strategy: Reasoning strategy (e.g., 'simple', 'react', 'wizard').
        **kwargs: Additional strategy settings.

    Returns:
        self for method chaining.
    """
    reasoning_config: dict[str, Any] = {"strategy": strategy}
    reasoning_config.update(kwargs)
    self._config["reasoning"] = reasoning_config
    return self
set_reasoning_wizard
set_reasoning_wizard(
    wizard_config: str | dict[str, Any] | WizardConfig, **kwargs: Any
) -> Self

Set wizard reasoning with a config path, inline dict, or WizardConfig.

When wizard_config is a WizardConfig object, the caller is responsible for writing it to disk via wizard_config.to_file() before the bot loads.

When wizard_config is a dict, it is stored inline in the reasoning config and loaded via WizardConfigLoader.load_from_dict() at bot startup.

Parameters:

Name Type Description Default
wizard_config str | dict[str, Any] | WizardConfig

Path to a wizard YAML file, an inline dict (compatible with WizardConfigLoader.load_from_dict()), or a WizardConfig object (whose name is used as the config path identifier).

required
**kwargs Any

Additional reasoning settings (extraction_config, etc.).

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_reasoning_wizard(
    self,
    wizard_config: str | dict[str, Any] | WizardConfig,
    **kwargs: Any,
) -> Self:
    """Set wizard reasoning with a config path, inline dict, or WizardConfig.

    When ``wizard_config`` is a ``WizardConfig`` object, the caller
    is responsible for writing it to disk via ``wizard_config.to_file()``
    before the bot loads.

    When ``wizard_config`` is a ``dict``, it is stored inline in the
    reasoning config and loaded via
    ``WizardConfigLoader.load_from_dict()`` at bot startup.

    Args:
        wizard_config: Path to a wizard YAML file, an inline dict
            (compatible with ``WizardConfigLoader.load_from_dict()``),
            or a ``WizardConfig`` object (whose ``name`` is used as
            the config path identifier).
        **kwargs: Additional reasoning settings
            (extraction_config, etc.).

    Returns:
        self for method chaining.
    """
    if isinstance(wizard_config, dict):
        return self.set_reasoning(
            "wizard", wizard_config=wizard_config, **kwargs
        )
    config_path = (
        wizard_config
        if isinstance(wizard_config, str)
        else wizard_config.name
    )
    return self.set_reasoning(
        "wizard", wizard_config=config_path, **kwargs
    )
set_system_prompt
set_system_prompt(
    content: str | None = None,
    name: str | None = None,
    rag_configs: list[dict[str, Any]] | None = None,
) -> Self

Set the system prompt configuration.

Provide either content (inline prompt) or name (template reference). Optionally add RAG configurations for prompt enhancement.

Parameters:

Name Type Description Default
content str | None

Inline prompt content.

None
name str | None

Prompt template name.

None
rag_configs list[dict[str, Any]] | None

RAG configurations for prompt enhancement.

None

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_system_prompt(
    self,
    content: str | None = None,
    name: str | None = None,
    rag_configs: list[dict[str, Any]] | None = None,
) -> Self:
    """Set the system prompt configuration.

    Provide either ``content`` (inline prompt) or ``name`` (template
    reference). Optionally add RAG configurations for prompt enhancement.

    Args:
        content: Inline prompt content.
        name: Prompt template name.
        rag_configs: RAG configurations for prompt enhancement.

    Returns:
        self for method chaining.
    """
    if content is not None and name is None and rag_configs is None:
        self._config["system_prompt"] = content
    else:
        prompt_config: dict[str, Any] = {}
        if content is not None:
            prompt_config["content"] = content
        if name is not None:
            prompt_config["name"] = name
        if rag_configs is not None:
            prompt_config["rag_configs"] = rag_configs
        self._config["system_prompt"] = prompt_config
    return self
set_knowledge_base
set_knowledge_base(**kwargs: Any) -> Self

Set the knowledge base configuration.

Parameters:

Name Type Description Default
**kwargs Any

Knowledge base settings (enabled, type, vector_store, embedding, etc.).

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_knowledge_base(self, **kwargs: Any) -> Self:
    """Set the knowledge base configuration.

    Args:
        **kwargs: Knowledge base settings (enabled, type,
            vector_store, embedding, etc.).

    Returns:
        self for method chaining.
    """
    self._config["knowledge_base"] = dict(kwargs)
    return self
add_tool
add_tool(tool_class: str, **params: Any) -> Self

Add a tool to the bot configuration.

Parameters:

Name Type Description Default
tool_class str

Fully qualified tool class name.

required
**params Any

Tool constructor parameters.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def add_tool(self, tool_class: str, **params: Any) -> Self:
    """Add a tool to the bot configuration.

    Args:
        tool_class: Fully qualified tool class name.
        **params: Tool constructor parameters.

    Returns:
        self for method chaining.
    """
    tools = self._config.setdefault("tools", [])
    tool_entry: dict[str, Any] = {"class": tool_class}
    if params:
        tool_entry["params"] = dict(params)
    tools.append(tool_entry)
    return self
add_tool_by_name
add_tool_by_name(
    catalog: ToolCatalog, name: str, **param_overrides: Any
) -> Self

Add a tool to the config by looking up its catalog entry.

Resolves the tool name to a class path via the catalog and adds it with default params (overridable).

Parameters:

Name Type Description Default
catalog ToolCatalog

Tool catalog for name resolution.

required
name str

Tool name to look up.

required
**param_overrides Any

Override default params.

{}

Returns:

Type Description
Self

self for method chaining.

Raises:

Type Description
NotFoundError

If tool name is not in the catalog.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def add_tool_by_name(
    self,
    catalog: ToolCatalog,
    name: str,
    **param_overrides: Any,
) -> Self:
    """Add a tool to the config by looking up its catalog entry.

    Resolves the tool name to a class path via the catalog and adds
    it with default params (overridable).

    Args:
        catalog: Tool catalog for name resolution.
        name: Tool name to look up.
        **param_overrides: Override default params.

    Returns:
        self for method chaining.

    Raises:
        NotFoundError: If tool name is not in the catalog.
    """
    config = catalog.to_bot_config(name, **param_overrides)
    tools = self._config.setdefault("tools", [])
    tools.append(config)
    return self
add_tools_by_name
add_tools_by_name(
    catalog: ToolCatalog,
    names: Sequence[str],
    overrides: dict[str, dict[str, Any]] | None = None,
) -> Self

Add multiple tools by name from the catalog.

Parameters:

Name Type Description Default
catalog ToolCatalog

Tool catalog for name resolution.

required
names Sequence[str]

Tool names to add.

required
overrides dict[str, dict[str, Any]] | None

Per-tool param overrides keyed by tool name.

None

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def add_tools_by_name(
    self,
    catalog: ToolCatalog,
    names: Sequence[str],
    overrides: dict[str, dict[str, Any]] | None = None,
) -> Self:
    """Add multiple tools by name from the catalog.

    Args:
        catalog: Tool catalog for name resolution.
        names: Tool names to add.
        overrides: Per-tool param overrides keyed by tool name.

    Returns:
        self for method chaining.
    """
    configs = catalog.to_bot_configs(names, overrides)
    tools = self._config.setdefault("tools", [])
    tools.extend(configs)
    return self
add_middleware
add_middleware(middleware_class: str, **params: Any) -> Self

Add middleware to the bot configuration.

Parameters:

Name Type Description Default
middleware_class str

Fully qualified middleware class name.

required
**params Any

Middleware constructor parameters.

{}

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def add_middleware(self, middleware_class: str, **params: Any) -> Self:
    """Add middleware to the bot configuration.

    Args:
        middleware_class: Fully qualified middleware class name.
        **params: Middleware constructor parameters.

    Returns:
        self for method chaining.
    """
    middleware = self._config.setdefault("middleware", [])
    mw_entry: dict[str, Any] = {"class": middleware_class}
    if params:
        mw_entry["params"] = dict(params)
    middleware.append(mw_entry)
    return self
set_custom_section
set_custom_section(key: str, value: Any) -> Self

Set a custom (domain-specific) config section.

This is the extension point for consumers to add sections like educational, customer_service, domain, etc.

Parameters:

Name Type Description Default
key str

Section key name.

required
value Any

Section value (dict, list, or scalar).

required

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def set_custom_section(self, key: str, value: Any) -> Self:
    """Set a custom (domain-specific) config section.

    This is the extension point for consumers to add sections like
    ``educational``, ``customer_service``, ``domain``, etc.

    Args:
        key: Section key name.
        value: Section value (dict, list, or scalar).

    Returns:
        self for method chaining.
    """
    self._custom_sections[key] = value
    return self
from_template
from_template(template: ConfigTemplate, variables: dict[str, Any]) -> Self

Initialize the builder from a template.

Deep-copies the template structure, substitutes variables, and uses the result as the builder's base configuration.

Parameters:

Name Type Description Default
template ConfigTemplate

The template to apply.

required
variables dict[str, Any]

Variable values for substitution.

required

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def from_template(
    self,
    template: ConfigTemplate,
    variables: dict[str, Any],
) -> Self:
    """Initialize the builder from a template.

    Deep-copies the template structure, substitutes variables, and
    uses the result as the builder's base configuration.

    Args:
        template: The template to apply.
        variables: Variable values for substitution.

    Returns:
        self for method chaining.
    """
    from .templates import _build_variable_map

    from dataknobs_config.template_vars import substitute_template_vars

    var_map = _build_variable_map(template, variables)
    structure = copy.deepcopy(template.structure)
    resolved: dict[str, Any] = substitute_template_vars(
        structure, var_map, preserve_missing=True
    )

    # If structure has a 'bot' key, use its contents as the config
    if "bot" in resolved:
        self._config = dict(resolved.pop("bot"))
        # Remaining top-level keys become custom sections
        for key, value in resolved.items():
            self._custom_sections[key] = value
    else:
        self._config = resolved

    return self
merge_overrides
merge_overrides(overrides: dict[str, Any]) -> Self

Merge override values into the current configuration.

Performs recursive dict merge for nested dictionaries.

Parameters:

Name Type Description Default
overrides dict[str, Any]

Override values to merge.

required

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def merge_overrides(self, overrides: dict[str, Any]) -> Self:
    """Merge override values into the current configuration.

    Performs recursive dict merge for nested dictionaries.

    Args:
        overrides: Override values to merge.

    Returns:
        self for method chaining.
    """
    self._config = _deep_merge(self._config, overrides)
    return self
validate
validate() -> ValidationResult

Validate the current configuration.

Returns:

Type Description
ValidationResult

ValidationResult with any errors and warnings.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def validate(self) -> ValidationResult:
    """Validate the current configuration.

    Returns:
        ValidationResult with any errors and warnings.
    """
    config = self._build_internal()
    return self._validator.validate(config)
build
build() -> dict[str, Any]

Build the flat configuration dict.

The returned dict is compatible with DynaBot.from_config(). Validates before returning and raises ValueError if there are errors.

Returns:

Type Description
dict[str, Any]

Flat configuration dictionary.

Raises:

Type Description
ValueError

If the configuration has validation errors.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def build(self) -> dict[str, Any]:
    """Build the flat configuration dict.

    The returned dict is compatible with ``DynaBot.from_config()``.
    Validates before returning and raises ValueError if there are errors.

    Returns:
        Flat configuration dictionary.

    Raises:
        ValueError: If the configuration has validation errors.
    """
    config = self._build_internal()
    result = self._validator.validate(config)
    if not result.valid:
        raise ValueError(
            "Configuration validation failed:\n"
            + "\n".join(f"  - {e}" for e in result.errors)
        )
    for warning in result.warnings:
        logger.warning("Config warning: %s", warning)
    return config
build_portable
build_portable() -> dict[str, Any]

Build the portable configuration with $resource references.

Wraps the config under a bot key and includes any custom sections as top-level siblings.

Returns:

Type Description
dict[str, Any]

Portable configuration dict with bot wrapper.

Raises:

Type Description
ValueError

If the configuration has validation errors.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def build_portable(self) -> dict[str, Any]:
    """Build the portable configuration with $resource references.

    Wraps the config under a ``bot`` key and includes any custom
    sections as top-level siblings.

    Returns:
        Portable configuration dict with ``bot`` wrapper.

    Raises:
        ValueError: If the configuration has validation errors.
    """
    config = self._build_internal()
    result = self._validator.validate(config)
    if not result.valid:
        raise ValueError(
            "Configuration validation failed:\n"
            + "\n".join(f"  - {e}" for e in result.errors)
        )
    for warning in result.warnings:
        logger.warning("Config warning: %s", warning)

    # Separate core bot config from custom sections
    bot_config: dict[str, Any] = {}
    custom: dict[str, Any] = {}
    for key, value in config.items():
        if key in self._custom_sections:
            custom[key] = value
        else:
            bot_config[key] = value

    portable: dict[str, Any] = {"bot": bot_config}
    portable.update(custom)
    return portable
to_yaml
to_yaml() -> str

Serialize the portable configuration as YAML.

Returns:

Type Description
str

YAML string representation.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def to_yaml(self) -> str:
    """Serialize the portable configuration as YAML.

    Returns:
        YAML string representation.
    """
    portable = self.build_portable()
    return yaml.dump(portable, default_flow_style=False, sort_keys=False)
reset
reset() -> Self

Reset the builder to an empty state.

Returns:

Type Description
Self

self for method chaining.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
def reset(self) -> Self:
    """Reset the builder to an empty state.

    Returns:
        self for method chaining.
    """
    self._config = {}
    self._custom_sections = {}
    return self
from_config classmethod
from_config(config: dict[str, Any]) -> DynaBotConfigBuilder

Create a builder pre-populated from an existing config.

Supports both flat format and portable format (with bot wrapper).

Parameters:

Name Type Description Default
config dict[str, Any]

Existing configuration dictionary.

required

Returns:

Type Description
DynaBotConfigBuilder

A new builder instance with the config loaded.

Source code in packages/bots/src/dataknobs_bots/config/builder.py
@classmethod
def from_config(cls, config: dict[str, Any]) -> DynaBotConfigBuilder:
    """Create a builder pre-populated from an existing config.

    Supports both flat format and portable format (with ``bot`` wrapper).

    Args:
        config: Existing configuration dictionary.

    Returns:
        A new builder instance with the config loaded.
    """
    builder = cls()
    if "bot" in config:
        bot = dict(config["bot"])
        builder._config = bot
        for key, value in config.items():
            if key != "bot":
                builder._custom_sections[key] = value
    else:
        builder._config = dict(config)
    return builder

DynaBotConfigSchema

DynaBotConfigSchema()

Queryable registry of valid DynaBot configuration options.

Auto-registers the 8 default DynaBot components on initialization. Consumers can register additional extensions for domain-specific sections.

Methods:

Name Description
register_component

Register a core DynaBot component schema.

register_extension

Register a consumer-specific config extension.

get_component_schema

Get the JSON Schema for a component.

get_extension_schema

Get the JSON Schema for an extension.

get_valid_options

Get valid options for a field within a component or extension.

validate

Validate a config against all registered schemas.

get_full_schema

Get the combined schema for all components and extensions.

to_description

Generate a human-readable description for LLM system prompts.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def __init__(self) -> None:
    self._components: dict[str, ComponentSchema] = {}
    self._extensions: dict[str, ComponentSchema] = {}
    self._register_defaults()
Functions
register_component
register_component(
    name: str,
    schema: dict[str, Any],
    description: str = "",
    required: bool = False,
) -> None

Register a core DynaBot component schema.

Parameters:

Name Type Description Default
name str

Component name.

required
schema dict[str, Any]

JSON Schema-like definition.

required
description str

Human-readable description.

''
required bool

Whether this component is required.

False
Source code in packages/bots/src/dataknobs_bots/config/schema.py
def register_component(
    self,
    name: str,
    schema: dict[str, Any],
    description: str = "",
    required: bool = False,
) -> None:
    """Register a core DynaBot component schema.

    Args:
        name: Component name.
        schema: JSON Schema-like definition.
        description: Human-readable description.
        required: Whether this component is required.
    """
    self._components[name] = ComponentSchema(
        name=name,
        description=description,
        schema=schema,
        required=required,
    )
    logger.debug("Registered component schema: %s", name)
register_extension
register_extension(
    name: str, schema: dict[str, Any], description: str = ""
) -> None

Register a consumer-specific config extension.

Extensions are domain-specific sections (e.g., 'educational', 'customer_service') that aren't part of the core DynaBot schema.

Parameters:

Name Type Description Default
name str

Extension name.

required
schema dict[str, Any]

JSON Schema-like definition.

required
description str

Human-readable description.

''
Source code in packages/bots/src/dataknobs_bots/config/schema.py
def register_extension(
    self,
    name: str,
    schema: dict[str, Any],
    description: str = "",
) -> None:
    """Register a consumer-specific config extension.

    Extensions are domain-specific sections (e.g., 'educational',
    'customer_service') that aren't part of the core DynaBot schema.

    Args:
        name: Extension name.
        schema: JSON Schema-like definition.
        description: Human-readable description.
    """
    self._extensions[name] = ComponentSchema(
        name=name,
        description=description or f"Extension: {name}",
        schema=schema,
    )
    logger.debug("Registered extension schema: %s", name)
get_component_schema
get_component_schema(name: str) -> dict[str, Any] | None

Get the JSON Schema for a component.

Parameters:

Name Type Description Default
name str

Component name.

required

Returns:

Type Description
dict[str, Any] | None

JSON Schema dict, or None if not registered.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def get_component_schema(self, name: str) -> dict[str, Any] | None:
    """Get the JSON Schema for a component.

    Args:
        name: Component name.

    Returns:
        JSON Schema dict, or None if not registered.
    """
    component = self._components.get(name)
    if component is not None:
        return component.schema
    return None
get_extension_schema
get_extension_schema(name: str) -> dict[str, Any] | None

Get the JSON Schema for an extension.

Parameters:

Name Type Description Default
name str

Extension name.

required

Returns:

Type Description
dict[str, Any] | None

JSON Schema dict, or None if not registered.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def get_extension_schema(self, name: str) -> dict[str, Any] | None:
    """Get the JSON Schema for an extension.

    Args:
        name: Extension name.

    Returns:
        JSON Schema dict, or None if not registered.
    """
    ext = self._extensions.get(name)
    if ext is not None:
        return ext.schema
    return None
get_valid_options
get_valid_options(component: str, field_name: str) -> list[str]

Get valid options for a field within a component or extension.

Parameters:

Name Type Description Default
component str

Component or extension name.

required
field_name str

Field name to query.

required

Returns:

Type Description
list[str]

List of valid option strings.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def get_valid_options(self, component: str, field_name: str) -> list[str]:
    """Get valid options for a field within a component or extension.

    Args:
        component: Component or extension name.
        field_name: Field name to query.

    Returns:
        List of valid option strings.
    """
    comp = self._components.get(component) or self._extensions.get(component)
    if comp is not None:
        return comp.get_valid_options(field_name)
    return []
validate
validate(config: dict[str, Any]) -> ValidationResult

Validate a config against all registered schemas.

Parameters:

Name Type Description Default
config dict[str, Any]

Full DynaBot configuration dict.

required

Returns:

Type Description
ValidationResult

ValidationResult with all schema violations.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def validate(self, config: dict[str, Any]) -> ValidationResult:
    """Validate a config against all registered schemas.

    Args:
        config: Full DynaBot configuration dict.

    Returns:
        ValidationResult with all schema violations.
    """
    result = ValidationResult.ok()
    bot = config.get("bot", config)

    for name, comp in self._components.items():
        if comp.required and name not in bot:
            result = result.merge(
                ValidationResult.error(f"Missing required component: {name}")
            )
        if name in bot and isinstance(bot[name], dict):
            result = result.merge(
                _validate_against_schema(name, bot[name], comp.schema)
            )

    for name, ext in self._extensions.items():
        if name in bot and isinstance(bot[name], dict):
            result = result.merge(
                _validate_against_schema(name, bot[name], ext.schema)
            )

    return result
get_full_schema
get_full_schema() -> dict[str, Any]

Get the combined schema for all components and extensions.

Returns:

Type Description
dict[str, Any]

Dict mapping component/extension names to their schemas.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def get_full_schema(self) -> dict[str, Any]:
    """Get the combined schema for all components and extensions.

    Returns:
        Dict mapping component/extension names to their schemas.
    """
    result: dict[str, Any] = {}
    for name, comp in self._components.items():
        result[name] = {
            "description": comp.description,
            "required": comp.required,
            "schema": comp.schema,
        }
    for name, ext in self._extensions.items():
        result[name] = {
            "description": ext.description,
            "required": False,
            "extension": True,
            "schema": ext.schema,
        }
    return result
to_description
to_description() -> str

Generate a human-readable description for LLM system prompts.

Returns:

Type Description
str

Structured text describing all available configuration options.

Source code in packages/bots/src/dataknobs_bots/config/schema.py
def to_description(self) -> str:
    """Generate a human-readable description for LLM system prompts.

    Returns:
        Structured text describing all available configuration options.
    """
    lines: list[str] = ["# DynaBot Configuration Options", ""]

    lines.append("## Core Components")
    lines.append("")
    for name, comp in self._components.items():
        req = " (required)" if comp.required else " (optional)"
        lines.append(f"### {name}{req}")
        if comp.description:
            lines.append(comp.description)
        props = comp.schema.get("properties", {})
        if props:
            lines.append("")
            for field_name, field_schema in props.items():
                desc = field_schema.get("description", "")
                enum_values = field_schema.get("enum")
                line = f"- **{field_name}**"
                if desc:
                    line += f": {desc}"
                if enum_values:
                    line += f" (options: {', '.join(str(v) for v in enum_values)})"
                lines.append(line)
        lines.append("")

    if self._extensions:
        lines.append("## Extensions")
        lines.append("")
        for name, ext in self._extensions.items():
            lines.append(f"### {name}")
            if ext.description:
                lines.append(ext.description)
            props = ext.schema.get("properties", {})
            if props:
                lines.append("")
                for field_name, field_schema in props.items():
                    desc = field_schema.get("description", "")
                    line = f"- **{field_name}**"
                    if desc:
                        line += f": {desc}"
                    lines.append(line)
            lines.append("")

    return "\n".join(lines)

TemplateVariable dataclass

TemplateVariable(
    name: str,
    description: str = "",
    type: str = "string",
    required: bool = False,
    default: Any = None,
    choices: list[Any] | None = None,
    validation: dict[str, Any] | None = None,
)

Definition of a template variable.

Attributes:

Name Type Description
name str

Variable name used in {{name}} placeholders.

description str

Human-readable description.

type str

Variable type (string, integer, boolean, enum, array).

required bool

Whether the variable must be provided.

default Any

Default value if not provided.

choices list[Any] | None

Valid values for enum-type variables.

validation dict[str, Any] | None

JSON Schema constraints for the value.

Methods:

Name Description
to_dict

Convert to dictionary representation.

from_dict

Create a TemplateVariable from a dictionary.

Functions
to_dict
to_dict() -> dict[str, Any]

Convert to dictionary representation.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary representation."""
    result: dict[str, Any] = {
        "name": self.name,
        "type": self.type,
        "required": self.required,
    }
    if self.description:
        result["description"] = self.description
    if self.default is not None:
        result["default"] = self.default
    if self.choices is not None:
        result["choices"] = self.choices
    if self.validation is not None:
        result["validation"] = self.validation
    return result
from_dict classmethod
from_dict(data: dict[str, Any]) -> TemplateVariable

Create a TemplateVariable from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with variable fields.

required

Returns:

Type Description
TemplateVariable

A new TemplateVariable instance.

Source code in packages/bots/src/dataknobs_bots/config/templates.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> TemplateVariable:
    """Create a TemplateVariable from a dictionary.

    Args:
        data: Dictionary with variable fields.

    Returns:
        A new TemplateVariable instance.
    """
    return cls(
        name=data["name"],
        description=data.get("description", ""),
        type=data.get("type", "string"),
        required=data.get("required", False),
        default=data.get("default"),
        choices=data.get("choices"),
        validation=data.get("validation"),
    )

ToolCatalog

ToolCatalog()

Bases: Registry[ToolEntry]

Registry mapping tool names to class paths and default configuration.

Provides a single source of truth for tool metadata, enabling config builders to reference tools by name and produce correct bot/wizard configs.

Built on Registry[ToolEntry] for thread safety, metrics, and consistent error handling.

Example
catalog = ToolCatalog()
catalog.register_tool(
    name="knowledge_search",
    class_path="dataknobs_bots.tools.knowledge_search.KnowledgeSearchTool",
    description="Search the knowledge base.",
    tags=("general", "rag"),
    requires=("knowledge_base",),
)
config = catalog.to_bot_config("knowledge_search", k=10)

Initialize the catalog.

Methods:

Name Description
register_tool

Register a tool in the catalog.

register_entry

Register a pre-built ToolEntry.

register_from_dict

Register a tool from a dict (e.g., loaded from YAML).

register_many_from_dicts

Register multiple tools from dicts.

register_from_class

Register a tool class that provides catalog_metadata().

list_tools

List all registered tools, optionally filtered by tags.

get_names

Get all registered tool names.

to_bot_config

Generate a bot config tool entry for the named tool.

to_bot_configs

Generate bot config entries for multiple tools.

get_requirements

Get the union of all requirements for the given tool names.

check_requirements

Check that tool requirements are satisfied by a config dict.

instantiate_tool

Import and instantiate a tool from its catalog entry.

create_tool_registry

Create a ToolRegistry populated from catalog entries.

to_dict

Serialize entire catalog to a dict (for YAML output).

from_dict

Create a catalog from a dict (e.g., loaded from YAML).

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def __init__(self) -> None:
    """Initialize the catalog."""
    super().__init__("tool_catalog", enable_metrics=True)
Functions
register_tool
register_tool(
    name: str,
    class_path: str,
    description: str = "",
    default_params: dict[str, Any] | None = None,
    tags: Sequence[str] = (),
    requires: Sequence[str] = (),
) -> None

Register a tool in the catalog.

Parameters:

Name Type Description Default
name str

Tool's runtime name.

required
class_path str

Fully-qualified class path.

required
description str

Human-readable description.

''
default_params dict[str, Any] | None

Default constructor params.

None
tags Sequence[str]

Categorization tags.

()
requires Sequence[str]

Dependency identifiers.

()
Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def register_tool(
    self,
    name: str,
    class_path: str,
    description: str = "",
    default_params: dict[str, Any] | None = None,
    tags: Sequence[str] = (),
    requires: Sequence[str] = (),
) -> None:
    """Register a tool in the catalog.

    Args:
        name: Tool's runtime name.
        class_path: Fully-qualified class path.
        description: Human-readable description.
        default_params: Default constructor params.
        tags: Categorization tags.
        requires: Dependency identifiers.
    """
    entry = ToolEntry(
        name=name,
        class_path=class_path,
        description=description,
        default_params=default_params or {},
        tags=frozenset(tags),
        requires=frozenset(requires),
    )
    self.register(name, entry)
register_entry
register_entry(entry: ToolEntry) -> None

Register a pre-built ToolEntry.

Parameters:

Name Type Description Default
entry ToolEntry

ToolEntry to register.

required
Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def register_entry(self, entry: ToolEntry) -> None:
    """Register a pre-built ToolEntry.

    Args:
        entry: ToolEntry to register.
    """
    self.register(entry.name, entry)
register_from_dict
register_from_dict(data: dict[str, Any]) -> None

Register a tool from a dict (e.g., loaded from YAML).

Parameters:

Name Type Description Default
data dict[str, Any]

Dict with name and class_path keys.

required
Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def register_from_dict(self, data: dict[str, Any]) -> None:
    """Register a tool from a dict (e.g., loaded from YAML).

    Args:
        data: Dict with ``name`` and ``class_path`` keys.
    """
    entry = ToolEntry.from_dict(data)
    self.register(entry.name, entry)
register_many_from_dicts
register_many_from_dicts(entries: list[dict[str, Any]]) -> None

Register multiple tools from dicts.

Parameters:

Name Type Description Default
entries list[dict[str, Any]]

List of tool definition dicts.

required
Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def register_many_from_dicts(self, entries: list[dict[str, Any]]) -> None:
    """Register multiple tools from dicts.

    Args:
        entries: List of tool definition dicts.
    """
    for data in entries:
        self.register_from_dict(data)
register_from_class
register_from_class(tool_class: type) -> None

Register a tool class that provides catalog_metadata().

Computes class_path automatically from the class's module path.

Parameters:

Name Type Description Default
tool_class type

A tool class with a catalog_metadata() classmethod.

required

Raises:

Type Description
ValueError

If tool_class does not implement catalog_metadata().

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def register_from_class(self, tool_class: type) -> None:
    """Register a tool class that provides ``catalog_metadata()``.

    Computes ``class_path`` automatically from the class's module path.

    Args:
        tool_class: A tool class with a ``catalog_metadata()`` classmethod.

    Raises:
        ValueError: If tool_class does not implement ``catalog_metadata()``.
    """
    if not hasattr(tool_class, "catalog_metadata") or not callable(
        tool_class.catalog_metadata
    ):
        raise ValueError(
            f"{tool_class.__name__} does not implement catalog_metadata()"
        )
    meta = tool_class.catalog_metadata()
    class_path = f"{tool_class.__module__}.{tool_class.__qualname__}"
    self.register_tool(
        name=meta["name"],
        class_path=class_path,
        description=meta.get("description", ""),
        default_params=meta.get("default_params"),
        tags=meta.get("tags", ()),
        requires=meta.get("requires", ()),
    )
list_tools
list_tools(tags: Sequence[str] | None = None) -> list[ToolEntry]

List all registered tools, optionally filtered by tags.

Parameters:

Name Type Description Default
tags Sequence[str] | None

If provided, return only tools that have ANY of the specified tags (union semantics).

None

Returns:

Type Description
list[ToolEntry]

List of matching ToolEntry instances.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def list_tools(
    self,
    tags: Sequence[str] | None = None,
) -> list[ToolEntry]:
    """List all registered tools, optionally filtered by tags.

    Args:
        tags: If provided, return only tools that have ANY of the
            specified tags (union semantics).

    Returns:
        List of matching ToolEntry instances.
    """
    entries = self.list_items()
    if tags:
        tag_set = frozenset(tags)
        entries = [e for e in entries if e.tags & tag_set]
    return entries
get_names
get_names() -> list[str]

Get all registered tool names.

Returns:

Type Description
list[str]

List of tool names.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def get_names(self) -> list[str]:
    """Get all registered tool names.

    Returns:
        List of tool names.
    """
    return self.list_keys()
to_bot_config
to_bot_config(name: str, **param_overrides: Any) -> dict[str, Any]

Generate a bot config tool entry for the named tool.

Returns a dict suitable for DynaBot._resolve_tool(): {"class": "full.class.path", "params": {...}}

Parameters:

Name Type Description Default
name str

Tool name to look up.

required
**param_overrides Any

Override default params.

{}

Returns:

Type Description
dict[str, Any]

Bot config dict for the tool.

Raises:

Type Description
NotFoundError

If tool name is not registered.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def to_bot_config(self, name: str, **param_overrides: Any) -> dict[str, Any]:
    """Generate a bot config tool entry for the named tool.

    Returns a dict suitable for ``DynaBot._resolve_tool()``:
    ``{"class": "full.class.path", "params": {...}}``

    Args:
        name: Tool name to look up.
        **param_overrides: Override default params.

    Returns:
        Bot config dict for the tool.

    Raises:
        NotFoundError: If tool name is not registered.
    """
    entry = self.get(name)
    return entry.to_bot_config(**param_overrides)
to_bot_configs
to_bot_configs(
    names: Sequence[str], overrides: dict[str, dict[str, Any]] | None = None
) -> list[dict[str, Any]]

Generate bot config entries for multiple tools.

Parameters:

Name Type Description Default
names Sequence[str]

Tool names to include.

required
overrides dict[str, dict[str, Any]] | None

Per-tool param overrides keyed by tool name.

None

Returns:

Type Description
list[dict[str, Any]]

List of bot config dicts.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def to_bot_configs(
    self,
    names: Sequence[str],
    overrides: dict[str, dict[str, Any]] | None = None,
) -> list[dict[str, Any]]:
    """Generate bot config entries for multiple tools.

    Args:
        names: Tool names to include.
        overrides: Per-tool param overrides keyed by tool name.

    Returns:
        List of bot config dicts.
    """
    overrides = overrides or {}
    return [
        self.to_bot_config(name, **overrides.get(name, {}))
        for name in names
    ]
get_requirements
get_requirements(names: Sequence[str]) -> frozenset[str]

Get the union of all requirements for the given tool names.

Parameters:

Name Type Description Default
names Sequence[str]

Tool names to check.

required

Returns:

Type Description
frozenset[str]

Set of all requirement identifiers across the named tools.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def get_requirements(self, names: Sequence[str]) -> frozenset[str]:
    """Get the union of all requirements for the given tool names.

    Args:
        names: Tool names to check.

    Returns:
        Set of all requirement identifiers across the named tools.
    """
    reqs: set[str] = set()
    for name in names:
        entry = self.get(name)
        reqs.update(entry.requires)
    return frozenset(reqs)
check_requirements
check_requirements(
    tool_names: Sequence[str], config: dict[str, Any]
) -> list[str]

Check that tool requirements are satisfied by a config dict.

Returns a list of warning messages for any unmet requirements. Tools with no requirements are always satisfied.

Parameters:

Name Type Description Default
tool_names Sequence[str]

Names of tools to check.

required
config dict[str, Any]

Bot config dict to check against (top-level keys).

required

Returns:

Type Description
list[str]

List of warning strings (empty if all requirements met).

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def check_requirements(
    self,
    tool_names: Sequence[str],
    config: dict[str, Any],
) -> list[str]:
    """Check that tool requirements are satisfied by a config dict.

    Returns a list of warning messages for any unmet requirements.
    Tools with no requirements are always satisfied.

    Args:
        tool_names: Names of tools to check.
        config: Bot config dict to check against (top-level keys).

    Returns:
        List of warning strings (empty if all requirements met).
    """
    warnings: list[str] = []
    for name in tool_names:
        entry = self.get(name)
        for req in sorted(entry.requires):
            if req not in config:
                warnings.append(
                    f"Tool '{name}' requires '{req}' "
                    f"but it is not configured"
                )
    return warnings
instantiate_tool
instantiate_tool(name: str, **param_overrides: Any) -> Any

Import and instantiate a tool from its catalog entry.

Uses resolve_callable() to import the class, then instantiates it with default_params merged with overrides. Prefers from_config() if the class defines it.

Parameters:

Name Type Description Default
name str

Tool name to instantiate.

required
**param_overrides Any

Override default params.

{}

Returns:

Type Description
Any

Instantiated tool.

Raises:

Type Description
NotFoundError

If name not in catalog.

ImportError

If class cannot be imported.

ValueError

If resolved class is not callable.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def instantiate_tool(self, name: str, **param_overrides: Any) -> Any:
    """Import and instantiate a tool from its catalog entry.

    Uses ``resolve_callable()`` to import the class, then instantiates
    it with ``default_params`` merged with overrides. Prefers
    ``from_config()`` if the class defines it.

    Args:
        name: Tool name to instantiate.
        **param_overrides: Override default params.

    Returns:
        Instantiated tool.

    Raises:
        NotFoundError: If name not in catalog.
        ImportError: If class cannot be imported.
        ValueError: If resolved class is not callable.
    """
    from dataknobs_bots.tools.resolve import resolve_callable

    entry = self.get(name)
    tool_class = resolve_callable(entry.class_path)
    params = dict(entry.default_params)
    params.update(param_overrides)

    if hasattr(tool_class, "from_config") and callable(
        tool_class.from_config
    ):
        return tool_class.from_config(params)
    return tool_class(**params) if params else tool_class()
create_tool_registry
create_tool_registry(
    names: Sequence[str] | None = None,
    overrides: dict[str, dict[str, Any]] | None = None,
    strict: bool = False,
) -> Any

Create a ToolRegistry populated from catalog entries.

Imports and instantiates each named tool, registering them in a new ToolRegistry.

Parameters:

Name Type Description Default
names Sequence[str] | None

Tool names to include (default: all registered).

None
overrides dict[str, dict[str, Any]] | None

Per-tool param overrides keyed by tool name.

None
strict bool

If True, raise on instantiation failure. If False (default), skip failed tools and log warnings.

False

Returns:

Type Description
Any

ToolRegistry with instantiated tools.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def create_tool_registry(
    self,
    names: Sequence[str] | None = None,
    overrides: dict[str, dict[str, Any]] | None = None,
    strict: bool = False,
) -> Any:
    """Create a ToolRegistry populated from catalog entries.

    Imports and instantiates each named tool, registering them in a
    new ``ToolRegistry``.

    Args:
        names: Tool names to include (default: all registered).
        overrides: Per-tool param overrides keyed by tool name.
        strict: If True, raise on instantiation failure.
            If False (default), skip failed tools and log warnings.

    Returns:
        ToolRegistry with instantiated tools.
    """
    from dataknobs_llm.tools import ToolRegistry

    registry = ToolRegistry()
    target_names = list(names) if names else self.list_keys()
    overrides = overrides or {}

    for name in target_names:
        try:
            tool = self.instantiate_tool(name, **overrides.get(name, {}))
            registry.register_tool(tool)
        except Exception as e:
            if strict:
                raise
            logger.warning(
                "Failed to instantiate tool '%s': %s", name, e
            )

    return registry
to_dict
to_dict() -> dict[str, Any]

Serialize entire catalog to a dict (for YAML output).

Returns:

Type Description
dict[str, Any]

Dict with tools key containing list of tool dicts.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def to_dict(self) -> dict[str, Any]:
    """Serialize entire catalog to a dict (for YAML output).

    Returns:
        Dict with ``tools`` key containing list of tool dicts.
    """
    return {
        "tools": [entry.to_dict() for entry in self.list_items()]
    }
from_dict classmethod
from_dict(data: dict[str, Any]) -> ToolCatalog

Create a catalog from a dict (e.g., loaded from YAML).

Parameters:

Name Type Description Default
data dict[str, Any]

Dict with tools key containing list of tool dicts.

required

Returns:

Type Description
ToolCatalog

New ToolCatalog populated from the data.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> ToolCatalog:
    """Create a catalog from a dict (e.g., loaded from YAML).

    Args:
        data: Dict with ``tools`` key containing list of tool dicts.

    Returns:
        New ToolCatalog populated from the data.
    """
    catalog = cls()
    for tool_data in data.get("tools", []):
        catalog.register_from_dict(tool_data)
    return catalog

ToolEntry dataclass

ToolEntry(
    name: str,
    class_path: str,
    description: str = "",
    default_params: dict[str, Any] = None,
    tags: frozenset[str] = frozenset(),
    requires: frozenset[str] = frozenset(),
)

Metadata for a tool in the catalog.

Captures the information needed to: - Generate bot config entries (class path + params) - Reference tools in wizard stage configs (name) - Discover tools by capability (tags) - Validate tool dependencies (requires)

Methods:

Name Description
__post_init__

Set default_params to empty dict if None.

to_dict

Serialize to dict (suitable for YAML output).

from_dict

Deserialize from dict (e.g., loaded from YAML).

to_bot_config

Generate a bot config tool entry.

Functions
__post_init__
__post_init__() -> None

Set default_params to empty dict if None.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def __post_init__(self) -> None:
    """Set default_params to empty dict if None."""
    if self.default_params is None:
        object.__setattr__(self, "default_params", {})
to_dict
to_dict() -> dict[str, Any]

Serialize to dict (suitable for YAML output).

Omits empty/default fields for clean output.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def to_dict(self) -> dict[str, Any]:
    """Serialize to dict (suitable for YAML output).

    Omits empty/default fields for clean output.
    """
    result: dict[str, Any] = {
        "name": self.name,
        "class_path": self.class_path,
    }
    if self.description:
        result["description"] = self.description
    if self.default_params:
        result["default_params"] = dict(self.default_params)
    if self.tags:
        result["tags"] = sorted(self.tags)
    if self.requires:
        result["requires"] = sorted(self.requires)
    return result
from_dict classmethod
from_dict(data: dict[str, Any]) -> ToolEntry

Deserialize from dict (e.g., loaded from YAML).

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> ToolEntry:
    """Deserialize from dict (e.g., loaded from YAML)."""
    return cls(
        name=data["name"],
        class_path=data["class_path"],
        description=data.get("description", ""),
        default_params=data.get("default_params") or {},
        tags=frozenset(data.get("tags") or ()),
        requires=frozenset(data.get("requires") or ()),
    )
to_bot_config
to_bot_config(**param_overrides: Any) -> dict[str, Any]

Generate a bot config tool entry.

Returns a dict suitable for DynaBot._resolve_tool(): {"class": "full.class.path", "params": {...}}

Parameters:

Name Type Description Default
**param_overrides Any

Override default params.

{}

Returns:

Type Description
dict[str, Any]

Bot config dict for this tool.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def to_bot_config(self, **param_overrides: Any) -> dict[str, Any]:
    """Generate a bot config tool entry.

    Returns a dict suitable for ``DynaBot._resolve_tool()``:
    ``{"class": "full.class.path", "params": {...}}``

    Args:
        **param_overrides: Override default params.

    Returns:
        Bot config dict for this tool.
    """
    params = dict(self.default_params)
    params.update(param_overrides)
    config: dict[str, Any] = {"class": self.class_path}
    if params:
        config["params"] = params
    return config

ValidationResult dataclass

ValidationResult(
    valid: bool, errors: list[str] = list(), warnings: list[str] = list()
)

Result of validating a configuration.

Attributes:

Name Type Description
valid bool

Whether the configuration passed validation.

errors list[str]

List of error messages (validation failures).

warnings list[str]

List of warning messages (non-blocking issues).

Methods:

Name Description
merge

Merge another validation result into this one.

ok

Create a successful validation result.

error

Create a failed validation result with a single error.

warning

Create a successful validation result with a warning.

to_dict

Convert to dictionary representation.

Functions
merge
merge(other: ValidationResult) -> ValidationResult

Merge another validation result into this one.

The merged result is valid only if both results are valid.

Parameters:

Name Type Description Default
other ValidationResult

Another validation result to merge.

required

Returns:

Type Description
ValidationResult

A new ValidationResult with combined errors and warnings.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def merge(self, other: ValidationResult) -> ValidationResult:
    """Merge another validation result into this one.

    The merged result is valid only if both results are valid.

    Args:
        other: Another validation result to merge.

    Returns:
        A new ValidationResult with combined errors and warnings.
    """
    return ValidationResult(
        valid=self.valid and other.valid,
        errors=self.errors + other.errors,
        warnings=self.warnings + other.warnings,
    )
ok classmethod
ok() -> ValidationResult

Create a successful validation result.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
@classmethod
def ok(cls) -> ValidationResult:
    """Create a successful validation result."""
    return cls(valid=True)
error classmethod
error(message: str) -> ValidationResult

Create a failed validation result with a single error.

Parameters:

Name Type Description Default
message str

The error message.

required
Source code in packages/bots/src/dataknobs_bots/config/validation.py
@classmethod
def error(cls, message: str) -> ValidationResult:
    """Create a failed validation result with a single error.

    Args:
        message: The error message.
    """
    return cls(valid=False, errors=[message])
warning classmethod
warning(message: str) -> ValidationResult

Create a successful validation result with a warning.

Parameters:

Name Type Description Default
message str

The warning message.

required
Source code in packages/bots/src/dataknobs_bots/config/validation.py
@classmethod
def warning(cls, message: str) -> ValidationResult:
    """Create a successful validation result with a warning.

    Args:
        message: The warning message.
    """
    return cls(valid=True, warnings=[message])
to_dict
to_dict() -> dict[str, Any]

Convert to dictionary representation.

Source code in packages/bots/src/dataknobs_bots/config/validation.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary representation."""
    return {
        "valid": self.valid,
        "errors": self.errors,
        "warnings": self.warnings,
    }

RAGKnowledgeBase

RAGKnowledgeBase(
    vector_store: Any,
    embedding_provider: Any,
    chunking_config: dict[str, Any] | None = None,
    merger_config: MergerConfig | None = None,
    formatter_config: FormatterConfig | None = None,
)

Bases: KnowledgeBase

RAG knowledge base using dataknobs-xization for chunking and vector search.

This implementation: - Parses markdown documents using dataknobs-xization - Chunks documents intelligently based on structure - Stores chunks with embeddings in vector store - Provides semantic search for relevant context

Attributes:

Name Type Description
vector_store

Vector store backend from dataknobs_data

embedding_provider

LLM provider for generating embeddings

chunking_config

Configuration for document chunking

Initialize RAG knowledge base.

Parameters:

Name Type Description Default
vector_store Any

Vector store backend instance

required
embedding_provider Any

LLM provider with embed() method

required
chunking_config dict[str, Any] | None

Configuration for chunking: - max_chunk_size: Maximum chunk size in characters - combine_under_heading: Combine text under same heading - quality_filter: ChunkQualityConfig for filtering - generate_embeddings: Whether to generate enriched embedding text

None
merger_config MergerConfig | None

Configuration for chunk merging (optional)

None
formatter_config FormatterConfig | None

Configuration for context formatting (optional)

None

Methods:

Name Description
from_config

Create RAG knowledge base from configuration.

load_markdown_document

Load and chunk a markdown document from a file.

load_documents_from_directory

Load all markdown documents from a directory.

load_json_document

Load and chunk a JSON document by converting it to markdown.

load_yaml_document

Load and chunk a YAML document by converting it to markdown.

load_csv_document

Load and chunk a CSV document by converting it to markdown.

load_from_directory

Load documents from a directory using KnowledgeBaseConfig.

load_markdown_text

Load markdown content from a string.

query

Query knowledge base for relevant chunks.

hybrid_query

Query knowledge base using hybrid search (text + vector).

format_context

Format search results for LLM context.

count

Get the number of chunks in the knowledge base.

clear

Clear all documents from the knowledge base.

save

Save the knowledge base to persistent storage.

providers

Return the embedding provider, keyed by role.

set_provider

Replace the embedding provider if the role matches.

close

Close the knowledge base and release resources.

__aenter__

Async context manager entry.

__aexit__

Async context manager exit - ensures cleanup.

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
def __init__(
    self,
    vector_store: Any,
    embedding_provider: Any,
    chunking_config: dict[str, Any] | None = None,
    merger_config: MergerConfig | None = None,
    formatter_config: FormatterConfig | None = None,
):
    """Initialize RAG knowledge base.

    Args:
        vector_store: Vector store backend instance
        embedding_provider: LLM provider with embed() method
        chunking_config: Configuration for chunking:
            - max_chunk_size: Maximum chunk size in characters
            - combine_under_heading: Combine text under same heading
            - quality_filter: ChunkQualityConfig for filtering
            - generate_embeddings: Whether to generate enriched embedding text
        merger_config: Configuration for chunk merging (optional)
        formatter_config: Configuration for context formatting (optional)
    """
    self.vector_store = vector_store
    self.embedding_provider = embedding_provider
    self.chunking_config = chunking_config or {
        "max_chunk_size": 500,
        "combine_under_heading": True,
    }

    # Initialize merger and formatter
    self.merger = ChunkMerger(merger_config) if merger_config else ChunkMerger()
    self.formatter = ContextFormatter(formatter_config) if formatter_config else ContextFormatter()
Functions
from_config async classmethod
from_config(config: dict[str, Any]) -> RAGKnowledgeBase

Create RAG knowledge base from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with: - vector_store: Vector store configuration - embedding: Nested embedding config dict (preferred), e.g. {"provider": "ollama", "model": "nomic-embed-text"} - embedding_provider / embedding_model: Legacy flat keys - chunking: Optional chunking configuration - documents_path: Optional path to load documents from - document_pattern: Optional glob pattern for documents

required

Returns:

Type Description
RAGKnowledgeBase

Configured RAGKnowledgeBase instance

Example
config = {
    "vector_store": {
        "backend": "faiss",
        "dimensions": 768,
        "collection": "docs"
    },
    "embedding": {
        "provider": "ollama",
        "model": "nomic-embed-text",
    },
    "chunking": {
        "max_chunk_size": 500
    },
    "documents_path": "./docs"
}
kb = await RAGKnowledgeBase.from_config(config)
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
@classmethod
async def from_config(cls, config: dict[str, Any]) -> "RAGKnowledgeBase":
    """Create RAG knowledge base from configuration.

    Args:
        config: Configuration dictionary with:
            - vector_store: Vector store configuration
            - embedding: Nested embedding config dict (preferred), e.g.
              ``{"provider": "ollama", "model": "nomic-embed-text"}``
            - embedding_provider / embedding_model: Legacy flat keys
            - chunking: Optional chunking configuration
            - documents_path: Optional path to load documents from
            - document_pattern: Optional glob pattern for documents

    Returns:
        Configured RAGKnowledgeBase instance

    Example:
        ```python
        config = {
            "vector_store": {
                "backend": "faiss",
                "dimensions": 768,
                "collection": "docs"
            },
            "embedding": {
                "provider": "ollama",
                "model": "nomic-embed-text",
            },
            "chunking": {
                "max_chunk_size": 500
            },
            "documents_path": "./docs"
        }
        kb = await RAGKnowledgeBase.from_config(config)
        ```
    """
    from dataknobs_data.vector.stores import VectorStoreFactory

    from ..providers import create_embedding_provider

    # Create vector store
    vs_config = config["vector_store"]
    factory = VectorStoreFactory()
    vector_store = factory.create(**vs_config)
    await vector_store.initialize()

    # Create embedding provider
    embedding_provider = await create_embedding_provider(config)

    # Create merger config if specified
    merger_config = None
    if "merger" in config:
        merger_config = MergerConfig(**config["merger"])

    # Create formatter config if specified
    formatter_config = None
    if "formatter" in config:
        formatter_config = FormatterConfig(**config["formatter"])

    # Create instance
    kb = cls(
        vector_store=vector_store,
        embedding_provider=embedding_provider,
        chunking_config=config.get("chunking", {}),
        merger_config=merger_config,
        formatter_config=formatter_config,
    )

    # Load documents if path provided
    if "documents_path" in config:
        await kb.load_documents_from_directory(
            config["documents_path"], config.get("document_pattern", "**/*.md")
        )

    return kb
load_markdown_document async
load_markdown_document(
    filepath: str | Path, metadata: dict[str, Any] | None = None
) -> int

Load and chunk a markdown document from a file.

Reads the file and delegates to :meth:load_markdown_text for parsing, chunking, embedding, and storage.

Parameters:

Name Type Description Default
filepath str | Path

Path to markdown file

required
metadata dict[str, Any] | None

Optional metadata to attach to all chunks

None

Returns:

Type Description
int

Number of chunks created

Example
num_chunks = await kb.load_markdown_document(
    "docs/api.md",
    metadata={"category": "api", "version": "1.0"}
)
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_markdown_document(
    self, filepath: str | Path, metadata: dict[str, Any] | None = None
) -> int:
    """Load and chunk a markdown document from a file.

    Reads the file and delegates to :meth:`load_markdown_text` for
    parsing, chunking, embedding, and storage.

    Args:
        filepath: Path to markdown file
        metadata: Optional metadata to attach to all chunks

    Returns:
        Number of chunks created

    Example:
        ```python
        num_chunks = await kb.load_markdown_document(
            "docs/api.md",
            metadata={"category": "api", "version": "1.0"}
        )
        ```
    """
    filepath = Path(filepath)
    with open(filepath, encoding="utf-8") as f:
        markdown_text = f.read()

    return await self.load_markdown_text(
        markdown_text,
        source=str(filepath),
        metadata=metadata,
    )
load_documents_from_directory async
load_documents_from_directory(
    directory: str | Path, pattern: str = "**/*.md"
) -> dict[str, Any]

Load all markdown documents from a directory.

Parameters:

Name Type Description Default
directory str | Path

Directory path containing documents

required
pattern str

Glob pattern for files to load (default: **/*.md)

'**/*.md'

Returns:

Type Description
dict[str, Any]

Dictionary with loading statistics: - total_files: Number of files processed - total_chunks: Total chunks created - errors: List of errors encountered

Example
results = await kb.load_documents_from_directory(
    "docs/",
    pattern="**/*.md"
)
print(f"Loaded {results['total_chunks']} chunks from {results['total_files']} files")
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_documents_from_directory(
    self, directory: str | Path, pattern: str = "**/*.md"
) -> dict[str, Any]:
    """Load all markdown documents from a directory.

    Args:
        directory: Directory path containing documents
        pattern: Glob pattern for files to load (default: **/*.md)

    Returns:
        Dictionary with loading statistics:
            - total_files: Number of files processed
            - total_chunks: Total chunks created
            - errors: List of errors encountered

    Example:
        ```python
        results = await kb.load_documents_from_directory(
            "docs/",
            pattern="**/*.md"
        )
        print(f"Loaded {results['total_chunks']} chunks from {results['total_files']} files")
        ```
    """
    directory = Path(directory)
    results = {"total_files": 0, "total_chunks": 0, "errors": []}

    for filepath in directory.glob(pattern):
        if not filepath.is_file():
            continue

        try:
            num_chunks = await self.load_markdown_document(
                filepath, metadata={"filename": filepath.name}
            )
            results["total_files"] += 1
            results["total_chunks"] += num_chunks
        except Exception as e:
            results["errors"].append({"file": str(filepath), "error": str(e)})

    return results
load_json_document async
load_json_document(
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    schema: str | None = None,
    transformer: ContentTransformer | None = None,
    title: str | None = None,
) -> int

Load and chunk a JSON document by converting it to markdown.

This method converts JSON data to markdown format using ContentTransformer, then processes it like any other markdown document.

Parameters:

Name Type Description Default
filepath str | Path

Path to JSON file

required
metadata dict[str, Any] | None

Optional metadata to attach to all chunks

None
schema str | None

Optional schema name (requires transformer with registered schema)

None
transformer ContentTransformer | None

Optional ContentTransformer instance with custom configuration

None
title str | None

Optional document title for the markdown

None

Returns:

Type Description
int

Number of chunks created

Example
# Generic conversion
num_chunks = await kb.load_json_document(
    "data/patterns.json",
    metadata={"content_type": "patterns"}
)

# With custom schema
transformer = ContentTransformer()
transformer.register_schema("pattern", {
    "title_field": "name",
    "sections": [
        {"field": "description", "heading": "Description"},
        {"field": "example", "heading": "Example", "format": "code"}
    ]
})
num_chunks = await kb.load_json_document(
    "data/patterns.json",
    transformer=transformer,
    schema="pattern"
)
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_json_document(
    self,
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    schema: str | None = None,
    transformer: ContentTransformer | None = None,
    title: str | None = None,
) -> int:
    """Load and chunk a JSON document by converting it to markdown.

    This method converts JSON data to markdown format using ContentTransformer,
    then processes it like any other markdown document.

    Args:
        filepath: Path to JSON file
        metadata: Optional metadata to attach to all chunks
        schema: Optional schema name (requires transformer with registered schema)
        transformer: Optional ContentTransformer instance with custom configuration
        title: Optional document title for the markdown

    Returns:
        Number of chunks created

    Example:
        ```python
        # Generic conversion
        num_chunks = await kb.load_json_document(
            "data/patterns.json",
            metadata={"content_type": "patterns"}
        )

        # With custom schema
        transformer = ContentTransformer()
        transformer.register_schema("pattern", {
            "title_field": "name",
            "sections": [
                {"field": "description", "heading": "Description"},
                {"field": "example", "heading": "Example", "format": "code"}
            ]
        })
        num_chunks = await kb.load_json_document(
            "data/patterns.json",
            transformer=transformer,
            schema="pattern"
        )
        ```
    """
    import json

    filepath = Path(filepath)

    # Read JSON
    with open(filepath, encoding="utf-8") as f:
        data = json.load(f)

    # Convert to markdown
    if transformer is None:
        transformer = ContentTransformer()

    markdown_text = transformer.transform_json(
        data,
        schema=schema,
        title=title or filepath.stem.replace("_", " ").title(),
    )

    return await self.load_markdown_text(
        markdown_text,
        source=str(filepath),
        metadata=metadata,
    )
load_yaml_document async
load_yaml_document(
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    schema: str | None = None,
    transformer: ContentTransformer | None = None,
    title: str | None = None,
) -> int

Load and chunk a YAML document by converting it to markdown.

Parameters:

Name Type Description Default
filepath str | Path

Path to YAML file

required
metadata dict[str, Any] | None

Optional metadata to attach to all chunks

None
schema str | None

Optional schema name (requires transformer with registered schema)

None
transformer ContentTransformer | None

Optional ContentTransformer instance with custom configuration

None
title str | None

Optional document title for the markdown

None

Returns:

Type Description
int

Number of chunks created

Example
num_chunks = await kb.load_yaml_document(
    "data/config.yaml",
    metadata={"content_type": "configuration"}
)
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_yaml_document(
    self,
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    schema: str | None = None,
    transformer: ContentTransformer | None = None,
    title: str | None = None,
) -> int:
    """Load and chunk a YAML document by converting it to markdown.

    Args:
        filepath: Path to YAML file
        metadata: Optional metadata to attach to all chunks
        schema: Optional schema name (requires transformer with registered schema)
        transformer: Optional ContentTransformer instance with custom configuration
        title: Optional document title for the markdown

    Returns:
        Number of chunks created

    Example:
        ```python
        num_chunks = await kb.load_yaml_document(
            "data/config.yaml",
            metadata={"content_type": "configuration"}
        )
        ```
    """
    filepath = Path(filepath)

    # Convert to markdown
    if transformer is None:
        transformer = ContentTransformer()

    markdown_text = transformer.transform_yaml(
        filepath,
        schema=schema,
        title=title or filepath.stem.replace("_", " ").title(),
    )

    return await self.load_markdown_text(
        markdown_text,
        source=str(filepath),
        metadata=metadata,
    )
load_csv_document async
load_csv_document(
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    title: str | None = None,
    title_field: str | None = None,
    transformer: ContentTransformer | None = None,
) -> int

Load and chunk a CSV document by converting it to markdown.

Each row becomes a section with the first column (or title_field) as heading.

Parameters:

Name Type Description Default
filepath str | Path

Path to CSV file

required
metadata dict[str, Any] | None

Optional metadata to attach to all chunks

None
title str | None

Optional document title for the markdown

None
title_field str | None

Column to use as section title (default: first column)

None
transformer ContentTransformer | None

Optional ContentTransformer instance with custom configuration

None

Returns:

Type Description
int

Number of chunks created

Example
num_chunks = await kb.load_csv_document(
    "data/faq.csv",
    title="Frequently Asked Questions",
    title_field="question"
)
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_csv_document(
    self,
    filepath: str | Path,
    metadata: dict[str, Any] | None = None,
    title: str | None = None,
    title_field: str | None = None,
    transformer: ContentTransformer | None = None,
) -> int:
    """Load and chunk a CSV document by converting it to markdown.

    Each row becomes a section with the first column (or title_field) as heading.

    Args:
        filepath: Path to CSV file
        metadata: Optional metadata to attach to all chunks
        title: Optional document title for the markdown
        title_field: Column to use as section title (default: first column)
        transformer: Optional ContentTransformer instance with custom configuration

    Returns:
        Number of chunks created

    Example:
        ```python
        num_chunks = await kb.load_csv_document(
            "data/faq.csv",
            title="Frequently Asked Questions",
            title_field="question"
        )
        ```
    """
    filepath = Path(filepath)

    # Convert to markdown
    if transformer is None:
        transformer = ContentTransformer()

    markdown_text = transformer.transform_csv(
        filepath,
        title=title or filepath.stem.replace("_", " ").title(),
        title_field=title_field,
    )

    return await self.load_markdown_text(
        markdown_text,
        source=str(filepath),
        metadata=metadata,
    )
load_from_directory async
load_from_directory(
    directory: str | Path,
    config: KnowledgeBaseConfig | None = None,
    progress_callback: Any | None = None,
) -> dict[str, Any]

Load documents from a directory using KnowledgeBaseConfig.

This method uses the xization DirectoryProcessor to process documents with configurable patterns, chunking, and metadata. It supports markdown, JSON, and JSONL files with streaming for large files.

Parameters:

Name Type Description Default
directory str | Path

Directory path containing documents

required
config KnowledgeBaseConfig | None

Optional KnowledgeBaseConfig. If not provided, attempts to load from knowledge_base.json/yaml in the directory, or uses defaults.

None
progress_callback Any | None

Optional callback function(file_path, num_chunks) for progress

None

Returns:

Type Description
dict[str, Any]

Dictionary with loading statistics: - total_files: Number of files processed - total_chunks: Total chunks created - files_by_type: Count of files by type (markdown, json, jsonl) - errors: List of errors encountered - documents: List of processed document info

Example
# With auto-loaded config from directory
results = await kb.load_from_directory("./docs")

# With explicit config
config = KnowledgeBaseConfig(
    name="product-docs",
    default_chunking={"max_chunk_size": 800},
    patterns=[
        FilePatternConfig(pattern="api/**/*.json", text_fields=["title", "description"]),
        FilePatternConfig(pattern="**/*.md"),
    ],
    exclude_patterns=["**/drafts/**"],
)
results = await kb.load_from_directory("./docs", config=config)
print(f"Loaded {results['total_chunks']} chunks from {results['total_files']} files")
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_from_directory(
    self,
    directory: str | Path,
    config: KnowledgeBaseConfig | None = None,
    progress_callback: Any | None = None,
) -> dict[str, Any]:
    """Load documents from a directory using KnowledgeBaseConfig.

    This method uses the xization DirectoryProcessor to process documents
    with configurable patterns, chunking, and metadata. It supports markdown,
    JSON, and JSONL files with streaming for large files.

    Args:
        directory: Directory path containing documents
        config: Optional KnowledgeBaseConfig. If not provided, attempts to load
               from knowledge_base.json/yaml in the directory, or uses defaults.
        progress_callback: Optional callback function(file_path, num_chunks) for progress

    Returns:
        Dictionary with loading statistics:
            - total_files: Number of files processed
            - total_chunks: Total chunks created
            - files_by_type: Count of files by type (markdown, json, jsonl)
            - errors: List of errors encountered
            - documents: List of processed document info

    Example:
        ```python
        # With auto-loaded config from directory
        results = await kb.load_from_directory("./docs")

        # With explicit config
        config = KnowledgeBaseConfig(
            name="product-docs",
            default_chunking={"max_chunk_size": 800},
            patterns=[
                FilePatternConfig(pattern="api/**/*.json", text_fields=["title", "description"]),
                FilePatternConfig(pattern="**/*.md"),
            ],
            exclude_patterns=["**/drafts/**"],
        )
        results = await kb.load_from_directory("./docs", config=config)
        print(f"Loaded {results['total_chunks']} chunks from {results['total_files']} files")
        ```
    """
    import numpy as np

    directory = Path(directory)

    # Load or use provided config
    if config is None:
        config = KnowledgeBaseConfig.load(directory)

    # Create processor
    processor = DirectoryProcessor(config, directory)

    # Track results
    results: dict[str, Any] = {
        "total_files": 0,
        "total_chunks": 0,
        "files_by_type": {"markdown": 0, "json": 0, "jsonl": 0},
        "errors": [],
        "documents": [],
    }

    # Process each document
    for doc in processor.process():
        doc_info: dict[str, Any] = {
            "source": doc.source_file,
            "type": doc.document_type,
            "chunks": 0,
            "errors": doc.errors,
        }

        if doc.has_errors:
            results["errors"].extend([
                {"file": doc.source_file, "error": err}
                for err in doc.errors
            ])
            results["documents"].append(doc_info)
            continue

        # Process chunks for this document
        vectors = []
        ids = []
        metadatas = []

        source_stem = Path(doc.source_file).stem

        for chunk in doc.chunks:
            # Get text for embedding
            text_for_embedding = chunk.get("embedding_text") or chunk.get("text", "")

            if not text_for_embedding:
                continue

            # Generate embedding
            embedding = await self.embedding_provider.embed(text_for_embedding)

            # Convert to numpy if needed
            if not isinstance(embedding, np.ndarray):
                embedding = np.array(embedding, dtype=np.float32)

            # Build chunk ID
            chunk_index = chunk.get("chunk_index", len(vectors))
            chunk_id = f"{source_stem}_{chunk_index}"

            # Build metadata
            chunk_metadata = {
                "text": chunk.get("text", ""),
                "source": doc.source_file,
                "chunk_index": chunk_index,
                "document_type": doc.document_type,
            }

            # Add chunk-specific metadata
            if "metadata" in chunk:
                chunk_metadata.update(chunk["metadata"])

            # Add document-level metadata
            if doc.metadata:
                for key, value in doc.metadata.items():
                    if key not in chunk_metadata:
                        chunk_metadata[key] = value

            vectors.append(embedding)
            ids.append(chunk_id)
            metadatas.append(chunk_metadata)

        # Batch insert into vector store
        if vectors:
            await self.vector_store.add_vectors(
                vectors=vectors, ids=ids, metadata=metadatas
            )

        doc_info["chunks"] = len(vectors)
        results["total_files"] += 1
        results["total_chunks"] += len(vectors)
        results["files_by_type"][doc.document_type] += 1
        results["documents"].append(doc_info)

        # Call progress callback if provided
        if progress_callback:
            progress_callback(doc.source_file, len(vectors))

    return results
load_markdown_text async
load_markdown_text(
    markdown_text: str, source: str, metadata: dict[str, Any] | None = None
) -> int

Load markdown content from a string.

Parses, chunks, embeds, and stores the markdown text. This is the shared implementation used by :meth:load_markdown_document, :meth:load_json_document, :meth:load_yaml_document, and :meth:load_csv_document.

Parameters:

Name Type Description Default
markdown_text str

Markdown content to load

required
source str

Source identifier for metadata

required
metadata dict[str, Any] | None

Optional metadata to attach to all chunks

None

Returns:

Type Description
int

Number of chunks created

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def load_markdown_text(
    self,
    markdown_text: str,
    source: str,
    metadata: dict[str, Any] | None = None,
) -> int:
    """Load markdown content from a string.

    Parses, chunks, embeds, and stores the markdown text.  This is
    the shared implementation used by :meth:`load_markdown_document`,
    :meth:`load_json_document`, :meth:`load_yaml_document`, and
    :meth:`load_csv_document`.

    Args:
        markdown_text: Markdown content to load
        source: Source identifier for metadata
        metadata: Optional metadata to attach to all chunks

    Returns:
        Number of chunks created
    """
    import numpy as np

    # Parse markdown
    tree = parse_markdown(markdown_text)

    # Build quality filter config if specified
    quality_filter = None
    if "quality_filter" in self.chunking_config:
        qf_config = self.chunking_config["quality_filter"]
        if isinstance(qf_config, ChunkQualityConfig):
            quality_filter = qf_config
        elif isinstance(qf_config, dict):
            quality_filter = ChunkQualityConfig(**qf_config)

    # Chunk the document with enhanced options
    chunks = chunk_markdown_tree(
        tree,
        max_chunk_size=self.chunking_config.get("max_chunk_size", 500),
        heading_inclusion=HeadingInclusion.IN_METADATA,
        combine_under_heading=self.chunking_config.get("combine_under_heading", True),
        quality_filter=quality_filter,
        generate_embeddings=self.chunking_config.get("generate_embeddings", True),
    )

    # Process and store chunks
    vectors = []
    ids = []
    metadatas = []

    # Generate a base ID from source
    source_stem = Path(source).stem if source else "doc"

    for i, chunk in enumerate(chunks):
        # Use embedding_text if available, otherwise use chunk text
        text_for_embedding = chunk.metadata.embedding_text or chunk.text

        # Generate embedding
        embedding = await self.embedding_provider.embed(text_for_embedding)

        # Convert to numpy if needed
        if not isinstance(embedding, np.ndarray):
            embedding = np.array(embedding, dtype=np.float32)

        # Prepare metadata with new fields
        chunk_id = f"{source_stem}_{i}"
        chunk_metadata = {
            "text": chunk.text,
            "source": source,
            "chunk_index": i,
            "heading_path": chunk.metadata.heading_display or chunk.metadata.get_heading_path(),
            "headings": chunk.metadata.headings,
            "heading_levels": chunk.metadata.heading_levels,
            "line_number": chunk.metadata.line_number,
            "chunk_size": chunk.metadata.chunk_size,
            "content_length": chunk.metadata.content_length,
        }

        # Merge with user metadata
        if metadata:
            chunk_metadata.update(metadata)

        vectors.append(embedding)
        ids.append(chunk_id)
        metadatas.append(chunk_metadata)

    # Batch insert into vector store
    if vectors:
        await self.vector_store.add_vectors(
            vectors=vectors, ids=ids, metadata=metadatas
        )

    return len(chunks)
query async
query(
    query: str,
    k: int = 5,
    filter_metadata: dict[str, Any] | None = None,
    min_similarity: float = 0.0,
    merge_adjacent: bool = False,
    max_chunk_size: int | None = None,
) -> list[dict[str, Any]]

Query knowledge base for relevant chunks.

Parameters:

Name Type Description Default
query str

Query text to search for

required
k int

Number of results to return

5
filter_metadata dict[str, Any] | None

Optional metadata filters

None
min_similarity float

Minimum similarity score (0-1)

0.0
merge_adjacent bool

Whether to merge adjacent chunks with same heading

False
max_chunk_size int | None

Maximum size for merged chunks (uses merger config default if not specified)

None

Returns:

Type Description
list[dict[str, Any]]

List of result dictionaries with: - text: Chunk text - source: Source file - heading_path: Heading hierarchy - similarity: Similarity score - metadata: Full chunk metadata

Example
results = await kb.query(
    "How do I configure the database?",
    k=3,
    merge_adjacent=True
)
for result in results:
    print(f"[{result['similarity']:.2f}] {result['heading_path']}")
    print(result['text'])
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def query(
    self,
    query: str,
    k: int = 5,
    filter_metadata: dict[str, Any] | None = None,
    min_similarity: float = 0.0,
    merge_adjacent: bool = False,
    max_chunk_size: int | None = None,
) -> list[dict[str, Any]]:
    """Query knowledge base for relevant chunks.

    Args:
        query: Query text to search for
        k: Number of results to return
        filter_metadata: Optional metadata filters
        min_similarity: Minimum similarity score (0-1)
        merge_adjacent: Whether to merge adjacent chunks with same heading
        max_chunk_size: Maximum size for merged chunks (uses merger config default if not specified)

    Returns:
        List of result dictionaries with:
            - text: Chunk text
            - source: Source file
            - heading_path: Heading hierarchy
            - similarity: Similarity score
            - metadata: Full chunk metadata

    Example:
        ```python
        results = await kb.query(
            "How do I configure the database?",
            k=3,
            merge_adjacent=True
        )
        for result in results:
            print(f"[{result['similarity']:.2f}] {result['heading_path']}")
            print(result['text'])
        ```
    """
    import numpy as np

    # Generate query embedding
    query_embedding = await self.embedding_provider.embed(query)

    # Convert to numpy if needed
    if not isinstance(query_embedding, np.ndarray):
        query_embedding = np.array(query_embedding, dtype=np.float32)

    # Search vector store
    search_results = await self.vector_store.search(
        query_vector=query_embedding,
        k=k,
        filter=filter_metadata,
        include_metadata=True,
    )

    # Format results
    results = []
    for _chunk_id, similarity, chunk_metadata in search_results:
        if chunk_metadata and similarity >= min_similarity:
            results.append(
                {
                    "text": chunk_metadata.get("text", ""),
                    "source": chunk_metadata.get("source", ""),
                    "heading_path": chunk_metadata.get("heading_path", ""),
                    "similarity": similarity,
                    "metadata": chunk_metadata,
                }
            )

    # Apply chunk merging if requested
    if merge_adjacent and results:
        # Update merger config if max_chunk_size specified
        if max_chunk_size is not None:
            merger = ChunkMerger(MergerConfig(max_merged_size=max_chunk_size))
        else:
            merger = self.merger

        merged_chunks = merger.merge(results)
        results = merger.to_result_list(merged_chunks)

    return results
hybrid_query async
hybrid_query(
    query: str,
    k: int = 5,
    text_weight: float = 0.5,
    vector_weight: float = 0.5,
    fusion_strategy: str = "rrf",
    text_fields: list[str] | None = None,
    filter_metadata: dict[str, Any] | None = None,
    min_similarity: float = 0.0,
    merge_adjacent: bool = False,
    max_chunk_size: int | None = None,
) -> list[dict[str, Any]]

Query knowledge base using hybrid search (text + vector).

Combines keyword matching with semantic vector search for improved retrieval quality. Uses Reciprocal Rank Fusion (RRF) or weighted score fusion to combine results.

Parameters:

Name Type Description Default
query str

Query text to search for

required
k int

Number of results to return

5
text_weight float

Weight for text search (0.0 to 1.0)

0.5
vector_weight float

Weight for vector search (0.0 to 1.0)

0.5
fusion_strategy str

Fusion method - "rrf" (default), "weighted_sum", or "native"

'rrf'
text_fields list[str] | None

Fields to search for text matching (default: ["text"])

None
filter_metadata dict[str, Any] | None

Optional metadata filters

None
min_similarity float

Minimum combined score (0-1)

0.0
merge_adjacent bool

Whether to merge adjacent chunks with same heading

False
max_chunk_size int | None

Maximum size for merged chunks

None

Returns:

Type Description
list[dict[str, Any]]

List of result dictionaries with: - text: Chunk text - source: Source file - heading_path: Heading hierarchy - similarity: Combined similarity score - text_score: Score from text search (if available) - vector_score: Score from vector search (if available) - metadata: Full chunk metadata

Example
# Default RRF fusion
results = await kb.hybrid_query(
    "How do I configure the database?",
    k=5,
)

# Weighted toward vector search
results = await kb.hybrid_query(
    "database configuration",
    k=5,
    text_weight=0.3,
    vector_weight=0.7,
)

# Weighted sum fusion
results = await kb.hybrid_query(
    "configure database",
    k=5,
    fusion_strategy="weighted_sum",
)

for result in results:
    print(f"[{result['similarity']:.2f}] {result['heading_path']}")
    print(f"  text_score={result.get('text_score', 'N/A')}")
    print(f"  vector_score={result.get('vector_score', 'N/A')}")
    print(result['text'])
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def hybrid_query(
    self,
    query: str,
    k: int = 5,
    text_weight: float = 0.5,
    vector_weight: float = 0.5,
    fusion_strategy: str = "rrf",
    text_fields: list[str] | None = None,
    filter_metadata: dict[str, Any] | None = None,
    min_similarity: float = 0.0,
    merge_adjacent: bool = False,
    max_chunk_size: int | None = None,
) -> list[dict[str, Any]]:
    """Query knowledge base using hybrid search (text + vector).

    Combines keyword matching with semantic vector search for improved
    retrieval quality. Uses Reciprocal Rank Fusion (RRF) or weighted
    score fusion to combine results.

    Args:
        query: Query text to search for
        k: Number of results to return
        text_weight: Weight for text search (0.0 to 1.0)
        vector_weight: Weight for vector search (0.0 to 1.0)
        fusion_strategy: Fusion method - "rrf" (default), "weighted_sum", or "native"
        text_fields: Fields to search for text matching (default: ["text"])
        filter_metadata: Optional metadata filters
        min_similarity: Minimum combined score (0-1)
        merge_adjacent: Whether to merge adjacent chunks with same heading
        max_chunk_size: Maximum size for merged chunks

    Returns:
        List of result dictionaries with:
            - text: Chunk text
            - source: Source file
            - heading_path: Heading hierarchy
            - similarity: Combined similarity score
            - text_score: Score from text search (if available)
            - vector_score: Score from vector search (if available)
            - metadata: Full chunk metadata

    Example:
        ```python
        # Default RRF fusion
        results = await kb.hybrid_query(
            "How do I configure the database?",
            k=5,
        )

        # Weighted toward vector search
        results = await kb.hybrid_query(
            "database configuration",
            k=5,
            text_weight=0.3,
            vector_weight=0.7,
        )

        # Weighted sum fusion
        results = await kb.hybrid_query(
            "configure database",
            k=5,
            fusion_strategy="weighted_sum",
        )

        for result in results:
            print(f"[{result['similarity']:.2f}] {result['heading_path']}")
            print(f"  text_score={result.get('text_score', 'N/A')}")
            print(f"  vector_score={result.get('vector_score', 'N/A')}")
            print(result['text'])
        ```
    """
    from dataknobs_data.vector.hybrid import (
        FusionStrategy,
        HybridSearchConfig,
        reciprocal_rank_fusion,
        weighted_score_fusion,
    )
    import numpy as np

    # Generate query embedding
    query_embedding = await self.embedding_provider.embed(query)

    # Convert to numpy if needed
    if not isinstance(query_embedding, np.ndarray):
        query_embedding = np.array(query_embedding, dtype=np.float32)

    # Check if vector store supports hybrid search natively
    has_hybrid = hasattr(self.vector_store, "hybrid_search")

    # Default text fields for knowledge base chunks
    search_text_fields = text_fields or ["text"]

    # Map string to FusionStrategy enum
    strategy_map = {
        "rrf": FusionStrategy.RRF,
        "weighted_sum": FusionStrategy.WEIGHTED_SUM,
        "native": FusionStrategy.NATIVE,
    }
    strategy = strategy_map.get(fusion_strategy.lower(), FusionStrategy.RRF)

    if has_hybrid and strategy == FusionStrategy.NATIVE:
        # Use vector store's native hybrid search
        config = HybridSearchConfig(
            text_weight=text_weight,
            vector_weight=vector_weight,
            fusion_strategy=strategy,
            text_fields=search_text_fields,
        )
        hybrid_results = await self.vector_store.hybrid_search(
            query_text=query,
            query_vector=query_embedding,
            text_fields=search_text_fields,
            k=k,
            config=config,
            filter=filter_metadata,
        )

        # Convert HybridSearchResult to our result format
        results = []
        for hr in hybrid_results:
            if hr.combined_score >= min_similarity:
                # Extract metadata from record
                record_metadata = {}
                if hasattr(hr.record, "data"):
                    record_metadata = hr.record.data or {}
                elif hasattr(hr.record, "metadata"):
                    record_metadata = hr.record.metadata or {}

                results.append({
                    "text": record_metadata.get("text", ""),
                    "source": record_metadata.get("source", ""),
                    "heading_path": record_metadata.get("heading_path", ""),
                    "similarity": hr.combined_score,
                    "text_score": hr.text_score,
                    "vector_score": hr.vector_score,
                    "metadata": record_metadata,
                })
    else:
        # Client-side hybrid search implementation
        # Step 1: Vector search
        vector_results = await self.vector_store.search(
            query_vector=query_embedding,
            k=k * 2,  # Get more for fusion
            filter=filter_metadata,
            include_metadata=True,
        )

        # Step 2: Text search (simple keyword matching on stored chunks)
        # For vector stores without text search, we search in retrieved chunks
        # and also do a broader metadata-based text match if supported

        # Build vector result map
        vector_scores: list[tuple[str, float]] = []
        chunks_by_id: dict[str, dict[str, Any]] = {}

        for chunk_id, similarity, chunk_metadata in vector_results:
            if chunk_metadata:
                vector_scores.append((chunk_id, similarity))
                chunks_by_id[chunk_id] = chunk_metadata

        # Simple text matching on chunk content
        query_lower = query.lower()
        query_terms = query_lower.split()
        text_scores: list[tuple[str, float]] = []

        for chunk_id, chunk_metadata in chunks_by_id.items():
            text_content = ""
            for field in search_text_fields:
                value = chunk_metadata.get(field, "")
                if value:
                    text_content += " " + str(value)

            text_content_lower = text_content.lower()

            # Calculate text match score
            if query_lower in text_content_lower:
                # Exact phrase match
                score = 1.0
            else:
                # Term overlap score
                matched_terms = sum(1 for term in query_terms if term in text_content_lower)
                score = matched_terms / len(query_terms) if query_terms else 0.0

            if score > 0:
                text_scores.append((chunk_id, score))

        # Sort text scores descending
        text_scores.sort(key=lambda x: x[1], reverse=True)

        # Step 3: Fuse results
        if strategy == FusionStrategy.WEIGHTED_SUM:
            total = text_weight + vector_weight
            if total > 0:
                norm_text = text_weight / total
                norm_vector = vector_weight / total
            else:
                norm_text = norm_vector = 0.5

            fused = weighted_score_fusion(
                text_results=text_scores,
                vector_results=vector_scores,
                text_weight=norm_text,
                vector_weight=norm_vector,
                normalize_scores=True,
            )
        else:
            # Default to RRF
            fused = reciprocal_rank_fusion(
                text_results=text_scores,
                vector_results=vector_scores,
                k=60,
                text_weight=text_weight,
                vector_weight=vector_weight,
            )

        # Build result list
        text_score_map = dict(text_scores)
        vector_score_map = dict(vector_scores)

        results = []
        for chunk_id, combined_score in fused[:k]:
            if combined_score < min_similarity:
                continue

            chunk_metadata = chunks_by_id.get(chunk_id)
            if not chunk_metadata:
                continue

            results.append({
                "text": chunk_metadata.get("text", ""),
                "source": chunk_metadata.get("source", ""),
                "heading_path": chunk_metadata.get("heading_path", ""),
                "similarity": combined_score,
                "text_score": text_score_map.get(chunk_id),
                "vector_score": vector_score_map.get(chunk_id),
                "metadata": chunk_metadata,
            })

    # Apply chunk merging if requested
    if merge_adjacent and results:
        if max_chunk_size is not None:
            merger = ChunkMerger(MergerConfig(max_merged_size=max_chunk_size))
        else:
            merger = self.merger

        merged_chunks = merger.merge(results)
        results = merger.to_result_list(merged_chunks)

    return results
format_context
format_context(results: list[dict[str, Any]], wrap_in_tags: bool = True) -> str

Format search results for LLM context.

Convenience method to format results using the configured formatter.

Parameters:

Name Type Description Default
results list[dict[str, Any]]

Search results from query()

required
wrap_in_tags bool

Whether to wrap in tags

True

Returns:

Type Description
str

Formatted context string

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
def format_context(
    self,
    results: list[dict[str, Any]],
    wrap_in_tags: bool = True,
) -> str:
    """Format search results for LLM context.

    Convenience method to format results using the configured formatter.

    Args:
        results: Search results from query()
        wrap_in_tags: Whether to wrap in <knowledge_base> tags

    Returns:
        Formatted context string
    """
    context = self.formatter.format(results)
    if wrap_in_tags:
        context = self.formatter.wrap_for_prompt(context)
    return context
count async
count(filter: dict[str, Any] | None = None) -> int

Get the number of chunks in the knowledge base.

Delegates to the underlying vector store's count method.

Parameters:

Name Type Description Default
filter dict[str, Any] | None

Optional metadata filter to count only matching chunks

None

Returns:

Type Description
int

Number of chunks stored (optionally filtered)

Example
total = await kb.count()
domain_count = await kb.count(filter={"domain_id": "my-domain"})
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def count(self, filter: dict[str, Any] | None = None) -> int:
    """Get the number of chunks in the knowledge base.

    Delegates to the underlying vector store's count method.

    Args:
        filter: Optional metadata filter to count only matching chunks

    Returns:
        Number of chunks stored (optionally filtered)

    Example:
        ```python
        total = await kb.count()
        domain_count = await kb.count(filter={"domain_id": "my-domain"})
        ```
    """
    return await self.vector_store.count(filter)
clear async
clear() -> None

Clear all documents from the knowledge base.

Warning: This removes all stored chunks and embeddings.

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def clear(self) -> None:
    """Clear all documents from the knowledge base.

    Warning: This removes all stored chunks and embeddings.
    """
    if hasattr(self.vector_store, "clear"):
        await self.vector_store.clear()
    else:
        raise NotImplementedError(
            "Vector store does not support clearing. "
            "Consider creating a new knowledge base with a fresh collection."
        )
save async
save() -> None

Save the knowledge base to persistent storage.

This persists the vector store index and metadata to disk. Only applicable for vector stores that support persistence (e.g., FAISS).

Example
await kb.load_markdown_document("docs/api.md")
await kb.save()  # Persist to disk
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def save(self) -> None:
    """Save the knowledge base to persistent storage.

    This persists the vector store index and metadata to disk.
    Only applicable for vector stores that support persistence (e.g., FAISS).

    Example:
        ```python
        await kb.load_markdown_document("docs/api.md")
        await kb.save()  # Persist to disk
        ```
    """
    if hasattr(self.vector_store, "save"):
        await self.vector_store.save()
providers
providers() -> dict[str, Any]

Return the embedding provider, keyed by role.

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
def providers(self) -> dict[str, Any]:
    """Return the embedding provider, keyed by role."""
    from dataknobs_bots.bot.base import PROVIDER_ROLE_KB_EMBEDDING

    if self.embedding_provider is not None:
        return {PROVIDER_ROLE_KB_EMBEDDING: self.embedding_provider}
    return {}
set_provider
set_provider(role: str, provider: Any) -> bool

Replace the embedding provider if the role matches.

Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Replace the embedding provider if the role matches."""
    from dataknobs_bots.bot.base import PROVIDER_ROLE_KB_EMBEDDING

    if role == PROVIDER_ROLE_KB_EMBEDDING:
        self.embedding_provider = provider
        return True
    return False
close async
close() -> None

Close the knowledge base and release resources.

This method: - Saves the vector store to disk (if persistence is configured) - Closes the vector store connection - Closes the embedding provider (releases HTTP sessions)

Should be called when done using the knowledge base to prevent resource leaks (e.g., unclosed aiohttp sessions).

Example
kb = await RAGKnowledgeBase.from_config(config)
try:
    await kb.load_markdown_document("docs/api.md")
    results = await kb.query("How do I configure?")
finally:
    await kb.close()
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def close(self) -> None:
    """Close the knowledge base and release resources.

    This method:
    - Saves the vector store to disk (if persistence is configured)
    - Closes the vector store connection
    - Closes the embedding provider (releases HTTP sessions)

    Should be called when done using the knowledge base to prevent
    resource leaks (e.g., unclosed aiohttp sessions).

    Example:
        ```python
        kb = await RAGKnowledgeBase.from_config(config)
        try:
            await kb.load_markdown_document("docs/api.md")
            results = await kb.query("How do I configure?")
        finally:
            await kb.close()
        ```
    """
    # Close vector store (will save if persist_path is set)
    if hasattr(self.vector_store, "close"):
        await self.vector_store.close()

    # Close embedding provider (releases HTTP client sessions)
    if hasattr(self.embedding_provider, "close"):
        await self.embedding_provider.close()
__aenter__ async
__aenter__() -> Self

Async context manager entry.

Returns:

Type Description
Self

Self for use in async with statement

Example
async with await RAGKnowledgeBase.from_config(config) as kb:
    await kb.load_markdown_document("docs/api.md")
    results = await kb.query("How do I configure?")
# Automatically saved and closed
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def __aenter__(self) -> Self:
    """Async context manager entry.

    Returns:
        Self for use in async with statement

    Example:
        ```python
        async with await RAGKnowledgeBase.from_config(config) as kb:
            await kb.load_markdown_document("docs/api.md")
            results = await kb.query("How do I configure?")
        # Automatically saved and closed
        ```
    """
    return self
__aexit__ async
__aexit__(
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: TracebackType | None,
) -> None

Async context manager exit - ensures cleanup.

Parameters:

Name Type Description Default
exc_type type[BaseException] | None

Exception type if an exception occurred

required
exc_val BaseException | None

Exception value if an exception occurred

required
exc_tb TracebackType | None

Exception traceback if an exception occurred

required
Source code in packages/bots/src/dataknobs_bots/knowledge/rag.py
async def __aexit__(
    self,
    exc_type: type[BaseException] | None,
    exc_val: BaseException | None,
    exc_tb: types.TracebackType | None,
) -> None:
    """Async context manager exit - ensures cleanup.

    Args:
        exc_type: Exception type if an exception occurred
        exc_val: Exception value if an exception occurred
        exc_tb: Exception traceback if an exception occurred
    """
    await self.close()

BufferMemory

BufferMemory(max_messages: int = 10)

Bases: Memory

Simple buffer memory keeping last N messages.

This implementation uses a fixed-size buffer that keeps the most recent messages in memory. When the buffer is full, the oldest messages are automatically removed.

Attributes:

Name Type Description
max_messages

Maximum number of messages to keep in buffer

messages deque[dict[str, Any]]

Deque containing the messages

Initialize buffer memory.

Parameters:

Name Type Description Default
max_messages int

Maximum number of messages to keep

10

Methods:

Name Description
add_message

Add message to buffer.

get_context

Get all messages in buffer.

clear

Clear all messages from buffer.

pop_messages

Remove and return the last N messages from the buffer.

Source code in packages/bots/src/dataknobs_bots/memory/buffer.py
def __init__(self, max_messages: int = 10):
    """Initialize buffer memory.

    Args:
        max_messages: Maximum number of messages to keep
    """
    self.max_messages = max_messages
    self.messages: deque[dict[str, Any]] = deque(maxlen=max_messages)
Functions
add_message async
add_message(
    content: str, role: str, metadata: dict[str, Any] | None = None
) -> None

Add message to buffer.

Parameters:

Name Type Description Default
content str

Message content

required
role str

Message role

required
metadata dict[str, Any] | None

Optional metadata

None
Source code in packages/bots/src/dataknobs_bots/memory/buffer.py
async def add_message(
    self, content: str, role: str, metadata: dict[str, Any] | None = None
) -> None:
    """Add message to buffer.

    Args:
        content: Message content
        role: Message role
        metadata: Optional metadata
    """
    self.messages.append({"content": content, "role": role, "metadata": metadata or {}})
get_context async
get_context(current_message: str) -> list[dict[str, Any]]

Get all messages in buffer.

The current_message parameter is not used in buffer memory since we simply return all buffered messages in order.

Parameters:

Name Type Description Default
current_message str

Not used in buffer memory

required

Returns:

Type Description
list[dict[str, Any]]

List of all buffered messages

Source code in packages/bots/src/dataknobs_bots/memory/buffer.py
async def get_context(self, current_message: str) -> list[dict[str, Any]]:
    """Get all messages in buffer.

    The current_message parameter is not used in buffer memory since
    we simply return all buffered messages in order.

    Args:
        current_message: Not used in buffer memory

    Returns:
        List of all buffered messages
    """
    return list(self.messages)
clear async
clear() -> None

Clear all messages from buffer.

Source code in packages/bots/src/dataknobs_bots/memory/buffer.py
async def clear(self) -> None:
    """Clear all messages from buffer."""
    self.messages.clear()
pop_messages async
pop_messages(count: int = 2) -> list[dict[str, Any]]

Remove and return the last N messages from the buffer.

Parameters:

Name Type Description Default
count int

Number of messages to remove from the end.

2

Returns:

Type Description
list[dict[str, Any]]

The removed messages in the order they were stored.

Raises:

Type Description
ValueError

If count exceeds available messages or is < 1.

Source code in packages/bots/src/dataknobs_bots/memory/buffer.py
async def pop_messages(self, count: int = 2) -> list[dict[str, Any]]:
    """Remove and return the last N messages from the buffer.

    Args:
        count: Number of messages to remove from the end.

    Returns:
        The removed messages in the order they were stored.

    Raises:
        ValueError: If count exceeds available messages or is < 1.
    """
    if count < 1:
        raise ValueError(f"count must be >= 1, got {count}")
    if count > len(self.messages):
        raise ValueError(
            f"Cannot pop {count} messages, only {len(self.messages)} available"
        )
    removed = []
    for _ in range(count):
        removed.append(self.messages.pop())
    removed.reverse()
    return removed

CompositeMemory

CompositeMemory(strategies: list[Memory], *, primary_index: int = 0)

Bases: Memory

Combines multiple memory strategies into one.

All sub-strategies receive every add_message() call independently. On get_context(), the primary strategy's results appear first, followed by deduplicated results from secondary strategies.

Graceful degradation: if any strategy fails on a read or write, the composite logs a warning and continues with the remaining strategies.

Attributes:

Name Type Description
primary Memory

The primary memory strategy (results appear first).

strategies list[Memory]

All sub-strategies in order.

Initialize composite memory.

Parameters:

Name Type Description Default
strategies list[Memory]

List of memory strategy instances.

required
primary_index int

Index of the primary strategy in the list.

0

Raises:

Type Description
ValueError

If strategies is empty or primary_index is out of range.

Methods:

Name Description
add_message

Forward message to all strategies.

get_context

Collect context from all strategies, primary first.

clear

Clear all strategies. Log and continue on individual failures.

pop_messages

Delegate to primary strategy only.

close

Close all strategies. Log and continue on individual failures.

providers

Aggregate providers from all sub-strategies.

set_provider

Forward to all sub-strategies; return True if any accepted.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
def __init__(
    self,
    strategies: list[Memory],
    *,
    primary_index: int = 0,
) -> None:
    """Initialize composite memory.

    Args:
        strategies: List of memory strategy instances.
        primary_index: Index of the primary strategy in the list.

    Raises:
        ValueError: If strategies is empty or primary_index is out of range.
    """
    if not strategies:
        raise ValueError("CompositeMemory requires at least one strategy")
    if primary_index < 0 or primary_index >= len(strategies):
        raise ValueError(
            f"primary_index {primary_index} out of range for "
            f"{len(strategies)} strategies"
        )
    self._strategies = strategies
    self._primary_index = primary_index
Attributes
primary property
primary: Memory

The primary memory strategy.

strategies property
strategies: list[Memory]

All sub-strategies (defensive copy).

Functions
add_message async
add_message(
    content: str, role: str, metadata: dict[str, Any] | None = None
) -> None

Forward message to all strategies.

If a strategy raises, the error is logged and remaining strategies still receive the message.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
async def add_message(
    self, content: str, role: str, metadata: dict[str, Any] | None = None
) -> None:
    """Forward message to all strategies.

    If a strategy raises, the error is logged and remaining strategies
    still receive the message.
    """
    for i, strategy in enumerate(self._strategies):
        try:
            await strategy.add_message(content, role, metadata)
        except _STRATEGY_ERRORS:
            logger.warning(
                "Memory strategy %d (%s) failed on add_message",
                i,
                type(strategy).__name__,
                exc_info=True,
            )
get_context async
get_context(current_message: str) -> list[dict[str, Any]]

Collect context from all strategies, primary first.

Results from the primary strategy appear first. All results are deduplicated by (role, content) — if a message with the same role and content already appeared, it is not repeated.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
async def get_context(self, current_message: str) -> list[dict[str, Any]]:
    """Collect context from all strategies, primary first.

    Results from the primary strategy appear first. All results are
    deduplicated by ``(role, content)`` — if a message with the same
    role and content already appeared, it is not repeated.
    """
    results: list[dict[str, Any]] = []
    seen: set[tuple[str, str]] = set()

    # Primary first
    try:
        primary_results = await self.primary.get_context(current_message)
        for msg in primary_results:
            key = (msg.get("role", ""), msg.get("content", ""))
            if key not in seen:
                results.append(msg)
                seen.add(key)
    except _STRATEGY_ERRORS:
        logger.warning(
            "Primary memory strategy (%s) failed on get_context",
            type(self.primary).__name__,
            exc_info=True,
        )

    # Secondaries — skip primary, dedup by (role, content)
    for i, strategy in enumerate(self._strategies):
        if i == self._primary_index:
            continue
        try:
            secondary_results = await strategy.get_context(current_message)
            for msg in secondary_results:
                key = (msg.get("role", ""), msg.get("content", ""))
                if key not in seen:
                    results.append(msg)
                    seen.add(key)
        except _STRATEGY_ERRORS:
            logger.warning(
                "Memory strategy %d (%s) failed on get_context",
                i,
                type(strategy).__name__,
                exc_info=True,
            )

    return results
clear async
clear() -> None

Clear all strategies. Log and continue on individual failures.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
async def clear(self) -> None:
    """Clear all strategies. Log and continue on individual failures."""
    for i, strategy in enumerate(self._strategies):
        try:
            await strategy.clear()
        except _STRATEGY_ERRORS:
            logger.warning(
                "Memory strategy %d (%s) failed on clear",
                i,
                type(strategy).__name__,
                exc_info=True,
            )
pop_messages async
pop_messages(count: int = 2) -> list[dict[str, Any]]

Delegate to primary strategy only.

Secondary strategies (especially vector) may not support undo. If the primary doesn't support it, NotImplementedError propagates.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
async def pop_messages(self, count: int = 2) -> list[dict[str, Any]]:
    """Delegate to primary strategy only.

    Secondary strategies (especially vector) may not support undo.
    If the primary doesn't support it, NotImplementedError propagates.
    """
    return await self.primary.pop_messages(count)
close async
close() -> None

Close all strategies. Log and continue on individual failures.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
async def close(self) -> None:
    """Close all strategies. Log and continue on individual failures."""
    for i, strategy in enumerate(self._strategies):
        try:
            await strategy.close()
        except _STRATEGY_ERRORS:
            logger.warning(
                "Memory strategy %d (%s) failed on close",
                i,
                type(strategy).__name__,
                exc_info=True,
            )
providers
providers() -> dict[str, Any]

Aggregate providers from all sub-strategies.

If multiple strategies expose the same role, the last one wins and a warning is logged.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
def providers(self) -> dict[str, Any]:
    """Aggregate providers from all sub-strategies.

    If multiple strategies expose the same role, the last one wins
    and a warning is logged.
    """
    result: dict[str, Any] = {}
    for i, strategy in enumerate(self._strategies):
        for role, provider in strategy.providers().items():
            if role in result:
                logger.warning(
                    "Provider role %r already registered by an earlier "
                    "strategy; strategy %d (%s) overwrites it",
                    role,
                    i,
                    type(strategy).__name__,
                )
            result[role] = provider
    return result
set_provider
set_provider(role: str, provider: Any) -> bool

Forward to all sub-strategies; return True if any accepted.

Source code in packages/bots/src/dataknobs_bots/memory/composite.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Forward to all sub-strategies; return True if any accepted."""
    accepted = False
    for strategy in self._strategies:
        if strategy.set_provider(role, provider):
            accepted = True
    return accepted

Memory

Bases: ABC

Abstract base class for memory implementations.

Methods:

Name Description
add_message

Add message to memory.

get_context

Get relevant context for current message.

clear

Clear all memory.

providers

Return LLM providers managed by this memory, keyed by role.

set_provider

Replace a provider managed by this memory.

close

Release resources held by this memory implementation.

pop_messages

Remove and return the last N messages from memory.

Functions
add_message abstractmethod async
add_message(
    content: str, role: str, metadata: dict[str, Any] | None = None
) -> None

Add message to memory.

Parameters:

Name Type Description Default
content str

Message content

required
role str

Message role (user, assistant, system, etc.)

required
metadata dict[str, Any] | None

Optional metadata for the message

None
Source code in packages/bots/src/dataknobs_bots/memory/base.py
@abstractmethod
async def add_message(
    self, content: str, role: str, metadata: dict[str, Any] | None = None
) -> None:
    """Add message to memory.

    Args:
        content: Message content
        role: Message role (user, assistant, system, etc.)
        metadata: Optional metadata for the message
    """
    pass
get_context abstractmethod async
get_context(current_message: str) -> list[dict[str, Any]]

Get relevant context for current message.

Parameters:

Name Type Description Default
current_message str

The current message to get context for

required

Returns:

Type Description
list[dict[str, Any]]

List of relevant message dictionaries

Source code in packages/bots/src/dataknobs_bots/memory/base.py
@abstractmethod
async def get_context(self, current_message: str) -> list[dict[str, Any]]:
    """Get relevant context for current message.

    Args:
        current_message: The current message to get context for

    Returns:
        List of relevant message dictionaries
    """
    pass
clear abstractmethod async
clear() -> None

Clear all memory.

Source code in packages/bots/src/dataknobs_bots/memory/base.py
@abstractmethod
async def clear(self) -> None:
    """Clear all memory."""
    pass
providers
providers() -> dict[str, Any]

Return LLM providers managed by this memory, keyed by role.

Subsystems declare the providers they own so that the bot can register them in the provider catalog without reaching into private attributes. The default returns an empty dict (no providers).

Returns:

Type Description
dict[str, Any]

Dict mapping provider role names to provider instances.

Source code in packages/bots/src/dataknobs_bots/memory/base.py
def providers(self) -> dict[str, Any]:
    """Return LLM providers managed by this memory, keyed by role.

    Subsystems declare the providers they own so that the bot can
    register them in the provider catalog without reaching into
    private attributes.  The default returns an empty dict (no
    providers).

    Returns:
        Dict mapping provider role names to provider instances.
    """
    return {}
set_provider
set_provider(role: str, provider: Any) -> bool

Replace a provider managed by this memory.

Called by inject_providers to wire a test provider into the actual subsystem, not just the registry catalog. The default returns False (role not recognized). Concrete subclasses override to accept their known roles.

Parameters:

Name Type Description Default
role str

Provider role name (e.g. PROVIDER_ROLE_MEMORY_EMBEDDING).

required
provider Any

Replacement provider instance.

required

Returns:

Type Description
bool

True if the role was recognized and the provider updated,

bool

False otherwise.

Source code in packages/bots/src/dataknobs_bots/memory/base.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Replace a provider managed by this memory.

    Called by ``inject_providers`` to wire a test provider into the
    actual subsystem, not just the registry catalog.  The default
    returns ``False`` (role not recognized).  Concrete subclasses
    override to accept their known roles.

    Args:
        role: Provider role name (e.g. ``PROVIDER_ROLE_MEMORY_EMBEDDING``).
        provider: Replacement provider instance.

    Returns:
        ``True`` if the role was recognized and the provider updated,
        ``False`` otherwise.
    """
    return False
close async
close() -> None

Release resources held by this memory implementation.

The default is a no-op. Subclasses that create providers or open connections (e.g. VectorMemory, SummaryMemory) should override to clean up.

Source code in packages/bots/src/dataknobs_bots/memory/base.py
async def close(self) -> None:  # noqa: B027 — intentional no-op default
    """Release resources held by this memory implementation.

    The default is a no-op.  Subclasses that create providers or open
    connections (e.g. ``VectorMemory``, ``SummaryMemory``) should
    override to clean up.
    """
pop_messages async
pop_messages(count: int = 2) -> list[dict[str, Any]]

Remove and return the last N messages from memory.

Used for conversation undo. The count is determined by the caller based on node depth difference (not a fixed 2).

Parameters:

Name Type Description Default
count int

Number of messages to remove from the end.

2

Returns:

Type Description
list[dict[str, Any]]

The removed messages in the order they were stored.

Raises:

Type Description
NotImplementedError

If the implementation does not support undo.

Source code in packages/bots/src/dataknobs_bots/memory/base.py
async def pop_messages(self, count: int = 2) -> list[dict[str, Any]]:
    """Remove and return the last N messages from memory.

    Used for conversation undo. The count is determined by the caller
    based on node depth difference (not a fixed 2).

    Args:
        count: Number of messages to remove from the end.

    Returns:
        The removed messages in the order they were stored.

    Raises:
        NotImplementedError: If the implementation does not support undo.
    """
    raise NotImplementedError(
        f"{type(self).__name__} does not support pop_messages"
    )

SummaryMemory

SummaryMemory(
    llm_provider: AsyncLLMProvider,
    recent_window: int = 10,
    summary_prompt: str | None = None,
    *,
    owns_llm_provider: bool = False,
)

Bases: Memory

Memory that summarizes older messages to maintain long context windows.

Maintains a rolling buffer of recent messages. When the buffer exceeds a configurable threshold, the oldest messages are compressed into a running summary using the LLM provider. This trades exact message recall for a much longer effective context window.

get_context() returns the summary (if any) as a system message, followed by the recent verbatim messages.

Attributes:

Name Type Description
llm_provider

LLM provider used for generating summaries

recent_window

Number of recent messages to keep verbatim

summary_prompt

Template for the summarization prompt

Initialize summary memory.

Parameters:

Name Type Description Default
llm_provider AsyncLLMProvider

Async LLM provider for generating summaries

required
recent_window int

Number of recent messages to keep verbatim. When the buffer has more than recent_window messages, the oldest are summarized.

10
summary_prompt str | None

Custom summarization prompt template. Must contain {existing_summary} and {new_messages} placeholders.

None
owns_llm_provider bool

Whether this instance owns the provider's lifecycle. True when a dedicated provider was created for this memory; False when reusing the bot's main LLM.

False

Methods:

Name Description
add_message

Add a message and trigger summarization if the buffer is full.

get_context

Return the running summary followed by recent messages.

providers

Return the summary LLM provider for catalog registration.

set_provider

Replace the summary LLM provider if the role matches.

close

Close the LLM provider if this instance owns it.

clear

Clear both the running summary and the message buffer.

pop_messages

Remove and return the last N messages from the recent window.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
def __init__(
    self,
    llm_provider: AsyncLLMProvider,
    recent_window: int = 10,
    summary_prompt: str | None = None,
    *,
    owns_llm_provider: bool = False,
) -> None:
    """Initialize summary memory.

    Args:
        llm_provider: Async LLM provider for generating summaries
        recent_window: Number of recent messages to keep verbatim.
                      When the buffer has more than ``recent_window``
                      messages, the oldest are summarized.
        summary_prompt: Custom summarization prompt template. Must
                       contain ``{existing_summary}`` and
                       ``{new_messages}`` placeholders.
        owns_llm_provider: Whether this instance owns the provider's
            lifecycle. True when a dedicated provider was created for
            this memory; False when reusing the bot's main LLM.
    """
    self.llm_provider = llm_provider
    self.recent_window = recent_window
    self.summary_prompt = summary_prompt or DEFAULT_SUMMARY_PROMPT
    self._owns_llm_provider = owns_llm_provider
    self._messages: deque[dict[str, Any]] = deque()
    self._summary: str = ""
Functions
add_message async
add_message(
    content: str, role: str, metadata: dict[str, Any] | None = None
) -> None

Add a message and trigger summarization if the buffer is full.

When the number of buffered messages exceeds recent_window, the oldest messages are summarized into the running summary using the LLM provider. On LLM failure, older messages are dropped to keep the buffer within bounds (graceful degradation).

Parameters:

Name Type Description Default
content str

Message content

required
role str

Message role (user, assistant, system)

required
metadata dict[str, Any] | None

Optional metadata for the message

None
Source code in packages/bots/src/dataknobs_bots/memory/summary.py
async def add_message(
    self, content: str, role: str, metadata: dict[str, Any] | None = None
) -> None:
    """Add a message and trigger summarization if the buffer is full.

    When the number of buffered messages exceeds ``recent_window``,
    the oldest messages are summarized into the running summary using
    the LLM provider. On LLM failure, older messages are dropped to
    keep the buffer within bounds (graceful degradation).

    Args:
        content: Message content
        role: Message role (user, assistant, system)
        metadata: Optional metadata for the message
    """
    self._messages.append(
        {"content": content, "role": role, "metadata": metadata or {}}
    )

    if len(self._messages) > self.recent_window:
        await self._summarize_oldest()
get_context async
get_context(current_message: str) -> list[dict[str, Any]]

Return the running summary followed by recent messages.

Parameters:

Name Type Description Default
current_message str

The current message (not used by summary memory, kept for interface compatibility)

required

Returns:

Type Description
list[dict[str, Any]]

List of message dicts. If a summary exists it is the first

list[dict[str, Any]]

element with role="system"; the remaining elements are

list[dict[str, Any]]

the recent verbatim messages.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
async def get_context(self, current_message: str) -> list[dict[str, Any]]:
    """Return the running summary followed by recent messages.

    Args:
        current_message: The current message (not used by summary memory,
                        kept for interface compatibility)

    Returns:
        List of message dicts. If a summary exists it is the first
        element with ``role="system"``; the remaining elements are
        the recent verbatim messages.
    """
    context: list[dict[str, Any]] = []

    if self._summary:
        context.append(
            {
                "content": f"[Conversation summary]: {self._summary}",
                "role": "system",
                "metadata": {"is_summary": True},
            }
        )

    context.extend(self._messages)
    return context
providers
providers() -> dict[str, Any]

Return the summary LLM provider for catalog registration.

Always reports the provider for discovery and observability. The _owns_llm_provider flag controls lifecycle (close()), not visibility — consistent with VectorMemory, RAGKnowledgeBase, and WizardReasoning.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
def providers(self) -> dict[str, Any]:
    """Return the summary LLM provider for catalog registration.

    Always reports the provider for discovery and observability.
    The ``_owns_llm_provider`` flag controls lifecycle (``close()``),
    not visibility — consistent with VectorMemory, RAGKnowledgeBase,
    and WizardReasoning.
    """
    from dataknobs_bots.bot.base import PROVIDER_ROLE_SUMMARY_LLM

    if self.llm_provider is not None:
        return {PROVIDER_ROLE_SUMMARY_LLM: self.llm_provider}
    return {}
set_provider
set_provider(role: str, provider: Any) -> bool

Replace the summary LLM provider if the role matches.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Replace the summary LLM provider if the role matches."""
    from dataknobs_bots.bot.base import PROVIDER_ROLE_SUMMARY_LLM

    if role == PROVIDER_ROLE_SUMMARY_LLM:
        self.llm_provider = provider
        return True
    return False
close async
close() -> None

Close the LLM provider if this instance owns it.

When a dedicated provider was created for this memory (via the llm config key), this instance owns its lifecycle. When the bot's main LLM was passed in as a fallback, the bot owns it.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
async def close(self) -> None:
    """Close the LLM provider if this instance owns it.

    When a dedicated provider was created for this memory (via the
    ``llm`` config key), this instance owns its lifecycle. When the
    bot's main LLM was passed in as a fallback, the bot owns it.
    """
    if self._owns_llm_provider and self.llm_provider and hasattr(self.llm_provider, "close"):
        try:
            await self.llm_provider.close()
        except Exception:
            logger.exception("Error closing summary LLM provider")
clear async
clear() -> None

Clear both the running summary and the message buffer.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
async def clear(self) -> None:
    """Clear both the running summary and the message buffer."""
    self._messages.clear()
    self._summary = ""
pop_messages async
pop_messages(count: int = 2) -> list[dict[str, Any]]

Remove and return the last N messages from the recent window.

Only messages still in the recent buffer can be popped. Messages that have already been summarized are irreversibly compressed and cannot be individually removed.

Parameters:

Name Type Description Default
count int

Number of messages to remove from the end.

2

Returns:

Type Description
list[dict[str, Any]]

The removed messages in the order they were stored.

Raises:

Type Description
ValueError

If count exceeds available (unsummarized) messages or is < 1.

Source code in packages/bots/src/dataknobs_bots/memory/summary.py
async def pop_messages(self, count: int = 2) -> list[dict[str, Any]]:
    """Remove and return the last N messages from the recent window.

    Only messages still in the recent buffer can be popped. Messages that
    have already been summarized are irreversibly compressed and cannot be
    individually removed.

    Args:
        count: Number of messages to remove from the end.

    Returns:
        The removed messages in the order they were stored.

    Raises:
        ValueError: If count exceeds available (unsummarized) messages
            or is < 1.
    """
    if count < 1:
        raise ValueError(f"count must be >= 1, got {count}")
    if count > len(self._messages):
        raise ValueError(
            f"Cannot pop {count} messages, only {len(self._messages)} "
            f"unsummarized messages available in the recent window"
        )
    removed = []
    for _ in range(count):
        removed.append(self._messages.pop())
    removed.reverse()
    return removed

VectorMemory

VectorMemory(
    vector_store: Any,
    embedding_provider: Any,
    max_results: int = 5,
    similarity_threshold: float = 0.7,
    default_metadata: dict[str, Any] | None = None,
    default_filter: dict[str, Any] | None = None,
    owns_embedding_provider: bool = False,
    owns_vector_store: bool = False,
)

Bases: Memory

Vector-based semantic memory using dataknobs-data vector stores.

This implementation stores messages with vector embeddings and retrieves relevant messages based on semantic similarity.

Attributes:

Name Type Description
vector_store

Vector store backend from dataknobs_data.vector.stores

embedding_provider

LLM provider for generating embeddings

max_results

Maximum number of results to return

similarity_threshold

Minimum similarity score for results

Initialize vector memory.

Parameters:

Name Type Description Default
vector_store Any

Vector store backend instance

required
embedding_provider Any

LLM provider with embed() method

required
max_results int

Maximum number of similar messages to return

5
similarity_threshold float

Minimum similarity score (0-1)

0.7
default_metadata dict[str, Any] | None

Metadata merged into every add_message() call. Caller-supplied metadata overrides these defaults. Use for tenant scoping, e.g. {"user_id": "u123"}.

None
default_filter dict[str, Any] | None

Filter merged into every get_context() search call. Use to scope reads to a tenant, e.g. {"user_id": "u123"}.

None
owns_embedding_provider bool

If True, close() will close the embedding provider. Set by from_config for resources it creates. Default False for externally-injected providers.

False
owns_vector_store bool

If True, close() will close the vector store. Set by from_config for resources it creates. Default False for externally-injected stores.

False

Methods:

Name Description
from_config

Create VectorMemory from configuration.

add_message

Add message with vector embedding.

get_context

Get semantically relevant messages.

providers

Return the embedding provider, keyed by role.

set_provider

Replace the embedding provider if the role matches.

close

Close owned resources.

clear

Clear all vectors from memory.

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
def __init__(
    self,
    vector_store: Any,
    embedding_provider: Any,
    max_results: int = 5,
    similarity_threshold: float = 0.7,
    default_metadata: dict[str, Any] | None = None,
    default_filter: dict[str, Any] | None = None,
    owns_embedding_provider: bool = False,
    owns_vector_store: bool = False,
):
    """Initialize vector memory.

    Args:
        vector_store: Vector store backend instance
        embedding_provider: LLM provider with embed() method
        max_results: Maximum number of similar messages to return
        similarity_threshold: Minimum similarity score (0-1)
        default_metadata: Metadata merged into every ``add_message()``
            call. Caller-supplied metadata overrides these defaults.
            Use for tenant scoping, e.g. ``{"user_id": "u123"}``.
        default_filter: Filter merged into every ``get_context()``
            search call. Use to scope reads to a tenant, e.g.
            ``{"user_id": "u123"}``.
        owns_embedding_provider: If True, ``close()`` will close the
            embedding provider. Set by ``from_config`` for resources
            it creates. Default False for externally-injected providers.
        owns_vector_store: If True, ``close()`` will close the vector
            store. Set by ``from_config`` for resources it creates.
            Default False for externally-injected stores.
    """
    self.vector_store = vector_store
    self.embedding_provider = embedding_provider
    self.max_results = max_results
    self.similarity_threshold = similarity_threshold
    self._default_metadata = default_metadata or {}
    self._default_filter = default_filter or {}
    self._owns_embedding_provider = owns_embedding_provider
    self._owns_vector_store = owns_vector_store
Functions
from_config async classmethod
from_config(config: dict[str, Any]) -> VectorMemory

Create VectorMemory from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with: - backend: Vector store backend type - dimension: Vector store dimension (singular; default 1536) - collection: Collection/index name (optional) - embedding: Nested embedding config dict (preferred), e.g. {"provider": "ollama", "model": "nomic-embed-text", "dimensions": 768} - embedding_provider / embedding_model: Legacy flat keys. Note: dimensions (plural) at the top level is forwarded to the embedding provider, not the vector store. Use dimension (singular) for the vector store size. - max_results: Max results to return (default 5) - similarity_threshold: Min similarity score (default 0.7)

required

Returns:

Type Description
VectorMemory

Configured VectorMemory instance

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
@classmethod
async def from_config(cls, config: dict[str, Any]) -> "VectorMemory":
    """Create VectorMemory from configuration.

    Args:
        config: Configuration dictionary with:
            - backend: Vector store backend type
            - dimension: Vector store dimension (singular; default 1536)
            - collection: Collection/index name (optional)
            - embedding: Nested embedding config dict (preferred), e.g.
              ``{"provider": "ollama", "model": "nomic-embed-text",
              "dimensions": 768}``
            - embedding_provider / embedding_model: Legacy flat keys.
              Note: ``dimensions`` (plural) at the top level is forwarded
              to the embedding provider, not the vector store.  Use
              ``dimension`` (singular) for the vector store size.
            - max_results: Max results to return (default 5)
            - similarity_threshold: Min similarity score (default 0.7)

    Returns:
        Configured VectorMemory instance
    """
    from dataknobs_data.vector.stores import VectorStoreFactory

    from ..providers import create_embedding_provider

    # Create vector store
    store_config = {
        "backend": config.get("backend", "memory"),
        "dimensions": config.get("dimension", 1536),
    }

    # Add optional store parameters
    if "collection" in config:
        store_config["collection_name"] = config["collection"]
    if "persist_path" in config:
        store_config["persist_path"] = config["persist_path"]

    # Merge any additional store_params
    if "store_params" in config:
        store_config.update(config["store_params"])

    factory = VectorStoreFactory()
    vector_store = factory.create(**store_config)
    await vector_store.initialize()

    # Create embedding provider
    embedding_provider = await create_embedding_provider(config)

    return cls(
        vector_store=vector_store,
        embedding_provider=embedding_provider,
        max_results=config.get("max_results", 5),
        similarity_threshold=config.get("similarity_threshold", 0.7),
        default_metadata=config.get("default_metadata"),
        default_filter=config.get("default_filter"),
        owns_embedding_provider=True,
        owns_vector_store=True,
    )
add_message async
add_message(
    content: str, role: str, metadata: dict[str, Any] | None = None
) -> None

Add message with vector embedding.

Parameters:

Name Type Description Default
content str

Message content

required
role str

Message role

required
metadata dict[str, Any] | None

Optional caller-supplied metadata. Merged after default_metadata (from init) and system base fields (content, role, timestamp, id). Caller metadata has highest precedence.

None
Source code in packages/bots/src/dataknobs_bots/memory/vector.py
async def add_message(
    self, content: str, role: str, metadata: dict[str, Any] | None = None
) -> None:
    """Add message with vector embedding.

    Args:
        content: Message content
        role: Message role
        metadata: Optional caller-supplied metadata. Merged after
            ``default_metadata`` (from init) and system base fields
            (``content``, ``role``, ``timestamp``, ``id``).
            Caller metadata has highest precedence.
    """
    # Generate embedding
    embedding = await self.embedding_provider.embed(content)

    # Convert to numpy array if needed
    if not isinstance(embedding, np.ndarray):
        embedding = np.array(embedding, dtype=np.float32)

    # Merge order: defaults < base fields (system-controlled) < caller metadata
    msg_metadata = dict(self._default_metadata)
    msg_metadata.update({
        "content": content,
        "role": role,
        "timestamp": datetime.now().isoformat(),
        "id": str(uuid4()),
    })
    if metadata:
        msg_metadata.update(metadata)

    # Store in vector store
    await self.vector_store.add_vectors(
        vectors=[embedding], ids=[msg_metadata["id"]], metadata=[msg_metadata]
    )
get_context async
get_context(current_message: str) -> list[dict[str, Any]]

Get semantically relevant messages.

Parameters:

Name Type Description Default
current_message str

Current message to find context for

required

Returns:

Type Description
list[dict[str, Any]]

List of relevant message dictionaries sorted by similarity

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
async def get_context(self, current_message: str) -> list[dict[str, Any]]:
    """Get semantically relevant messages.

    Args:
        current_message: Current message to find context for

    Returns:
        List of relevant message dictionaries sorted by similarity
    """
    # Generate query embedding
    query_embedding = await self.embedding_provider.embed(current_message)

    # Convert to numpy array if needed
    if not isinstance(query_embedding, np.ndarray):
        query_embedding = np.array(query_embedding, dtype=np.float32)

    # Search for similar vectors
    search_kwargs: dict[str, Any] = {
        "query_vector": query_embedding,
        "k": self.max_results,
        "include_metadata": True,
    }
    if self._default_filter:
        search_kwargs["filter"] = dict(self._default_filter)

    results = await self.vector_store.search(**search_kwargs)

    # Format results
    context = []
    for _vector_id, similarity, msg_metadata in results:
        if msg_metadata and similarity >= self.similarity_threshold:
            context.append(
                {
                    "content": msg_metadata.get("content", ""),
                    "role": msg_metadata.get("role", ""),
                    "similarity": similarity,
                    "metadata": msg_metadata,
                }
            )

    return context
providers
providers() -> dict[str, Any]

Return the embedding provider, keyed by role.

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
def providers(self) -> dict[str, Any]:
    """Return the embedding provider, keyed by role."""
    from dataknobs_bots.bot.base import PROVIDER_ROLE_MEMORY_EMBEDDING

    if self.embedding_provider is not None:
        return {PROVIDER_ROLE_MEMORY_EMBEDDING: self.embedding_provider}
    return {}
set_provider
set_provider(role: str, provider: Any) -> bool

Replace the embedding provider if the role matches.

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Replace the embedding provider if the role matches."""
    from dataknobs_bots.bot.base import PROVIDER_ROLE_MEMORY_EMBEDDING

    if role == PROVIDER_ROLE_MEMORY_EMBEDDING:
        self.embedding_provider = provider
        return True
    return False
close async
close() -> None

Close owned resources.

Only closes resources that this instance owns (created in from_config). Externally-injected resources are left open for the caller to manage.

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
async def close(self) -> None:
    """Close owned resources.

    Only closes resources that this instance owns (created in
    ``from_config``). Externally-injected resources are left open
    for the caller to manage.
    """
    if (
        self._owns_embedding_provider
        and self.embedding_provider
        and hasattr(self.embedding_provider, "close")
    ):
        try:
            await self.embedding_provider.close()
        except Exception:
            logger.exception("Error closing embedding provider")

    if (
        self._owns_vector_store
        and self.vector_store
        and hasattr(self.vector_store, "close")
    ):
        try:
            await self.vector_store.close()
        except Exception:
            logger.exception("Error closing vector store")
clear async
clear() -> None

Clear all vectors from memory.

Delegates to the vector store's clear() method. Use with caution if the store is shared across multiple memory instances.

Raises:

Type Description
NotImplementedError

If the backing vector store does not support clear().

Source code in packages/bots/src/dataknobs_bots/memory/vector.py
async def clear(self) -> None:
    """Clear all vectors from memory.

    Delegates to the vector store's ``clear()`` method. Use with caution
    if the store is shared across multiple memory instances.

    Raises:
        NotImplementedError: If the backing vector store does not
            support ``clear()``.
    """
    if hasattr(self.vector_store, "clear"):
        await self.vector_store.clear()
    else:
        raise NotImplementedError(
            "Vector store does not support clearing. "
            "Consider creating a new VectorMemory instance with a fresh collection."
        )

CostTrackingMiddleware

CostTrackingMiddleware(
    track_tokens: bool = True, cost_rates: dict[str, Any] | None = None
)

Bases: Middleware

Middleware for tracking LLM API costs and usage.

Monitors token usage across different providers (Ollama, OpenAI, Anthropic, etc.) to help optimize costs and track budgets.

Attributes:

Name Type Description
track_tokens

Whether to track token usage

cost_rates

Token cost rates per provider/model

usage_stats

Accumulated usage statistics by client_id

Example
# Create middleware with default rates
middleware = CostTrackingMiddleware()

# Or with custom rates
middleware = CostTrackingMiddleware(
    cost_rates={
        "openai": {
            "gpt-4o": {"input": 0.0025, "output": 0.01},
        },
    }
)

# Get stats
stats = middleware.get_client_stats("my-client")
total = middleware.get_total_cost()

# Export to JSON
json_data = middleware.export_stats_json()

Initialize cost tracking middleware.

Parameters:

Name Type Description Default
track_tokens bool

Enable token tracking

True
cost_rates dict[str, Any] | None

Optional custom cost rates (merged with defaults)

None

Methods:

Name Description
on_turn_start

Log estimated input tokens at the start of a turn.

after_turn

Track costs after turn completion using TurnState data.

on_error

Log errors but don't track costs for failed requests.

on_hook_error

Track middleware hook failures.

get_client_stats

Get usage statistics for a client.

get_all_stats

Get all usage statistics.

get_total_cost

Get total cost across all clients.

get_total_tokens

Get total tokens across all clients.

clear_stats

Clear usage statistics.

export_stats_json

Export all statistics as JSON.

export_stats_csv

Export statistics as CSV (one row per client).

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def __init__(
    self,
    track_tokens: bool = True,
    cost_rates: dict[str, Any] | None = None,
):
    """Initialize cost tracking middleware.

    Args:
        track_tokens: Enable token tracking
        cost_rates: Optional custom cost rates (merged with defaults)
    """
    self.track_tokens = track_tokens
    # Merge custom rates with defaults
    self.cost_rates = self.DEFAULT_RATES.copy()
    if cost_rates:
        for provider, rates in cost_rates.items():
            if provider in self.cost_rates:
                if isinstance(rates, dict) and isinstance(
                    self.cost_rates[provider], dict
                ):
                    self.cost_rates[provider].update(rates)
                else:
                    self.cost_rates[provider] = rates
            else:
                self.cost_rates[provider] = rates

    self._usage_stats: dict[str, dict[str, Any]] = {}
    self._logger = logging.getLogger(f"{__name__}.CostTracker")
Functions
on_turn_start async
on_turn_start(turn: TurnState) -> str | None

Log estimated input tokens at the start of a turn.

Parameters:

Name Type Description Default
turn TurnState

Turn state at the start of the pipeline.

required

Returns:

Type Description
str | None

None (no message transform).

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
async def on_turn_start(self, turn: TurnState) -> str | None:
    """Log estimated input tokens at the start of a turn.

    Args:
        turn: Turn state at the start of the pipeline.

    Returns:
        None (no message transform).
    """
    # Estimate input tokens (rough approximation: ~4 chars per token)
    estimated_tokens = len(turn.message) // 4
    self._logger.debug("Estimated input tokens: %d", estimated_tokens)
    return None
after_turn async
after_turn(turn: TurnState) -> None

Track costs after turn completion using TurnState data.

Uses real token usage when the provider reports it, otherwise estimates from message/response text length (~4 chars per token).

Parameters:

Name Type Description Default
turn TurnState

Completed turn state with usage and response data.

required
Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
async def after_turn(self, turn: TurnState) -> None:
    """Track costs after turn completion using TurnState data.

    Uses real token usage when the provider reports it, otherwise
    estimates from message/response text length (~4 chars per token).

    Args:
        turn: Completed turn state with usage and response data.
    """
    if not self.track_tokens:
        return

    client_id = turn.context.client_id
    provider = turn.provider_name or "unknown"
    model = turn.model or "unknown"

    if turn.usage:
        input_tokens = int(
            turn.usage.get(
                "input",
                turn.usage.get("prompt_tokens", 0),
            )
        )
        output_tokens = int(
            turn.usage.get(
                "output",
                turn.usage.get("completion_tokens", 0),
            )
        )
        estimated = False
    else:
        # Estimate from text length (~4 chars per token).
        # Note: turn.message is the user's message before KB/memory
        # augmentation.  The actual LLM input includes system prompt,
        # KB chunks, and memory context, so this underestimates real
        # input tokens.  When real usage data is available (above
        # branch), this fallback is not reached.
        input_tokens = len(turn.message) // 4
        output_tokens = len(turn.response_content) // 4
        estimated = True

    hook_counter = (
        "stream_turns" if turn.is_streaming else "chat_turns"
    )
    cost = self._record_usage(
        client_id, hook_counter,
        provider, model, input_tokens, output_tokens,
    )

    total = self._usage_stats[client_id]["total_cost_usd"]
    mode_label = turn.mode.value
    est_marker = " (estimated)" if estimated else ""
    self._logger.info(
        "Turn complete (%s) - Client %s: %s/%s - "
        "%d in + %d out tokens%s, cost: $%.6f, total: $%.6f",
        mode_label, client_id, provider, model,
        input_tokens, output_tokens, est_marker, cost, total,
    )
on_error async
on_error(error: Exception, message: str, context: BotContext) -> None

Log errors but don't track costs for failed requests.

Parameters:

Name Type Description Default
error Exception

The exception that occurred

required
message str

User message that caused the error

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
async def on_error(
    self, error: Exception, message: str, context: BotContext
) -> None:
    """Log errors but don't track costs for failed requests.

    Args:
        error: The exception that occurred
        message: User message that caused the error
        context: Bot context
    """
    client_id = context.client_id
    if client_id not in self._usage_stats:
        self._usage_stats[client_id] = self._new_client_stats(client_id)
    self._usage_stats[client_id]["on_error_calls"] += 1

    self._logger.warning(
        "Error during request for client %s: %s", client_id, error,
    )
on_hook_error async
on_hook_error(hook_name: str, error: Exception, context: BotContext) -> None

Track middleware hook failures.

Parameters:

Name Type Description Default
hook_name str

Name of the hook that failed

required
error Exception

The exception raised by the middleware hook

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
async def on_hook_error(
    self, hook_name: str, error: Exception, context: BotContext
) -> None:
    """Track middleware hook failures.

    Args:
        hook_name: Name of the hook that failed
        error: The exception raised by the middleware hook
        context: Bot context
    """
    client_id = context.client_id
    if client_id not in self._usage_stats:
        self._usage_stats[client_id] = self._new_client_stats(client_id)
    self._usage_stats[client_id]["on_hook_error_calls"] += 1

    self._logger.warning(
        "Middleware hook %s failed for client %s: %s",
        hook_name, client_id, error,
    )
get_client_stats
get_client_stats(client_id: str) -> dict[str, Any] | None

Get usage statistics for a client.

Parameters:

Name Type Description Default
client_id str

Client identifier

required

Returns:

Type Description
dict[str, Any] | None

Usage statistics or None if not found

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def get_client_stats(self, client_id: str) -> dict[str, Any] | None:
    """Get usage statistics for a client.

    Args:
        client_id: Client identifier

    Returns:
        Usage statistics or None if not found
    """
    return self._usage_stats.get(client_id)
get_all_stats
get_all_stats() -> dict[str, dict[str, Any]]

Get all usage statistics.

Returns:

Type Description
dict[str, dict[str, Any]]

Dictionary mapping client_id to statistics

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def get_all_stats(self) -> dict[str, dict[str, Any]]:
    """Get all usage statistics.

    Returns:
        Dictionary mapping client_id to statistics
    """
    return self._usage_stats.copy()
get_total_cost
get_total_cost() -> float

Get total cost across all clients.

Returns:

Type Description
float

Total cost in USD

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def get_total_cost(self) -> float:
    """Get total cost across all clients.

    Returns:
        Total cost in USD
    """
    return float(
        sum(stats["total_cost_usd"] for stats in self._usage_stats.values())
    )
get_total_tokens
get_total_tokens() -> dict[str, int]

Get total tokens across all clients.

Returns:

Type Description
dict[str, int]

Dictionary with 'input', 'output', and 'total' token counts

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def get_total_tokens(self) -> dict[str, int]:
    """Get total tokens across all clients.

    Returns:
        Dictionary with 'input', 'output', and 'total' token counts
    """
    input_tokens = sum(
        stats["total_input_tokens"] for stats in self._usage_stats.values()
    )
    output_tokens = sum(
        stats["total_output_tokens"] for stats in self._usage_stats.values()
    )
    return {
        "input": input_tokens,
        "output": output_tokens,
        "total": input_tokens + output_tokens,
    }
clear_stats
clear_stats(client_id: str | None = None) -> None

Clear usage statistics.

Parameters:

Name Type Description Default
client_id str | None

If provided, clear only this client. Otherwise clear all.

None
Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def clear_stats(self, client_id: str | None = None) -> None:
    """Clear usage statistics.

    Args:
        client_id: If provided, clear only this client. Otherwise clear all.
    """
    if client_id:
        if client_id in self._usage_stats:
            del self._usage_stats[client_id]
    else:
        self._usage_stats.clear()
export_stats_json
export_stats_json(indent: int = 2) -> str

Export all statistics as JSON.

Parameters:

Name Type Description Default
indent int

JSON indentation level

2

Returns:

Type Description
str

JSON string of all statistics

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def export_stats_json(self, indent: int = 2) -> str:
    """Export all statistics as JSON.

    Args:
        indent: JSON indentation level

    Returns:
        JSON string of all statistics
    """
    return json.dumps(self._usage_stats, indent=indent)
export_stats_csv
export_stats_csv() -> str

Export statistics as CSV (one row per client).

Returns:

Type Description
str

CSV string with headers

Source code in packages/bots/src/dataknobs_bots/middleware/cost.py
def export_stats_csv(self) -> str:
    """Export statistics as CSV (one row per client).

    Returns:
        CSV string with headers
    """
    lines = [
        "client_id,total_requests,total_input_tokens,total_output_tokens,total_cost_usd"
    ]
    for client_id, stats in self._usage_stats.items():
        lines.append(
            f"{client_id},{stats['total_requests']},"
            f"{stats['total_input_tokens']},{stats['total_output_tokens']},"
            f"{stats['total_cost_usd']:.6f}"
        )
    return "\n".join(lines)

LoggingMiddleware

LoggingMiddleware(
    log_level: str = "INFO",
    include_metadata: bool = True,
    json_format: bool = False,
)

Bases: Middleware

Middleware for tracking conversation interactions.

Logs all user messages and bot responses with context for monitoring, debugging, and analytics.

Uses the unified TurnState hooks:

  • on_turn_start — logs incoming user message
  • after_turn — logs turn completion with response, usage, tools

Attributes:

Name Type Description
log_level

Logging level to use (default: INFO)

include_metadata

Whether to include full context metadata

json_format

Whether to output logs in JSON format

Example
# Basic usage
middleware = LoggingMiddleware()

# With JSON format for log aggregation
middleware = LoggingMiddleware(
    log_level="INFO",
    include_metadata=True,
    json_format=True
)

Initialize logging middleware.

Parameters:

Name Type Description Default
log_level str

Logging level (DEBUG, INFO, WARNING, ERROR)

'INFO'
include_metadata bool

Whether to log full context metadata

True
json_format bool

Whether to output in JSON format

False

Methods:

Name Description
on_turn_start

Log incoming user message at the start of a turn.

after_turn

Log turn completion with unified data for all turn types.

on_error

Called when an error occurs during message processing.

on_hook_error

Called when a middleware hook itself raises.

Source code in packages/bots/src/dataknobs_bots/middleware/logging.py
def __init__(
    self,
    log_level: str = "INFO",
    include_metadata: bool = True,
    json_format: bool = False,
):
    """Initialize logging middleware.

    Args:
        log_level: Logging level (DEBUG, INFO, WARNING, ERROR)
        include_metadata: Whether to log full context metadata
        json_format: Whether to output in JSON format
    """
    self.log_level = log_level
    self.include_metadata = include_metadata
    self.json_format = json_format
    self._logger = logging.getLogger(f"{__name__}.ConversationLogger")
    self._logger.setLevel(getattr(logging, log_level.upper()))
Functions
on_turn_start async
on_turn_start(turn: TurnState) -> str | None

Log incoming user message at the start of a turn.

Parameters:

Name Type Description Default
turn TurnState

Turn state at the start of the pipeline.

required

Returns:

Type Description
str | None

None (no message transform).

Source code in packages/bots/src/dataknobs_bots/middleware/logging.py
async def on_turn_start(self, turn: TurnState) -> str | None:
    """Log incoming user message at the start of a turn.

    Args:
        turn: Turn state at the start of the pipeline.

    Returns:
        None (no message transform).
    """
    log_data: dict[str, Any] = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event": "user_message",
        "mode": turn.mode.value,
        "client_id": turn.context.client_id,
        "user_id": turn.context.user_id,
        "conversation_id": turn.context.conversation_id,
        "message_length": len(turn.message),
    }

    if self.include_metadata:
        log_data["session_metadata"] = turn.context.session_metadata
        log_data["request_metadata"] = turn.context.request_metadata

    if self.json_format:
        self._logger.info(json.dumps(log_data))
    else:
        self._logger.info("User message: %s", log_data)

    # Log content at DEBUG level (first 200 chars)
    self._logger.debug("Message content: %.200s...", turn.message)
    return None
after_turn async
after_turn(turn: TurnState) -> None

Log turn completion with unified data for all turn types.

Parameters:

Name Type Description Default
turn TurnState

Completed turn state.

required
Source code in packages/bots/src/dataknobs_bots/middleware/logging.py
async def after_turn(self, turn: TurnState) -> None:
    """Log turn completion with unified data for all turn types.

    Args:
        turn: Completed turn state.
    """
    log_data: dict[str, Any] = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event": "turn_complete",
        "mode": turn.mode.value,
        "client_id": turn.context.client_id,
        "user_id": turn.context.user_id,
        "conversation_id": turn.context.conversation_id,
        "response_length": len(turn.response_content),
    }

    if turn.usage:
        log_data["tokens_used"] = turn.usage
    if turn.provider_name:
        log_data["provider"] = turn.provider_name
    if turn.model:
        log_data["model"] = turn.model
    if turn.tool_executions:
        log_data["tool_executions"] = len(turn.tool_executions)

    if self.include_metadata:
        log_data["session_metadata"] = turn.context.session_metadata
        log_data["request_metadata"] = turn.context.request_metadata

    if self.json_format:
        self._logger.info(json.dumps(log_data))
    else:
        self._logger.info("Turn complete: %s", log_data)

    # Log content at DEBUG level (first 200 chars)
    self._logger.debug("Response content: %.200s...", turn.response_content)
on_error async
on_error(error: Exception, message: str, context: BotContext) -> None

Called when an error occurs during message processing.

Parameters:

Name Type Description Default
error Exception

The exception that occurred

required
message str

User message that caused the error

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/logging.py
async def on_error(
    self, error: Exception, message: str, context: BotContext
) -> None:
    """Called when an error occurs during message processing.

    Args:
        error: The exception that occurred
        message: User message that caused the error
        context: Bot context
    """
    log_data = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event": "error",
        "client_id": context.client_id,
        "user_id": context.user_id,
        "conversation_id": context.conversation_id,
        "error_type": type(error).__name__,
        "error_message": str(error),
    }

    if self.json_format:
        self._logger.error(json.dumps(log_data), exc_info=error)
    else:
        self._logger.error(
            "Error processing message: %s", log_data, exc_info=error
        )
on_hook_error async
on_hook_error(hook_name: str, error: Exception, context: BotContext) -> None

Called when a middleware hook itself raises.

Parameters:

Name Type Description Default
hook_name str

Name of the hook that failed

required
error Exception

The exception raised by the middleware hook

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/logging.py
async def on_hook_error(
    self, hook_name: str, error: Exception, context: BotContext
) -> None:
    """Called when a middleware hook itself raises.

    Args:
        hook_name: Name of the hook that failed
        error: The exception raised by the middleware hook
        context: Bot context
    """
    log_data = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event": "hook_error",
        "hook_name": hook_name,
        "client_id": context.client_id,
        "user_id": context.user_id,
        "conversation_id": context.conversation_id,
        "error_type": type(error).__name__,
        "error_message": str(error),
    }

    if self.json_format:
        self._logger.warning(json.dumps(log_data), exc_info=error)
    else:
        self._logger.warning(
            "Middleware hook %s failed: %s", hook_name, log_data, exc_info=error
        )

Middleware

Base class for bot middleware.

Middleware provides hooks into the bot request/response lifecycle. All hooks are concrete no-ops — subclasses override only the hooks they need.

Preferred hooks (receive full TurnState):

  • on_turn_start(turn) — before processing; can write plugin_data and optionally transform the message.
  • after_turn(turn) — after any turn completes (chat, stream, greet); unified successor to after_message and post_stream.
  • finally_turn(turn) — fires after every turn on both success and error paths. Use for resource cleanup. For stream_chat, requires full consumption or aclosing().
  • on_tool_executed(execution, context) — after each tool call.

Legacy hooks (kept for backward compatibility):

  • before_message(message, context) — use on_turn_start instead.
  • after_message(response, context, **kwargs) — use after_turn instead.
  • post_stream(message, response, context) — use after_turn instead.

Error hooks (no TurnState equivalent — still primary):

  • on_error(error, message, context) — request failed.
  • on_hook_error(hook_name, error, context) — a hook failed.
Error semantics

on_error fires when the bot request fails — the caller does NOT receive a response. on_hook_error fires when a middleware's own hook raises after the request already succeeded — the caller DID receive a response, but a middleware could not complete its post-processing.

Example
class MyMiddleware(Middleware):
    async def on_turn_start(self, turn):
        turn.plugin_data["started"] = True
        return None  # or return transformed message

    async def after_turn(self, turn):
        log.info("Turn %s done", turn.mode.value)

    async def on_error(self, error, message, context):
        log.error("Request failed: %s", error)

Methods:

Name Description
before_message

Called before processing user message.

after_message

Called after generating bot response (non-streaming).

post_stream

Called after streaming response completes.

on_error

Called when a request-level error occurs during message processing.

on_hook_error

Called when a middleware hook itself raises an exception.

on_turn_start

Called at the start of every turn, before message processing.

after_turn

Called after any turn completes (chat, stream, or greet).

finally_turn

Called after every turn, on both success and error paths.

on_tool_executed

Called after each tool execution within a turn.

Functions
before_message async
before_message(message: str, context: BotContext) -> None

Called before processing user message.

.. deprecated:: Use on_turn_start instead, which provides the full TurnState including plugin_data for cross-middleware communication and supports message transforms.

Parameters:

Name Type Description Default
message str

User's input message

required
context BotContext

Bot context with conversation and user info

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def before_message(
    self, message: str, context: BotContext
) -> None:
    """Called before processing user message.

    .. deprecated::
        Use ``on_turn_start`` instead, which provides the full
        ``TurnState`` including ``plugin_data`` for cross-middleware
        communication and supports message transforms.

    Args:
        message: User's input message
        context: Bot context with conversation and user info
    """
after_message async
after_message(response: str, context: BotContext, **kwargs: Any) -> None

Called after generating bot response (non-streaming).

.. deprecated:: Use after_turn instead, which fires for all turn types (chat, stream, greet) and provides the full TurnState with usage data, tool executions, and plugin data.

Parameters:

Name Type Description Default
response str

Bot's generated response

required
context BotContext

Bot context

required
**kwargs Any

Additional data (e.g., tokens_used, response_time_ms, provider, model)

{}
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def after_message(
    self, response: str, context: BotContext, **kwargs: Any
) -> None:
    """Called after generating bot response (non-streaming).

    .. deprecated::
        Use ``after_turn`` instead, which fires for all turn types
        (chat, stream, greet) and provides the full ``TurnState``
        with usage data, tool executions, and plugin data.

    Args:
        response: Bot's generated response
        context: Bot context
        **kwargs: Additional data (e.g., tokens_used, response_time_ms, provider, model)
    """
post_stream async
post_stream(message: str, response: str, context: BotContext) -> None

Called after streaming response completes.

.. deprecated:: Use after_turn instead, which fires for all turn types and provides real token usage data from the provider.

Parameters:

Name Type Description Default
message str

Original user message that triggered the stream

required
response str

Complete accumulated response from streaming

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def post_stream(
    self, message: str, response: str, context: BotContext
) -> None:
    """Called after streaming response completes.

    .. deprecated::
        Use ``after_turn`` instead, which fires for all turn types
        and provides real token usage data from the provider.

    Args:
        message: Original user message that triggered the stream
        response: Complete accumulated response from streaming
        context: Bot context
    """
on_error async
on_error(error: Exception, message: str, context: BotContext) -> None

Called when a request-level error occurs during message processing.

This hook fires when the bot request fails (preparation, generation, or memory/middleware post-processing in before_message). The caller does NOT receive a response.

Parameters:

Name Type Description Default
error Exception

The exception that occurred

required
message str

User message that caused the error

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def on_error(
    self, error: Exception, message: str, context: BotContext
) -> None:
    """Called when a request-level error occurs during message processing.

    This hook fires when the bot request fails (preparation, generation,
    or memory/middleware post-processing in ``before_message``). The
    caller does NOT receive a response.

    Args:
        error: The exception that occurred
        message: User message that caused the error
        context: Bot context
    """
on_hook_error async
on_hook_error(hook_name: str, error: Exception, context: BotContext) -> None

Called when a middleware hook itself raises an exception.

This fires when a post-generation middleware hook raises: after_turn, finally_turn, on_tool_executed, after_message, post_stream, or on_error. On the success path, the response was already delivered; on the error path (e.g. finally_turn after a failed turn), it was not. In either case, the middleware could not complete its own post-processing (e.g., a logging sink was unreachable, a metrics backend timed out).

Note: on_turn_start exceptions are NOT routed here — they are re-raised to abort the request (matching before_message semantics), so on_error fires instead.

Parameters:

Name Type Description Default
hook_name str

Name of the hook that failed (e.g. "after_turn", "finally_turn", "on_tool_executed", "on_error")

required
error Exception

The exception raised by the middleware hook

required
context BotContext

Bot context

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def on_hook_error(
    self, hook_name: str, error: Exception, context: BotContext
) -> None:
    """Called when a middleware hook itself raises an exception.

    This fires when a post-generation middleware hook raises:
    ``after_turn``, ``finally_turn``, ``on_tool_executed``,
    ``after_message``, ``post_stream``, or ``on_error``.  On the
    success path, the response was already delivered; on the error
    path (e.g. ``finally_turn`` after a failed turn), it was not.
    In either case, the middleware could not complete its own
    post-processing (e.g., a logging sink was unreachable, a
    metrics backend timed out).

    Note: ``on_turn_start`` exceptions are NOT routed here —
    they are re-raised to abort the request (matching
    ``before_message`` semantics), so ``on_error`` fires instead.

    Args:
        hook_name: Name of the hook that failed (e.g.
            ``"after_turn"``, ``"finally_turn"``,
            ``"on_tool_executed"``, ``"on_error"``)
        error: The exception raised by the middleware hook
        context: Bot context
    """
on_turn_start async
on_turn_start(turn: TurnState) -> str | None

Called at the start of every turn, before message processing.

Receives the full TurnState including plugin_data for cross-middleware communication. Middleware can:

  • Write to turn.plugin_data to share data with downstream pipeline participants (LLM middleware, tools, after_turn).
  • Return a transformed message string to replace turn.message before it reaches the LLM (e.g., PII stripping, attack sanitization). Transforms chain: each middleware receives the message as modified by the previous one.
  • Return None to leave the message unchanged.

Parameters:

Name Type Description Default
turn TurnState

Turn state at the start of the pipeline.

required

Returns:

Type Description
str | None

Transformed message string, or None to keep the original.

Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def on_turn_start(
    self, turn: TurnState
) -> str | None:
    """Called at the start of every turn, before message processing.

    Receives the full ``TurnState`` including ``plugin_data`` for
    cross-middleware communication. Middleware can:

    - Write to ``turn.plugin_data`` to share data with downstream
      pipeline participants (LLM middleware, tools, ``after_turn``).
    - Return a transformed message string to replace ``turn.message``
      before it reaches the LLM (e.g., PII stripping, attack
      sanitization). Transforms chain: each middleware receives the
      message as modified by the previous one.
    - Return ``None`` to leave the message unchanged.

    Args:
        turn: Turn state at the start of the pipeline.

    Returns:
        Transformed message string, or ``None`` to keep the original.
    """
    return None
after_turn async
after_turn(turn: TurnState) -> None

Called after any turn completes (chat, stream, or greet).

Provides the full TurnState with usage data, tool executions, and response content regardless of how the turn was initiated. This is the unified successor to after_message and post_stream — implement this for uniform post-turn handling.

The legacy hooks (after_message / post_stream) continue to fire as well, so existing middleware is unaffected.

Parameters:

Name Type Description Default
turn TurnState

Complete turn state with all pipeline data.

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def after_turn(self, turn: TurnState) -> None:
    """Called after any turn completes (chat, stream, or greet).

    Provides the full ``TurnState`` with usage data, tool executions,
    and response content regardless of how the turn was initiated.
    This is the unified successor to ``after_message`` and
    ``post_stream`` — implement this for uniform post-turn handling.

    The legacy hooks (``after_message`` / ``post_stream``) continue
    to fire as well, so existing middleware is unaffected.

    Args:
        turn: Complete turn state with all pipeline data.
    """
finally_turn async
finally_turn(turn: TurnState) -> None

Called after every turn, on both success and error paths.

Use this for resource cleanup (closing DB sessions, releasing locks, flushing buffers) that must happen regardless of outcome. after_turn is conditional — it does not fire on error paths or when greet() returns None. Do not assume after_turn has already run when writing finally_turn logic.

For chat() and greet(), this hook fires reliably via a finally block. For stream_chat() (an async generator), the finally block fires only when the generator is fully consumed, explicitly closed (aclose()), or garbage collected. Callers that break out of the stream early should use contextlib.aclosing to guarantee prompt cleanup.

plugin_data populated by on_turn_start (or seeded from the call site via the plugin_data parameter on chat() / stream_chat() / greet()) is available here.

This is an observational hook — failures are logged and reported via on_hook_error but do not prevent other middleware from running.

Parameters:

Name Type Description Default
turn TurnState

Turn state at the end of the pipeline. On error paths or no-strategy greet() paths, response_content may be empty and manager may be None. plugin_data is always available.

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def finally_turn(self, turn: TurnState) -> None:
    """Called after every turn, on both success and error paths.

    Use this for resource cleanup (closing DB sessions, releasing
    locks, flushing buffers) that must happen regardless of outcome.
    ``after_turn`` is conditional — it does not fire on error paths
    or when ``greet()`` returns ``None``.  Do not assume
    ``after_turn`` has already run when writing ``finally_turn``
    logic.

    For ``chat()`` and ``greet()``, this hook fires reliably via a
    ``finally`` block.  For ``stream_chat()`` (an async generator),
    the ``finally`` block fires only when the generator is fully
    consumed, explicitly closed (``aclose()``), or garbage
    collected.  Callers that break out of the stream early should
    use ``contextlib.aclosing`` to guarantee prompt cleanup.

    ``plugin_data`` populated by ``on_turn_start`` (or seeded from
    the call site via the ``plugin_data`` parameter on ``chat()`` /
    ``stream_chat()`` / ``greet()``) is available here.

    This is an observational hook — failures are logged and reported
    via ``on_hook_error`` but do not prevent other middleware from
    running.

    Args:
        turn: Turn state at the end of the pipeline.  On error paths
            or no-strategy ``greet()`` paths, ``response_content``
            may be empty and ``manager`` may be ``None``.
            ``plugin_data`` is always available.
    """
on_tool_executed async
on_tool_executed(execution: ToolExecution, context: BotContext) -> None

Called after each tool execution within a turn.

Fired once per tool invocation, before after_turn. All on_tool_executed calls happen post-turn during _finalize_turn(), not in real-time as tools execute — this hook is for auditing and logging, not for aborting or rate-limiting mid-turn.

Ordering note: DynaBot-level tool executions appear first, followed by strategy-level executions (e.g. ReAct). In practice only one source produces executions per turn.

Parameters:

Name Type Description Default
execution ToolExecution

Record of the tool execution (name, params, result, error, duration).

required
context BotContext

Bot context for the current turn.

required
Source code in packages/bots/src/dataknobs_bots/middleware/base.py
async def on_tool_executed(
    self, execution: ToolExecution, context: BotContext
) -> None:
    """Called after each tool execution within a turn.

    Fired once per tool invocation, before ``after_turn``.  All
    ``on_tool_executed`` calls happen **post-turn** during
    ``_finalize_turn()``, not in real-time as tools execute — this
    hook is for auditing and logging, not for aborting or
    rate-limiting mid-turn.

    Ordering note: DynaBot-level tool executions appear first,
    followed by strategy-level executions (e.g. ReAct).  In
    practice only one source produces executions per turn.

    Args:
        execution: Record of the tool execution (name, params, result,
            error, duration).
        context: Bot context for the current turn.
    """

ReActReasoning

ReActReasoning(
    max_iterations: int = 5,
    verbose: bool = False,
    store_trace: bool = False,
    artifact_registry: Any | None = None,
    review_executor: Any | None = None,
    context_builder: Any | None = None,
    extra_context: dict[str, Any] | None = None,
    prompt_refresher: Callable[[], str] | None = None,
    greeting_template: str | None = None,
)

Bases: ReasoningStrategy

ReAct (Reasoning + Acting) strategy.

This strategy implements the ReAct pattern where the LLM: 1. Reasons about what to do (Thought) 2. Takes an action (using tools if needed) 3. Observes the result 4. Repeats until task is complete

This is useful for: - Multi-step problem solving - Tasks requiring tool use - Complex reasoning chains

Attributes:

Name Type Description
max_iterations

Maximum number of reasoning loops

verbose

Whether to enable debug-level logging

store_trace

Whether to store reasoning trace in conversation metadata

Example
strategy = ReActReasoning(
    max_iterations=5,
    verbose=True,
    store_trace=True
)
response = await strategy.generate(
    manager=conversation_manager,
    llm=llm_provider,
    tools=[search_tool, calculator_tool]
)

Initialize ReAct reasoning strategy.

Parameters:

Name Type Description Default
max_iterations int

Maximum reasoning/action iterations

5
verbose bool

Enable debug-level logging for reasoning steps

False
store_trace bool

Store reasoning trace in conversation metadata

False
artifact_registry Any | None

Optional ArtifactRegistry for artifact management

None
review_executor Any | None

Optional ReviewExecutor for running reviews

None
context_builder Any | None

Optional ContextBuilder for building conversation context

None
extra_context dict[str, Any] | None

Optional extra key-value pairs to merge into the ToolExecutionContext for every tool call (e.g. banks, custom state)

None
prompt_refresher Callable[[], str] | None

Optional callback that returns a fresh system prompt string. Called after tool execution in each iteration to update system_prompt_override in the next manager.complete() call. This prevents stale context when mutating tools change artifact/bank state mid-loop.

None
greeting_template str | None

Optional Jinja2 template for bot-initiated greetings (inherited from ReasoningStrategy).

None

Methods:

Name Description
from_config

Create ReActReasoning from a configuration dict.

generate

Generate response using ReAct loop.

Source code in packages/bots/src/dataknobs_bots/reasoning/react.py
def __init__(
    self,
    max_iterations: int = 5,
    verbose: bool = False,
    store_trace: bool = False,
    artifact_registry: Any | None = None,
    review_executor: Any | None = None,
    context_builder: Any | None = None,
    extra_context: dict[str, Any] | None = None,
    prompt_refresher: Callable[[], str] | None = None,
    greeting_template: str | None = None,
):
    """Initialize ReAct reasoning strategy.

    Args:
        max_iterations: Maximum reasoning/action iterations
        verbose: Enable debug-level logging for reasoning steps
        store_trace: Store reasoning trace in conversation metadata
        artifact_registry: Optional ArtifactRegistry for artifact management
        review_executor: Optional ReviewExecutor for running reviews
        context_builder: Optional ContextBuilder for building conversation context
        extra_context: Optional extra key-value pairs to merge into the
            ToolExecutionContext for every tool call (e.g. banks, custom state)
        prompt_refresher: Optional callback that returns a fresh system
            prompt string.  Called after tool execution in each iteration
            to update ``system_prompt_override`` in the next
            ``manager.complete()`` call.  This prevents stale context
            when mutating tools change artifact/bank state mid-loop.
        greeting_template: Optional Jinja2 template for bot-initiated
            greetings (inherited from ReasoningStrategy).
    """
    super().__init__(greeting_template=greeting_template)
    self.max_iterations = max_iterations
    self.verbose = verbose
    self.store_trace = store_trace
    self._artifact_registry = artifact_registry
    self._review_executor = review_executor
    self._context_builder = context_builder
    self._extra_context = extra_context
    self._prompt_refresher = prompt_refresher
Attributes
artifact_registry property
artifact_registry: Any | None

Get the artifact registry if configured.

review_executor property
review_executor: Any | None

Get the review executor if configured.

context_builder property
context_builder: Any | None

Get the context builder if configured.

Functions
from_config classmethod
from_config(config: dict[str, Any], **_kwargs: Any) -> ReActReasoning

Create ReActReasoning from a configuration dict.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dict with optional keys: max_iterations, verbose, store_trace, greeting_template.

required
**_kwargs Any

Ignored (no KB or provider injection needed).

{}

Returns:

Type Description
ReActReasoning

Configured ReActReasoning instance.

Source code in packages/bots/src/dataknobs_bots/reasoning/react.py
@classmethod
def from_config(cls, config: dict[str, Any], **_kwargs: Any) -> ReActReasoning:  # type: ignore[override]
    """Create ReActReasoning from a configuration dict.

    Args:
        config: Configuration dict with optional keys:
            max_iterations, verbose, store_trace, greeting_template.
        **_kwargs: Ignored (no KB or provider injection needed).

    Returns:
        Configured ReActReasoning instance.
    """
    return cls(
        max_iterations=config.get("max_iterations", 5),
        verbose=config.get("verbose", False),
        store_trace=config.get("store_trace", False),
        greeting_template=config.get("greeting_template"),
    )
generate async
generate(
    manager: Any, llm: Any, tools: list[Any] | None = None, **kwargs: Any
) -> Any

Generate response using ReAct loop.

The ReAct loop: 1. Generate response (may include tool calls) 2. If tool calls present, execute them 3. Add observations to conversation 4. Repeat until no more tool calls or max iterations

Parameters:

Name Type Description Default
manager Any

ConversationManager instance

required
llm Any

LLM provider instance

required
tools list[Any] | None

Optional list of available tools

None
**kwargs Any

Generation parameters

{}

Returns:

Type Description
Any

Final LLM response

Source code in packages/bots/src/dataknobs_bots/reasoning/react.py
async def generate(
    self,
    manager: Any,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> Any:
    """Generate response using ReAct loop.

    The ReAct loop:
    1. Generate response (may include tool calls)
    2. If tool calls present, execute them
    3. Add observations to conversation
    4. Repeat until no more tool calls or max iterations

    Args:
        manager: ConversationManager instance
        llm: LLM provider instance
        tools: Optional list of available tools
        **kwargs: Generation parameters

    Returns:
        Final LLM response
    """
    # Clear any stale tool executions from a previous call.
    # Each generate() call should start with a fresh list so
    # concurrent async calls on the same strategy instance don't
    # accumulate records from earlier calls.
    self._tool_executions.clear()

    if not tools:
        # No tools available, fall back to simple generation
        logger.info(
            "ReAct: No tools available, falling back to simple generation",
            extra={"conversation_id": manager.conversation_id},
        )
        return await manager.complete(**kwargs)

    # Initialize trace if enabled
    trace = [] if self.store_trace else None

    # Get log level based on verbose setting
    log_level = logging.DEBUG if self.verbose else logging.INFO

    logger.log(
        log_level,
        "ReAct: Starting reasoning loop",
        extra={
            "conversation_id": manager.conversation_id,
            "max_iterations": self.max_iterations,
            "tools_available": len(tools),
        },
    )

    # Track previous iteration's tool calls for duplicate detection
    prev_tool_calls: list[tuple[str, str]] | None = None

    # ReAct loop
    for iteration in range(self.max_iterations):
        iteration_trace = {
            "iteration": iteration + 1,
            "tool_calls": [],
        }

        logger.log(
            log_level,
            "ReAct: Starting iteration",
            extra={
                "conversation_id": manager.conversation_id,
                "iteration": iteration + 1,
                "max_iterations": self.max_iterations,
            },
        )

        # Generate response with tools
        try:
            response = await manager.complete(tools=tools, **kwargs)
        except ToolsNotSupportedError as e:
            logger.error(
                "ReAct: Model '%s' does not support tools — "
                "returning graceful response to user",
                e.model,
                extra={"conversation_id": manager.conversation_id},
            )
            return LLMResponse(
                content=(
                    "I'm configured to use tools for this task, but my "
                    "current language model doesn't support tool calling. "
                    "Please contact the administrator to update the model "
                    "configuration."
                ),
                model=e.model,
                finish_reason="error",
            )

        # Check if we have tool calls
        if not hasattr(response, "tool_calls") or not response.tool_calls:
            # No tool calls, we're done
            logger.log(
                log_level,
                "ReAct: No tool calls in response, finishing",
                extra={
                    "conversation_id": manager.conversation_id,
                    "iteration": iteration + 1,
                },
            )

            if trace is not None:
                iteration_trace["status"] = "completed"
                trace.append(iteration_trace)
                await self._store_trace(manager, trace)

            return response

        num_tool_calls = len(response.tool_calls)
        logger.log(
            log_level,
            "ReAct: Executing tool calls",
            extra={
                "conversation_id": manager.conversation_id,
                "iteration": iteration + 1,
                "num_tools": num_tool_calls,
                "tools": [tc.name for tc in response.tool_calls],
            },
        )

        # Duplicate detection: compare (name, sorted params JSON)
        # with previous iteration to avoid infinite loops
        current_calls = [
            (tc.name, json.dumps(tc.parameters, sort_keys=True))
            for tc in response.tool_calls
        ]

        if prev_tool_calls is not None and current_calls == prev_tool_calls:
            logger.warning(
                "ReAct: Duplicate tool calls detected, breaking loop",
                extra={
                    "conversation_id": manager.conversation_id,
                    "iteration": iteration + 1,
                    "duplicate_calls": [tc.name for tc in response.tool_calls],
                },
            )

            # Add explanatory message so the final LLM call doesn't
            # see dangling tool_calls with no corresponding observations.
            tool_names = [tc.name for tc in response.tool_calls]
            await manager.add_message(
                content=(
                    f"System notice: The tools {tool_names} were already "
                    "called with identical parameters in the previous step. "
                    "Their results are already in the conversation above. "
                    "Please use those results to respond to the user."
                ),
                role="system",
            )

            if trace is not None:
                iteration_trace["status"] = "duplicate_tool_calls_detected"
                trace.append(iteration_trace)
                await self._store_trace(manager, trace)

            break

        prev_tool_calls = current_calls

        # Build execution context for tools that need it
        tool_context = ToolExecutionContext.from_manager(manager)

        # Extend context with artifact/review infrastructure if available
        extra_context: dict[str, Any] = {}
        if self._artifact_registry is not None:
            extra_context["artifact_registry"] = self._artifact_registry
        if self._review_executor is not None:
            extra_context["review_executor"] = self._review_executor
        if self._context_builder is not None:
            try:
                conversation_context = await self._context_builder.build(manager)
                extra_context["conversation_context"] = conversation_context
            except Exception as e:
                logger.warning("Failed to build conversation context: %s", e)
        if self._extra_context:
            extra_context.update(self._extra_context)
        if extra_context:
            tool_context = tool_context.with_extra(**extra_context)

        # Execute all tool calls
        for tool_call in response.tool_calls:
            tool_trace = {
                "name": tool_call.name,
                "parameters": tool_call.parameters,
            }

            try:
                # Find the tool
                tool = self._find_tool(tool_call.name, tools)
                if not tool:
                    observation = f"Error: Tool '{tool_call.name}' not found"
                    tool_trace["status"] = "error"
                    tool_trace["error"] = "Tool not found"

                    logger.warning(
                        "ReAct: Tool not found",
                        extra={
                            "conversation_id": manager.conversation_id,
                            "iteration": iteration + 1,
                            "tool_name": tool_call.name,
                        },
                    )
                else:
                    # Execute the tool with context injection
                    # Context-aware tools will extract _context and use it
                    # Regular tools will ignore _context via **kwargs
                    t0 = time.monotonic()
                    result = await tool.execute(
                        **tool_call.parameters, _context=tool_context
                    )
                    duration_ms = (time.monotonic() - t0) * 1000
                    try:
                        observation = f"Tool result: {json.dumps(result, default=str)}"
                    except (TypeError, ValueError):
                        observation = f"Tool result: {result}"
                    tool_trace["status"] = "success"
                    tool_trace["result"] = str(result)

                    # Record for DynaBot on_tool_executed middleware hook
                    self._tool_executions.append(ToolExecution(
                        tool_name=tool_call.name,
                        parameters=tool_call.parameters,
                        result=result,
                        duration_ms=duration_ms,
                    ))

                    logger.log(
                        log_level,
                        "ReAct: Tool executed successfully",
                        extra={
                            "conversation_id": manager.conversation_id,
                            "iteration": iteration + 1,
                            "tool_name": tool_call.name,
                            "result_length": len(str(result)),
                        },
                    )

                # Add observation using role="tool" so providers can
                # pair it with the assistant's tool_calls in history.
                await manager.add_message(
                    content=f"Observation from {tool_call.name}: {observation}",
                    role="tool",
                    name=tool_call.name,
                    tool_call_id=tool_call.id,
                )

            except Exception as e:
                # Handle tool execution errors — use role="tool" so the
                # error is paired with the tool call in conversation.
                error_msg = f"Error executing tool {tool_call.name}: {e!s}"
                tool_trace["status"] = "error"
                tool_trace["error"] = str(e)

                # Record failed execution for middleware hook
                self._tool_executions.append(ToolExecution(
                    tool_name=tool_call.name,
                    parameters=tool_call.parameters,
                    error=str(e),
                ))

                logger.error(
                    "ReAct: Tool execution failed",
                    extra={
                        "conversation_id": manager.conversation_id,
                        "iteration": iteration + 1,
                        "tool_name": tool_call.name,
                        "error": str(e),
                    },
                    exc_info=True,
                )

                await manager.add_message(
                    content=error_msg,
                    role="tool",
                    name=tool_call.name,
                    tool_call_id=tool_call.id,
                )

            if trace is not None:
                iteration_trace["tool_calls"].append(tool_trace)

        if trace is not None:
            iteration_trace["status"] = "continued"
            trace.append(iteration_trace)

        # Refresh system prompt so the next iteration sees current
        # artifact/bank state (e.g. after load_from_catalog).
        if self._prompt_refresher is not None:
            kwargs["system_prompt_override"] = self._prompt_refresher()

    else:
        # for-else: only reached when the loop exhausts all iterations
        # without a break (i.e. not triggered by duplicate detection)
        logger.log(
            log_level,
            "ReAct: Max iterations reached, generating final response",
            extra={
                "conversation_id": manager.conversation_id,
                "iterations_used": self.max_iterations,
            },
        )

        if trace is not None:
            trace.append({"status": "max_iterations_reached"})
            await self._store_trace(manager, trace)

    # Refresh prompt for the final complete() call as well.
    if self._prompt_refresher is not None:
        kwargs["system_prompt_override"] = self._prompt_refresher()

    return await manager.complete(**kwargs)

ReasoningStrategy

ReasoningStrategy(*, greeting_template: str | None = None)

Bases: ABC

Abstract base class for reasoning strategies.

Reasoning strategies control how the bot processes information and generates responses. Different strategies can implement different levels of reasoning complexity.

All strategies support an optional greeting_template — a Jinja2 template string rendered with initial_context variables when greet() is called. Strategies that need richer greeting behavior (e.g. WizardReasoning with FSM-driven stage responses) override greet() entirely.

Parameters:

Name Type Description Default
greeting_template str | None

Optional Jinja2 template for bot-initiated greetings. Variables from initial_context are available as top-level template variables (e.g. {{ user_name }}).

None

Examples:

  • Simple: Direct LLM call
  • Chain-of-Thought: Break down reasoning into steps
  • ReAct: Reason and act in a loop with tools

Methods:

Name Description
capabilities

Declare what this strategy manages autonomously.

from_config

Create a strategy instance from a configuration dict.

get_source_configs

Extract source configuration dicts from a strategy config.

get_and_clear_tool_executions

Return tool executions recorded during the last generate() call.

greet

Generate an initial bot greeting before the user speaks.

add_source

Add a retrieval source to this strategy.

providers

Return LLM providers managed by this strategy, keyed by role.

set_provider

Replace a provider managed by this strategy.

close

Release resources held by this strategy.

generate

Generate response using this reasoning strategy.

stream_generate

Stream response using this reasoning strategy.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
def __init__(self, *, greeting_template: str | None = None) -> None:
    self._greeting_template = greeting_template
    self._tool_executions: list[ToolExecution] = []
Functions
capabilities classmethod
capabilities() -> StrategyCapabilities

Declare what this strategy manages autonomously.

The default returns no capabilities. Concrete strategies override to declare their actual capabilities.

Returns:

Type Description
StrategyCapabilities

Frozen dataclass describing strategy capabilities.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
@classmethod
def capabilities(cls) -> StrategyCapabilities:
    """Declare what this strategy manages autonomously.

    The default returns no capabilities.  Concrete strategies
    override to declare their actual capabilities.

    Returns:
        Frozen dataclass describing strategy capabilities.
    """
    return StrategyCapabilities()
from_config classmethod
from_config(config: dict[str, Any], **_kwargs: Any) -> Self

Create a strategy instance from a configuration dict.

The base implementation extracts greeting_template and passes it to the constructor. Concrete strategies with richer configuration override this classmethod.

Parameters:

Name Type Description Default
config dict[str, Any]

Strategy configuration dict.

required
**_kwargs Any

Additional context (e.g. knowledge_base).

{}

Returns:

Type Description
Self

Configured strategy instance.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
@classmethod
def from_config(cls, config: dict[str, Any], **_kwargs: Any) -> Self:
    """Create a strategy instance from a configuration dict.

    The base implementation extracts ``greeting_template`` and
    passes it to the constructor.  Concrete strategies with richer
    configuration override this classmethod.

    Args:
        config: Strategy configuration dict.
        **_kwargs: Additional context (e.g. ``knowledge_base``).

    Returns:
        Configured strategy instance.
    """
    return cls(greeting_template=config.get("greeting_template"))
get_source_configs classmethod
get_source_configs(config: dict[str, Any]) -> list[dict[str, Any]]

Extract source configuration dicts from a strategy config.

DynaBot calls this after creating the strategy to discover which sources to construct and wire in via :meth:add_source. The default looks for a top-level "sources" key, which is the convention used by GroundedReasoning.

Strategies with non-standard config layouts (e.g. HybridReasoning, where sources are nested under "grounded") override this to return the correct list.

Parameters:

Name Type Description Default
config dict[str, Any]

The full strategy configuration dict.

required

Returns:

Type Description
list[dict[str, Any]]

List of source configuration dicts (may be empty).

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
@classmethod
def get_source_configs(cls, config: dict[str, Any]) -> list[dict[str, Any]]:
    """Extract source configuration dicts from a strategy config.

    ``DynaBot`` calls this after creating the strategy to discover
    which sources to construct and wire in via :meth:`add_source`.
    The default looks for a top-level ``"sources"`` key, which is
    the convention used by ``GroundedReasoning``.

    Strategies with non-standard config layouts (e.g.
    ``HybridReasoning``, where sources are nested under
    ``"grounded"``) override this to return the correct list.

    Args:
        config: The full strategy configuration dict.

    Returns:
        List of source configuration dicts (may be empty).
    """
    return config.get("sources", [])
get_and_clear_tool_executions
get_and_clear_tool_executions() -> list[ToolExecution]

Return tool executions recorded during the last generate() call.

Strategies that execute tools (e.g. ReAct) append ToolExecution records to self._tool_executions during their generation loop. DynaBot calls this after generate() returns to collect the records and fire on_tool_executed middleware hooks.

Returns:

Type Description
list[ToolExecution]

List of tool execution records (cleared after retrieval).

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
def get_and_clear_tool_executions(self) -> list[ToolExecution]:
    """Return tool executions recorded during the last generate() call.

    Strategies that execute tools (e.g. ReAct) append
    ``ToolExecution`` records to ``self._tool_executions`` during
    their generation loop.  DynaBot calls this after
    ``generate()`` returns to collect the records and fire
    ``on_tool_executed`` middleware hooks.

    Returns:
        List of tool execution records (cleared after retrieval).
    """
    result = list(self._tool_executions)
    self._tool_executions.clear()
    return result
greet async
greet(
    manager: ReasoningManagerProtocol,
    llm: Any,
    *,
    initial_context: dict[str, Any] | None = None,
    **kwargs: Any,
) -> Any | None

Generate an initial bot greeting before the user speaks.

The default implementation renders greeting_template (if set) with initial_context variables using Jinja2 and returns the result as an LLMResponse. Returns None when no template is configured.

WizardReasoning fully overrides this with FSM-driven greeting generation from the wizard's start stage.

Parameters:

Name Type Description Default
manager ReasoningManagerProtocol

ConversationManager or compatible manager instance

required
llm Any

LLM provider instance

required
initial_context dict[str, Any] | None

Optional dict of data available as Jinja2 template variables (e.g. {"user_name": "Alice"} makes {{ user_name }} resolve to "Alice").

None
**kwargs Any

Additional generation parameters

{}

Returns:

Type Description
Any | None

LLMResponse if a greeting was generated, or None

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
async def greet(
    self,
    manager: ReasoningManagerProtocol,
    llm: Any,
    *,
    initial_context: dict[str, Any] | None = None,
    **kwargs: Any,
) -> Any | None:
    """Generate an initial bot greeting before the user speaks.

    The default implementation renders ``greeting_template`` (if set)
    with ``initial_context`` variables using Jinja2 and returns the
    result as an ``LLMResponse``.  Returns ``None`` when no template
    is configured.

    ``WizardReasoning`` fully overrides this with FSM-driven greeting
    generation from the wizard's start stage.

    Args:
        manager: ConversationManager or compatible manager instance
        llm: LLM provider instance
        initial_context: Optional dict of data available as Jinja2
            template variables (e.g. ``{"user_name": "Alice"}``
            makes ``{{ user_name }}`` resolve to ``"Alice"``).
        **kwargs: Additional generation parameters

    Returns:
        LLMResponse if a greeting was generated, or None
    """
    if self._greeting_template is None:
        return None
    context = initial_context or {}
    env = jinja2.Environment(undefined=jinja2.Undefined)
    text = env.from_string(self._greeting_template).render(**context)
    return LLMResponse(content=text, model="template", finish_reason="stop")
add_source
add_source(source: Any) -> None

Add a retrieval source to this strategy.

Strategies that declare manages_sources=True in their :meth:capabilities MUST override this method. DynaBot calls it during config-driven source construction.

The default raises NotImplementedError so that a 3rd-party strategy that forgets to implement it fails loudly.

Parameters:

Name Type Description Default
source Any

A GroundedSource instance (or compatible).

required

Raises:

Type Description
NotImplementedError

If not overridden by a subclass.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
def add_source(self, source: Any) -> None:
    """Add a retrieval source to this strategy.

    Strategies that declare ``manages_sources=True`` in their
    :meth:`capabilities` MUST override this method.  ``DynaBot``
    calls it during config-driven source construction.

    The default raises ``NotImplementedError`` so that a 3rd-party
    strategy that forgets to implement it fails loudly.

    Args:
        source: A ``GroundedSource`` instance (or compatible).

    Raises:
        NotImplementedError: If not overridden by a subclass.
    """
    raise NotImplementedError(
        f"{type(self).__name__} does not implement add_source(). "
        f"Strategies that declare manages_sources=True in "
        f"capabilities() must override this method."
    )
providers
providers() -> dict[str, Any]

Return LLM providers managed by this strategy, keyed by role.

Subsystems declare the providers they own so that the bot can register them in the provider catalog without reaching into private attributes. The default returns an empty dict (no providers).

Returns:

Type Description
dict[str, Any]

Dict mapping provider role names to provider instances.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
def providers(self) -> dict[str, Any]:
    """Return LLM providers managed by this strategy, keyed by role.

    Subsystems declare the providers they own so that the bot can
    register them in the provider catalog without reaching into
    private attributes.  The default returns an empty dict (no
    providers).

    Returns:
        Dict mapping provider role names to provider instances.
    """
    return {}
set_provider
set_provider(role: str, provider: Any) -> bool

Replace a provider managed by this strategy.

Called by inject_providers to wire a test provider into the actual subsystem, not just the registry catalog. The default returns False (role not recognized). Concrete subclasses override to accept their known roles.

Parameters:

Name Type Description Default
role str

Provider role name (e.g. PROVIDER_ROLE_EXTRACTION).

required
provider Any

Replacement provider instance.

required

Returns:

Type Description
bool

True if the role was recognized and the provider updated,

bool

False otherwise.

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
def set_provider(self, role: str, provider: Any) -> bool:
    """Replace a provider managed by this strategy.

    Called by ``inject_providers`` to wire a test provider into the
    actual subsystem, not just the registry catalog.  The default
    returns ``False`` (role not recognized).  Concrete subclasses
    override to accept their known roles.

    Args:
        role: Provider role name (e.g. ``PROVIDER_ROLE_EXTRACTION``).
        provider: Replacement provider instance.

    Returns:
        ``True`` if the role was recognized and the provider updated,
        ``False`` otherwise.
    """
    return False
close async
close() -> None

Release resources held by this strategy.

Default no-op. Subclasses that hold resources (LLM providers, database connections, asyncio tasks) should override to release them. Called by DynaBot.close().

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
async def close(self) -> None:  # noqa: B027
    """Release resources held by this strategy.

    Default no-op. Subclasses that hold resources (LLM providers,
    database connections, asyncio tasks) should override to release
    them. Called by ``DynaBot.close()``.
    """
generate abstractmethod async
generate(
    manager: ReasoningManagerProtocol,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> Any

Generate response using this reasoning strategy.

Pass tools through to manager.complete(tools=tools) so the LLM can see available tools. If the returned response contains tool_calls, DynaBot will execute them automatically in a post-strategy loop — strategies do not need to handle tool execution unless they want full control (like ReActReasoning).

Strategies that execute tools internally should record them via self._tool_executions.append(ToolExecution(...)) and consume tool_calls before returning, so the DynaBot loop is a no-op.

Parameters:

Name Type Description Default
manager ReasoningManagerProtocol

ConversationManager or compatible manager instance

required
llm Any

LLM provider instance

required
tools list[Any] | None

Optional list of available tools. Forward to manager.complete(tools=tools) for LLM visibility.

None
**kwargs Any

Additional generation parameters (temperature, max_tokens, etc.)

{}

Returns:

Type Description
Any

LLM response object

Example
response = await strategy.generate(
    manager=conversation_manager,
    llm=llm_provider,
    tools=[search_tool, calculator_tool],
    temperature=0.7,
    max_tokens=1000
)
Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
@abstractmethod
async def generate(
    self,
    manager: ReasoningManagerProtocol,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> Any:
    """Generate response using this reasoning strategy.

    Pass ``tools`` through to ``manager.complete(tools=tools)`` so
    the LLM can see available tools.  If the returned response
    contains ``tool_calls``, ``DynaBot`` will execute them
    automatically in a post-strategy loop — strategies do not need
    to handle tool execution unless they want full control (like
    ``ReActReasoning``).

    Strategies that execute tools internally should record them via
    ``self._tool_executions.append(ToolExecution(...))`` and
    consume ``tool_calls`` before returning, so the DynaBot loop
    is a no-op.

    Args:
        manager: ConversationManager or compatible manager instance
        llm: LLM provider instance
        tools: Optional list of available tools.  Forward to
            ``manager.complete(tools=tools)`` for LLM visibility.
        **kwargs: Additional generation parameters (temperature, max_tokens, etc.)

    Returns:
        LLM response object

    Example:
        ```python
        response = await strategy.generate(
            manager=conversation_manager,
            llm=llm_provider,
            tools=[search_tool, calculator_tool],
            temperature=0.7,
            max_tokens=1000
        )
        ```
    """
    pass
stream_generate async
stream_generate(
    manager: ReasoningManagerProtocol,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> AsyncIterator[Any]

Stream response using this reasoning strategy.

The default implementation wraps generate() and yields the complete response as a single item. Subclasses that support true token-level streaming (e.g. SimpleReasoning) should override this to yield incremental chunks.

Parameters:

Name Type Description Default
manager ReasoningManagerProtocol

ConversationManager or compatible manager instance

required
llm Any

LLM provider instance

required
tools list[Any] | None

Optional list of available tools

None
**kwargs Any

Additional generation parameters

{}

Yields:

Type Description
AsyncIterator[Any]

LLM response or stream chunk objects

Source code in packages/bots/src/dataknobs_bots/reasoning/base.py
async def stream_generate(
    self,
    manager: ReasoningManagerProtocol,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> AsyncIterator[Any]:
    """Stream response using this reasoning strategy.

    The default implementation wraps ``generate()`` and yields the
    complete response as a single item.  Subclasses that support true
    token-level streaming (e.g. ``SimpleReasoning``) should override
    this to yield incremental chunks.

    Args:
        manager: ConversationManager or compatible manager instance
        llm: LLM provider instance
        tools: Optional list of available tools
        **kwargs: Additional generation parameters

    Yields:
        LLM response or stream chunk objects
    """
    result = await self.generate(manager, llm, tools=tools, **kwargs)
    yield result

SimpleReasoning

SimpleReasoning(*, greeting_template: str | None = None)

Bases: ReasoningStrategy

Simple reasoning strategy that makes direct LLM calls.

This is the most straightforward strategy - it simply passes the conversation to the LLM and returns the response without any additional reasoning steps.

Use this when: - You want direct, fast responses - The task doesn't require complex reasoning - You're using a powerful model that doesn't need guidance

Example
strategy = SimpleReasoning()
response = await strategy.generate(
    manager=conversation_manager,
    llm=llm_provider,
    temperature=0.7
)

Methods:

Name Description
generate

Generate response with a simple LLM call.

stream_generate

Stream response with true token-level streaming.

Source code in packages/bots/src/dataknobs_bots/reasoning/simple.py
def __init__(self, *, greeting_template: str | None = None) -> None:
    super().__init__(greeting_template=greeting_template)
Functions
generate async
generate(
    manager: Any, llm: Any, tools: list[Any] | None = None, **kwargs: Any
) -> Any

Generate response with a simple LLM call.

Parameters:

Name Type Description Default
manager Any

ConversationManager instance

required
llm Any

LLM provider instance (not used directly)

required
tools list[Any] | None

Optional list of tools

None
**kwargs Any

Generation parameters

{}

Returns:

Type Description
Any

LLM response

Source code in packages/bots/src/dataknobs_bots/reasoning/simple.py
async def generate(
    self,
    manager: Any,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> Any:
    """Generate response with a simple LLM call.

    Args:
        manager: ConversationManager instance
        llm: LLM provider instance (not used directly)
        tools: Optional list of tools
        **kwargs: Generation parameters

    Returns:
        LLM response
    """
    # Use the conversation manager's generate method
    # which handles the LLM call with the conversation history
    return await manager.complete(tools=tools, **kwargs)
stream_generate async
stream_generate(
    manager: Any, llm: Any, tools: list[Any] | None = None, **kwargs: Any
) -> AsyncIterator[Any]

Stream response with true token-level streaming.

Delegates to manager.stream_complete() which yields LLMStreamResponse chunks as they arrive from the provider.

Parameters:

Name Type Description Default
manager Any

ConversationManager instance

required
llm Any

LLM provider instance (not used directly)

required
tools list[Any] | None

Optional list of tools

None
**kwargs Any

Generation parameters

{}

Yields:

Type Description
AsyncIterator[Any]

LLM stream response chunks

Source code in packages/bots/src/dataknobs_bots/reasoning/simple.py
async def stream_generate(
    self,
    manager: Any,
    llm: Any,
    tools: list[Any] | None = None,
    **kwargs: Any,
) -> AsyncIterator[Any]:
    """Stream response with true token-level streaming.

    Delegates to ``manager.stream_complete()`` which yields
    ``LLMStreamResponse`` chunks as they arrive from the provider.

    Args:
        manager: ConversationManager instance
        llm: LLM provider instance (not used directly)
        tools: Optional list of tools
        **kwargs: Generation parameters

    Yields:
        LLM stream response chunks
    """
    async for chunk in manager.stream_complete(tools=tools, **kwargs):
        yield chunk

StrategyCapabilities dataclass

StrategyCapabilities(manages_sources: bool = False)

Declares what a reasoning strategy manages autonomously.

Used by DynaBot and other consumers to decide which orchestration steps to perform (e.g. source construction, auto-context) without hard-coding strategy names.

All fields default to False; concrete strategies override only the capabilities they possess. New fields can be added with default=False without breaking existing strategies.

Attributes:

Name Type Description
manages_sources bool

Strategy manages its own retrieval sources (grounded/hybrid). When True, DynaBot performs config-driven source construction after factory creation and disables redundant auto_context.

StrategyRegistry

StrategyRegistry()

Registry mapping strategy names to their factories.

Unlike PluginRegistry (which caches singleton instances and has a (key, config) factory signature), StrategyRegistry creates a fresh instance per call — strategies are per-bot, not singletons.

When PluginRegistry gains a create() method for fresh-instance factory invocation (consumer-gaps plan Item 65), this class should be migrated to use it as its backing store.

Methods:

Name Description
register

Register a strategy factory under the given name.

create

Create a strategy instance from a config dict.

get_factory

Return the factory for a strategy name, or None.

is_registered

Check whether a strategy name is registered.

list_keys

Return sorted list of registered strategy names.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def __init__(self) -> None:
    self._factories: dict[str, StrategyFactory] = {}
    self._initialized = False
    self._lock = threading.RLock()
Functions
register
register(
    name: str, factory: StrategyFactory, *, override: bool = False
) -> None

Register a strategy factory under the given name.

Parameters:

Name Type Description Default
name str

Strategy name (used in config strategy field).

required
factory StrategyFactory

A ReasoningStrategy subclass or callable (config, **kwargs) -> ReasoningStrategy.

required
override bool

If True, silently replace an existing registration. Otherwise raise ValueError.

False

Raises:

Type Description
ValueError

If name is already registered and override is False.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def register(
    self,
    name: str,
    factory: StrategyFactory,
    *,
    override: bool = False,
) -> None:
    """Register a strategy factory under the given name.

    Args:
        name: Strategy name (used in config ``strategy`` field).
        factory: A ``ReasoningStrategy`` subclass or callable
            ``(config, **kwargs) -> ReasoningStrategy``.
        override: If ``True``, silently replace an existing
            registration.  Otherwise raise ``ValueError``.

    Raises:
        ValueError: If ``name`` is already registered and
            ``override`` is ``False``.
    """
    self._ensure_builtins()
    canonical = name.lower()
    with self._lock:
        if canonical in self._factories and not override:
            raise ValueError(
                f"Strategy '{canonical}' is already registered. "
                f"Use override=True to replace it."
            )
        self._factories[canonical] = factory
    logger.debug("Registered strategy '%s'", canonical)
create
create(config: dict[str, Any], **kwargs: Any) -> ReasoningStrategy

Create a strategy instance from a config dict.

Extracts config["strategy"] (default "simple"), looks up the factory, and calls it. For ReasoningStrategy subclasses the factory is cls.from_config(config, **kwargs). For plain callables the factory is called as factory(config, **kwargs).

Parameters:

Name Type Description Default
config dict[str, Any]

Strategy configuration dict (must contain strategy key).

required
**kwargs Any

Forwarded to the factory (e.g. knowledge_base).

{}

Returns:

Type Description
ReasoningStrategy

Configured strategy instance.

Raises:

Type Description
ValueError

If the strategy name is not registered.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def create(
    self,
    config: dict[str, Any],
    **kwargs: Any,
) -> ReasoningStrategy:
    """Create a strategy instance from a config dict.

    Extracts ``config["strategy"]`` (default ``"simple"``), looks up
    the factory, and calls it.  For ``ReasoningStrategy`` subclasses
    the factory is ``cls.from_config(config, **kwargs)``.  For plain
    callables the factory is called as ``factory(config, **kwargs)``.

    Args:
        config: Strategy configuration dict (must contain
            ``strategy`` key).
        **kwargs: Forwarded to the factory (e.g. ``knowledge_base``).

    Returns:
        Configured strategy instance.

    Raises:
        ValueError: If the strategy name is not registered.
    """
    self._ensure_builtins()
    name = config.get("strategy", "simple").lower()
    factory = self._factories.get(name)
    if factory is None:
        available = ", ".join(sorted(self._factories))
        raise ValueError(
            f"Unknown reasoning strategy: '{name}'. "
            f"Available strategies: {available}"
        )

    if isinstance(factory, type) and issubclass(factory, ReasoningStrategy):
        return factory.from_config(config, **kwargs)
    return factory(config, **kwargs)
get_factory
get_factory(name: str) -> StrategyFactory | None

Return the factory for a strategy name, or None.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def get_factory(self, name: str) -> StrategyFactory | None:
    """Return the factory for a strategy name, or ``None``."""
    self._ensure_builtins()
    return self._factories.get(name.lower())
is_registered
is_registered(name: str) -> bool

Check whether a strategy name is registered.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def is_registered(self, name: str) -> bool:
    """Check whether a strategy name is registered."""
    self._ensure_builtins()
    return name.lower() in self._factories
list_keys
list_keys() -> list[str]

Return sorted list of registered strategy names.

Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def list_keys(self) -> list[str]:
    """Return sorted list of registered strategy names."""
    self._ensure_builtins()
    return sorted(self._factories)

BotTestHarness

BotTestHarness(
    bot: Any,
    provider: EchoProvider,
    extractor: ConfigurableExtractor | None,
    context: Any,
)

High-level test helper for ALL DynaBot behavioral tests.

Wraps the full setup ceremony (bot creation, provider injection, tool registration, middleware wiring, context management) into one object. Use create() to build, chat()/greet() to run turns.

For non-wizard tests, use bot_config= with any DynaBot config:

Example
async with await BotTestHarness.create(
    bot_config={
        "llm": {"provider": "echo", "model": "test"},
        "conversation_storage": {"backend": "memory"},
        "reasoning": {"strategy": "simple"},
    },
    main_responses=[
        tool_call_response("my_tool", {"q": "test"}),
        text_response("Here are the results"),
    ],
    tools=[my_tool],
    middleware=[my_middleware],
) as harness:
    result = await harness.chat("search")
    assert result.response == "Here are the results"
    # Streaming: harness.bot.stream_chat("msg", harness.context)

For wizard tests, use wizard_config= with WizardConfigBuilder:

Example
async with await BotTestHarness.create(
    wizard_config=config,
    main_responses=["Got it!", "All set!"],
    extraction_results=[
        [{"name": "Alice"}],
        [{"domain_id": "chess"}, {"name": "Alice", "domain_id": "chess"}],
    ],
) as harness:
    result = await harness.chat("My name is Alice")
    assert harness.wizard_data["name"] == "Alice"
    assert harness.wizard_stage == "gather"

Methods:

Name Description
create

Create a harness with a fully wired DynaBot.

chat

Run a chat turn and capture wizard state.

greet

Run a greet turn and capture wizard state.

close

Close the bot and release resources.

Attributes:

Name Type Description
wizard_stage str | None

Current wizard stage from the last turn.

wizard_data dict[str, Any]

Wizard state data dict from the last turn.

wizard_state dict[str, Any] | None

Full wizard state from the last turn.

last_response str

Response text from the last turn.

turn_count int

Number of turns executed.

bot Any

The underlying DynaBot instance.

context Any

The BotContext used for this harness's turns.

provider EchoProvider

The main EchoProvider (for call history assertions).

extractor ConfigurableExtractor | None

The ConfigurableExtractor (for call verification).

Source code in packages/bots/src/dataknobs_bots/testing.py
def __init__(
    self,
    bot: Any,
    provider: EchoProvider,
    extractor: ConfigurableExtractor | None,
    context: Any,
) -> None:
    self._bot = bot
    self._provider = provider
    self._extractor = extractor
    self._context = context
    self._turn_count = 0
    self._last_result: TurnResult | None = None
Attributes
wizard_stage property
wizard_stage: str | None

Current wizard stage from the last turn.

wizard_data property
wizard_data: dict[str, Any]

Wizard state data dict from the last turn.

wizard_state property
wizard_state: dict[str, Any] | None

Full wizard state from the last turn.

last_response property
last_response: str

Response text from the last turn.

turn_count property
turn_count: int

Number of turns executed.

bot property
bot: Any

The underlying DynaBot instance.

context property
context: Any

The BotContext used for this harness's turns.

provider property
provider: EchoProvider

The main EchoProvider (for call history assertions).

extractor property
extractor: ConfigurableExtractor | None

The ConfigurableExtractor (for call verification).

Functions
create async classmethod
create(
    *,
    wizard_config: dict[str, Any] | None = None,
    bot_config: dict[str, Any] | None = None,
    main_responses: list[Any] | None = None,
    extraction_results: list[list[dict[str, Any]]] | None = None,
    system_prompt: str = "You are a helpful assistant.",
    conversation_id: str = "test-conv",
    client_id: str = "test",
    extraction_scope: str = "current_message",
    tools: list[Any] | None = None,
    middleware: list[Any] | None = None,
    strict_tools: bool = True,
    strict: bool = False,
) -> BotTestHarness

Create a harness with a fully wired DynaBot.

Provide either wizard_config (auto-wires bot config) or bot_config (full control).

Parameters:

Name Type Description Default
wizard_config dict[str, Any] | None

Wizard config dict (e.g. from WizardConfigBuilder.build()). Auto-wires EchoProvider, ConfigurableExtractor, and memory storage.

None
bot_config dict[str, Any] | None

Complete bot config dict for DynaBot.from_config(). When provided, wizard_config is ignored.

None
main_responses list[Any] | None

Responses to queue on the main EchoProvider. Accepts strings or LLMResponse objects (e.g. from text_response() / tool_call_response()).

None
extraction_results list[list[dict[str, Any]]] | None

Per-turn extraction results. Each inner list contains dicts for one turn's extraction calls. Flattened into a ConfigurableExtractor sequence internally.

None
system_prompt str

System prompt text.

'You are a helpful assistant.'
conversation_id str

Conversation ID for the test context.

'test-conv'
client_id str

Client ID for the test context.

'test'
extraction_scope str

Default extraction scope for the wizard. Only applies when wizard_config is used; ignored when bot_config is provided directly.

'current_message'
tools list[Any] | None

Optional list of Tool instances to register on the bot. Useful for ReAct strategy tests that need tool execution.

None
middleware list[Any] | None

Optional list of Middleware instances to append to the bot. Useful for testing middleware hooks like after_turn and on_tool_executed.

None
strict_tools bool

If True (default), the EchoProvider raises ValueError when a scripted response contains tool_calls but no tools were provided to complete(). Set to False for tests that intentionally exercise unexpected tool_calls with no registered tools.

True
strict bool

If True, the EchoProvider raises ResponseQueueExhaustedError when all scripted responses have been consumed, instead of falling back to echo behavior. Catches under-scripted tests.

False

Returns:

Type Description
BotTestHarness

Configured BotTestHarness instance.

Raises:

Type Description
ValueError

If neither wizard_config nor bot_config is provided.

Source code in packages/bots/src/dataknobs_bots/testing.py
@classmethod
async def create(
    cls,
    *,
    wizard_config: dict[str, Any] | None = None,
    bot_config: dict[str, Any] | None = None,
    main_responses: list[Any] | None = None,
    extraction_results: list[list[dict[str, Any]]] | None = None,
    system_prompt: str = "You are a helpful assistant.",
    conversation_id: str = "test-conv",
    client_id: str = "test",
    extraction_scope: str = "current_message",
    tools: list[Any] | None = None,
    middleware: list[Any] | None = None,
    strict_tools: bool = True,
    strict: bool = False,
) -> BotTestHarness:
    """Create a harness with a fully wired DynaBot.

    Provide either ``wizard_config`` (auto-wires bot config) or
    ``bot_config`` (full control).

    Args:
        wizard_config: Wizard config dict (e.g. from
            ``WizardConfigBuilder.build()``). Auto-wires EchoProvider,
            ConfigurableExtractor, and memory storage.
        bot_config: Complete bot config dict for ``DynaBot.from_config()``.
            When provided, ``wizard_config`` is ignored.
        main_responses: Responses to queue on the main EchoProvider.
            Accepts strings or ``LLMResponse`` objects (e.g. from
            ``text_response()`` / ``tool_call_response()``).
        extraction_results: Per-turn extraction results. Each inner list
            contains dicts for one turn's extraction calls. Flattened
            into a ``ConfigurableExtractor`` sequence internally.
        system_prompt: System prompt text.
        conversation_id: Conversation ID for the test context.
        client_id: Client ID for the test context.
        extraction_scope: Default extraction scope for the wizard.
            Only applies when ``wizard_config`` is used; ignored
            when ``bot_config`` is provided directly.
        tools: Optional list of ``Tool`` instances to register on the
            bot. Useful for ReAct strategy tests that need tool
            execution.
        middleware: Optional list of ``Middleware`` instances to append
            to the bot. Useful for testing middleware hooks like
            ``after_turn`` and ``on_tool_executed``.
        strict_tools: If True (default), the EchoProvider raises
            ValueError when a scripted response contains tool_calls
            but no tools were provided to complete(). Set to False
            for tests that intentionally exercise unexpected
            tool_calls with no registered tools.
        strict: If True, the EchoProvider raises
            ``ResponseQueueExhaustedError`` when all scripted
            responses have been consumed, instead of falling back
            to echo behavior.  Catches under-scripted tests.

    Returns:
        Configured ``BotTestHarness`` instance.

    Raises:
        ValueError: If neither ``wizard_config`` nor ``bot_config``
            is provided.
    """
    from .bot.base import DynaBot
    from .bot.context import BotContext

    if bot_config is None and wizard_config is None:
        raise ValueError(
            "Either wizard_config or bot_config must be provided"
        )

    # Build extraction results
    extractor: ConfigurableExtractor | None = None
    if extraction_results is not None:
        flat_results = [
            SimpleExtractionResult(data=data, confidence=0.9)
            for turn_results in extraction_results
            for data in turn_results
        ]
        extractor = ConfigurableExtractor(results=flat_results)

    # Build bot config if not provided
    if bot_config is None:
        assert wizard_config is not None
        wizard_cfg = copy.deepcopy(wizard_config)

        existing_settings = wizard_cfg.get("settings", {})
        if "extraction_scope" not in existing_settings:
            wizard_cfg["settings"] = {
                "extraction_scope": extraction_scope,
                **existing_settings,
            }

        # When scripted extraction results are provided, force LLM
        # extraction on stages that would otherwise use verbatim
        # capture (single required string field).  Without this,
        # the ConfigurableExtractor is silently bypassed and tests
        # get the raw user message instead of scripted results.
        #
        # This applies to ALL schema stages uniformly.  In multi-stage
        # wizards where a specific stage should still use verbatim
        # capture, set ``capture_mode="verbatim"`` explicitly on that
        # stage — the guard below respects explicit overrides at both
        # the top-level and collection_config levels.
        if extraction_results is not None:
            for stage_def in wizard_cfg.get("stages", []):
                if (
                    stage_def.get("schema")
                    and stage_def.get("capture_mode") in (None, "auto")
                ):
                    col = stage_def.get("collection_config") or {}
                    if col.get("capture_mode") in (None, "auto"):
                        stage_def["capture_mode"] = "extract"

        bot_config = {
            "llm": {"provider": "echo", "model": "echo-test"},
            "conversation_storage": {"backend": "memory"},
            "prompts": {
                "assistant": system_prompt,
            },
            "system_prompt": "assistant",
            "reasoning": {
                "strategy": "wizard",
                "wizard_config": wizard_cfg,
                "extraction_config": {
                    "provider": "echo",
                    "model": "echo-extraction",
                },
            },
        }

    # Create bot
    bot = await DynaBot.from_config(bot_config)

    # Close the original provider created by from_config() — we replace
    # it with a fresh EchoProvider that has a clean response queue.
    original_provider = bot.llm
    if hasattr(original_provider, "close"):
        await original_provider.close()

    # Create a fresh provider with known state
    provider = EchoProvider(
        {"provider": "echo", "model": "echo-test"},
        strict_tools=strict_tools,
        strict=strict,
    )
    if main_responses:
        provider.set_responses(main_responses)

    # Inject fresh provider and extractor
    inject_providers(bot, main_provider=provider, extractor=extractor)

    # Register tools if provided
    if tools:
        for tool in tools:
            bot.tool_registry.register_tool(tool)

    # Append middleware if provided
    if middleware:
        for mw in middleware:
            bot.middleware.append(mw)

    context = BotContext(
        conversation_id=conversation_id,
        client_id=client_id,
    )

    return cls(
        bot=bot,
        provider=provider,
        extractor=extractor,
        context=context,
    )
chat async
chat(message: str, **kwargs: Any) -> TurnResult

Run a chat turn and capture wizard state.

Parameters:

Name Type Description Default
message str

User message.

required
**kwargs Any

Additional kwargs passed to bot.chat().

{}

Returns:

Type Description
TurnResult

TurnResult with response and wizard state snapshot.

Source code in packages/bots/src/dataknobs_bots/testing.py
async def chat(self, message: str, **kwargs: Any) -> TurnResult:
    """Run a chat turn and capture wizard state.

    Args:
        message: User message.
        **kwargs: Additional kwargs passed to ``bot.chat()``.

    Returns:
        ``TurnResult`` with response and wizard state snapshot.
    """
    response = await self._bot.chat(message, self._context, **kwargs)
    self._turn_count += 1

    state = await self._bot.get_wizard_state(
        self._context.conversation_id,
    )
    result = TurnResult(
        response=response or "",
        wizard_stage=state["current_stage"] if state else None,
        wizard_data=state.get("data", {}) if state else {},
        wizard_state=state,
        turn_index=self._turn_count,
    )
    self._last_result = result
    return result
greet async
greet(**kwargs: Any) -> TurnResult

Run a greet turn and capture wizard state.

Parameters:

Name Type Description Default
**kwargs Any

Additional kwargs passed to bot.greet().

{}

Returns:

Type Description
TurnResult

TurnResult with response and wizard state snapshot.

Source code in packages/bots/src/dataknobs_bots/testing.py
async def greet(self, **kwargs: Any) -> TurnResult:
    """Run a greet turn and capture wizard state.

    Args:
        **kwargs: Additional kwargs passed to ``bot.greet()``.

    Returns:
        ``TurnResult`` with response and wizard state snapshot.
    """
    response = await self._bot.greet(self._context, **kwargs)
    self._turn_count += 1

    state = await self._bot.get_wizard_state(
        self._context.conversation_id,
    )
    result = TurnResult(
        response=response or "",
        wizard_stage=state["current_stage"] if state else None,
        wizard_data=state.get("data", {}) if state else {},
        wizard_state=state,
        turn_index=self._turn_count,
    )
    self._last_result = result
    return result
close async
close() -> None

Close the bot and release resources.

Source code in packages/bots/src/dataknobs_bots/testing.py
async def close(self) -> None:
    """Close the bot and release resources."""
    await self._bot.close()

CaptureReplay

CaptureReplay(data: dict[str, Any])

Loads a capture JSON file and creates pre-loaded EchoProviders.

Capture files contain serialized LLM request/response pairs from real provider runs, organized by turn. CaptureReplay deserializes these and creates EchoProviders queued with the correct responses, enabling deterministic replay of captured conversations.

Attributes:

Name Type Description
metadata dict[str, Any]

Capture session metadata (description, model info, timestamps)

turns list[dict[str, Any]]

List of turn dicts with wizard state, user messages, bot responses

format_version str

Capture file format version

Example
replay = CaptureReplay.from_file("captures/quiz_basic.json")

# Get providers for replay
main = replay.main_provider()
extraction = replay.extraction_provider()

# Or inject directly into a bot
replay.inject_into_bot(bot)

Methods:

Name Description
from_file

Load a capture replay from a JSON file.

from_dict

Create a CaptureReplay from a dict (e.g., already-parsed JSON).

main_provider

Create an EchoProvider queued with main-role responses.

extraction_provider

Create an EchoProvider queued with extraction-role responses.

inject_into_bot

Replace providers on a DynaBot with capture-replay EchoProviders.

Source code in packages/bots/src/dataknobs_bots/testing.py
def __init__(
    self,
    data: dict[str, Any],
) -> None:
    self.format_version: str = data.get("format_version", "1.0")
    self.metadata: dict[str, Any] = data.get("metadata", {})
    self.turns: list[dict[str, Any]] = data.get("turns", [])
    self._data = data

    # Pre-separate LLM calls by role for provider creation
    self._main_responses: list[LLMResponse] = []
    self._extraction_responses: list[LLMResponse] = []
    self._parse_calls()
Functions
from_file classmethod
from_file(path: str | Path) -> CaptureReplay

Load a capture replay from a JSON file.

Parameters:

Name Type Description Default
path str | Path

Path to the capture JSON file

required

Returns:

Type Description
CaptureReplay

CaptureReplay instance

Raises:

Type Description
FileNotFoundError

If the file does not exist

JSONDecodeError

If the file is not valid JSON

Source code in packages/bots/src/dataknobs_bots/testing.py
@classmethod
def from_file(cls, path: str | Path) -> CaptureReplay:
    """Load a capture replay from a JSON file.

    Args:
        path: Path to the capture JSON file

    Returns:
        CaptureReplay instance

    Raises:
        FileNotFoundError: If the file does not exist
        json.JSONDecodeError: If the file is not valid JSON
    """
    with open(path) as f:
        data = json.load(f)
    return cls(data)
from_dict classmethod
from_dict(data: dict[str, Any]) -> CaptureReplay

Create a CaptureReplay from a dict (e.g., already-parsed JSON).

Parameters:

Name Type Description Default
data dict[str, Any]

Capture data dict

required

Returns:

Type Description
CaptureReplay

CaptureReplay instance

Source code in packages/bots/src/dataknobs_bots/testing.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> CaptureReplay:
    """Create a CaptureReplay from a dict (e.g., already-parsed JSON).

    Args:
        data: Capture data dict

    Returns:
        CaptureReplay instance
    """
    return cls(data)
main_provider
main_provider() -> EchoProvider

Create an EchoProvider queued with main-role responses.

Returns:

Type Description
EchoProvider

EchoProvider with responses in capture order

Source code in packages/bots/src/dataknobs_bots/testing.py
def main_provider(self) -> EchoProvider:
    """Create an EchoProvider queued with main-role responses.

    Returns:
        EchoProvider with responses in capture order
    """
    provider = EchoProvider({"provider": "echo", "model": "capture-replay"})
    if self._main_responses:
        provider.set_responses(self._main_responses)
    return provider
extraction_provider
extraction_provider() -> EchoProvider

Create an EchoProvider queued with extraction-role responses.

Returns:

Type Description
EchoProvider

EchoProvider with responses in capture order

Source code in packages/bots/src/dataknobs_bots/testing.py
def extraction_provider(self) -> EchoProvider:
    """Create an EchoProvider queued with extraction-role responses.

    Returns:
        EchoProvider with responses in capture order
    """
    provider = EchoProvider({"provider": "echo", "model": "capture-replay"})
    if self._extraction_responses:
        provider.set_responses(self._extraction_responses)
    return provider
inject_into_bot
inject_into_bot(bot: Any) -> None

Replace providers on a DynaBot with capture-replay EchoProviders.

Creates main and extraction EchoProviders from the captured data and injects them into the bot using inject_providers.

Parameters:

Name Type Description Default
bot Any

A DynaBot instance

required
Source code in packages/bots/src/dataknobs_bots/testing.py
def inject_into_bot(self, bot: Any) -> None:
    """Replace providers on a DynaBot with capture-replay EchoProviders.

    Creates main and extraction EchoProviders from the captured data
    and injects them into the bot using ``inject_providers``.

    Args:
        bot: A DynaBot instance
    """
    inject_providers(
        bot,
        main_provider=self.main_provider(),
        extraction_provider=self.extraction_provider() if self._extraction_responses else None,
    )

TurnResult dataclass

TurnResult(
    response: str,
    wizard_stage: str | None = None,
    wizard_data: dict[str, Any] = dict(),
    wizard_state: dict[str, Any] | None = None,
    turn_index: int = 0,
)

Result of a single bot.chat() or bot.greet() turn.

Captures the bot response along with a snapshot of wizard state at the end of the turn.

Attributes:

Name Type Description
response str

Bot response text.

wizard_stage str | None

Current wizard stage after this turn, or None if no wizard.

wizard_data dict[str, Any]

Wizard state data dict after this turn.

wizard_state dict[str, Any] | None

Full normalized wizard state after this turn, or None.

turn_index int

One-based turn index (1 = first turn).

Attributes
response instance-attribute
response: str

Bot response text.

wizard_stage class-attribute instance-attribute
wizard_stage: str | None = None

Current wizard stage after this turn, or None if no wizard.

wizard_data class-attribute instance-attribute
wizard_data: dict[str, Any] = field(default_factory=dict)

Wizard state data dict after this turn.

wizard_state class-attribute instance-attribute
wizard_state: dict[str, Any] | None = None

Full normalized wizard state after this turn, or None.

turn_index class-attribute instance-attribute
turn_index: int = 0

One-based turn index (1 = first turn).

WizardConfigBuilder

WizardConfigBuilder(name: str, version: str = '1.0')

Fluent builder for wizard configuration dicts.

Replaces verbose inline dict construction (40+ lines) with a readable chained API. Performs build-time validation to catch common mistakes.

Example
config = (WizardConfigBuilder("my-wizard")
    .stage("gather", is_start=True, prompt="Tell me your name.")
        .field("name", field_type="string", required=True)
        .field("domain", field_type="string")
        .transition("done", "data.get('name') and data.get('domain')")
    .stage("done", is_end=True, prompt="All done!")
    .settings(extraction_scope="current_message")
    .build())

Methods:

Name Description
stage

Add a stage to the wizard config.

field

Add a field to the current stage's schema.

transition

Add a transition from the current stage.

settings

Set wizard-level settings.

build

Build and validate the wizard config dict.

Source code in packages/bots/src/dataknobs_bots/testing.py
def __init__(self, name: str, version: str = "1.0") -> None:
    self._name = name
    self._version = version
    self._stages: list[dict[str, Any]] = []
    self._settings: dict[str, Any] = {}
    self._current_stage: dict[str, Any] | None = None
Functions
stage
stage(
    name: str,
    *,
    is_start: bool = False,
    is_end: bool = False,
    prompt: str = "",
    response_template: str | None = None,
    mode: str | None = None,
    extraction_scope: str | None = None,
    auto_advance: bool | None = None,
    skip_extraction: bool | None = None,
    derivation_enabled: bool | None = None,
    recovery_enabled: bool | None = None,
    confirm_first_render: bool | None = None,
    confirm_on_new_data: bool | None = None,
    can_skip: bool | None = None,
    skip_default: bool | None = None,
    can_go_back: bool | None = None,
    reasoning: str | None = None,
    max_iterations: int | None = None,
    capture_mode: str | None = None,
    routing_transforms: list[str] | None = None,
    **extra_fields: Any,
) -> WizardConfigBuilder

Add a stage to the wizard config.

After calling stage(), subsequent field() and transition() calls apply to this stage.

Parameters:

Name Type Description Default
name str

Stage name (unique identifier).

required
is_start bool

Whether this is the start stage.

False
is_end bool

Whether this is an end stage.

False
prompt str

Stage prompt text.

''
response_template str | None

Jinja2 template rendered after extraction to confirm captured data.

None
mode str | None

Stage mode (e.g. "conversation").

None
extraction_scope str | None

Per-stage extraction scope override.

None
auto_advance bool | None

Per-stage auto-advance override.

None
skip_extraction bool | None

Whether to skip extraction on this stage.

None
derivation_enabled bool | None

Per-stage field derivation override. Set to False to suppress derivation on this stage.

None
recovery_enabled bool | None

Per-stage recovery pipeline override. Set to False to suppress all recovery on this stage.

None
confirm_first_render bool | None

Whether to pause for confirmation on first render when new data is extracted. Default True. Set to False to skip confirmation and evaluate transitions immediately.

None
confirm_on_new_data bool | None

Whether to re-confirm when schema property values change on subsequent renders.

None
can_skip bool | None

Whether the user can skip this stage.

None
skip_default bool | None

Default value to use when the stage is skipped.

None
can_go_back bool | None

Whether the user can navigate back from this stage.

None
reasoning str | None

Reasoning strategy for this stage (e.g. "react").

None
max_iterations int | None

Maximum ReAct iterations for this stage.

None
capture_mode str | None

Extraction capture mode — "auto" (default), "verbatim" (raw input), or "extract" (force LLM extraction).

None
routing_transforms list[str] | None

List of transform function names to execute before transition condition evaluation.

None
**extra_fields Any

Additional stage config fields passed through to the stage dict verbatim. Use for less common fields (e.g. llm_assist=True, navigation={...}) without needing explicit builder parameters.

{}

Returns:

Type Description
WizardConfigBuilder

Self for method chaining.

Source code in packages/bots/src/dataknobs_bots/testing.py
def stage(
    self,
    name: str,
    *,
    is_start: bool = False,
    is_end: bool = False,
    prompt: str = "",
    response_template: str | None = None,
    mode: str | None = None,
    extraction_scope: str | None = None,
    auto_advance: bool | None = None,
    skip_extraction: bool | None = None,
    derivation_enabled: bool | None = None,
    recovery_enabled: bool | None = None,
    confirm_first_render: bool | None = None,
    confirm_on_new_data: bool | None = None,
    can_skip: bool | None = None,
    skip_default: bool | None = None,
    can_go_back: bool | None = None,
    reasoning: str | None = None,
    max_iterations: int | None = None,
    capture_mode: str | None = None,
    routing_transforms: list[str] | None = None,
    **extra_fields: Any,
) -> WizardConfigBuilder:
    """Add a stage to the wizard config.

    After calling ``stage()``, subsequent ``field()`` and
    ``transition()`` calls apply to this stage.

    Args:
        name: Stage name (unique identifier).
        is_start: Whether this is the start stage.
        is_end: Whether this is an end stage.
        prompt: Stage prompt text.
        response_template: Jinja2 template rendered after extraction
            to confirm captured data.
        mode: Stage mode (e.g. ``"conversation"``).
        extraction_scope: Per-stage extraction scope override.
        auto_advance: Per-stage auto-advance override.
        skip_extraction: Whether to skip extraction on this stage.
        derivation_enabled: Per-stage field derivation override.
            Set to ``False`` to suppress derivation on this stage.
        recovery_enabled: Per-stage recovery pipeline override.
            Set to ``False`` to suppress all recovery on this stage.
        confirm_first_render: Whether to pause for confirmation on
            first render when new data is extracted. Default ``True``.
            Set to ``False`` to skip confirmation and evaluate
            transitions immediately.
        confirm_on_new_data: Whether to re-confirm when schema
            property values change on subsequent renders.
        can_skip: Whether the user can skip this stage.
        skip_default: Default value to use when the stage is skipped.
        can_go_back: Whether the user can navigate back from this
            stage.
        reasoning: Reasoning strategy for this stage
            (e.g. ``"react"``).
        max_iterations: Maximum ReAct iterations for this stage.
        capture_mode: Extraction capture mode — ``"auto"``
            (default), ``"verbatim"`` (raw input), or ``"extract"``
            (force LLM extraction).
        routing_transforms: List of transform function names to
            execute before transition condition evaluation.
        **extra_fields: Additional stage config fields passed through
            to the stage dict verbatim. Use for less common fields
            (e.g. ``llm_assist=True``, ``navigation={...}``) without
            needing explicit builder parameters.

    Returns:
        Self for method chaining.
    """
    # Finalize previous stage
    self._finalize_current_stage()

    stage: dict[str, Any] = {"name": name, "prompt": prompt}
    if is_start:
        stage["is_start"] = True
    if is_end:
        stage["is_end"] = True
    if response_template is not None:
        stage["response_template"] = response_template
    if mode is not None:
        stage["mode"] = mode
    if extraction_scope is not None:
        stage["extraction_scope"] = extraction_scope
    if auto_advance is not None:
        stage["auto_advance"] = auto_advance
    if skip_extraction is not None:
        stage["skip_extraction"] = skip_extraction
    if derivation_enabled is not None:
        stage["derivation_enabled"] = derivation_enabled
    if recovery_enabled is not None:
        stage["recovery_enabled"] = recovery_enabled
    if confirm_first_render is not None:
        stage["confirm_first_render"] = confirm_first_render
    if confirm_on_new_data is not None:
        stage["confirm_on_new_data"] = confirm_on_new_data
    if can_skip is not None:
        stage["can_skip"] = can_skip
    if skip_default is not None:
        stage["skip_default"] = skip_default
    if can_go_back is not None:
        stage["can_go_back"] = can_go_back
    if reasoning is not None:
        stage["reasoning"] = reasoning
    if max_iterations is not None:
        stage["max_iterations"] = max_iterations
    if capture_mode is not None:
        stage["capture_mode"] = capture_mode
    if routing_transforms is not None:
        stage["routing_transforms"] = routing_transforms
    if extra_fields:
        # Prevent accidental override of structural keys set by
        # positional/explicit parameters above.
        reserved = {"name", "prompt", "is_start", "is_end"}
        safe_fields = {
            k: v for k, v in extra_fields.items()
            if k not in reserved
        }
        stage.update(safe_fields)

    self._current_stage = stage
    return self
field
field(
    name: str,
    *,
    field_type: str = "string",
    required: bool = False,
    description: str | None = None,
    enum: list[str] | None = None,
    default: Any = None,
    x_extraction: dict[str, Any] | None = None,
) -> WizardConfigBuilder

Add a field to the current stage's schema.

Must be called after stage().

Parameters:

Name Type Description Default
name str

Field name.

required
field_type str

JSON Schema type ("string", "integer", etc.).

'string'
required bool

Whether this field is required.

False
description str | None

Field description.

None
enum list[str] | None

Allowed values.

None
default Any

Default value.

None
x_extraction dict[str, Any] | None

Extraction hints (x-extraction schema extension).

None

Returns:

Type Description
WizardConfigBuilder

Self for method chaining.

Raises:

Type Description
ValueError

If no current stage is set.

Source code in packages/bots/src/dataknobs_bots/testing.py
def field(
    self,
    name: str,
    *,
    field_type: str = "string",
    required: bool = False,
    description: str | None = None,
    enum: list[str] | None = None,
    default: Any = None,
    x_extraction: dict[str, Any] | None = None,
) -> WizardConfigBuilder:
    """Add a field to the current stage's schema.

    Must be called after ``stage()``.

    Args:
        name: Field name.
        field_type: JSON Schema type (``"string"``, ``"integer"``, etc.).
        required: Whether this field is required.
        description: Field description.
        enum: Allowed values.
        default: Default value.
        x_extraction: Extraction hints (``x-extraction`` schema extension).

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If no current stage is set.
    """
    if self._current_stage is None:
        raise ValueError("field() must be called after stage()")

    schema = self._current_stage.setdefault("schema", {
        "type": "object",
        "properties": {},
        "required": [],
    })
    props = schema.setdefault("properties", {})

    field_def: dict[str, Any] = {"type": field_type}
    if description is not None:
        field_def["description"] = description
    if enum is not None:
        field_def["enum"] = enum
    if default is not None:
        field_def["default"] = default
    if x_extraction is not None:
        field_def["x-extraction"] = x_extraction

    props[name] = field_def

    if required:
        req_list = schema.setdefault("required", [])
        if name not in req_list:
            req_list.append(name)

    return self
transition
transition(
    target: str, condition: str | None = None, priority: int | None = None
) -> WizardConfigBuilder

Add a transition from the current stage.

Must be called after stage().

Parameters:

Name Type Description Default
target str

Target stage name.

required
condition str | None

Python expression evaluated against wizard state.

None
priority int | None

Transition evaluation priority (lower = first).

None

Returns:

Type Description
WizardConfigBuilder

Self for method chaining.

Raises:

Type Description
ValueError

If no current stage is set.

Source code in packages/bots/src/dataknobs_bots/testing.py
def transition(
    self,
    target: str,
    condition: str | None = None,
    priority: int | None = None,
) -> WizardConfigBuilder:
    """Add a transition from the current stage.

    Must be called after ``stage()``.

    Args:
        target: Target stage name.
        condition: Python expression evaluated against wizard state.
        priority: Transition evaluation priority (lower = first).

    Returns:
        Self for method chaining.

    Raises:
        ValueError: If no current stage is set.
    """
    if self._current_stage is None:
        raise ValueError("transition() must be called after stage()")

    transitions = self._current_stage.setdefault("transitions", [])
    t: dict[str, Any] = {"target": target}
    if condition is not None:
        t["condition"] = condition
    if priority is not None:
        t["priority"] = priority
    transitions.append(t)
    return self
settings
settings(**kwargs: Any) -> WizardConfigBuilder

Set wizard-level settings.

Parameters:

Name Type Description Default
**kwargs Any

Settings key-value pairs (e.g. extraction_scope="current_message", scope_escalation={"enabled": True}).

{}

Returns:

Type Description
WizardConfigBuilder

Self for method chaining.

Source code in packages/bots/src/dataknobs_bots/testing.py
def settings(self, **kwargs: Any) -> WizardConfigBuilder:
    """Set wizard-level settings.

    Args:
        **kwargs: Settings key-value pairs (e.g.
            ``extraction_scope="current_message"``,
            ``scope_escalation={"enabled": True}``).

    Returns:
        Self for method chaining.
    """
    self._settings.update(kwargs)
    return self
build
build() -> dict[str, Any]

Build and validate the wizard config dict.

Returns:

Type Description
dict[str, Any]

Complete wizard configuration dict compatible with

dict[str, Any]

WizardConfigLoader.load_from_dict().

Raises:

Type Description
ValueError

If validation fails (no start stage, no end stage, transition to nonexistent stage).

Source code in packages/bots/src/dataknobs_bots/testing.py
def build(self) -> dict[str, Any]:
    """Build and validate the wizard config dict.

    Returns:
        Complete wizard configuration dict compatible with
        ``WizardConfigLoader.load_from_dict()``.

    Raises:
        ValueError: If validation fails (no start stage, no end stage,
            transition to nonexistent stage).
    """
    self._finalize_current_stage()

    config: dict[str, Any] = {
        "name": self._name,
        "version": self._version,
        "stages": list(self._stages),
    }
    if self._settings:
        config["settings"] = dict(self._settings)

    self._validate(config)
    return config

AddKBResourceTool

AddKBResourceTool(knowledge_dir: Path | None = None)

Bases: ContextAwareTool

Tool for adding a resource to the knowledge base resource list.

Supports adding file references (from the source directory) or inline content that gets written to the knowledge directory.

Wizard data read/written: - _kb_resources: list[dict] — resource list (append) - domain_id: str — used for knowledge directory organization

Attributes:

Name Type Description
_knowledge_dir

Optional base directory for writing inline content.

Initialize the tool.

Parameters:

Name Type Description Default
knowledge_dir Path | None

Base directory for knowledge files. Used when writing inline content to disk. Resolved from wizard data _knowledge_dir if not provided here.

None

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

Add a KB resource.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
def __init__(self, knowledge_dir: Path | None = None) -> None:
    """Initialize the tool.

    Args:
        knowledge_dir: Base directory for knowledge files. Used when
            writing inline content to disk. Resolved from wizard data
            ``_knowledge_dir`` if not provided here.
    """
    super().__init__(
        name="add_kb_resource",
        description=(
            "Add a resource to the bot's knowledge base. Can add "
            "a file from the source directory or inline content."
        ),
    )
    self._knowledge_dir = knowledge_dir
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "add_kb_resource",
        "description": (
            "Add a resource to the bot's knowledge base."
        ),
        "tags": ("configbot", "kb"),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext,
    path: str,
    title: str = "",
    resource_type: str = "file",
    content: str | None = None,
    description: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Add a KB resource.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
path str

Resource path or filename.

required
title str

Optional display title.

''
resource_type str

Type of resource ('file' or 'inline').

'file'
content str | None

Inline content (required if resource_type='inline').

None
description str | None

Optional resource description.

None

Returns:

Type Description
dict[str, Any]

Dict with add result.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    path: str,
    title: str = "",
    resource_type: str = "file",
    content: str | None = None,
    description: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    """Add a KB resource.

    Args:
        context: Execution context with wizard state.
        path: Resource path or filename.
        title: Optional display title.
        resource_type: Type of resource ('file' or 'inline').
        content: Inline content (required if resource_type='inline').
        description: Optional resource description.

    Returns:
        Dict with add result.
    """
    wizard_data = _get_wizard_data_ref(context)
    resources: list[dict[str, Any]] = wizard_data.setdefault(
        "_kb_resources", []
    )

    # Check for duplicate
    existing_paths = {r["path"] for r in resources}
    if path in existing_paths:
        return {
            "success": False,
            "error": f"Resource already exists: {path}",
            "existing_resources": len(resources),
        }

    resource: dict[str, Any] = {
        "path": path,
        "type": resource_type,
    }
    if title:
        resource["title"] = title
    if description:
        resource["description"] = description

    # Handle inline content — write to knowledge directory
    if resource_type == "inline":
        if not content:
            return {
                "success": False,
                "error": "Content is required for inline resources",
            }
        kb_dir = _resolve_knowledge_dir(self._knowledge_dir, wizard_data)
        if kb_dir is None:
            return {
                "success": False,
                "error": "No knowledge directory configured",
            }
        domain_id = wizard_data.get("domain_id", "default")
        target_dir = kb_dir / domain_id
        target_dir.mkdir(parents=True, exist_ok=True)
        target_path = target_dir / path
        target_path.write_text(content, encoding="utf-8")
        resource["source"] = str(target_path)
        logger.debug(
            "Wrote inline resource: %s",
            target_path,
            extra={"conversation_id": context.conversation_id},
        )

    resources.append(resource)

    logger.debug(
        "Added KB resource: %s (type=%s)",
        path,
        resource_type,
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "success": True,
        "resource": resource,
        "total_resources": len(resources),
    }

CheckKnowledgeSourceTool

CheckKnowledgeSourceTool()

Bases: ContextAwareTool

Tool for verifying a knowledge source directory exists and has content.

Checks the specified path for files matching common document patterns and records the results in wizard data for subsequent tools.

Wizard data written: - source_verified: bool — whether the source was found - files_found: list[str] — matching file names - _source_path_resolved: str — the resolved absolute path - _kb_resources: list[dict] — initialized if not present

Initialize the tool.

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

Check the knowledge source directory.

Attributes:

Name Type Description
schema dict[str, Any]

Return JSON Schema for tool parameters.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
def __init__(self) -> None:
    """Initialize the tool."""
    super().__init__(
        name="check_knowledge_source",
        description=(
            "Check if a knowledge source directory exists and contains "
            "files that can be used for the knowledge base."
        ),
    )
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "check_knowledge_source",
        "description": (
            "Check if a knowledge source directory exists and "
            "contains usable files."
        ),
        "tags": ("configbot", "kb"),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext,
    source_path: str,
    file_patterns: list[str] | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Check the knowledge source directory.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
source_path str

Path to the knowledge source directory.

required
file_patterns list[str] | None

Optional glob patterns to match files.

None

Returns:

Type Description
dict[str, Any]

Dict with verification results.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    source_path: str,
    file_patterns: list[str] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    """Check the knowledge source directory.

    Args:
        context: Execution context with wizard state.
        source_path: Path to the knowledge source directory.
        file_patterns: Optional glob patterns to match files.

    Returns:
        Dict with verification results.
    """
    wizard_data = _get_wizard_data_ref(context)
    patterns = file_patterns or _DEFAULT_GLOB_PATTERNS

    path = Path(source_path).expanduser().resolve()
    if not path.exists() or not path.is_dir():
        wizard_data["source_verified"] = False
        wizard_data["files_found"] = []
        logger.debug(
            "Knowledge source not found: %s",
            path,
            extra={"conversation_id": context.conversation_id},
        )
        return {
            "exists": False,
            "error": f"Directory not found: {source_path}",
            "files_found": [],
        }

    # Find matching files
    found_files: list[str] = []
    for pattern in patterns:
        for match in path.glob(pattern):
            if match.is_file():
                found_files.append(match.name)
    found_files = sorted(set(found_files))

    # Update wizard data
    wizard_data["source_verified"] = True
    wizard_data["files_found"] = found_files
    wizard_data["_source_path_resolved"] = str(path)
    if "_kb_resources" not in wizard_data:
        wizard_data["_kb_resources"] = []

    # Auto-populate _kb_resources with discovered files
    resources: list[dict[str, Any]] = wizard_data["_kb_resources"]
    existing_paths = {r.get("path") for r in resources}
    for fname in found_files:
        if fname not in existing_paths:
            resources.append(
                {"path": fname, "type": "file", "source": str(path / fname)}
            )

    logger.debug(
        "Checked knowledge source: %s (%d files)",
        path,
        len(found_files),
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "exists": True,
        "path": str(path),
        "files_found": found_files,
        "file_count": len(found_files),
        "patterns_checked": patterns,
    }

GetTemplateDetailsTool

GetTemplateDetailsTool(template_registry: ConfigTemplateRegistry)

Bases: ContextAwareTool

Tool for getting detailed information about a template.

Returns the full template definition including all variables, their types, defaults, and constraints.

Attributes:

Name Type Description
_registry

Template registry to query.

Initialize the tool.

Parameters:

Name Type Description Default
template_registry ConfigTemplateRegistry

Registry containing available templates.

required

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

from_config

Create from YAML-compatible configuration.

execute_with_context

Get template details.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
def __init__(self, template_registry: ConfigTemplateRegistry) -> None:
    """Initialize the tool.

    Args:
        template_registry: Registry containing available templates.
    """
    super().__init__(
        name="get_template_details",
        description=(
            "Get detailed information about a specific configuration "
            "template, including all variables and their requirements."
        ),
    )
    self._registry = template_registry
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "get_template_details",
        "description": (
            "Get detailed information about a specific "
            "configuration template."
        ),
        "tags": ("configbot",),
        "requires": ("template_registry",),
        "default_params": {"template_dir": "configs/templates"},
    }
from_config classmethod
from_config(config: dict[str, Any]) -> GetTemplateDetailsTool

Create from YAML-compatible configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Dict with template_dir key pointing to a directory containing template YAML files.

required

Returns:

Type Description
GetTemplateDetailsTool

Configured GetTemplateDetailsTool instance.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def from_config(cls, config: dict[str, Any]) -> GetTemplateDetailsTool:
    """Create from YAML-compatible configuration.

    Args:
        config: Dict with ``template_dir`` key pointing to a
            directory containing template YAML files.

    Returns:
        Configured GetTemplateDetailsTool instance.
    """
    from pathlib import Path

    template_dir = config.get("template_dir", "configs/templates")
    registry = ConfigTemplateRegistry()
    path = Path(template_dir)
    if path.is_dir():
        registry.load_from_directory(path)
    return cls(template_registry=registry)
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, template_name: str, **kwargs: Any
) -> dict[str, Any]

Get template details.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context.

required
template_name str

Name of the template.

required

Returns:

Type Description
dict[str, Any]

Dict with template details, or error if not found.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    template_name: str,
    **kwargs: Any,
) -> dict[str, Any]:
    """Get template details.

    Args:
        context: Execution context.
        template_name: Name of the template.

    Returns:
        Dict with template details, or error if not found.
    """
    template = self._registry.get(template_name)
    if template is None:
        return {
            "error": f"Template not found: {template_name}",
            "available": [
                t.name for t in self._registry.list_templates()
            ],
        }

    logger.debug(
        "Retrieved template details: %s",
        template_name,
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "name": template.name,
        "description": template.description,
        "version": template.version,
        "tags": template.tags,
        "variables": [v.to_dict() for v in template.variables],
        "required_variables": [
            v.to_dict() for v in template.get_required_variables()
        ],
        "optional_variables": [
            v.to_dict() for v in template.get_optional_variables()
        ],
    }

IngestKnowledgeBaseTool

IngestKnowledgeBaseTool(knowledge_dir: Path | None = None)

Bases: ContextAwareTool

Tool for writing the KB ingestion manifest and finalizing KB config.

Writes a manifest.json file listing resources and chunking parameters, and updates wizard data with the final KB configuration for inclusion in the bot config.

Wizard data read: - _kb_resources: list[dict] — resources to include - domain_id: str — domain identifier - files_found: list[str] — auto-discovered files (fallback) - _source_path_resolved: str — resolved source path

Wizard data written: - kb_config: dict — final KB configuration for the bot config - kb_resources: list[dict] — finalized resource list (public key) - ingestion_complete: bool — whether ingestion manifest was written

Attributes:

Name Type Description
_knowledge_dir

Optional base directory for knowledge files.

Initialize the tool.

Parameters:

Name Type Description Default
knowledge_dir Path | None

Base directory for knowledge files. Resolved from wizard data _knowledge_dir if not provided here.

None

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

Write ingestion manifest and finalize KB config.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
def __init__(self, knowledge_dir: Path | None = None) -> None:
    """Initialize the tool.

    Args:
        knowledge_dir: Base directory for knowledge files. Resolved
            from wizard data ``_knowledge_dir`` if not provided here.
    """
    super().__init__(
        name="ingest_knowledge_base",
        description=(
            "Finalize and ingest the knowledge base resources. "
            "Writes an ingestion manifest and prepares the KB "
            "configuration for the bot."
        ),
    )
    self._knowledge_dir = knowledge_dir
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "ingest_knowledge_base",
        "description": (
            "Finalize and ingest the knowledge base resources."
        ),
        "tags": ("configbot", "kb"),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, chunk_size: int = 512, **kwargs: Any
) -> dict[str, Any]

Write ingestion manifest and finalize KB config.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
chunk_size int

Size of text chunks.

512

Returns:

Type Description
dict[str, Any]

Dict with ingestion result.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    chunk_size: int = 512,
    **kwargs: Any,
) -> dict[str, Any]:
    """Write ingestion manifest and finalize KB config.

    Args:
        context: Execution context with wizard state.
        chunk_size: Size of text chunks.

    Returns:
        Dict with ingestion result.
    """
    wizard_data = _get_wizard_data_ref(context)
    domain_id = wizard_data.get("domain_id", "default")
    resources = wizard_data.get("_kb_resources", [])
    source_path = wizard_data.get("_source_path_resolved")

    # Fallback: if no explicit resources, use auto-discovered files
    if not resources and wizard_data.get("files_found"):
        resources = [
            {"path": f, "type": "file"}
            for f in wizard_data["files_found"]
        ]

    if not resources:
        return {
            "success": False,
            "error": "No resources to ingest. Add resources first.",
        }

    kb_dir = _resolve_knowledge_dir(self._knowledge_dir, wizard_data)
    if kb_dir is None:
        return {
            "success": False,
            "error": "No knowledge directory configured",
        }

    # Write manifest
    manifest_dir = kb_dir / domain_id
    manifest_dir.mkdir(parents=True, exist_ok=True)
    manifest = {
        "domain_id": domain_id,
        "source_path": source_path,
        "resources": resources,
        "chunking": {
            "chunk_size": chunk_size,
        },
    }
    manifest_path = manifest_dir / "manifest.json"
    manifest_path.write_text(
        json.dumps(manifest, indent=2), encoding="utf-8"
    )

    # Build KB config for the bot configuration
    kb_config: dict[str, Any] = {
        "enabled": True,
        "type": "rag",
        "documents_path": str(manifest_dir),
        "chunking": {
            "chunk_size": chunk_size,
        },
    }

    # Update wizard data with finalized KB config
    wizard_data["kb_config"] = kb_config
    wizard_data["kb_resources"] = resources
    wizard_data["ingestion_complete"] = True

    logger.info(
        "Wrote KB manifest for '%s' with %d resources",
        domain_id,
        len(resources),
        extra={
            "domain_id": domain_id,
            "resource_count": len(resources),
            "conversation_id": context.conversation_id,
        },
    )

    return {
        "success": True,
        "domain_id": domain_id,
        "manifest_path": str(manifest_path),
        "resource_count": len(resources),
        "chunk_size": chunk_size,
    }

KnowledgeSearchTool

KnowledgeSearchTool(knowledge_base: Any, name: str = 'knowledge_search')

Bases: ContextAwareTool

Tool for searching the knowledge base.

This tool allows LLMs to search the bot's knowledge base for relevant information during conversations.

Demonstrates the umbrella pattern for tools: - Static dependency: knowledge_base (via constructor injection) - Dynamic context: conversation_id, user_id (via ToolExecutionContext)

Example
# Create tool with knowledge base (static dependency)
tool = KnowledgeSearchTool(knowledge_base=kb)

# Register with bot
bot.tool_registry.register_tool(tool)

# LLM can now call the tool
# Context is automatically injected by reasoning strategy
results = await tool.execute(
    query="How do I configure the database?",
    max_results=3
)

Initialize knowledge search tool.

Parameters:

Name Type Description Default
knowledge_base Any

RAGKnowledgeBase instance to search

required
name str

Tool name (default: knowledge_search)

'knowledge_search'

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

Execute knowledge base search with context.

Attributes:

Name Type Description
schema dict[str, Any]

Get JSON schema for tool parameters.

Source code in packages/bots/src/dataknobs_bots/tools/knowledge_search.py
def __init__(self, knowledge_base: Any, name: str = "knowledge_search"):
    """Initialize knowledge search tool.

    Args:
        knowledge_base: RAGKnowledgeBase instance to search
        name: Tool name (default: knowledge_search)
    """
    super().__init__(
        name=name,
        description="Search the knowledge base for relevant information. "
        "Use this when you need to find documentation, examples, or "
        "specific information to answer user questions.",
    )
    # Static dependency - doesn't change per-request
    self.knowledge_base = knowledge_base
Attributes
schema property
schema: dict[str, Any]

Get JSON schema for tool parameters.

Returns:

Type Description
dict[str, Any]

JSON Schema for the tool parameters

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/knowledge_search.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "knowledge_search",
        "description": (
            "Search the knowledge base for relevant information."
        ),
        "tags": ("general", "rag"),
        "requires": ("knowledge_base",),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext,
    query: str,
    max_results: int = 3,
    **kwargs: Any,
) -> dict[str, Any]

Execute knowledge base search with context.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with conversation/user info

required
query str

Search query text

required
max_results int

Maximum number of results (default: 3)

3
**kwargs Any

Additional arguments (ignored)

{}

Returns:

Type Description
dict[str, Any]

Dictionary with search results: - query: Original query - results: List of relevant chunks - num_results: Number of results found - conversation_id: ID of conversation (if available)

Example
result = await tool.execute(
    query="How do I configure the database?",
    max_results=3
)
for chunk in result['results']:
    print(f"{chunk['heading_path']}: {chunk['text']}")
Source code in packages/bots/src/dataknobs_bots/tools/knowledge_search.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    query: str,
    max_results: int = 3,
    **kwargs: Any,
) -> dict[str, Any]:
    """Execute knowledge base search with context.

    Args:
        context: Execution context with conversation/user info
        query: Search query text
        max_results: Maximum number of results (default: 3)
        **kwargs: Additional arguments (ignored)

    Returns:
        Dictionary with search results:
            - query: Original query
            - results: List of relevant chunks
            - num_results: Number of results found
            - conversation_id: ID of conversation (if available)

    Example:
        ```python
        result = await tool.execute(
            query="How do I configure the database?",
            max_results=3
        )
        for chunk in result['results']:
            print(f"{chunk['heading_path']}: {chunk['text']}")
        ```
    """
    # Clamp max_results to valid range
    max_results = max(1, min(10, max_results))

    # Log search with context for observability
    logger.debug(
        "Knowledge search",
        extra={
            "query": query,
            "max_results": max_results,
            "conversation_id": context.conversation_id,
            "user_id": context.user_id,
        },
    )

    # Search knowledge base
    results = await self.knowledge_base.query(query, k=max_results)

    # Format response with optional context info
    response: dict[str, Any] = {
        "query": query,
        "results": [
            {
                "text": r["text"],
                "source": r["source"],
                "heading": r["heading_path"],
                "similarity": round(r["similarity"], 3),
            }
            for r in results
        ],
        "num_results": len(results),
    }

    # Include conversation_id for traceability if available
    if context.conversation_id:
        response["conversation_id"] = context.conversation_id

    return response

ListAvailableToolsTool

ListAvailableToolsTool(available_tools: list[dict[str, Any]])

Bases: ContextAwareTool

Tool for listing tools available to configure for a bot.

Takes a constructor-injected catalog of available tools and lets the LLM browse them, optionally filtering by category. The catalog data is consumer-specific — each DynaBot consumer provides its own list.

Attributes:

Name Type Description
_tools

The available tool catalog.

Initialize the tool.

Parameters:

Name Type Description Default
available_tools list[dict[str, Any]]

List of tool descriptors. Each dict should have at minimum name and description keys. Optional: category, params, class.

required

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

List available tools.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
def __init__(self, available_tools: list[dict[str, Any]]) -> None:
    """Initialize the tool.

    Args:
        available_tools: List of tool descriptors. Each dict should
            have at minimum ``name`` and ``description`` keys.
            Optional: ``category``, ``params``, ``class``.
    """
    super().__init__(
        name="list_available_tools",
        description=(
            "List tools that can be added to the bot configuration. "
            "Optionally filter by category."
        ),
    )
    self._tools = available_tools
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "list_available_tools",
        "description": (
            "List tools that can be added to the bot configuration."
        ),
        "tags": ("configbot",),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, category: str | None = None, **kwargs: Any
) -> dict[str, Any]

List available tools.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context.

required
category str | None

Optional category to filter by.

None

Returns:

Type Description
dict[str, Any]

Dict with matching tools, count, and available categories.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    category: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    """List available tools.

    Args:
        context: Execution context.
        category: Optional category to filter by.

    Returns:
        Dict with matching tools, count, and available categories.
    """
    if category:
        filtered = [
            t for t in self._tools
            if t.get("category", "").lower() == category.lower()
        ]
    else:
        filtered = list(self._tools)

    categories = sorted({
        t["category"] for t in self._tools if "category" in t
    })

    logger.debug(
        "Listed %d available tools (category=%s)",
        len(filtered),
        category,
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "tools": filtered,
        "count": len(filtered),
        "categories": categories,
    }

ListKBResourcesTool

ListKBResourcesTool()

Bases: ContextAwareTool

Tool for listing currently tracked knowledge base resources.

Reads _kb_resources and _source_path_resolved from wizard data to show what resources have been added so far.

Initialize the tool.

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

List KB resources.

Attributes:

Name Type Description
schema dict[str, Any]

Return JSON Schema for tool parameters.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
def __init__(self) -> None:
    """Initialize the tool."""
    super().__init__(
        name="list_kb_resources",
        description=(
            "List the knowledge base resources that have been added "
            "to the current bot configuration."
        ),
    )
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "list_kb_resources",
        "description": (
            "List the knowledge base resources added to the "
            "current bot configuration."
        ),
        "tags": ("configbot", "kb"),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, **kwargs: Any
) -> dict[str, Any]

List KB resources.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required

Returns:

Type Description
dict[str, Any]

Dict with resource list and source path.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    **kwargs: Any,
) -> dict[str, Any]:
    """List KB resources.

    Args:
        context: Execution context with wizard state.

    Returns:
        Dict with resource list and source path.
    """
    wizard_data = _get_wizard_data_ref(context)
    resources = wizard_data.get("_kb_resources", [])
    source_path = wizard_data.get("_source_path_resolved")

    logger.debug(
        "Listed %d KB resources",
        len(resources),
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "resources": resources,
        "count": len(resources),
        "source_path": source_path,
    }

ListTemplatesTool

ListTemplatesTool(template_registry: ConfigTemplateRegistry)

Bases: ContextAwareTool

Tool for listing available configuration templates.

Allows the LLM to discover what templates are available, optionally filtered by tags.

Attributes:

Name Type Description
_registry

Template registry to query.

Initialize the tool.

Parameters:

Name Type Description Default
template_registry ConfigTemplateRegistry

Registry containing available templates.

required

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

from_config

Create from YAML-compatible configuration.

execute_with_context

List available templates.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
def __init__(self, template_registry: ConfigTemplateRegistry) -> None:
    """Initialize the tool.

    Args:
        template_registry: Registry containing available templates.
    """
    super().__init__(
        name="list_templates",
        description=(
            "List available bot configuration templates. "
            "Optionally filter by tags to find templates for "
            "specific use cases."
        ),
    )
    self._registry = template_registry
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "list_templates",
        "description": (
            "List available bot configuration templates."
        ),
        "tags": ("configbot",),
        "requires": ("template_registry",),
        "default_params": {"template_dir": "configs/templates"},
    }
from_config classmethod
from_config(config: dict[str, Any]) -> ListTemplatesTool

Create from YAML-compatible configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Dict with template_dir key pointing to a directory containing template YAML files.

required

Returns:

Type Description
ListTemplatesTool

Configured ListTemplatesTool instance.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def from_config(cls, config: dict[str, Any]) -> ListTemplatesTool:
    """Create from YAML-compatible configuration.

    Args:
        config: Dict with ``template_dir`` key pointing to a
            directory containing template YAML files.

    Returns:
        Configured ListTemplatesTool instance.
    """
    from pathlib import Path

    template_dir = config.get("template_dir", "configs/templates")
    registry = ConfigTemplateRegistry()
    path = Path(template_dir)
    if path.is_dir():
        registry.load_from_directory(path)
    return cls(template_registry=registry)
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, tags: list[str] | None = None, **kwargs: Any
) -> dict[str, Any]

List available templates.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context.

required
tags list[str] | None

Optional tags to filter by.

None

Returns:

Type Description
dict[str, Any]

Dict with list of template summaries.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    tags: list[str] | None = None,
    **kwargs: Any,
) -> dict[str, Any]:
    """List available templates.

    Args:
        context: Execution context.
        tags: Optional tags to filter by.

    Returns:
        Dict with list of template summaries.
    """
    templates = self._registry.list_templates(tags=tags)

    logger.debug(
        "Listed %d templates (tags=%s)",
        len(templates),
        tags,
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "templates": [
            {
                "name": t.name,
                "description": t.description,
                "version": t.version,
                "tags": t.tags,
                "variables_count": len(t.variables),
                "required_variables": [
                    v.name for v in t.get_required_variables()
                ],
            }
            for t in templates
        ],
        "count": len(templates),
    }

PreviewConfigTool

PreviewConfigTool(
    builder_factory: Callable[[dict[str, Any]], DynaBotConfigBuilder],
)

Bases: ContextAwareTool

Tool for previewing the configuration being built.

Uses a consumer-provided builder_factory to construct the configuration from wizard data. This is the key extension point: the factory encapsulates domain-specific logic.

Attributes:

Name Type Description
_builder_factory

Callable that creates a configured builder from wizard data.

Initialize the tool.

Parameters:

Name Type Description Default
builder_factory Callable[[dict[str, Any]], DynaBotConfigBuilder]

Function that takes wizard collected data and returns a configured DynaBotConfigBuilder. This is where consumers inject domain-specific config logic.

required

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

from_config

Create from YAML-compatible configuration.

execute_with_context

Preview the current configuration.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
def __init__(
    self,
    builder_factory: Callable[[dict[str, Any]], DynaBotConfigBuilder],
) -> None:
    """Initialize the tool.

    Args:
        builder_factory: Function that takes wizard collected data
            and returns a configured DynaBotConfigBuilder. This is
            where consumers inject domain-specific config logic.
    """
    super().__init__(
        name="preview_config",
        description=(
            "Preview the bot configuration being built from the "
            "current wizard data. Shows what the final config will "
            "look like."
        ),
    )
    self._builder_factory = builder_factory
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "preview_config",
        "description": (
            "Preview the bot configuration being built from "
            "the current wizard data."
        ),
        "tags": ("configbot",),
        "requires": ("builder_factory",),
    }
from_config classmethod
from_config(config: dict[str, Any]) -> PreviewConfigTool

Create from YAML-compatible configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Dict with builder_factory key — a dotted import path to a callable that accepts wizard data and returns a DynaBotConfigBuilder.

required

Returns:

Type Description
PreviewConfigTool

Configured PreviewConfigTool instance.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def from_config(cls, config: dict[str, Any]) -> PreviewConfigTool:
    """Create from YAML-compatible configuration.

    Args:
        config: Dict with ``builder_factory`` key — a dotted
            import path to a callable that accepts wizard data
            and returns a ``DynaBotConfigBuilder``.

    Returns:
        Configured PreviewConfigTool instance.
    """
    from .resolve import resolve_callable

    factory_ref = config["builder_factory"]
    factory = resolve_callable(factory_ref)
    return cls(builder_factory=factory)
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, format: str = "summary", **kwargs: Any
) -> dict[str, Any]

Preview the current configuration.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
format str

Output format ('summary', 'full', or 'yaml').

'summary'

Returns:

Type Description
dict[str, Any]

Dict with the configuration preview.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    format: str = "summary",
    **kwargs: Any,
) -> dict[str, Any]:
    """Preview the current configuration.

    Args:
        context: Execution context with wizard state.
        format: Output format ('summary', 'full', or 'yaml').

    Returns:
        Dict with the configuration preview.
    """
    wizard_data = _get_wizard_data(context)
    if not wizard_data:
        return {"error": "No wizard data available for preview"}

    try:
        builder = self._builder_factory(wizard_data)
        config = builder._build_internal()
    except Exception as e:
        logger.exception("Failed to build config for preview")
        return {"error": f"Failed to build configuration: {e}"}

    logger.debug(
        "Generated config preview (format=%s)",
        format,
        extra={"conversation_id": context.conversation_id},
    )

    if format == "yaml":
        return {"yaml": yaml.dump(config, default_flow_style=False, sort_keys=False)}
    elif format == "full":
        return {"config": config}
    else:
        return _build_summary(config)

RemoveKBResourceTool

RemoveKBResourceTool()

Bases: ContextAwareTool

Tool for removing a resource from the knowledge base resource list.

Wizard data read/written: - _kb_resources: list[dict] — resource list (remove by name)

Initialize the tool.

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

execute_with_context

Remove a KB resource.

Attributes:

Name Type Description
schema dict[str, Any]

Return JSON Schema for tool parameters.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
def __init__(self) -> None:
    """Initialize the tool."""
    super().__init__(
        name="remove_kb_resource",
        description=(
            "Remove a resource from the bot's knowledge base "
            "resource list."
        ),
    )
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "remove_kb_resource",
        "description": (
            "Remove a resource from the bot's knowledge base "
            "resource list."
        ),
        "tags": ("configbot", "kb"),
    }
execute_with_context async
execute_with_context(
    context: ToolExecutionContext, path: str, **kwargs: Any
) -> dict[str, Any]

Remove a KB resource.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
path str

Path of the resource to remove.

required

Returns:

Type Description
dict[str, Any]

Dict with removal result.

Source code in packages/bots/src/dataknobs_bots/tools/kb_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    path: str,
    **kwargs: Any,
) -> dict[str, Any]:
    """Remove a KB resource.

    Args:
        context: Execution context with wizard state.
        path: Path of the resource to remove.

    Returns:
        Dict with removal result.
    """
    wizard_data = _get_wizard_data_ref(context)
    resources: list[dict[str, Any]] = wizard_data.get("_kb_resources", [])

    original_count = len(resources)
    updated = [r for r in resources if r["path"] != path]

    if len(updated) == original_count:
        return {
            "success": False,
            "error": f"Resource not found: {path}",
            "available": [r["path"] for r in resources],
        }

    wizard_data["_kb_resources"] = updated

    logger.debug(
        "Removed KB resource: %s",
        path,
        extra={"conversation_id": context.conversation_id},
    )

    return {
        "success": True,
        "removed": path,
        "remaining_resources": len(updated),
    }

SaveConfigTool

SaveConfigTool(
    draft_manager: ConfigDraftManager,
    on_save: Callable[[str, dict[str, Any]], Any] | None = None,
    builder_factory: Callable[[dict[str, Any]], DynaBotConfigBuilder]
    | None = None,
    portable: bool = False,
)

Bases: ContextAwareTool

Tool for saving/finalizing the configuration.

Finalizes the draft and writes the final config file. Optionally calls a consumer-provided callback for post-save actions (e.g., registering the bot with a manager).

When portable=True, the builder's build_portable() method is used instead of _build_internal(), producing a config with a bot wrapper key suitable for environment-aware deployment.

Attributes:

Name Type Description
_draft_manager

Draft manager for file operations.

_on_save

Optional callback invoked after successful save.

_builder_factory

Optional factory for building config from wizard data.

_portable

Whether to use portable (bot-wrapped) output format.

Initialize the tool.

Parameters:

Name Type Description Default
draft_manager ConfigDraftManager

Manager for draft file operations.

required
on_save Callable[[str, dict[str, Any]], Any] | None

Optional callback called with (config_name, config) after successful save. Can be used for post-save actions like bot registration.

None
builder_factory Callable[[dict[str, Any]], DynaBotConfigBuilder] | None

Optional factory to build final config from wizard data before saving.

None
portable bool

When True, use build_portable() for output (wraps config under bot key with custom sections as siblings). When False (default), use _build_internal() for flat format.

False

Methods:

Name Description
catalog_metadata

Return catalog metadata for this tool class.

from_config

Create from YAML-compatible configuration.

execute_with_context

Save the configuration.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
def __init__(
    self,
    draft_manager: ConfigDraftManager,
    on_save: Callable[[str, dict[str, Any]], Any] | None = None,
    builder_factory: Callable[[dict[str, Any]], DynaBotConfigBuilder] | None = None,
    portable: bool = False,
) -> None:
    """Initialize the tool.

    Args:
        draft_manager: Manager for draft file operations.
        on_save: Optional callback called with (config_name, config)
            after successful save. Can be used for post-save actions
            like bot registration.
        builder_factory: Optional factory to build final config from
            wizard data before saving.
        portable: When True, use ``build_portable()`` for output
            (wraps config under ``bot`` key with custom sections as
            siblings). When False (default), use ``_build_internal()``
            for flat format.
    """
    super().__init__(
        name="save_config",
        description=(
            "Save and finalize the bot configuration. Writes the "
            "final config file and optionally activates the bot."
        ),
    )
    self._draft_manager = draft_manager
    self._on_save = on_save
    self._builder_factory = builder_factory
    self._portable = portable
Attributes
schema property
schema: dict[str, Any]

Return JSON Schema for tool parameters.

Functions
catalog_metadata classmethod
catalog_metadata() -> dict[str, Any]

Return catalog metadata for this tool class.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def catalog_metadata(cls) -> dict[str, Any]:
    """Return catalog metadata for this tool class."""
    return {
        "name": "save_config",
        "description": (
            "Save and finalize the bot configuration."
        ),
        "tags": ("configbot",),
        "requires": ("draft_manager",),
    }
from_config classmethod
from_config(config: dict[str, Any]) -> SaveConfigTool

Create from YAML-compatible configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Dict with keys: - config_dir (str): Output directory for configs. - builder_factory (str, optional): Dotted import path. - on_save (str, optional): Dotted import path. - portable (bool, optional): Use portable output format.

required

Returns:

Type Description
SaveConfigTool

Configured SaveConfigTool instance.

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
@classmethod
def from_config(cls, config: dict[str, Any]) -> SaveConfigTool:
    """Create from YAML-compatible configuration.

    Args:
        config: Dict with keys:
            - ``config_dir`` (str): Output directory for configs.
            - ``builder_factory`` (str, optional): Dotted import path.
            - ``on_save`` (str, optional): Dotted import path.
            - ``portable`` (bool, optional): Use portable output format.

    Returns:
        Configured SaveConfigTool instance.
    """
    from pathlib import Path

    config_dir = config.get("config_dir", "configs")
    manager = ConfigDraftManager(output_dir=Path(config_dir))

    on_save = None
    factory = None
    if "on_save" in config or "builder_factory" in config:
        from .resolve import resolve_callable

        if "on_save" in config:
            on_save = resolve_callable(config["on_save"])
        if "builder_factory" in config:
            factory = resolve_callable(config["builder_factory"])

    portable = config.get("portable", False)
    return cls(
        draft_manager=manager,
        on_save=on_save,
        builder_factory=factory,
        portable=portable,
    )
execute_with_context async
execute_with_context(
    context: ToolExecutionContext,
    config_name: str | None = None,
    activate: bool = False,
    **kwargs: Any,
) -> dict[str, Any]

Save the configuration.

Parameters:

Name Type Description Default
context ToolExecutionContext

Execution context with wizard state.

required
config_name str | None

Name for the config file.

None
activate bool

Whether to activate the bot.

False

Returns:

Type Description
dict[str, Any]

Dict with save result (success, file path, etc.).

Source code in packages/bots/src/dataknobs_bots/tools/config_tools.py
async def execute_with_context(
    self,
    context: ToolExecutionContext,
    config_name: str | None = None,
    activate: bool = False,
    **kwargs: Any,
) -> dict[str, Any]:
    """Save the configuration.

    Args:
        context: Execution context with wizard state.
        config_name: Name for the config file.
        activate: Whether to activate the bot.

    Returns:
        Dict with save result (success, file path, etc.).
    """
    wizard_data = _get_wizard_data(context)
    if not wizard_data:
        return {"success": False, "error": "No wizard data available"}

    # Determine config name
    name = config_name or wizard_data.get("domain_id") or wizard_data.get("config_name")
    if not name:
        return {
            "success": False,
            "error": "No config_name provided and no domain_id in wizard data",
        }

    # Build final config
    if self._builder_factory is not None:
        try:
            builder = self._builder_factory(wizard_data)
            if self._portable:
                config = builder.build_portable()
            else:
                config = builder._build_internal()
        except Exception as e:
            return {"success": False, "error": f"Failed to build configuration: {e}"}
    else:
        config = {
            k: v for k, v in wizard_data.items() if not k.startswith("_")
        }

    # Check for existing draft — finalize cleans up the draft file,
    # but we always use the freshly-built config (draft may be stale)
    draft_id = wizard_data.get("_draft_id")
    if draft_id:
        try:
            self._draft_manager.finalize(draft_id, final_name=name)
        except FileNotFoundError:
            logger.warning("Draft %s not found, saving directly", draft_id)
    final_config = config

    # Write the final file
    output_dir = self._draft_manager.output_dir
    output_dir.mkdir(parents=True, exist_ok=True)
    final_path = output_dir / f"{name}.yaml"
    with open(final_path, "w") as f:
        yaml.dump(final_config, f, default_flow_style=False, sort_keys=False)

    logger.info(
        "Saved configuration '%s' to %s",
        name,
        final_path,
        extra={
            "config_name": name,
            "activate": activate,
            "conversation_id": context.conversation_id,
        },
    )

    # Run consumer callback
    if self._on_save is not None:
        try:
            self._on_save(name, final_config)
        except Exception:
            logger.exception("on_save callback failed for '%s'", name)

    return {
        "success": True,
        "config_name": name,
        "file_path": str(final_path),
        "activated": activate,
    }

Functions

normalize_wizard_state

normalize_wizard_state(wizard_meta: dict[str, Any]) -> dict[str, Any]

Normalize wizard metadata to canonical structure.

Handles both old nested format (fsm_state.current_stage) and new flat format (current_stage directly).

Parameters:

Name Type Description Default
wizard_meta dict[str, Any]

Raw wizard metadata from manager or storage

required

Returns:

Type Description
dict[str, Any]

Normalized wizard state dict with canonical fields:

dict[str, Any]

current_stage, stage_index, total_stages, progress, completed,

dict[str, Any]

data, can_skip, can_go_back, suggestions, history, stages,

dict[str, Any]

subflow_depth, and (when in a subflow) subflow_stage.

Source code in packages/bots/src/dataknobs_bots/bot/base.py
def normalize_wizard_state(wizard_meta: dict[str, Any]) -> dict[str, Any]:
    """Normalize wizard metadata to canonical structure.

    Handles both old nested format (fsm_state.current_stage) and
    new flat format (current_stage directly).

    Args:
        wizard_meta: Raw wizard metadata from manager or storage

    Returns:
        Normalized wizard state dict with canonical fields:
        current_stage, stage_index, total_stages, progress, completed,
        data, can_skip, can_go_back, suggestions, history, stages,
        subflow_depth, and (when in a subflow) subflow_stage.
    """
    # Handle nested fsm_state format (legacy)
    fsm_state = wizard_meta.get("fsm_state", {})

    # Prefer direct fields, fall back to fsm_state
    current_stage = (
        wizard_meta.get("current_stage")
        or wizard_meta.get("stage")  # Old response format
        or fsm_state.get("current_stage")
    )

    result: dict[str, Any] = {
        "current_stage": current_stage,
        "stage_index": (
            wizard_meta.get("stage_index") or fsm_state.get("stage_index", 0)
        ),
        "total_stages": wizard_meta.get("total_stages", 0),
        "progress": wizard_meta.get("progress", 0.0),
        "completed": wizard_meta.get("completed", False),
        "data": wizard_meta.get("data") or fsm_state.get("data", {}),
        "can_skip": wizard_meta.get("can_skip", False),
        "can_go_back": wizard_meta.get("can_go_back", True),
        "suggestions": wizard_meta.get("suggestions", []),
        "history": wizard_meta.get("history") or fsm_state.get("history", []),
        "stages": wizard_meta.get("stages", []),
    }

    # Subflow context: present when wizard is executing a subflow
    subflow_stage = wizard_meta.get("subflow_stage")
    if subflow_stage:
        result["subflow_stage"] = subflow_stage
        result["subflow_depth"] = 1  # _build_wizard_metadata exposes top subflow
    else:
        result["subflow_depth"] = 0

    return result

create_default_catalog

create_default_catalog() -> ToolCatalog

Create a new ToolCatalog pre-populated with built-in tools.

Returns a fresh catalog (not the module-level singleton) so consumers can extend it without affecting other users of default_catalog.

Returns:

Type Description
ToolCatalog

New ToolCatalog with all built-in tools registered.

Source code in packages/bots/src/dataknobs_bots/config/tool_catalog.py
def create_default_catalog() -> ToolCatalog:
    """Create a new ToolCatalog pre-populated with built-in tools.

    Returns a fresh catalog (not the module-level singleton) so consumers
    can extend it without affecting other users of ``default_catalog``.

    Returns:
        New ToolCatalog with all built-in tools registered.
    """
    catalog = ToolCatalog()
    for entry in default_catalog.list_items():
        catalog.register_entry(entry)
    return catalog

create_knowledge_base_from_config async

create_knowledge_base_from_config(config: dict[str, Any]) -> KnowledgeBase

Create knowledge base from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Knowledge base configuration with: - type: Type of knowledge base (currently only 'rag' supported) - vector_store: Vector store configuration - embedding_provider: LLM provider for embeddings - embedding_model: Model to use for embeddings - chunking: Optional chunking configuration - documents_path: Optional path to load documents - document_pattern: Optional file pattern

required

Returns:

Type Description
KnowledgeBase

Configured knowledge base instance

Raises:

Type Description
ValueError

If knowledge base type is not supported

Example
config = {
    "type": "rag",
    "vector_store": {
        "backend": "memory",
        "dimensions": 384
    },
    "embedding_provider": "echo",
    "embedding_model": "test"
}
kb = await create_knowledge_base_from_config(config)
Source code in packages/bots/src/dataknobs_bots/knowledge/__init__.py
async def create_knowledge_base_from_config(config: dict[str, Any]) -> KnowledgeBase:
    """Create knowledge base from configuration.

    Args:
        config: Knowledge base configuration with:
            - type: Type of knowledge base (currently only 'rag' supported)
            - vector_store: Vector store configuration
            - embedding_provider: LLM provider for embeddings
            - embedding_model: Model to use for embeddings
            - chunking: Optional chunking configuration
            - documents_path: Optional path to load documents
            - document_pattern: Optional file pattern

    Returns:
        Configured knowledge base instance

    Raises:
        ValueError: If knowledge base type is not supported

    Example:
        ```python
        config = {
            "type": "rag",
            "vector_store": {
                "backend": "memory",
                "dimensions": 384
            },
            "embedding_provider": "echo",
            "embedding_model": "test"
        }
        kb = await create_knowledge_base_from_config(config)
        ```
    """
    kb_type = config.get("type", "rag").lower()

    if kb_type == "rag":
        return await RAGKnowledgeBase.from_config(config)
    else:
        raise ValueError(
            f"Unknown knowledge base type: {kb_type}. " f"Available types: rag"
        )

create_memory_from_config async

create_memory_from_config(
    config: dict[str, Any], llm_provider: Any | None = None
) -> Memory

Create memory instance from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Memory configuration with 'type' field and type-specific params

required
llm_provider Any | None

Optional LLM provider instance, required for summary memory

None

Returns:

Type Description
Memory

Configured Memory instance

Raises:

Type Description
ValueError

If memory type is not recognized or required params missing

Example
# Buffer memory
config = {
    "type": "buffer",
    "max_messages": 10
}
memory = await create_memory_from_config(config)

# Vector memory
config = {
    "type": "vector",
    "backend": "faiss",
    "dimension": 768,
    "embedding_provider": "ollama",
    "embedding_model": "nomic-embed-text"
}
memory = await create_memory_from_config(config)

# Summary memory (uses bot's LLM as fallback)
config = {
    "type": "summary",
    "recent_window": 10,
}
memory = await create_memory_from_config(config, llm_provider=llm)

# Summary memory with its own dedicated LLM
config = {
    "type": "summary",
    "recent_window": 10,
    "llm": {
        "provider": "ollama",
        "model": "gemma3:1b",
    },
}
memory = await create_memory_from_config(config)

# Composite memory (multiple strategies)
config = {
    "type": "composite",
    "strategies": [
        {"type": "buffer", "max_messages": 50},
        {
            "type": "vector",
            "backend": "memory",
            "dimension": 768,
            "embedding_provider": "ollama",
            "embedding_model": "nomic-embed-text",
        },
    ],
    "primary": 0,
}
memory = await create_memory_from_config(config)
Source code in packages/bots/src/dataknobs_bots/memory/__init__.py
async def create_memory_from_config(
    config: dict[str, Any],
    llm_provider: Any | None = None,
) -> Memory:
    """Create memory instance from configuration.

    Args:
        config: Memory configuration with 'type' field and type-specific params
        llm_provider: Optional LLM provider instance, required for summary memory

    Returns:
        Configured Memory instance

    Raises:
        ValueError: If memory type is not recognized or required params missing

    Example:
        ```python
        # Buffer memory
        config = {
            "type": "buffer",
            "max_messages": 10
        }
        memory = await create_memory_from_config(config)

        # Vector memory
        config = {
            "type": "vector",
            "backend": "faiss",
            "dimension": 768,
            "embedding_provider": "ollama",
            "embedding_model": "nomic-embed-text"
        }
        memory = await create_memory_from_config(config)

        # Summary memory (uses bot's LLM as fallback)
        config = {
            "type": "summary",
            "recent_window": 10,
        }
        memory = await create_memory_from_config(config, llm_provider=llm)

        # Summary memory with its own dedicated LLM
        config = {
            "type": "summary",
            "recent_window": 10,
            "llm": {
                "provider": "ollama",
                "model": "gemma3:1b",
            },
        }
        memory = await create_memory_from_config(config)

        # Composite memory (multiple strategies)
        config = {
            "type": "composite",
            "strategies": [
                {"type": "buffer", "max_messages": 50},
                {
                    "type": "vector",
                    "backend": "memory",
                    "dimension": 768,
                    "embedding_provider": "ollama",
                    "embedding_model": "nomic-embed-text",
                },
            ],
            "primary": 0,
        }
        memory = await create_memory_from_config(config)
        ```
    """
    memory_type = config.get("type", "buffer").lower()

    if memory_type == "buffer":
        return BufferMemory(max_messages=config.get("max_messages", 10))

    elif memory_type == "vector":
        return await VectorMemory.from_config(config)

    elif memory_type == "summary":
        # Track whether a dedicated provider was created (owns lifecycle)
        # vs reusing the bot's main LLM (bot owns lifecycle)
        has_dedicated_llm = "llm" in config
        summary_llm = await _resolve_summary_llm(config, llm_provider)
        return SummaryMemory(
            llm_provider=summary_llm,
            recent_window=config.get("recent_window", 10),
            summary_prompt=config.get("summary_prompt"),
            owns_llm_provider=has_dedicated_llm,
        )

    elif memory_type == "composite":
        strategy_configs = config.get("strategies", [])
        strategies: list[Memory] = []
        try:
            for strategy_config in strategy_configs:
                strategy = await create_memory_from_config(
                    strategy_config, llm_provider
                )
                strategies.append(strategy)
            if not strategies:
                raise ValueError(
                    "Composite memory requires at least one strategy "
                    "in 'strategies' list"
                )
            return CompositeMemory(
                strategies=strategies,
                primary_index=config.get("primary", 0),
            )
        except Exception:
            # Clean up any already-initialized strategies
            for s in strategies:
                try:
                    await s.close()
                except Exception:
                    logger.warning(
                        "Failed to close strategy during cleanup: %s",
                        type(s).__name__,
                        exc_info=True,
                    )
            raise

    else:
        raise ValueError(
            f"Unknown memory type: {memory_type}. "
            f"Available types: buffer, composite, summary, vector"
        )

create_reasoning_from_config

create_reasoning_from_config(
    config: dict[str, Any], *, knowledge_base: Any | None = None
) -> ReasoningStrategy

Create reasoning strategy from configuration.

Delegates to the :class:StrategyRegistry singleton. Built-in strategies (simple, react, wizard, grounded, hybrid) are registered automatically; 3rd-party strategies can be added via :func:register_strategy.

See each strategy class's from_config() for available config keys (e.g. ReActReasoning.from_config, WizardReasoning.from_config).

Parameters:

Name Type Description Default
config dict[str, Any]

Reasoning configuration dict. The strategy key selects the strategy type (default "simple"). All other keys are forwarded to the strategy's from_config() classmethod.

required
knowledge_base Any | None

Optional knowledge base instance forwarded as a kwarg to the strategy factory.

None

Returns:

Type Description
ReasoningStrategy

Configured reasoning strategy instance.

Raises:

Type Description
ValueError

If strategy type is not registered.

Example
# Simple reasoning
config = {"strategy": "simple"}
strategy = create_reasoning_from_config(config)

# Grounded reasoning (deterministic KB retrieval)
config = {
    "strategy": "grounded",
    "intent": {"mode": "extract", "num_queries": 3},
    "retrieval": {"top_k": 5},
}
strategy = create_reasoning_from_config(config, knowledge_base=kb)
Source code in packages/bots/src/dataknobs_bots/reasoning/__init__.py
def create_reasoning_from_config(
    config: dict[str, Any],
    *,
    knowledge_base: Any | None = None,
) -> ReasoningStrategy:
    """Create reasoning strategy from configuration.

    Delegates to the :class:`StrategyRegistry` singleton.  Built-in
    strategies (simple, react, wizard, grounded, hybrid) are registered
    automatically; 3rd-party strategies can be added via
    :func:`register_strategy`.

    See each strategy class's ``from_config()`` for available config
    keys (e.g. ``ReActReasoning.from_config``,
    ``WizardReasoning.from_config``).

    Args:
        config: Reasoning configuration dict.  The ``strategy`` key
            selects the strategy type (default ``"simple"``).  All
            other keys are forwarded to the strategy's
            ``from_config()`` classmethod.
        knowledge_base: Optional knowledge base instance forwarded
            as a kwarg to the strategy factory.

    Returns:
        Configured reasoning strategy instance.

    Raises:
        ValueError: If strategy type is not registered.

    Example:
        ```python
        # Simple reasoning
        config = {"strategy": "simple"}
        strategy = create_reasoning_from_config(config)

        # Grounded reasoning (deterministic KB retrieval)
        config = {
            "strategy": "grounded",
            "intent": {"mode": "extract", "num_queries": 3},
            "retrieval": {"top_k": 5},
        }
        strategy = create_reasoning_from_config(config, knowledge_base=kb)
        ```
    """
    return get_registry().create(config, knowledge_base=knowledge_base)

register_strategy

register_strategy(
    name: str, factory: StrategyFactory, *, override: bool = False
) -> None

Register a custom reasoning strategy.

Parameters:

Name Type Description Default
name str

Strategy name (used in reasoning.strategy config).

required
factory StrategyFactory

ReasoningStrategy subclass or factory callable.

required
override bool

Replace existing registration if True.

False

Example::

from dataknobs_bots.reasoning.registry import register_strategy

class MyStrategy(ReasoningStrategy):
    ...

register_strategy("my_strategy", MyStrategy)
Source code in packages/bots/src/dataknobs_bots/reasoning/registry.py
def register_strategy(
    name: str,
    factory: StrategyFactory,
    *,
    override: bool = False,
) -> None:
    """Register a custom reasoning strategy.

    Args:
        name: Strategy name (used in ``reasoning.strategy`` config).
        factory: ``ReasoningStrategy`` subclass or factory callable.
        override: Replace existing registration if ``True``.

    Example::

        from dataknobs_bots.reasoning.registry import register_strategy

        class MyStrategy(ReasoningStrategy):
            ...

        register_strategy("my_strategy", MyStrategy)
    """
    _registry.register(name, factory, override=override)

inject_providers

inject_providers(
    bot: Any,
    main_provider: AsyncLLMProvider | None = None,
    extraction_provider: AsyncLLMProvider | None = None,
    *,
    extractor: Any | None = None,
    **role_providers: AsyncLLMProvider,
) -> None

Inject LLM providers into a DynaBot instance for testing.

For main_provider, directly replaces bot.llm (the "main" role is always served from this attribute, not the registry catalog).

For extraction_provider and **role_providers, updates both the registry catalog and the actual subsystem wiring via set_provider().

For extractor, calls strategy.set_extractor() to replace the reasoning strategy's extractor entirely. Use this to inject a ConfigurableExtractor (which is not an AsyncLLMProvider and cannot be wired through set_provider()).

Lifecycle note: bot.close() will close self.llm (the main provider) unconditionally — the caller should be aware that an injected main_provider will be closed when the bot is closed. For subsystem providers (memory embedding, extraction), ownership flags control whether close() acts on them.

If bot does not implement register_provider, catalog registration is skipped; only subsystem wiring via set_provider() is performed.

Parameters:

Name Type Description Default
bot Any

A DynaBot instance (or any object with llm and reasoning_strategy attributes).

required
main_provider AsyncLLMProvider | None

Provider to use for main LLM calls. If None, the existing provider is kept.

None
extraction_provider AsyncLLMProvider | None

Provider to use for schema extraction. If None, the existing provider is kept.

None
extractor Any | None

A ConfigurableExtractor (or compatible object) to replace the wizard's SchemaExtractor directly. Mutually exclusive with extraction_provider.

None
**role_providers AsyncLLMProvider

Additional providers keyed by role name (e.g. memory_embedding=echo_provider). Each provider is registered in the catalog AND wired into the owning subsystem via set_provider().

{}
Example
from dataknobs_llm import EchoProvider
from dataknobs_bots.testing import inject_providers

main = EchoProvider()
extraction = EchoProvider()
inject_providers(bot, main, extraction)
Source code in packages/bots/src/dataknobs_bots/testing.py
def inject_providers(
    bot: Any,
    main_provider: AsyncLLMProvider | None = None,
    extraction_provider: AsyncLLMProvider | None = None,
    *,
    extractor: Any | None = None,
    **role_providers: AsyncLLMProvider,
) -> None:
    """Inject LLM providers into a DynaBot instance for testing.

    For ``main_provider``, directly replaces ``bot.llm`` (the ``"main"``
    role is always served from this attribute, not the registry catalog).

    For ``extraction_provider`` and ``**role_providers``, updates both the
    registry catalog and the actual subsystem wiring via ``set_provider()``.

    For ``extractor``, calls ``strategy.set_extractor()`` to replace
    the reasoning strategy's extractor entirely.  Use this to inject a
    ``ConfigurableExtractor`` (which is not an ``AsyncLLMProvider`` and
    cannot be wired through ``set_provider()``).

    **Lifecycle note:** ``bot.close()`` will close ``self.llm`` (the main
    provider) unconditionally — the caller should be aware that an
    injected ``main_provider`` will be closed when the bot is closed.
    For subsystem providers (memory embedding, extraction), ownership
    flags control whether ``close()`` acts on them.

    If ``bot`` does not implement ``register_provider``, catalog
    registration is skipped; only subsystem wiring via ``set_provider()``
    is performed.

    Args:
        bot: A DynaBot instance (or any object with ``llm`` and
            ``reasoning_strategy`` attributes).
        main_provider: Provider to use for main LLM calls. If None,
            the existing provider is kept.
        extraction_provider: Provider to use for schema extraction.
            If None, the existing provider is kept.
        extractor: A ``ConfigurableExtractor`` (or compatible object)
            to replace the wizard's ``SchemaExtractor`` directly.
            Mutually exclusive with ``extraction_provider``.
        **role_providers: Additional providers keyed by role name
            (e.g. ``memory_embedding=echo_provider``).  Each provider
            is registered in the catalog AND wired into the owning
            subsystem via ``set_provider()``.

    Example:
        ```python
        from dataknobs_llm import EchoProvider
        from dataknobs_bots.testing import inject_providers

        main = EchoProvider()
        extraction = EchoProvider()
        inject_providers(bot, main, extraction)
        ```
    """
    if extractor is not None and extraction_provider is not None:
        raise ValueError(
            "extractor and extraction_provider are mutually exclusive"
        )

    if main_provider is not None:
        bot.llm = main_provider

    if extractor is not None:
        strategy = getattr(bot, "reasoning_strategy", None)
        if strategy is not None and hasattr(strategy, "set_extractor"):
            strategy.set_extractor(extractor)
        else:
            logger.warning(
                "Bot has no reasoning_strategy.set_extractor — "
                "skipping extractor injection"
            )

    if extraction_provider is not None:
        from dataknobs_bots.bot.base import PROVIDER_ROLE_EXTRACTION

        # Update the registry entry
        if hasattr(bot, "register_provider"):
            bot.register_provider(PROVIDER_ROLE_EXTRACTION, extraction_provider)

        # Also update the actual extractor so subsystem calls use it
        strategy = getattr(bot, "reasoning_strategy", None)
        if strategy is None:
            logger.warning(
                "Bot has no reasoning_strategy — skipping extraction provider injection"
            )
        elif hasattr(strategy, "set_provider"):
            strategy.set_provider(PROVIDER_ROLE_EXTRACTION, extraction_provider)
        else:
            # Fallback for strategies without set_provider (e.g. test stubs)
            extractor = getattr(strategy, "_extractor", None)
            if extractor is None:
                logger.warning(
                    "Reasoning strategy has no _extractor — "
                    "skipping extraction provider injection"
                )
            else:
                extractor.provider = extraction_provider
                if hasattr(extractor, "_owns_provider"):
                    extractor._owns_provider = False

    # Wire role-based providers into catalog AND subsystems
    for role, provider in role_providers.items():
        if hasattr(bot, "register_provider"):
            bot.register_provider(role, provider)

        # Wire into the actual subsystem that owns this role
        _wire_role_provider(bot, role, provider)