Skip to content

Getting Started with Dataknobs

This guide will help you get up and running with Dataknobs based on your use case.

Prerequisites

  • Python 3.12 or higher
  • pip or uv package manager

Installation

Choose Based on Your Use Case

# For building chatbots and AI agents
pip install dataknobs-bots dataknobs-llm

# Add data persistence if needed
pip install dataknobs-data dataknobs-config
# For ETL and workflow orchestration
pip install dataknobs-fsm dataknobs-data dataknobs-config
# Core utilities and data structures
pip install dataknobs-structures dataknobs-utils dataknobs-xization
# Install all packages
pip install dataknobs-config dataknobs-data dataknobs-fsm \
            dataknobs-llm dataknobs-bots \
            dataknobs-structures dataknobs-utils dataknobs-xization

Installing with uv

If you're using the uv package manager:

uv pip install dataknobs-bots dataknobs-llm  # Or any combination

Installing from Source

Clone the repository and install in development mode:

git clone https://github.com/kbs-labs/dataknobs.git
cd dataknobs
uv sync --all-packages

Quick Start by Use Case

Building AI Chatbots

Create an intelligent chatbot with memory and tools:

import asyncio
from dataknobs_bots import DynaBot, BotContext

async def main():
    # Configure bot from dictionary or YAML
    config = {
        "llm": {
            "provider": "openai",
            "model": "gpt-4",
            "temperature": 0.7
        },
        "conversation_storage": {"backend": "memory"},
        "memory": {
            "type": "buffer",
            "max_messages": 10
        },
        "prompts": {
            "support_assistant": "You are a helpful customer support assistant."
        },
        "system_prompt": {"name": "support_assistant"}
    }

    # Create bot
    bot = await DynaBot.from_config(config)

    # Create context for conversation
    context = BotContext(
        conversation_id="support-001",
        client_id="my-company",
        user_id="customer123"
    )

    # Chat with context retention
    response = await bot.chat("I need help with my order", context)
    print(response)

asyncio.run(main())

Learn more about Bots →

Processing Data with Workflows

Build robust ETL pipelines with finite state machines:

from dataknobs_fsm import SimpleFSM, DataHandlingMode

# Define a data processing pipeline
config = {
    "name": "user_import",
    "states": [
        {"name": "load", "is_start": True},
        {"name": "validate"},
        {"name": "transform"},
        {"name": "save", "is_end": True}
    ],
    "arcs": [
        {"from": "load", "to": "validate"},
        {"from": "validate", "to": "transform"},
        {"from": "transform", "to": "save"}
    ]
}

fsm = SimpleFSM(config, data_mode=DataHandlingMode.COPY)
result = fsm.process({"users": [{"name": "Alice", "age": 30}]})

Learn more about FSM →

Working with Multiple Data Backends

Use a unified interface across different storage systems:

from dataknobs_config import Config
from dataknobs_data import database_factory, Record, Query

# Load configuration (supports environment variables)
config = Config("config.yaml")  # or dict
config.register_factory("database", database_factory)

# Get database instance - backend determined by config
# Supports: Memory, File, PostgreSQL, Elasticsearch, S3
db = config.get_instance("databases", "primary")

# Unified API regardless of backend
record = Record({"name": "Alice", "email": "alice@example.com"})
record_id = db.create(record)

# Query with same API across all backends
results = db.search(Query().filter("name", "=", "Alice"))

Learn more about Data → | Learn more about Config →

Integrating LLMs

Manage prompts and multi-provider LLM access:

from dataknobs_llm import create_llm_provider, LLMMessage

# Create LLM provider
llm = create_llm_provider({
    "provider": "openai",
    "model": "gpt-4",
    "api_key": "your-key"
})

# Generate completion
messages = [
    LLMMessage(role="user", content="What's the capital of France?")
]
response = await llm.generate(messages)
print(response.content)

# Continue conversation
messages.append(LLMMessage(role="assistant", content=response.content))
messages.append(LLMMessage(role="user", content="What's its population?"))
response = await llm.generate(messages)  # Maintains context

Learn more about LLM →

Working with Data Structures

Use trees, documents, and utilities for common tasks:

from dataknobs_structures import Tree
from dataknobs_utils import json_utils
from dataknobs_xization import normalize

# Hierarchical data
tree = Tree("root")
chapter1 = tree.add_child("Chapter 1")
chapter1.add_child("Section 1.1")
chapter1.add_child("Section 1.2")

# Navigate tree
for node in tree.traverse():
    print(f"{'  ' * node.level}{node.value}")

# JSON utilities
data = {"users": {"alice": {"age": 30, "city": "Paris"}}}
age = json_utils.get_value(data, "users.alice.age")  # 30

# Text normalization
text = "  Hello   WORLD!!!  "
normalized = normalize.basic_normalization_fn(text)  # "hello world!"

Learn more about Structures → | Learn more about Utils →

Understanding the Package Ecosystem

Dataknobs packages are organized by capability:

Configuration & Data Layer (Foundation for other packages): - dataknobs-config: Environment-aware configuration management - dataknobs-data: Unified data access across multiple backends

AI & LLM Capabilities (Building intelligent applications): - dataknobs-llm: LLM integration with prompt management - dataknobs-bots: Pre-built AI agents with memory and tools

Workflow & Processing (Orchestrating complex operations): - dataknobs-fsm: Finite state machines for robust pipelines

Core Utilities (Building blocks): - dataknobs-structures: Trees, documents, record stores - dataknobs-utils: JSON, file operations, integrations - dataknobs-xization: Text normalization and tokenization - dataknobs-common: Shared base classes

Next Steps

Based on what you want to build:

For AI/ML Projects: 1. Start with Bots Quickstart for chatbots 2. Or LLM Quickstart for custom LLM integration 3. Add Data package for persistence 4. Use FSM for complex AI workflows

For Data Engineering: 1. Begin with FSM Quickstart 2. Add Data package for backend abstraction 3. Use Config for environment management 4. Explore Data Examples

For General Development: 1. Check Basic Usage Guide for structures and utilities 2. Explore Advanced Usage for patterns 3. Browse Examples for real-world use cases

Getting Help

Common Patterns

Combining Packages

Packages work seamlessly together:

# FSM + Data + Config: Robust data pipeline
from dataknobs_fsm import SimpleFSM
from dataknobs_data import database_factory
from dataknobs_config import Config

config = Config("pipeline.yaml")
config.register_factory("database", database_factory)

# FSM can access database through config
fsm = SimpleFSM(pipeline_config)
fsm.context["db"] = config.get_instance("databases", "primary")

# Bots + LLM + Data: Chatbot with persistence
from dataknobs_bots import BotRegistry
from dataknobs_data import MemoryDatabase

registry = BotRegistry()
db = MemoryDatabase()  # For conversation history

bot = registry.create_bot("assistant", {
    "llm": {"provider": "openai"},
    "memory": {"type": "buffer"}
})

Environment-Based Configuration

All packages support environment variables through Config:

# config.yaml
databases:
  primary:
    backend: ${DB_BACKEND:memory}  # Default to memory
    connection: ${DB_CONNECTION:}

llm:
  provider: ${LLM_PROVIDER:openai}
  api_key: ${OPENAI_API_KEY}
from dataknobs_config import Config

# Reads from environment or uses defaults
config = Config("config.yaml")

Troubleshooting

Import Errors

Use the new package names with underscores:

# ✅ Correct
from dataknobs_structures import Tree
from dataknobs_bots import BotRegistry

# ❌ Old style (deprecated)
from dataknobs.structures import Tree

Missing Dependencies

Some packages require additional dependencies:

# For PostgreSQL support
pip install psycopg2-binary

# For Elasticsearch
pip install elasticsearch

# For S3 support
pip install boto3

# For LLM providers
pip install openai anthropic

See the Installation Guide for complete dependency information.