Skip to content

Dataknobs Documentation

Welcome to Dataknobs - simple, standardized tools for working productively with knowledge and data.

  • Getting Started


    Get up and running with Dataknobs in minutes

    Quick Start

  • Modular Packages


    Explore our modular architecture with specialized packages

    Package Overview

  • API Reference


    Complete API documentation with examples

    API Docs

  • User Guide


    Learn best practices and advanced usage patterns

    User Guide

What is Dataknobs?

Dataknobs provides simple, standardized implementations and interfaces for data structures, tools, and processes that enable effective and productive work with knowledge. Whether you're building AI applications, processing large datasets, orchestrating complex workflows, or creating intelligent chatbots, Dataknobs gives you the building blocks to work efficiently and responsibly.

Core Capabilities:

  • Configuration Management: Flexible, environment-aware configuration with factory patterns
  • Data Abstraction: Unified interface across Memory, File, PostgreSQL, Elasticsearch, and S3 backends
  • Workflow Orchestration: Finite State Machines for building robust data processing pipelines
  • LLM Integration: Prompt management, conversations, versioning, and tool calling
  • AI Agents: Configuration-driven chatbots with memory, RAG, and reasoning capabilities
  • Data Structures: Trees, documents, and record stores for organizing information
  • Text Processing: Tokenization, normalization, and text analysis
  • Utilities: JSON processing, file handling, and integration tools

Our Mission:

Dataknobs is open-source because we believe in democratizing access to data through useful tools that can be employed toward productive ends. We're committed to promoting responsible and ethical use of technology and AI by engineering safeguards into processes that work with data of all quantities and sizes.

Package Overview

Package Description Version
dataknobs-bots Configuration-driven AI agents with RAG, memory, and reasoning strategies 0.6.9
dataknobs-common Foundation library with exceptions, registries, serialization, and event bus 1.3.8
dataknobs-config Modular configuration system with environment variable overrides and factories 0.3.8
dataknobs-data Unified data abstraction layer with multiple backends 0.4.13
dataknobs-fsm Finite State Machine framework for workflows with data modes and resource management 0.1.14
dataknobs-llm Unified LLM abstraction with prompt management and conversations 0.5.5
dataknobs-structures Core data structures for AI knowledge bases and document processing 1.0.5
dataknobs-utils Utilities for file I/O, JSON processing, HTTP requests, and integrations 1.2.5
dataknobs-xization Text normalization, tokenization, annotation, and markdown chunking library 1.3.0
dataknobs Legacy compatibility package (deprecated) 0.1.1

Quick Installation

# Configuration and data abstraction
pip install dataknobs-config dataknobs-data

# AI and LLM capabilities
pip install dataknobs-llm dataknobs-bots

# Workflow orchestration
pip install dataknobs-fsm
# Data structures and utilities
pip install dataknobs-structures dataknobs-utils dataknobs-xization
# Install all packages
pip install dataknobs-config dataknobs-data dataknobs-fsm \
            dataknobs-llm dataknobs-bots \
            dataknobs-structures dataknobs-utils dataknobs-xization

Quick Examples

Configuration-Driven Database

from dataknobs_config import Config
from dataknobs_data import database_factory, Record, Query

# Load configuration with environment variables
config = Config("config.yaml")
config.register_factory("database", database_factory)

# Create database from config - supports PostgreSQL, Elasticsearch, S3, etc.
db = config.get_instance("databases", "primary")

# Unified API across all backends
record = Record({"name": "Alice", "role": "engineer"})
record_id = db.create(record)
results = db.search(Query().filter("role", "=", "engineer"))

AI Chatbot with Memory

import asyncio
from dataknobs_bots import DynaBot, BotContext

async def main():
    # Configure bot with memory and tools (from YAML/dict)
    bot_config = {
        "llm": {"provider": "openai", "model": "gpt-4"},
        "conversation_storage": {"backend": "memory"},
        "memory": {"type": "buffer", "max_messages": 10}
    }

    # Create bot
    bot = await DynaBot.from_config(bot_config)

    # Create context for conversation
    context = BotContext(
        conversation_id="conv-001",
        client_id="my-app",
        user_id="user123"
    )

    # Multi-turn conversation with context
    response1 = await bot.chat("What's the weather in Paris?", context)
    response2 = await bot.chat("How about tomorrow?", context)  # Remembers context

asyncio.run(main())

Data Processing Workflow

from dataknobs_fsm import SimpleFSM, DataHandlingMode

# Define workflow with inline transformations
pipeline_config = {
    "name": "etl_pipeline",
    "states": [
        {"name": "extract", "is_start": True},
        {"name": "transform"},
        {"name": "load", "is_end": True}
    ],
    "arcs": [
        {
            "from": "extract",
            "to": "transform",
            "transform": {
                "type": "inline",
                "code": "lambda data, ctx: {'records': data['raw_data']}"
            }
        },
        {
            "from": "transform",
            "to": "load",
            "transform": {
                "type": "inline",
                "code": "lambda data, ctx: [r.upper() for r in data['records']]"
            }
        }
    ]
}

fsm = SimpleFSM(pipeline_config, data_mode=DataHandlingMode.COPY)
result = fsm.process({"raw_data": ["item1", "item2"]})

Simple Data Structures

from dataknobs_structures import Tree
from dataknobs_utils import json_utils
from dataknobs_xization import normalize

# Hierarchical data organization
tree = Tree("root")
child = tree.add_child("child1")
child.add_child("grandchild")

# JSON navigation
data = {"users": {"alice": {"age": 30}}}
age = json_utils.get_value(data, "users.alice.age")

# Text normalization
text = "Hello WORLD!"
normalized = normalize.basic_normalization_fn(text)  # "hello world!"

Use Cases

For Data Engineers: Build robust ETL pipelines with FSM, unified data access with the Data package, and flexible configuration management.

For AI/ML Developers: Integrate LLMs with prompt management, create intelligent chatbots with memory and RAG, and orchestrate complex AI workflows.

For Application Developers: Use simple data structures, text processing utilities, and standardized interfaces to build applications faster.

For Researchers: Access democratized tools for working with knowledge bases, experiment with different storage backends, and build reproducible workflows.

Migration from Legacy

If you're using the old dataknobs package, see our Migration Guide for upgrading to the new modular structure.

Contributing

We welcome contributions! Dataknobs is open-source to democratize access to productive data tools. See our Contributing Guide for details.

License

Dataknobs is released under the MIT License. See License for details.