Skip to content

Memory Chatbot Example

A chatbot with conversation memory for context-aware responses.

Overview

This example demonstrates:

  • Buffer memory for conversation context
  • Context-aware responses using message history
  • Configuration-based memory setup

Prerequisites

# Install Ollama: https://ollama.ai/

# Pull the required model
ollama pull gemma3:1b

# Install dataknobs-bots
pip install dataknobs-bots

What Changed from Simple Chatbot?

We added a memory section to the configuration:

config = {
    "llm": {
        "provider": "ollama",
        "model": "gemma3:1b"
    },
    "conversation_storage": {
        "backend": "memory"
    },
    "memory": {
        "type": "buffer",        # Buffer memory (sliding window)
        "max_messages": 10        # Keep last 10 messages
    }
}

How It Works

Buffer Memory

Buffer memory maintains a sliding window of recent messages:

  1. User sends message → Added to buffer
  2. Bot responds → Response added to buffer
  3. Buffer exceeds max_messages → Oldest messages removed
  4. Next message → Recent history included in context

Memory Flow

graph LR
    A[User Message] --> B[Add to Buffer]
    B --> C[Get Recent Context]
    C --> D[LLM with Context]
    D --> E[Bot Response]
    E --> B

Complete Code

02_chatbot_with_memory.py
"""Chatbot with memory example.

This example demonstrates:
- Buffer memory configuration
- Context retention across messages
- Memory limits and management
- How the bot remembers previous conversation

Required Ollama model:
    ollama pull gemma3:1b
"""

import asyncio

from dataknobs_bots import BotContext, DynaBot


async def main():
    """Run a chatbot with memory."""
    print("=" * 60)
    print("Chatbot with Memory Example")
    print("=" * 60)
    print()
    print("This example shows a chatbot that remembers context.")
    print("Required: ollama pull gemma3:1b")
    print()

    # Configuration with buffer memory
    config = {
        "llm": {
            "provider": "ollama",
            "model": "gemma3:1b",
            "temperature": 0.7,
            "max_tokens": 500,
        },
        "conversation_storage": {
            "backend": "memory",
        },
        "memory": {
            "type": "buffer",
            "max_messages": 10,  # Remember last 10 messages
        },
        "prompts": {
            "helpful_assistant": "You are a helpful AI assistant with excellent memory. "
            "You remember details from earlier in the conversation and can reference them."
        },
        "system_prompt": {
            "name": "helpful_assistant",
        },
    }

    print("Creating bot with buffer memory...")
    bot = await DynaBot.from_config(config)
    print("✓ Bot created successfully")
    print(f"✓ Memory: Buffer (max {config['memory']['max_messages']} messages)")
    print()

    # Create context for this conversation
    context = BotContext(
        conversation_id="memory-chat-001",
        client_id="example-client",
        user_id="demo-user",
    )

    # Conversation demonstrating memory
    messages = [
        "Hello! My name is Alice and I love reading science fiction.",
        "What's your favorite sci-fi book?",
        "Do you remember my name?",
        "What did I tell you I love to read?",
        "Can you recommend a sci-fi book for me based on what you know about my interests?",
    ]

    for i, user_message in enumerate(messages, 1):
        print(f"[{i}] User: {user_message}")

        response = await bot.chat(
            message=user_message,
            context=context,
        )

        print(f"[{i}] Bot: {response}")
        print()

        # Add a small delay between messages
        if i < len(messages):
            await asyncio.sleep(1)

    print("=" * 60)
    print("Conversation complete!")
    print()
    print("Memory demonstration:")
    print("- The bot remembered the user's name (Alice)")
    print("- The bot remembered the user's interest (science fiction)")
    print("- The bot used this context to make relevant recommendations")
    print()
    print(f"Memory buffer stores last {config['memory']['max_messages']} messages")


if __name__ == "__main__":
    asyncio.run(main())

Running the Example

cd packages/bots
python examples/02_chatbot_with_memory.py

Expected Output

The bot now remembers previous messages:

User: My name is Alice.
Bot: Hello Alice! Nice to meet you.

User: What's my name?
Bot: Your name is Alice.

User: What did I just tell you?
Bot: You told me your name is Alice.

Memory Types

Buffer Memory

Simple sliding window (used in this example):

"memory": {
    "type": "buffer",
    "max_messages": 10  # Last 10 messages
}

Pros: Fast, simple, predictable Cons: Limited context, doesn't prioritize important information

Summary Memory

LLM-based compression of older messages:

"memory": {
    "type": "summary",
    "recent_window": 10  # Keep last 10 messages verbatim
}

Pros: Very long effective context, preserves key points Cons: Loses exact wording of old messages

Uses the bot's LLM by default. For a dedicated summarization model:

"memory": {
    "type": "summary",
    "recent_window": 10,
    "llm": {
        "provider": "ollama",
        "model": "gemma3:1b"  # Lightweight model for summaries
    }
}

Vector Memory

Semantic search over conversation history:

"memory": {
    "type": "vector",
    "max_messages": 100,
    "top_k": 5,  # Retrieve 5 most relevant messages
    "embedding_provider": "ollama",
    "embedding_model": "nomic-embed-text"
}

Pros: Finds relevant messages regardless of recency Cons: Slower, requires embedding model

Tenant scoping: Use default_metadata and default_filter for multi-tenant isolation:

"memory": {
    "type": "vector",
    "backend": "pgvector",
    "dimension": 768,
    "embedding_provider": "ollama",
    "embedding_model": "nomic-embed-text",
    "default_metadata": {"user_id": "u123"},   # Tagged on writes
    "default_filter": {"user_id": "u123"},     # Scoped on reads
}

Composite Memory

Combine multiple strategies for best-of-both-worlds context:

"memory": {
    "type": "composite",
    "primary": 0,         # Index of primary strategy
    "strategies": [
        {
            "type": "summary",
            "recent_window": 10
        },
        {
            "type": "vector",
            "backend": "memory",
            "dimension": 384,
            "embedding_provider": "ollama",
            "embedding_model": "nomic-embed-text"
        }
    ]
}

All strategies receive every message. On read, primary results appear first, then deduplicated secondary results. If any strategy fails, the composite continues with the remaining ones.

Pros: Combines recent-context awareness with semantic recall Cons: Uses more resources (multiple stores, possible embedding calls)

Choosing max_messages

max_messages Use Case Token Usage
5-10 Short conversations Low
10-20 Standard conversations Medium
20-50 Long conversations High
50+ Document-length conversations Very High

Recommendation: Start with 10-20 for most use cases.

Key Takeaways

  1. Context Awareness - Bot remembers conversation history
  2. Easy Configuration - Just add memory section
  3. Sliding Window - Automatic management of context size
  4. Token Efficiency - Only recent messages included

Customization

Longer Memory

"memory": {
    "type": "buffer",
    "max_messages": 20  # Remember more messages
}

Summary Memory

"memory": {
    "type": "summary",
    "recent_window": 10
}

Semantic Memory

"memory": {
    "type": "vector",
    "max_messages": 100,
    "embedding_provider": "ollama",
    "embedding_model": "nomic-embed-text"
}

Composite Memory (Summary + Vector)

"memory": {
    "type": "composite",
    "strategies": [
        {"type": "summary", "recent_window": 10},
        {
            "type": "vector",
            "backend": "memory",
            "dimension": 384,
            "embedding_provider": "ollama",
            "embedding_model": "nomic-embed-text"
        }
    ]
}

What's Next?

To add knowledge retrieval, see the RAG Chatbot Example.