Memory Chatbot Example¶
A chatbot with conversation memory for context-aware responses.
Overview¶
This example demonstrates:
- Buffer memory for conversation context
- Context-aware responses using message history
- Configuration-based memory setup
Prerequisites¶
# Install Ollama: https://ollama.ai/
# Pull the required model
ollama pull gemma3:1b
# Install dataknobs-bots
pip install dataknobs-bots
What Changed from Simple Chatbot?¶
We added a memory section to the configuration:
config = {
"llm": {
"provider": "ollama",
"model": "gemma3:1b"
},
"conversation_storage": {
"backend": "memory"
},
"memory": {
"type": "buffer", # Buffer memory (sliding window)
"max_messages": 10 # Keep last 10 messages
}
}
How It Works¶
Buffer Memory¶
Buffer memory maintains a sliding window of recent messages:
- User sends message → Added to buffer
- Bot responds → Response added to buffer
- Buffer exceeds max_messages → Oldest messages removed
- Next message → Recent history included in context
Memory Flow¶
graph LR
A[User Message] --> B[Add to Buffer]
B --> C[Get Recent Context]
C --> D[LLM with Context]
D --> E[Bot Response]
E --> B
Complete Code¶
"""Chatbot with memory example.
This example demonstrates:
- Buffer memory configuration
- Context retention across messages
- Memory limits and management
- How the bot remembers previous conversation
Required Ollama model:
ollama pull gemma3:1b
"""
import asyncio
from dataknobs_bots import BotContext, DynaBot
async def main():
"""Run a chatbot with memory."""
print("=" * 60)
print("Chatbot with Memory Example")
print("=" * 60)
print()
print("This example shows a chatbot that remembers context.")
print("Required: ollama pull gemma3:1b")
print()
# Configuration with buffer memory
config = {
"llm": {
"provider": "ollama",
"model": "gemma3:1b",
"temperature": 0.7,
"max_tokens": 500,
},
"conversation_storage": {
"backend": "memory",
},
"memory": {
"type": "buffer",
"max_messages": 10, # Remember last 10 messages
},
"prompts": {
"helpful_assistant": "You are a helpful AI assistant with excellent memory. "
"You remember details from earlier in the conversation and can reference them."
},
"system_prompt": {
"name": "helpful_assistant",
},
}
print("Creating bot with buffer memory...")
bot = await DynaBot.from_config(config)
print("✓ Bot created successfully")
print(f"✓ Memory: Buffer (max {config['memory']['max_messages']} messages)")
print()
# Create context for this conversation
context = BotContext(
conversation_id="memory-chat-001",
client_id="example-client",
user_id="demo-user",
)
# Conversation demonstrating memory
messages = [
"Hello! My name is Alice and I love reading science fiction.",
"What's your favorite sci-fi book?",
"Do you remember my name?",
"What did I tell you I love to read?",
"Can you recommend a sci-fi book for me based on what you know about my interests?",
]
for i, user_message in enumerate(messages, 1):
print(f"[{i}] User: {user_message}")
response = await bot.chat(
message=user_message,
context=context,
)
print(f"[{i}] Bot: {response}")
print()
# Add a small delay between messages
if i < len(messages):
await asyncio.sleep(1)
print("=" * 60)
print("Conversation complete!")
print()
print("Memory demonstration:")
print("- The bot remembered the user's name (Alice)")
print("- The bot remembered the user's interest (science fiction)")
print("- The bot used this context to make relevant recommendations")
print()
print(f"Memory buffer stores last {config['memory']['max_messages']} messages")
if __name__ == "__main__":
asyncio.run(main())
Running the Example¶
Expected Output¶
The bot now remembers previous messages:
User: My name is Alice.
Bot: Hello Alice! Nice to meet you.
User: What's my name?
Bot: Your name is Alice.
User: What did I just tell you?
Bot: You told me your name is Alice.
Memory Types¶
Buffer Memory¶
Simple sliding window (used in this example):
Pros: Fast, simple, predictable Cons: Limited context, doesn't prioritize important information
Summary Memory¶
LLM-based compression of older messages:
Pros: Very long effective context, preserves key points Cons: Loses exact wording of old messages
Uses the bot's LLM by default. For a dedicated summarization model:
"memory": {
"type": "summary",
"recent_window": 10,
"llm": {
"provider": "ollama",
"model": "gemma3:1b" # Lightweight model for summaries
}
}
Vector Memory¶
Semantic search over conversation history:
"memory": {
"type": "vector",
"max_messages": 100,
"top_k": 5, # Retrieve 5 most relevant messages
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text"
}
Pros: Finds relevant messages regardless of recency Cons: Slower, requires embedding model
Tenant scoping: Use default_metadata and default_filter for multi-tenant isolation:
"memory": {
"type": "vector",
"backend": "pgvector",
"dimension": 768,
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text",
"default_metadata": {"user_id": "u123"}, # Tagged on writes
"default_filter": {"user_id": "u123"}, # Scoped on reads
}
Composite Memory¶
Combine multiple strategies for best-of-both-worlds context:
"memory": {
"type": "composite",
"primary": 0, # Index of primary strategy
"strategies": [
{
"type": "summary",
"recent_window": 10
},
{
"type": "vector",
"backend": "memory",
"dimension": 384,
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text"
}
]
}
All strategies receive every message. On read, primary results appear first, then deduplicated secondary results. If any strategy fails, the composite continues with the remaining ones.
Pros: Combines recent-context awareness with semantic recall Cons: Uses more resources (multiple stores, possible embedding calls)
Choosing max_messages¶
| max_messages | Use Case | Token Usage |
|---|---|---|
| 5-10 | Short conversations | Low |
| 10-20 | Standard conversations | Medium |
| 20-50 | Long conversations | High |
| 50+ | Document-length conversations | Very High |
Recommendation: Start with 10-20 for most use cases.
Key Takeaways¶
- ✅ Context Awareness - Bot remembers conversation history
- ✅ Easy Configuration - Just add
memorysection - ✅ Sliding Window - Automatic management of context size
- ✅ Token Efficiency - Only recent messages included
Customization¶
Longer Memory¶
Summary Memory¶
Semantic Memory¶
"memory": {
"type": "vector",
"max_messages": 100,
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text"
}
Composite Memory (Summary + Vector)¶
"memory": {
"type": "composite",
"strategies": [
{"type": "summary", "recent_window": 10},
{
"type": "vector",
"backend": "memory",
"dimension": 384,
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text"
}
]
}
What's Next?¶
To add knowledge retrieval, see the RAG Chatbot Example.
Related Examples¶
- Simple Chatbot - Basic bot without memory
- RAG Chatbot - Add knowledge base
- Multi-Tenant Bot - Multiple clients