RAG Chatbot Example¶
A chatbot with knowledge base integration using Retrieval Augmented Generation (RAG).
Overview¶
This example demonstrates:
- Loading markdown documents into a knowledge base
- Vector search for relevant information
- Automatic context injection into prompts
- RAG-enhanced responses
Prerequisites¶
# Install Ollama: https://ollama.ai/
# Pull required models
ollama pull gemma3:1b # For chat
ollama pull nomic-embed-text # For embeddings
# Install dataknobs-bots with FAISS
pip install dataknobs-bots[faiss]
What is RAG?¶
Retrieval Augmented Generation enhances LLM responses by:
- Retrieval - Finding relevant documents from a knowledge base
- Augmentation - Adding retrieved context to the prompt
- Generation - LLM generates response with context
RAG Flow¶
graph LR
A[User Question] --> B[Embed Question]
B --> C[Vector Search]
C --> D[Retrieve Top K Docs]
D --> E[Inject into Prompt]
E --> F[LLM Response]
Configuration¶
Add a knowledge_base section:
config = {
"llm": {
"provider": "ollama",
"model": "gemma3:1b"
},
"conversation_storage": {
"backend": "memory"
},
"knowledge_base": {
"enabled": True,
"documents_path": "./my_docs", # Directory with markdown files
"vector_store": {
"backend": "faiss", # FAISS vector database
"dimension": 384 # Embedding dimension
},
"embedding_provider": "ollama",
"embedding_model": "nomic-embed-text",
"chunking": {
"max_chunk_size": 500 # Maximum chunk size
}
}
}
Complete Code¶
"""RAG (Retrieval Augmented Generation) chatbot example.
This example demonstrates:
- Knowledge base integration
- Document chunking and indexing
- Vector search and retrieval
- Context-aware responses using RAG
- Swapping storage backends (memory vs postgres)
Required Ollama models:
ollama pull gemma3:1b # For chat
ollama pull nomic-embed-text # For embeddings
Usage:
# With in-memory storage (default)
python examples/03_rag_chatbot.py
# With PostgreSQL storage
STORAGE_BACKEND=postgres python examples/03_rag_chatbot.py
"""
import asyncio
import os
from dataknobs_bots import BotContext, DynaBot
# Sample documents for the knowledge base
SAMPLE_DOCUMENTS = [
{
"id": "doc1",
"text": """
Python is a high-level, interpreted programming language known for its
simplicity and readability. Created by Guido van Rossum and first released
in 1991, Python emphasizes code readability with its use of significant
indentation. It supports multiple programming paradigms including
procedural, object-oriented, and functional programming.
""",
"metadata": {"source": "python_intro", "category": "programming"},
},
{
"id": "doc2",
"text": """
The DataKnobs ecosystem is a collection of Python packages designed for
building data-intensive applications. It includes modules for database
abstraction, LLM integration, configuration management, and state machines.
The ecosystem prioritizes modularity, type safety, and ease of use.
""",
"metadata": {"source": "dataknobs_intro", "category": "framework"},
},
{
"id": "doc3",
"text": """
Retrieval Augmented Generation (RAG) is a technique that combines information
retrieval with large language models. It works by first retrieving relevant
documents from a knowledge base, then using those documents as context for
generating responses. This approach helps reduce hallucinations and provides
more accurate, grounded answers.
""",
"metadata": {"source": "rag_explained", "category": "ai"},
},
{
"id": "doc4",
"text": """
Vector databases store data as high-dimensional vectors (embeddings) that
represent the semantic meaning of text. They enable similarity search,
allowing you to find documents that are semantically similar to a query
even if they don't share exact keywords. This is fundamental to modern
information retrieval systems.
""",
"metadata": {"source": "vector_db_intro", "category": "database"},
},
]
async def main():
"""Run a RAG chatbot conversation."""
print("=" * 60)
print("RAG Chatbot Example")
print("=" * 60)
print()
print("This example shows a chatbot with knowledge base integration.")
print("Required: ollama pull gemma3:1b nomic-embed-text")
print()
# Determine storage backend from environment
storage_backend = os.getenv("STORAGE_BACKEND", "memory")
print(f"Storage backend: {storage_backend}")
print()
# Base configuration
storage_config = {"backend": storage_backend}
# Add postgres-specific config if needed
if storage_backend == "postgres":
storage_config.update(
{
"host": os.getenv("POSTGRES_HOST", "localhost"),
"port": int(os.getenv("POSTGRES_PORT", "5432")),
"user": os.getenv("POSTGRES_USER", "postgres"),
"password": os.getenv("POSTGRES_PASSWORD", "postgres"),
"database": os.getenv("POSTGRES_DB", "dynabot_examples"),
"table": "rag_conversations",
"schema": "public",
}
)
# Configuration with knowledge base
config = {
"llm": {
"provider": "ollama",
"model": "gemma3:1b",
"temperature": 0.7,
"max_tokens": 500,
},
"conversation_storage": storage_config,
"knowledge_base": {
"enabled": True,
"provider": "vector", # Use vector-based knowledge base
"embedding_model": "nomic-embed-text",
"embedding_provider": "ollama",
"chunk_size": 200,
"top_k": 3, # Retrieve top 3 relevant documents
},
"prompts": {
"rag_assistant": "You are a knowledgeable AI assistant. "
"When answering questions, use the provided knowledge context to give "
"accurate, detailed responses. If the context doesn't contain relevant "
"information, say so honestly."
},
"system_prompt": {
"name": "rag_assistant",
},
}
print("Creating bot with knowledge base...")
bot = await DynaBot.from_config(config)
print("✓ Bot created successfully")
print("✓ Knowledge base enabled (vector-based)")
print()
# Index documents into knowledge base
print("Indexing sample documents...")
for doc in SAMPLE_DOCUMENTS:
await bot.knowledge_base.add_document(
doc_id=doc["id"],
text=doc["text"],
metadata=doc["metadata"],
)
print(f"✓ Indexed {len(SAMPLE_DOCUMENTS)} documents")
print()
# Create context for this conversation
context = BotContext(
conversation_id="rag-chat-001",
client_id="example-client",
user_id="demo-user",
)
# Questions that should be answered using the knowledge base
questions = [
"What is Python and who created it?",
"Can you explain what RAG is and how it works?",
"What is the DataKnobs ecosystem?",
"How do vector databases work?",
"What's the weather like today?", # Not in knowledge base
]
for i, question in enumerate(questions, 1):
print(f"[{i}] User: {question}")
response = await bot.chat(
message=question,
context=context,
)
print(f"[{i}] Bot: {response}")
print()
# Add a small delay between messages
if i < len(questions):
await asyncio.sleep(1)
print("=" * 60)
print("RAG demonstration complete!")
print()
print("Notice how the bot:")
print("- Used knowledge base to answer questions 1-4 accurately")
print("- Admitted when information wasn't in the knowledge base (question 5)")
print("- Retrieved relevant context before generating responses")
print()
print(f"Storage backend used: {storage_backend}")
if storage_backend == "memory":
print("To use PostgreSQL: STORAGE_BACKEND=postgres python examples/03_rag_chatbot.py")
if __name__ == "__main__":
asyncio.run(main())
Running the Example¶
How It Works¶
1. Document Preparation¶
Create markdown documents in your knowledge base:
# Product Features
Our product offers:
- Fast processing
- Easy integration
- Scalable architecture
# Pricing
- Basic: $10/month
- Pro: $50/month
- Enterprise: Contact sales
2. Document Loading¶
The bot automatically:
- Loads all markdown files from
documents_path - Chunks documents intelligently (respects headers)
- Creates embeddings for each chunk
- Stores embeddings in FAISS vector database
3. Query Time¶
When a user asks a question:
- Question is embedded using the same model
- Vector search finds most similar chunks
- Relevant chunks are injected into the prompt
- LLM generates response with context
Expected Output¶
Loading knowledge base...
Loading documents from: ./my_docs
Loaded 3 documents with 24 chunks
✓ Knowledge base ready
User: What are the product features?
Bot: According to our documentation, the product offers:
- Fast processing
- Easy integration
- Scalable architecture
User: How much does the Pro plan cost?
Bot: The Pro plan costs $50/month.
Vector Store Backends¶
FAISS (Recommended for Development)¶
Pros: Fast, local, no dependencies Cons: In-memory only, doesn't persist
Chroma¶
Pros: Persists to disk, easy to use Cons: Requires separate package
Pinecone (Production)¶
"vector_store": {
"backend": "pinecone",
"api_key": "your-api-key",
"environment": "us-west1-gcp",
"index_name": "my-index"
}
Pros: Managed, scalable, persistent Cons: Costs money, requires API key
Chunking Strategies¶
Default Chunking¶
Good for general content.
Larger Chunks¶
Better for dense technical content.
Smaller Chunks¶
Better for FAQ-style content.
Query Parameters¶
Control retrieval behavior:
results = await kb.query(
query="What are the features?",
k=5, # Top 5 chunks
min_similarity=0.7, # Minimum similarity threshold
filter_metadata={ # Filter by metadata
"category": "product"
}
)
Best Practices¶
Document Organization¶
docs/
├── product/
│ ├── features.md
│ └── pricing.md
├── support/
│ ├── faq.md
│ └── troubleshooting.md
└── api/
├── getting-started.md
└── reference.md
Document Format¶
Use clear markdown structure:
Metadata¶
Add metadata to documents for filtering:
await kb.load_markdown_document(
"docs/product.md",
metadata={"category": "product", "version": "1.0"}
)
Key Takeaways¶
- ✅ Grounded Responses - Bot answers from your documents
- ✅ Smart Chunking - Respects document structure
- ✅ Vector Search - Finds relevant content semantically
- ✅ Easy Setup - Just point to document directory
Common Issues¶
Embedding Model Not Found¶
Solution:
FAISS Not Installed¶
Solution:
No Documents Found¶
Solution: Ensure directory exists and contains .md files.
What's Next?¶
To add tool use and reasoning, see the ReAct Agent Example.
Related Examples¶
- Memory Chatbot - Add conversation memory
- ReAct Agent - Add tools and reasoning
- Custom Tools - Configuration-driven tools