Self-hosted, memory-augmented AI chat that works with any LLM

(github.com)

1 points | by pstryder 4 hours ago

1 comments

pstryder 4 hours ago
# Show HN: Cathedral – Self-hosted AI chat with persistent memory, tools, and RAG
*Title options:* 1. Show HN: Cathedral – Self-hosted AI chat with persistent memory, 31 tools, and document RAG 2. Show HN: Cathedral – A local AI operating system with memory that survives sessions 3. Show HN: Cathedral – Self-hosted, memory-augmented AI chat that works with any LLM
---
*Post body:*
I wanted an AI setup where nothing gets lost between sessions and I control everything locally.
Cathedral is a self-hosted chat interface with automatic context injection from a persistent knowledge store. Relevant memories and documents are retrieved and injected into the prompt before the LLM sees your message — the agent doesn't need to "decide" to search memory. It just has the context. No tool calls required for recall.
The memory system extracts knowledge from conversations automatically — observations with confidence scores, concepts with relationships, synthesized patterns. High-signal memories stay hot, noise fades to cold storage. Nothing is overwritten, only superseded with lineage intact.
*What's in the box:*
- Persistent semantic memory across sessions (pgvector or FAISS) - Document library with automatic RAG injection - 31 provider-agnostic tools across 6 subsystems (memory, files, shell, web, documents, sub-agents) - Tool calling via JSON-in-text protocol — works with any LLM, no function-calling API required - Per-thread personalities with separate system prompts - Multi-modal: vision, image analysis, audio transcription - Shell execution and file management with security controls - Web search and page fetching (DuckDuckGo, Brave, SearXNG) - Sub-agent spawning for background tasks - AES-256-GCM encryption, session locking, path validation - 60+ slash commands - Full web UI — Jinja2 + vanilla JS + Tailwind, no framework dependencies
*Works with any LLM backend* — OpenRouter (40+ models), local GGUF models via llama.cpp, Claude CLI, or your own gateway. The memory layer sits above your choice of provider.
*Two database options:* PostgreSQL + pgvector for production, or SQLite + FAISS for zero-setup local use.
*Stack:* Python/FastAPI, SQLAlchemy 2.0 async, SSE streaming, vanilla JS frontend. No npm. No React. No build step.
*Designed for local/private deployment.* Cathedral gives you shell access, file operations, and tool execution — it's your workstation, not a public service. Run it on your machine or behind a VPN.
I built this because I run multiple AI agents and got tired of every session starting from zero. Cloud platforms own your context and lose it on their schedule. This keeps everything local, persistent, and under your control.
MIT licensed. Solo developer.
GitHub: https://github.com/PStryder/Cathedral