Deployment Overview¶
Everything you need to run the MEMORISE AI stack, on your own machine or on cloud infrastructure. This section covers the active components, their configuration, and step-by-step guides for local and production deployment.
What you deploy¶
flowchart TB
app["In-memorial app"]
rs["RudderStack<br/>(event source)"]
subgraph apps["Application services (you run these)"]
api["AI Engine API<br/>FastAPI · :8000"]
ce["Content Engine<br/>FastAPI · :8002"]
end
subgraph data["Data stores"]
qd[("Qdrant<br/>:6333")]
rd[("Redis<br/>:6379")]
end
llm["Ollama / OpenRouter<br/>(LLM)"]
omeka["Omeka CMS<br/>(content source)"]
app -->|events| rs -->|webhook| api
app -->|search / recommend| api
api --> qd & rd & llm
ce --> qd
ce --> omeka
classDef store fill:#EFEAE0,stroke:#A8895B,color:#423D34;
class qd,rd store;
Active components and ports¶
The stack runs on a Kubernetes cluster. Each component we operate has its own walkthrough (local Docker for dev plus Kubernetes manifests). Keycloak, the LLM endpoint, and Omeka are run by other teams; you only point at their URLs and keys in Configuration.
| Component | Role | Port | Deploy guide |
|---|---|---|---|
| AI Engine API | serving REST API (search, recommend, narrative) and the ingest webhook | 8000 | Walkthrough |
| Content Engine | ingest content into Qdrant | 8002 | Walkthrough |
| Qdrant | vector + tag store | 6333, 6334 | Walkthrough |
| Redis | event buffer + materialized user model | 6379 | Walkthrough |
| RudderStack | behavioral event source into POST /api/ingest |
n/a | Walkthrough |
| Ollama / OpenRouter | LLM for narratives | 11434 / API | run by another team; set its URL/key |
| Keycloak | auth for the LLM gateway | 8080 | run by another team |
| Omeka | content source of truth | n/a | already deployed by the content team |
User and event capture
Visitor behavior flows through the recsys path: RudderStack to POST /api/ingest to
the Redis event buffer, then into the materialized user model. There is no separate
relational database in the active stack.
Minimum vs full stack¶
- Minimum to serve recommendations: AI Engine API, Qdrant, Redis, plus a populated Qdrant collection (run the Content Engine once).
- Add narratives: an LLM endpoint (Ollama or OpenRouter), and Keycloak if you use the MEMORISE LLM gateway.
- Add behavioral capture: a RudderStack source pointed at
POST /api/ingest.
The recsys core degrades gracefully: if REDIS_URL or QDRANT_API_URL are unset it falls
back to in-memory fakes, so the API boots even before infra exists (useful for smoke tests).
Prerequisites¶
- Docker and Docker Compose (local), or a Kubernetes cluster (cloud).
- Python 3.12+ if running services outside containers.
- Network access to your chosen LLM endpoint.
Pick a path¶
- Configuration: every environment variable, grouped by service.
- Local with Docker Compose: one machine, from zero to a working API.
- Cloud and production: managed data stores, Kubernetes, security, and scaling.