Cloud and Production¶
Moving from a single machine to managed infrastructure. The application services stay the same; the data stores become managed or clustered, and you harden security.
Topology¶
flowchart TB
lb["Ingress / Load balancer<br/>TLS"]
subgraph k8s["Kubernetes cluster"]
api["AI Engine API<br/>(Deployment, N replicas)"]
ce["Content Engine<br/>(Deployment or CronJob)"]
end
subgraph managed["Managed data services"]
qd[("Qdrant Cloud")]
rd[("Managed Redis")]
end
rs["RudderStack<br/>(event source)"]
llm["LLM endpoint"]
lb --> api
api --> qd & rd & llm
ce --> qd
rs --> api
classDef store fill:#EFEAE0,stroke:#A8895B,color:#423D34;
class qd,rd store;
1. Data stores: use managed services¶
| Store | Managed option | Notes |
|---|---|---|
| Qdrant | Qdrant Cloud | set QDRANT_API_URL to the cluster URL and QDRANT_API_KEY |
| Redis | any managed Redis | set REDIS_URL; enable persistence for the user model |
Keep COLLECTION_NAME and EMBEDDING_MODEL identical across the API and the Content Engine.
2. Build and push images¶
The API ships a Dockerfile (python:3.12-slim, Poetry, uvicorn on 8000). Build and push both
services to your registry:
docker build -t registry.example.com/memorise/ai-engine-api:1.0 ai-engine-api
docker push registry.example.com/memorise/ai-engine-api:1.0
# content-engine: add a Dockerfile that installs '.[api,qdrant,embed]' and runs `content-engine`
3. Run on Kubernetes¶
Deploy each service as a Deployment with a Service and an Ingress. Inject configuration from a
Secret and ConfigMap rather than a baked .env:
apiVersion: apps/v1
kind: Deployment
metadata: { name: ai-engine-api }
spec:
replicas: 3
selector: { matchLabels: { app: ai-engine-api } }
template:
metadata: { labels: { app: ai-engine-api } }
spec:
containers:
- name: api
image: registry.example.com/memorise/ai-engine-api:1.0
ports: [{ containerPort: 8000 }]
envFrom:
- secretRef: { name: ai-engine-secrets }
- configMapRef: { name: ai-engine-config }
readinessProbe:
httpGet: { path: /, port: 8000 }
livenessProbe:
httpGet: { path: /, port: 8000 }
Run the Content Engine as a long-lived Deployment if you use POST /ingest at runtime, or
as a CronJob if you only periodically sync from Omeka with POST /sync/omeka.
4. Behavioral capture¶
Point a RudderStack source at the recsys ingest webhook (POST /api/ingest). Events land
in the Redis buffer and are folded into the materialized user model on refresh. No separate
database is required.
5. Security hardening¶
- Lock down ingest: set
INGEST_API_KEYon the Content Engine and send it asX-API-Key. Do not expose/ingestpublicly. - Tighten CORS: the API default allows all origins, which is a demo posture. Restrict it to your front-end origins.
- Secrets: keep keys (
QDRANT_API_KEY,OPENROUTER_API_KEY,KEYCLOAK_*) in a secret manager, not in images. - TLS at the ingress for every public endpoint.
6. Scaling and reliability¶
- AI Engine API is stateless on the serve path (it reads the materialized user model from Redis), so scale it horizontally behind the load balancer.
- Latency targets (requirements WP5-NFR-01/02) call for sub-second suggestions and state updates; co-locate the API, Redis, and Qdrant in the same region.
- Redis holds the user model with a TTL; enable persistence, or accept that a cold model is rebuilt from the event buffer on the next refresh.
- Qdrant snapshots for backup.
7. Observability¶
- The API logs request duration, status, client IP, and user agent (Loguru). Ship logs to your aggregator.
- Health endpoints:
GET /(API) andGET /health(Content Engine) for readiness and liveness probes.