Cloud and Production¶

Moving from a single machine to managed infrastructure. The application services stay the same; the data stores become managed or clustered, and you harden security.

Topology¶

flowchart TB
    lb["Ingress / Load balancer<br/>TLS"]
    subgraph k8s["Kubernetes cluster"]
        api["AI Engine API<br/>(Deployment, N replicas)"]
        ce["Content Engine<br/>(Deployment or CronJob)"]
    end
    subgraph managed["Managed data services"]
        qd[("Qdrant Cloud")]
        rd[("Managed Redis")]
    end
    rs["RudderStack<br/>(event source)"]
    llm["LLM endpoint"]

    lb --> api
    api --> qd & rd & llm
    ce --> qd
    rs --> api

    classDef store fill:#EFEAE0,stroke:#A8895B,color:#423D34;
    class qd,rd store;

1. Data stores: use managed services¶

Store	Managed option	Notes
Qdrant	Qdrant Cloud	set `QDRANT_API_URL` to the cluster URL and `QDRANT_API_KEY`
Redis	any managed Redis	set `REDIS_URL`; enable persistence for the user model

Keep COLLECTION_NAME and EMBEDDING_MODEL identical across the API and the Content Engine.

2. Build and push images¶

The API ships a Dockerfile (python:3.12-slim, Poetry, uvicorn on 8000). Build and push both services to your registry:

docker build -t registry.example.com/memorise/ai-engine-api:1.0 ai-engine-api
docker push  registry.example.com/memorise/ai-engine-api:1.0
# content-engine: add a Dockerfile that installs '.[api,qdrant,embed]' and runs `content-engine`

3. Run on Kubernetes¶

Deploy each service as a Deployment with a Service and an Ingress. Inject configuration from a Secret and ConfigMap rather than a baked .env:

apiVersion: apps/v1
kind: Deployment
metadata: { name: ai-engine-api }
spec:
  replicas: 3
  selector: { matchLabels: { app: ai-engine-api } }
  template:
    metadata: { labels: { app: ai-engine-api } }
    spec:
      containers:
        - name: api
          image: registry.example.com/memorise/ai-engine-api:1.0
          ports: [{ containerPort: 8000 }]
          envFrom:
            - secretRef:  { name: ai-engine-secrets }
            - configMapRef: { name: ai-engine-config }
          readinessProbe:
            httpGet: { path: /, port: 8000 }
          livenessProbe:
            httpGet: { path: /, port: 8000 }

Run the Content Engine as a long-lived Deployment if you use POST /ingest at runtime, or as a CronJob if you only periodically sync from Omeka with POST /sync/omeka.

4. Behavioral capture¶

Point a RudderStack source at the recsys ingest webhook (POST /api/ingest). Events land in the Redis buffer and are folded into the materialized user model on refresh. No separate database is required.

5. Security hardening¶

Lock down ingest: set INGEST_API_KEY on the Content Engine and send it as X-API-Key. Do not expose /ingest publicly.
Tighten CORS: the API default allows all origins, which is a demo posture. Restrict it to your front-end origins.
Secrets: keep keys (QDRANT_API_KEY, OPENROUTER_API_KEY, KEYCLOAK_*) in a secret manager, not in images.
TLS at the ingress for every public endpoint.

6. Scaling and reliability¶

AI Engine API is stateless on the serve path (it reads the materialized user model from Redis), so scale it horizontally behind the load balancer.
Latency targets (requirements WP5-NFR-01/02) call for sub-second suggestions and state updates; co-locate the API, Redis, and Qdrant in the same region.
Redis holds the user model with a TTL; enable persistence, or accept that a cold model is rebuilt from the event buffer on the next refresh.
Qdrant snapshots for backup.

7. Observability¶

The API logs request duration, status, client IP, and user agent (Loguru). Ship logs to your aggregator.
Health endpoints: GET / (API) and GET /health (Content Engine) for readiness and liveness probes.