Skip to content

Cloud and Production

Moving from a single machine to managed infrastructure. The application services stay the same; the data stores become managed or clustered, and you harden security.

Topology

flowchart TB
    lb["Ingress / Load balancer<br/>TLS"]
    subgraph k8s["Kubernetes cluster"]
        api["AI Engine API<br/>(Deployment, N replicas)"]
        ce["Content Engine<br/>(Deployment or CronJob)"]
    end
    subgraph managed["Managed data services"]
        qd[("Qdrant Cloud")]
        rd[("Managed Redis")]
    end
    rs["RudderStack<br/>(event source)"]
    llm["LLM endpoint"]

    lb --> api
    api --> qd & rd & llm
    ce --> qd
    rs --> api

    classDef store fill:#EFEAE0,stroke:#A8895B,color:#423D34;
    class qd,rd store;

1. Data stores: use managed services

Store Managed option Notes
Qdrant Qdrant Cloud set QDRANT_API_URL to the cluster URL and QDRANT_API_KEY
Redis any managed Redis set REDIS_URL; enable persistence for the user model

Keep COLLECTION_NAME and EMBEDDING_MODEL identical across the API and the Content Engine.

2. Build and push images

The API ships a Dockerfile (python:3.12-slim, Poetry, uvicorn on 8000). Build and push both services to your registry:

docker build -t registry.example.com/memorise/ai-engine-api:1.0 ai-engine-api
docker push  registry.example.com/memorise/ai-engine-api:1.0
# content-engine: add a Dockerfile that installs '.[api,qdrant,embed]' and runs `content-engine`

3. Run on Kubernetes

Deploy each service as a Deployment with a Service and an Ingress. Inject configuration from a Secret and ConfigMap rather than a baked .env:

apiVersion: apps/v1
kind: Deployment
metadata: { name: ai-engine-api }
spec:
  replicas: 3
  selector: { matchLabels: { app: ai-engine-api } }
  template:
    metadata: { labels: { app: ai-engine-api } }
    spec:
      containers:
        - name: api
          image: registry.example.com/memorise/ai-engine-api:1.0
          ports: [{ containerPort: 8000 }]
          envFrom:
            - secretRef:  { name: ai-engine-secrets }
            - configMapRef: { name: ai-engine-config }
          readinessProbe:
            httpGet: { path: /, port: 8000 }
          livenessProbe:
            httpGet: { path: /, port: 8000 }

Run the Content Engine as a long-lived Deployment if you use POST /ingest at runtime, or as a CronJob if you only periodically sync from Omeka with POST /sync/omeka.

4. Behavioral capture

Point a RudderStack source at the recsys ingest webhook (POST /api/ingest). Events land in the Redis buffer and are folded into the materialized user model on refresh. No separate database is required.

5. Security hardening

  • Lock down ingest: set INGEST_API_KEY on the Content Engine and send it as X-API-Key. Do not expose /ingest publicly.
  • Tighten CORS: the API default allows all origins, which is a demo posture. Restrict it to your front-end origins.
  • Secrets: keep keys (QDRANT_API_KEY, OPENROUTER_API_KEY, KEYCLOAK_*) in a secret manager, not in images.
  • TLS at the ingress for every public endpoint.

6. Scaling and reliability

  • AI Engine API is stateless on the serve path (it reads the materialized user model from Redis), so scale it horizontally behind the load balancer.
  • Latency targets (requirements WP5-NFR-01/02) call for sub-second suggestions and state updates; co-locate the API, Redis, and Qdrant in the same region.
  • Redis holds the user model with a TTL; enable persistence, or accept that a cold model is rebuilt from the event buffer on the next refresh.
  • Qdrant snapshots for backup.

7. Observability

  • The API logs request duration, status, client IP, and user agent (Loguru). Ship logs to your aggregator.
  • Health endpoints: GET / (API) and GET /health (Content Engine) for readiness and liveness probes.