Search Subsystem (legacy stack)¶

The ai_engine.search package is the serving stack in production today, what the FastAPI /api/search/* endpoints call. It is Qdrant-coupled (no port indirection) and recomputes user state from Postgres per request.

flowchart TD
    gs["GlobalSearch<br/><i>umbrella orchestrator</i>"]
    cs["CommonSearch<br/><i>point/vector fetch</i>"]
    vs["VectorSearch<br/><i>fastembed encode + search</i>"]
    geo["GeoSearch<br/><i>GeoRadius + hybrid</i>"]
    ur["UserRecommender<br/><i>Qdrant recommend</i>"]
    pb["ProjectionBuilder<br/><i>events → reading projection</i>"]
    us["UserState<br/><i>reading time · success</i>"]
    db["DB_Interface<br/><i>Postgres events</i>"]
    qd[("Qdrant<br/>omeka-items")]

    gs --> cs & vs & geo & ur
    ur --> pb --> us
    pb --> db
    pb --> cs
    cs & vs & geo & ur --> qd

    classDef store fill:#EFEAE0,stroke:#A8895B,color:#423D34;
    class qd store;

`GlobalSearch`, the umbrella¶

One entry that routes by inputs:

Call	Routes to	Returns
`search(text=...)`	`VectorSearch.search`	semantic top-k
`search(lat, lon, radius)`	`GeoSearch.search`	geo radius scroll
`search(text, lat, lon, radius)`	`GeoSearch.hybrid_search`	vector constrained by geo
`similar(item_id)`	`CommonSearch.get_vector` → `VectorSearch.search(vector=...)`	nearest items
`random()`	`CommonSearch.get_random_item`	one random item

All return a uniform SearchResult (items + query params + next_offset), built via hit_to_item which standardizes Qdrant ScoredPoint/Record into a dict with score and an optional highlighted snippet.

User recommendation (legacy)¶

UserRecommender.recommend_for_user is the predecessor to the recsys redesign:

flowchart LR
    uid["user_id"] --> sig["_get_user_signals"]
    sig --> proj["ProjectionBuilder.get_user_projection"]
    proj --> dwell["dwell vs estimated reading time"]
    dwell --> split{"success?"}
    split -->|yes| pos["positive item_ids"]
    split -->|no| neg["negative item_ids"]
    pos & neg --> qd["Qdrant recommend<br/>AVERAGE_VECTOR strategy"]
    qd --> res["SearchResult"]

Engagement is binary here (dwell >= estimated_reading_time via UserState.is_interaction_successful), and recall is Qdrant's built-in recommend on the mean positive/negative vectors. The recsys redesign replaces this with continuous engagement, expert-tag fusion, and MMR diversity, see Recsys data flow.

Reading-time estimate¶

Shared by both stacks (UserState.compute_reading_time):

reading_time = word_count / READING_SPEED_WPS  (+ IMG_EXTRA_FIXED_TIME if has_image)

Defaults: READING_SPEED_WPS = 4.2 (~250 wpm), IMG_EXTRA_FIXED_TIME = 1.3 s.

Ingestion¶

ingest_content.py is the offline Omeka → Qdrant pipeline (load Parquet → filter by id / word count → SentenceTransformer encode → upsert PointStruct → create datetime + geo payload indices). Most of it is scaffolding; qdrant_index.py is the minimal index-creation script.

Search Subsystem (legacy stack)¶

GlobalSearch, the umbrella¶

User recommendation (legacy)¶

Reading-time estimate¶

Ingestion¶

`GlobalSearch`, the umbrella¶