Vector Store (Qdrant)¶

Where embeddings and metadata live together for fast, filtered, geospatial retrieval.

Once items are embedded, they need a home that supports two things at once: find by meaning (nearest vectors) and filter by fact (time, place, type, language). A plain vector store treats metadata as an afterthought. The MEMORISE corpus needs both, because a testimony is worth retrieving by what it is about and by where and when it happened.

The store also serves two query styles from one place: a visitor's typed query embedded and matched against content, and the visitor's own behavioral vector used as the query to recommend more of what they engaged with. Both can be combined with filters such as language, victim group, event category, or time period.

The engine uses Qdrant, an open-source vector search engine that stores high-dimensional embeddings alongside rich, flexible metadata. Key features for MEMORISE:

Unified semantic and metadata retrieval: explicit queries and behavioral vectors both run as similarity search, combined with metadata filters, in one framework.
Flexible schema: heterogeneous sources can carry different metadata. A survivor interview can hold recording date and languages; a 3D reconstruction can hold geo-coordinates and time ranges; both live in one collection with unified search.
Scalable search: fast ANN retrieval (HNSW) with structured filters.
Geospatial support: vectors annotated with latitude/longitude enable point-radius search (items within 20 km of Bergen-Belsen), polygon/geofence search (testimonies inside the Warsaw Ghetto boundary), and proximity ranking (order by closeness to a visitor's GPS location).

flowchart LR
    q["query vector or<br/>behavioral vector"] --> ann["ANN search (HNSW)"]
    filt["filters: time · place ·<br/>type · language"] --> ann
    ann --> hits["ranked candidates"]
    classDef st fill:#EFEAE0,stroke:#A8895B,color:#423D34;

Example combined query: "all testimonies mentioning Auschwitz within a 50 km radius of Krakow, recorded between 1945 and 1950, semantically similar to a visitor's last viewed content." Content is provided to Qdrant through the Omeka Content Management System.

Geospatial vector data around Bergen-Belsen in Qdrant — Qdrant with geospatial annotation: items around Bergen-Belsen, supporting point-radius, geofence, and proximity-ranked queries combined with semantic similarity.

Tags in payload. Expert tags ride in the payload as flat tag_labels (facet:label, a KEYWORD index) for recall via Filter(should=[...]), plus graded weights used by the pure-Python tag scorer. In code the store sits behind the ContentStore port (search_vector, search_tags, get, get_vectors); see Adapters. The tag-system/tags.json holds the working taxonomy.

Content Engine service and API

This collection is written by the content-engine service. It formats any source (Omeka, CSV, JSON) into the shared ContentDocument contract, embeds the text, and upserts to Qdrant; the AI Engine recommender only reads the result. The service runs on port 8002 with GET /health, POST /ingest, and POST /sync/omeka. Full detail in the interactive Content Manager API reference.