Diversity¶

Why pure relevance is not enough, and how the engine deliberately varies what it shows.

If the system only shows items most similar to a visitor's query or profile, the top set can overemphasise a narrow slice of the corpus: testimonies from the same camp, repeated accounts of the same deportation, several items of the same media type. That maximises short-term relevance but risks a skewed, incomplete view of the historical record. This is confirmation bias.

For a cultural heritage setting this is undesirable. Visitors should meet content aligned with their interests and also encounter different perspectives, modalities, and historical contexts that broaden understanding. Left unchecked, confirmation bias would limit discovery and reduce the educational and commemorative value of the platform. So the engine trades a little relevance for variety on purpose.

Diversity is a re-ranking step that runs after hybrid scoring. Once each candidate has a relevance score, many top items may still be near-duplicates. Maximal Marginal Relevance (MMR) (or submodular re-ranking) refines the list by balancing relevance with novelty:

\[ \mathrm{MMR} = \arg\max_{d \in \text{candidates}} \big[\, \lambda \cdot \mathrm{Relevance}(d) - (1-\lambda)\cdot \mathrm{Similarity}(d, d') \,\big] \]

Each new item is chosen to be relevant to the visitor yet dissimilar to the items already selected. \(\lambda\) controls the balance: higher favours raw relevance, lower favours spread.

flowchart LR
    scored["scored candidates"] --> pick["pick top-1 by relevance"]
    pick --> step{"selected < limit?"}
    step -->|yes| nxt["pick argmax<br/>λ·relevance − (1−λ)·max sim to selected"]
    nxt --> step
    step -->|no| out["diverse ranked list"]

This step matters in MEMORISE because the corpus is large, multimodal, and thematically dense; without it, recommendations become monotonous or reinforce narrow patterns of attention. The implemented mmr_rerank (default \(\lambda = 0.7\)) is documented in Ranking.

Math rendering

MMR picks the next item that maximises lambda times its relevance minus one-minus-lambda times its similarity to already-selected items.