Ranking¶
recsys/ranking/ scores candidates and fuses them into a diverse, explainable list. Pure
functions throughout. The hard contract: every scorer returns [0,1], so the weighted
sum is valid without rescaling.
scorers.py¶
| Function | Signature | Range | Formula |
|---|---|---|---|
cosine |
(a, b) -> float |
[-1,1] |
standard; 0 if either side missing/zero |
score_semantic |
(signals, candidate_vector) -> float |
[0,1] |
(cosine(taste_vec, cand_vec)+1)/2 |
score_tag |
(signals, content) -> float |
[0,1] |
Σ aff[l]·w[l] / Σ aff[l] |
score_tag is the graded counterpart to Qdrant's coarse tag recall: affinity-weighted
overlap of the user's tag_affinity with the candidate's tag weights, normalized by total
affinity. Geo and popularity scorers are planned/off by default.
fusion.py¶
Weighted fusion, explainable¶
weighted_fuse(per_scorer: dict[str, float], weights: FusionWeights)
-> tuple[float, dict[str, float]]
Returns the fused score and a {scorer: weight·score} breakdown that rides along in each
ScoredCandidate, every recommendation can answer "why this item?".
MMR rerank, diversity¶
Greedy Maximal Marginal Relevance:
flowchart LR
sc["scored candidates<br/>(fused score each)"] --> init["pick top-1 by fused"]
init --> step{"selected < limit?"}
step -->|yes| pick["pick argmax<br/>λ·relevance − (1−λ)·max sim to selected"]
pick --> step
step -->|no| out["ranked list (≤ limit)"]
λ = mmr_lambda (default 0.7) trades relevance vs diversity, higher λ favors raw score,
lower λ pushes variety. Avoids returning ten near-duplicate stories on the same theme.
Property tested
MMR keeps the top-1 relevant item; raising λ → more relevance, lowering λ → more spread;
output length ≤ limit.
Putting it together¶
flowchart TD
cand["Candidates (semantic ∪ tag)"] --> ss["score_semantic"]
cand --> st["score_tag"]
ss & st --> wf["weighted_fuse → fused + breakdown"]
wf --> mmr["mmr_rerank(λ, limit)"]
mmr --> rec["Recommendation.items"]
This is the body of Recommender.recommend_for_signals, see
Orchestration.