Skip to content

Signals

recsys/signals/ turns raw events into the user model. Both modules are pure, no IO, fully unit/property tested.

engagement.py, continuous strength

Three public functions, no class:

Function Signature (abridged) Does
estimate_reading_time (word_count, has_image, cfg) -> float Seconds to consume content.
engagement_strength (*, dwell_seconds, est_reading_time, end_reason, visits, survey_rating, cfg) -> float Continuous blend in ~[-1,1].
classify_outcome (strength, cfg) -> Outcome Threshold → positive / negative / neutral.
dwell_ratio = min(dwell / est_reading_time, dwell_cap) / dwell_cap            # [0,1]
completion  = {next_button:1.0, link:0.6, close_button:0.0, abandon:-0.5}
revisit     = 1 - exp(-visits / 2)
survey      = (rating - 3) / 2                                                # 1..5 → [-1,1]
strength    = wd·dwell_ratio + wc·completion + wr·revisit + ws·survey

Weights wd, wc, wr, ws come from RecConfig.engagement. This replaces the legacy binary dwell >= estimate with a graded signal, partial reads, abandons, and revisits all move the needle.

Property tested

Dwell monotonicity (more dwell ⇒ not-lower strength), abandon ⇒ negative contribution, survey extremes map to ±1.

signal_builder.py, fold into UserSignals

flowchart TD
    ev["Sequence[InteractionEvent]"] --> agg["aggregate_views()"]
    agg --> va["dict[content_id → ViewAggregate]<br/>(dwell paired, visits, end_reason,<br/>last_ts, survey_rating)"]
    va --> loop["per content"]
    loop --> est["estimate_reading_time"]
    loop --> str["engagement_strength"]
    str --> cls["classify_outcome"]
    str --> dec["recency decay<br/>w = strength · 0.5^(age/half_life)"]
    cls -->|positive| pos["positives[cid] = w"]
    cls -->|negative| neg["negatives[cid] = w"]
    imp["impressions never viewed"] --> sn["soft negatives<br/>w = soft_neg · decay"]
    str --> taff["tag_affinity += tag.weight · w"]
    pos --> tv["taste_vector =<br/>L2-norm centroid of positive vecs"]
    demo["demographics"] --> daff["person_who:* affinity"]
    pos & neg & sn & taff & tv & daff --> us["UserSignals"]

ViewAggregate

All views of one content folded together, content_id, dwell_seconds, visits, end_reason, last_ts, survey_rating. aggregate_views is robust to path-B async (separate START/END events) and to sources that already carry explicit dwell_seconds.

build_user_signals

build_user_signals(*, user_id, events, contents, vectors, now, cfg, demographics=None)
    -> UserSignals

Key behaviors:

  • Recency decay with half_life_days: recent engagement weighs more.
  • Soft negatives: content impressed (shown) but never viewed becomes a weak negative, scaled by soft_negative_weight × decay. Teaches the model what the user skipped.
  • Tag affinity: accumulates facet:label → weight from engaged content's tags, scaled by engagement weight.
  • Taste vector: L2-normalized centroid of positively-engaged content vectors; the query vector for semantic recall.
  • Demographic affinity: survey/demographics map to person_who:* facets for cold start (e.g. age bucket → person_who:child).

This single function is the brain. The online updater rebuilds from the event buffer each refresh so there is exactly one definition of "the user model", see Orchestration and the serving model.


Full auto-generated reference

Code reference → Recsys package.