Skip to content

Event Catalog

The shared vocabulary of things a visitor can do. Every behavioral signal in the system starts as one of these events.

Browse the full catalog

Every event, its schema, and its producers and consumers are documented and browsable at the hosted EventCatalog. This page covers the subset the recsys normalizer consumes.

Think of the event catalog as the alphabet of behavior. If every team invented its own words for "the visitor opened a story," nothing downstream could agree on meaning. A fixed catalog fixes that: a small set of named events, each with a clear meaning and a few fields.

The events fall into intuitive groups:

  • Viewing: a visitor starts and ends looking at a piece of content. The start tells us what they chose (and what else was on offer but ignored); the end tells us how they left (advanced on purpose? bailed?).
  • Searching: a visitor looks something up and maybe clicks a result.
  • Answering: a visitor responds to a survey, giving explicit signal for cold start.

Two subtle but important ideas:

  • Impressions matter. Knowing what was shown but not chosen is as informative as what was chosen, it teaches the model what to de-prioritize.
  • How a view ends is a rating. Advancing with the next button is a quiet thumbs-up; abandoning is a quiet thumbs-down.

Catalog as consumed by the normalizer (recsys/adapters/rudderstack.py):

Event Key fields (raw) Meaning
CONTENT_VIEW_STARTED content.content_id, context.candidates[].content_id view begins; candidates = impressions
CONTENT_VIEW_ENDED details.reason view ends; reason becomes end_reason
CONTENT_LOOKUP details.query_text, details.clicked_id search + optional click
SURVEY_ANSWERED answers[] explicit survey signal

end_reason vocabulary (drives the completion signal):

reason meaning engagement contribution
next_button advanced on purpose +1.0
link followed a link +0.6
close_button closed 0.0
abandon left without ending -0.5

Normalized form: every event becomes one InteractionEvent:

InteractionEvent(
    user_id, event, ts,
    session_id=None, content_id=None,
    dwell_seconds=None,        # paired from STARTED.ts to ENDED.ts
    end_reason=None,           # EndReason enum
    query_text=None, clicked_id=None,
    impressions=[],            # shown-but-not-chosen
    survey_answers={},
)

STARTED/ENDED of the same content+session are paired into dwell_seconds by the normalizer (not in SQL), so every source produces identical events. Full normalization map: AI Engine, Recsys data flow.

Canonical event schema. Raw events are persisted to a dedicated event store with this field set (D5.3, adapted from the Google Analytics event model):

Field Description Example
event_name name or type of the event (page_view, click, search) view
event_id unique identifier for the event evt_123456789
user_id pseudonymized user identifier u_ab3f9x7
session_id groups events within one session s_20250917_01
timestamp event time (ISO 8601 UTC) 2025-09-17T10:15:23Z
content_id optional reference to content metadata doc_9981
query search query, if the event is a search Warsaw uprising diaries
dwell_ms time spent on the item, milliseconds 48210
scroll_pct depth of scroll interaction, percent 87
device device or browser information Chrome on Android
referrer referring source homepage

Candidate schema standards considered: the Google Analytics event model, the open-source UBI schema, and W3C Activity Streams 2.0. The project does not commit to one; the requirement is that all partners use a consistent schema and storage approach so data from different prototypes aligns.

Signals captured

Interaction history captures navigation flow, dwell time, scroll depth and interaction style, search behavior (queries, reformulations, abandoned searches), engagement actions (bookmark, save, share), and drop-off patterns. These become the behavioral model.

To emit these events from a UI, see Send Events.