Event Catalog¶

The shared vocabulary of things a visitor can do. Every behavioral signal in the system starts as one of these events.

Browse the full catalog

Every event, its schema, and its producers and consumers are documented and browsable at the hosted EventCatalog. This page covers the subset the recsys normalizer consumes.

Think of the event catalog as the alphabet of behavior. If every team invented its own words for "the visitor opened a story," nothing downstream could agree on meaning. A fixed catalog fixes that: a small set of named events, each with a clear meaning and a few fields.

The events fall into intuitive groups:

Viewing: a visitor starts and ends looking at a piece of content. The start tells us what they chose (and what else was on offer but ignored); the end tells us how they left (advanced on purpose? bailed?).
Searching: a visitor looks something up and maybe clicks a result.
Answering: a visitor responds to a survey, giving explicit signal for cold start.

Two subtle but important ideas:

Impressions matter. Knowing what was shown but not chosen is as informative as what was chosen, it teaches the model what to de-prioritize.
How a view ends is a rating. Advancing with the next button is a quiet thumbs-up; abandoning is a quiet thumbs-down.

Catalog as consumed by the normalizer (recsys/adapters/rudderstack.py):

Event	Key fields (raw)	Meaning
`CONTENT_VIEW_STARTED`	`content.content_id`, `context.candidates[].content_id`	view begins; candidates = impressions
`CONTENT_VIEW_ENDED`	`details.reason`	view ends; reason becomes `end_reason`
`CONTENT_LOOKUP`	`details.query_text`, `details.clicked_id`	search + optional click
`SURVEY_ANSWERED`	`answers[]`	explicit survey signal

end_reason vocabulary (drives the completion signal):

reason	meaning	engagement contribution
`next_button`	advanced on purpose	`+1.0`
`link`	followed a link	`+0.6`
`close_button`	closed	`0.0`
`abandon`	left without ending	`-0.5`

Normalized form: every event becomes one InteractionEvent:

InteractionEvent(
    user_id, event, ts,
    session_id=None, content_id=None,
    dwell_seconds=None,        # paired from STARTED.ts to ENDED.ts
    end_reason=None,           # EndReason enum
    query_text=None, clicked_id=None,
    impressions=[],            # shown-but-not-chosen
    survey_answers={},
)

STARTED/ENDED of the same content+session are paired into dwell_seconds by the normalizer (not in SQL), so every source produces identical events. Full normalization map: AI Engine, Recsys data flow.

Canonical event schema. Raw events are persisted to a dedicated event store with this field set (D5.3, adapted from the Google Analytics event model):

Field	Description	Example
`event_name`	name or type of the event (page_view, click, search)	`view`
`event_id`	unique identifier for the event	`evt_123456789`
`user_id`	pseudonymized user identifier	`u_ab3f9x7`
`session_id`	groups events within one session	`s_20250917_01`
`timestamp`	event time (ISO 8601 UTC)	`2025-09-17T10:15:23Z`
`content_id`	optional reference to content metadata	`doc_9981`
`query`	search query, if the event is a search	`Warsaw uprising diaries`
`dwell_ms`	time spent on the item, milliseconds	`48210`
`scroll_pct`	depth of scroll interaction, percent	`87`
`device`	device or browser information	`Chrome on Android`
`referrer`	referring source	`homepage`

Candidate schema standards considered: the Google Analytics event model, the open-source UBI schema, and W3C Activity Streams 2.0. The project does not commit to one; the requirement is that all partners use a consistent schema and storage approach so data from different prototypes aligns.

Signals captured

Interaction history captures navigation flow, dwell time, scroll depth and interaction style, search behavior (queries, reformulations, abandoned searches), engagement actions (bookmark, save, share), and drop-off patterns. These become the behavioral model.

To emit these events from a UI, see Send Events.