Evaluation¶

How the engine will be assessed against its goals for engagement and learning. Performance and engagement measured quantitatively; user experience and satisfaction qualitatively.

Building the engine is not enough; it has to demonstrably help visitors learn and engage, and do so respectfully. The evaluation plan groups checks into three modules, each noting whether it is run internally (by the team) or externally (with users), and each with example metrics so assessment is concrete rather than vague.

User modelling and adaptation (internal) assesses how well the system builds profiles and adapts over time.

Aspect	Metric
User persona alignment	correlation between survey/persona data and generated recommendations (qualitative)
Adaptivity over time	improvement in relevant content after repeated interactions; simulations of extreme interest, frustration, disinterest (quantitative)
Cold-start handling	success rate for new or anonymous profiles, measured as sustained engagement in the chosen topic (quantitative)

User experience (external) measures engagement, learning, and perceived personalisation, coordinated with WP7.

Aspect	Metric
Knowledge gain	pre- and post-surveys and user interviews
Engagement and immersion	average session duration and navigation depth from implicit indicators
Personalisation experience	survey results, target mean at least 4 on a 5-point Likert scale, plus interviews

Ethical (internal) evaluates sensitivity and balance, per the MEMORISE ethical guidelines.

Aspect	Metric
Diversity and sensitivity	distributional analysis of recommended content across themes, communities, and narratives, attending to similarity and dissimilarity so the nuance of HNP is communicated

These modules will be put into practice and reported in the next deliverable, D5.4 (performance evaluation of the Individual Experience framework on final applications).