HNP Corpora and Knowledge Graph¶
The raw corpus of Heritage related to Nazi Persecution, and how it is turned into a structured, queryable Knowledge Graph.
The starting point is unprocessed material gathered from many memorials and partners. It is historically valuable but fragmented: each item arrives with only basic metadata and few explicit links to the people, places, and events it touches. The data spans modalities:
- Text: digitised diaries, testimonies, archival records.
- Images: photographs, artwork, historical maps.
- Audio and video: interviews and survivor testimonies.
The Knowledge Graph is the structured layer built on top. It reads each item and connects it into a network of entities (people, places, events, organisations, objects) joined by meaningful relationships, for example "Person was deported to Camp" or "Event occurred at Location". This turns isolated artifacts into an integrated resource where the engine can travel from a diary to the place it describes to other testimonies of the same event.
Building the graph. Populating the KG relies on NLP pipelines plus manual curation:
flowchart LR
text["Text source"] --> ner["NER<br/>detect mentions"]
ner --> el["Entity Linking<br/>resolve to URI"]
el --> voc["Controlled vocabularies<br/>WO2 NIOD ยท VHA"]
voc --> kg[("Knowledge Graph")]
classDef st fill:#EFEAE0,stroke:#A8895B,color:#423D34;
class kg st;
- NER detects mentions of persons, places, organisations, events, and temporal expressions.
- Entity Linking resolves each mention to a unique authoritative identifier (URI).
- Controlled vocabularies supply the stable identifiers, synonyms, multilingual labels, and hierarchy: the NIOD WO2 Thesaurus gives archival precision (camps, ghettos, organisations, events), and the VHA vocabulary adds experiential categories such as "Deportation/Transport" or "Life in Ghettos". Example: a diary mentioning "Westerbork" is recognised as a location and linked to the WO2 URI for the Westerbork transit camp.
Ontology. Two groups of classes (full spec: MEMORISE Ontology):
| Group | Classes |
|---|---|
| Historical (past realities) | Person (life events), Event (time, persons, places), Location (camps, ghettos, cities, geo-coordinates), Physical Object (artefacts, photos, maps) |
| Historiographic (how history is recorded) | Document (diaries, letters, testimonies, records), Media Item, Archive / Collection |
The KG is built and maintained by the knowledge-graph and NLP pipelines, and the engine consumes it as the symbolic grounding signal in hybrid scoring, connecting related people, places, and events across sources.