Kohesio and the
EU Knowledge Graph

Anne Thollard | Max De Wilde
Data Oriented Services (DORIS)
CNECT.R3
Presentation for DG ECHO
30.06.2023

Outline

  1. Wikibase
  2. The EU Knowledge Graph
  3. Kohesio
  4. Other use cases

What is Wikibase?

The Wikimedia Foundation hosts many wikis...
Wikidata is one of them
Wikibase is the software behind Wikidata
Architecture
Docker image
Comparison with other solutions
Source: Dennis Diefenbach, Max De Wilde and Samantha Alipio (ISWC 2021)
Wikibase as an Infrastructure for Knowledge Graphs: the EU Knowledge Graph

The EU Knowledge Graph

A repository to store structured information
about the European Union
https://linkedopendata.eu/

Why use Wikibase?

User-friendly interface
Graph structure
Can be queried
Can be edited by humans and by bots
Scales well
Wikibase hosts Wikidata, one of the largest existing KG which contains billions of triples
Multilingual
Full track of changes

Current content

European institutions
European countries
Capital cities
DGs
Buildings and canteens
Projects funded by the EU
Beneficiaries of EU funds

Importing data

1. Take any structured data

2. Model the data

  • We need entities like buildings, offices...
  • We need properties like address, opening hours, occupant...
  • Whenever possible, reuse Wikidata entities/properties or other existing ones

3. Keep identifiers

Use external identifiers so that one can use them to link to other resources!

4. Import using Wikibase APIs

We always use Pywikibot
But there are alternatives...
The data imported is understandable, aligned with existing concepts, queryable, and easy to reuse
But DG REGIO moved to another building!
How to stay in line with reality?

Keeping the data in sync

Wikidata
EU Knowledge Graph

For this we use Wikibase Sync

https://github.com/the-qa-company/WikibaseSync

What is it?

Similar to WikibaseImport but...
  1. you can run it locally
  2. it can sync items and properties
  3. local changes are not overwritten

WikibaseUpdater

  • A bot based on WikibaseSync that checks that the data is synchronised
  • Refreshed every 5 minutes

Services provided

Data exports

Available at https://data.linkedopendata.eu

Query service

SPARQL endpoint
Available at https://query.linkedopendata.eu

Use case: Kohesio

Transparent communication on projects co-funded by the EU

What is Kohesio?

  • Cohesion funds are managed together with national and local authorities in the 27 EU member states
  • The member states have a legal obligation to publish the list of projects and beneficiaries on their national websites
  • The goal of Kohesio is to aggregate this data and make it publicly available in an easy, open way

Data sources

  1. Dozens of Excel files describing the projects
  2. Additional vocabularies specific to Cohesion Policy: categories of intervention, thematic objectives, etc.
  3. Data about geographic entities (NUTS)
  4. Wikidata

Enriching the data

  1. Translating project labels and descriptions
  2. Computing location on the map (geocoding)
  3. Inferring the NUTS region
  4. Linking beneficiaries to Wikidata

Building a website on top

Available at https://kohesio.ec.europa.eu/

Mostly open source

Code on https://github.com/ec-doris/
Contributing back to Wikibase (e.g. BatchIngestion)
Memorandum of understanding with WMDE

Other use cases

  • with DIGIT: linked data solutions in Europe
  • with OP: linking vocabularies like Eurovoc
  • with RTD: closer integration with Horizon projects
  • with Eurostat: Local Administrative Units (LAUs)
  • with OIB: historical archives about organisations
More ideas welcome but we need to prioritise! 😊

Acknowledgements

  • Dennis Diefenbach @ The QA Company
  • Knowledge Management Team @ DG REGIO
  • Wikimedia Deutschland (WMDE)

Data Oriented Services (DORIS)

Thank you! Questions?