Wikidata and Wikibase

Activity report

Max De Wilde
Data Oriented Services (DORIS)
CNECT.R3
@ CRDM coordination group
30.06.2023

Outline

  1. Wikibase
  2. The EU Knowledge Graph
  3. SEMIC activities
  4. Use cases

What is Wikibase?

The Wikimedia Foundation hosts many wikis...
Wikidata is one of them
Wikibase is the software behind Wikidata
Architecture
Docker image
Comparison with other solutions
Source: Dennis Diefenbach, Max De Wilde and Samantha Alipio (ISWC 2021)
Wikibase as an Infrastructure for Knowledge Graphs: the EU Knowledge Graph

The EU Knowledge Graph

A repository to store structured information
about the European Union
https://linkedopendata.eu/

Why use Wikibase?

User-friendly interface
Graph structure
Can be queried
Can be edited by humans and by bots
Scales well
Wikibase hosts Wikidata, one of the largest existing KG which contains billions of triples
Multilingual
Full track of changes

Current content

European institutions
European countries
Capital cities
DGs
Buildings and canteens
Projects funded by the EU
Beneficiaries of EU funds
Linked Data solutions

Importing data

1. Take any structured data

2. Model the data

  • We need entities like buildings, offices...
  • We need properties like address, opening hours, occupant...
  • Whenever possible, reuse Wikidata entities/properties or other existing ones

3. Keep identifiers

Use external identifiers so that one can use them to link to other resources!

4. Import using Wikibase APIs

We always use Pywikibot
But there are alternatives...
The data imported is understandable, aligned with existing concepts, queryable, and easy to reuse
But DG REGIO moved to another building!
How to stay in line with reality?

Keeping the data in sync

Wikidata
EU Knowledge Graph

For this we use Wikibase Sync

https://github.com/the-qa-company/WikibaseSync

What is it?

Similar to WikibaseImport but...
  1. you can run it locally
  2. it can sync items and properties
  3. local changes are not overwritten

WikibaseUpdater

  • A bot based on WikibaseSync that checks that the data is synchronised
  • Refreshed every 5 minutes

Services provided

Data exports

Available at https://data.linkedopendata.eu

Query service

SPARQL endpoint
Available at https://query.linkedopendata.eu

SEMIC activities

Use cases

  • with OP: linking more vocabularies like Eurovoc
  • with RTD: closer integration with Horizon projects
  • with Eurostat: Local Administrative Units (LAUs)
  • with OIB: historical archives about organisations
  • with GROW: platform for single market obstacles
  • with VLOCA (Flanders): open city architecture KB
More ideas welcome but we need to prioritise! 😊

Acknowledgements

  • Dennis Diefenbach @ The QA Company
  • SEMIC Team @ DIGIT and PwC
  • Knowledge Management Team @ DG REGIO
  • Wikimedia Deutschland (WMDE)
Thank you! Questions?