Integrating COVID-related Open Data published by national governments in a Knowledge Graph

Supervisor: Prof. Dr. Andreas Harth

Description: Recently, with the pandemic still enduring, and the situation becoming partially more intransparent, a lot of calls are raised for more transparent and open reporting of data around the developments of the pandemic and its effects and development. While most countries publish obligatory dashboards and infection numbers, reporting about other factors seems to be still highly heterogeneous across countries, such that a comparable view of developments is hardly achievable.

In this thesis, the goal is to survey, overview, and compare the availability of Open Data around the pandemic across the EU countries, and assess the feasibility of (and potentially prototype) the integration of data published by different countries in a Data Processing Pipeline ready for analytical comparisons.

Related questions to be answered include:

Which aspects of COVID related numbers on national level are publicly available in each country?

• infection numbers
• test numbers, kinds of tests used?
• R0?
• availability of testing facilities
• 7-day incidence?
• at which level of regional granularity?
• age distributions?
• vaccine numbers? Which types of vaccines? at which temporal/regional granularity? Age distribution? Risk group definitions and planned vaccine phases? Vaccine order and delivery numbers? Production numbers?
• hospitalisations and hospital capacities? ICU beds, etc.
• lockdown measures? (shops, restaurants, childcare, school closing (per grades), home office obligations)
• quarantine regulations and travel restrictions
• socio-economic effects (such as un-employment numbers, GDP effects)

How is this data published?

• structured/machine-readable?
• data formats
• spatial and temporal granularity
• available via APIs?
• which schema is used

Overall goal: find out how comparable Data about covid is across EU countries or pinpoint the heterogeneity problems across the EU

The data availability situation assessment and integration prototype development could potentially be split up into two separate theses topics. For a prototype implementation, we are particularly interested in the deployment of Knowledge Graph technologies [1], i.e. building up a dynamic Knowledge graph on COVID-related information, that would enable or potentially enrich country- or regionwise analyses and studies such as the following examples [2,3,4], eventually.