DEMSCORE and FAIR Data

According to the overall FAIR principles, data shall be:

  • Findable
  • Accessible
  • Interoperable
  • Reusable

Demscore is committed to upholding these principles in our work and ensuring that our data is managed in line with best practices.

The following section outlines several ways in which Demscore strives to meet the FAIR principles, following the definition and guidelines from the GO FAIR Foundation and the Swedish Research Council.

Principles

Findable

F1: Data and metadata have a unique and permanent identifier

All datasets created through Demscore are assigned a unique Download ID, that can be used to replicate the exact data download when the ID is shared and entered into the Download Interface

During data processing in Demscore, the original identifying code strings in each dataset are kept. If modification is necessary for merging due to a mismatch in identifiers, this is discussed with the creators/collectors of the original data. Modifications of identifiers are limited to inserting default codes for missing values, and country code modifications to enable deterministic linkage between different sources. All changes made are documented in the reference material.

F2: Data are described with detailed machine-readable metadata in a way that enables searches to be processed mechanically

Demscore provides the original metadata, i.e., codebook entries as well as recommended citations for each individual dataset and variable. In addition to that, the Demscore reference document link to the original reference material that users can refer to when additional information on data collection methods etc. is needed.

F3: The provenance (origin) of data and metadata is described in detail

All datasets included in Demscore are published sources with their own citations and DOIs. For each variable, a recommended citation for the original dataset and individual variable (if applicable) is included in the Demscore autogenerated codebooks.

All reference documents include links to the original datasets and codebooks

The Demscore website gives detailed descriptions of all Modules and their original data sources

The Demscore Methodology Document is a detailed description of how the Demscore merge process works and how it is possible for users to customize their datasets

F4: Metadata include the identifier of the data it describes

Each dataset created through Demscore is assigned a unique Download ID. This ID can be shared by the creator of the dataset with peers, supervisors, reviewers, etc. Anyone with the Download ID can createt the exact same dataset through the Demscore Download Interface by pasting the ID in the “Download By Download ID” interface.

The unique Download IDs as well as the information on downloaded settings for each ID are stored in an encrypted database table.

F5: Metadata can be found via a searchable internet service

The Demscore download interface includes a search function allowing users to browse all variables included in the infrastructure. This interface shows not only the variable names, but also the codebook descriptions, time spans covered and dataset the variable originates from. The interface allows users to immediately select a variable for download.

In addition, all this information is available again in autogenerated static files uploaded to the Demscore website.

Accessible

A1: Metadata can be accessed through their identifier via a standardised communications protocol

All data can be accessed through the Demscore website and download interface (see F4).

Each dataset created through Demscore is assigned a unique Download ID. This ID can be shared by the creator of the dataset with peers, supervisors, reviewers, etc. Anyone with the Download ID can create the exact same dataset through the Demscore Download Interface by pasting the ID in the “Download By Download ID” interface.

The unique Download IDs as well as the information on downloaded settings for each ID are stored in an encrypted database table.

A2: Digital objects can be reached and read in an open, free and universally implementable way

The Demscore website is using HTTP/1.1 for communication between the browser and the server.

It supports persistent connections but does not offer features like multiplexing (which allows multiple requests over a single connection).

This feature would be available in HTTP/2 and HTTP/3.

A3: It is possible to create different user roles and mechanisms for verifying users and controlling access to digital objects, when necessary

The HTTP/1.1 protocol support secure transmission of credentials or tokens and does not prevent the use of authentication systems

The Demscore website does at no point require users to log in/provide credentials.

A4: Metadata are accessible even when data are no longer accessible

The metadata available on the Demscore website is updated biannually. The Download ID works across versions, meaning that even if v6 is the current version of Demscore, users can still use a download ID of version 3 and retrieve data from version 3. However, the download interface only contains the most recent data version.

Older versions of datasets are accessible through the partner Modules’ websites.

Interoperable

Metadata and data are reported according to semantic descriptions that are standardised or generally accepted, documented and made accessible

Demscore itself does not collect data. However, all variables are available via the download interface with their original codebook descriptions.

Demscore standardizes variable names for data processing but provides the original variable tag in the autogenerated reference documents.

The Demscore Methodology Documents lays out all merge decisions as well as the general approach of linking and harmonizing data in the infrastructure

I2: Vocabularies, terminologies or ontologies used are generally accepted and checked, and descriptions of these are accessible

All datasets included in Demscore are published sources with their own citations and DOIs. For each variable, a recommended citation for the original dataset and individual variable (if applicable) is included in the Demscore autogenerated codebooks.

All reference documents include links to the original datasets and codebooks

The Demscore website gives detailed descriptions of all Modules and their original data sources

The Demscore Methodology Document provides a detailed, standardized description of all merge- and aggregation processes performed for data preparation

When linking data, Demscore in detail lays out all merge decisions and information here, as well as in the publicly accessible code on GitHub

In addition, the website provides a FAQ section that addresses frequently asked questions regarding the use of Demscore data.

I3: Relationships between different data or metadata are described in a way that enables the context to be understood

If a variable available for download in Demscore is the result of aggregation during the data harmonization and linking process, the exact steps are outlined in the publicly available GitHub code repository as well as described in the codebook entries for each variable.

In addition, Demscore displays all information, such as codebook entries, from the original reference documents in the autogenerated codebooks

In addition, Demscore provides merge scores indicating how well a variable translates to a chosen download format. The merge scores can thus provide the user with a hint of what to expect from the downloaded data in terms of the number of available observations

Reusable

R1: Digital objects include different types of contextual descriptions that enable understanding, and determination whether data is suitable for the purpose of the reuse

All variables in the download interface are available with descriptions (codebook entries) for users to check if the variable is suited for their purposes.

If a variable available for download in Demscore is the result of aggregation during the data harmonization and linking process, the exact steps are outlined in the publicly available GitHub code repository as well as described in the codebook entries for each variable.

In addition, Demscore displays all information, such as codebook entries, from the original reference documents in the autogenerated codebooks

Furthermore, Demscore provides merge scores indicating how well a variable translates to a chosen download format. The merge scores can thus provide the user with a hint of what to expect from the downloaded data in terms of the number of available observations

R2: Conditions for how metadata or data can be used are stated

The autogenerated codebooks provided with the data download include information on the licenses of each included datasource.

The Demscore GitHub code can be used under the Attribution-ShareAlike 4.0 International License, stated in both public repositories

R3: Metadata and data are structured and documented according to the standards and generally accepted formats applicable for the purpose

The work-in-progress data and databases, as well as live databases including download information, are backed up regularly through scheduled backups