Merge Scores
We provide three merge scores in Demscore:
- The number of non-missing observations in the original Output Unit of the variable.
- The number of non-missing observations in the chosen end Output Unit for that variable.
- In direct translations, the number of lost observations between the variable in its original unit and the variable in the chosen end Output Unit.
Here are a few general guidelines on how to read and interpret the merge scores offered in Demscore:
- If the score for a variable is very high in the original Output Unit but very low in the end Output Unit, and at the same time the score for lost observations is very high, this means that overlap in identifier combinations between these two Output Units is low.
- If the merge score is high in the original Output unit, but low in the end Output Unit and at the same time, the number of lost observations is low, this means that you have probably chosen a variable that is available only in very few identifier combinations, compared to the identifier combinations in the end Output Unit. However, the end Output Unit covers a lot of the observations from the original Output Unit of the variable.
The merge scores in Demscore v1 can thus provide the user with a hint of what to expect from the downloaded data. However, we recommend users to also investigate their customised dataset and which observations that matched, before deciding on whether or not to use the chosen variable for their analysis.
Information on which identifier combinations "get lost” during a translation from one Output unit to another Output unit, can be found here.
Please note that this information is currently only available for selected combinations of Output Units. We add files continuously and aim to include the lost observations for each variable to the chosen end Output Unit in the download file in the future.