How to Select Data by Output Unit
1. Select Output Unit of Interest
Navigate to the Output Unit Selection Page. Here, you can select from various Output Unit options, such as:
- Country-Year
- Country-Region
- Cabinet and Party
- Dyad and Conflict
- Date and Event
- Predictions
- etc.
The Output Unit you select determines how your chosen variables will be merged, setting the key identifiers used to align and integrate the data accurately. Each Output unit includes information on year coverage, country coverage, and a description of the unit. For example, the Country-Year unit is ideal for examining country-level data over time. Let us take the UCDP Organized Violence Country-Year Output Unit as an example:
Output Unit: An Output Unit, e.g., QoG Country-Year, is defined as an output format in which variables can be retrieved from one or more datasets through a strictly defined output grid. A unit table defining this output grid contains unit identifier columns with u_ prefixes and the table is sorted based on these unit identifier columns and has a fixed number of rows. An Output Unit has specific definitions for the level at which observations are presented, e.g., country definitions. For example, variables from a QoG dataset may have been collected under QoG country definitions, but in Demscore can also be retrieved through a V-Dem Output Unit which follows V-Dem country definitions.
If you are interested in studying organized violence and its relationship to political changes, for instance, you could select this Output Unit and merge it with democracy-related datasets for the same country-year combinations.
Once you have selected this Output Unit, click the “Click Here to Download Data in the Output Unit ‘UCDP Organized Violence Country Year” button.
2. Select a File Format
The next step is to choose a file format for your dataset. You can download the data in one of the following formats:
- R (.rds)
- STATA (.dta)
- CSV (.csv)
3. Customize your Dataset: Unit Columns, Empty Rows, and Countries
Now, you can customize your dataset by adjusting the following options:
Include Unit Columns
We recommend including Unit Columns, as they provide unique identifiers for each row in the dataset. These are helpful when merging or comparing datasets in the future.
Exclude Empty Rows
By default, all rows in the dataset are included, even those with missing data. However, you have the option to exclude empty rows, which can be useful if the variables you have selected have very few observations in your chosen Output Unit.
If you opt to exclude empty rows, any row that consists entirely of missing values—such as Demscore’s default placeholder -11111 or true missing values like NA—will be removed. This can help you create a more streamlined dataset, focusing only on rows with meaningful data, which is especially useful when dealing with sparse variables.
However, keep in mind that excluding empty rows can affect your ability to easily merge or compare this dataset with others that use the same Output Unit. If the number of rows differs between datasets, you will need to rely on the unit identifiers (such as country or year) to perform a more complex merge later on.
Tip: Use this option when working with variables that have limited data points, but be cautious if you plan to merge with other datasets later.
Include All Countries
You can choose to include data for all countries or select specific ones. If you wish to focus on certain countries, uncheck the “Include all countries” box, and manually search for the countries you want to include. In this example, we will select all countries.
4. Select Variables
Next, select the variables for your dataset. Each variable is displayed with a label and its corresponding Demscore internal long tag (in parentheses), which indicates the dataset from which the variable originates.
For example, variables with the tag “ucdp_orgv_cy” come from the UCDP Country-Year Dataset on Organized Violence within Country Borders. Choose variables that align with your research focus.
In our example, we will select variables related to organized violence and democracy. Additionally, we will include control variables, such as GDP and population size, to account for socioeconomic factors.
5. Select Year Range
Now, choose the year range for your dataset. You can either select a specific time period or include the entire range of years available for your chosen Output Unit. In our example, we will select the year range from 1990 to 2022, as we want to investigate the post-Cold War period up to the present.
6. Generate Dataset
Once you have finalized your selections, click the 'Generate Dataset' button. Your dataset will be processed and made available for download, along with a customized codebook that provides details on the variables you selected.