The 'Download by Codebook Section' option allows users to browse and download variables grouped by thematic sections, making it easy to access data on specific topics across multiple datasets. This option is particularly useful for users interested in downloading all relevant variables related to a specific theme without searching each dataset or codebook individually.

1. Select the 'By Codebook Section' Option in the Download Interface

To begin, select the “By Codebook Section” option in the Download Interface.

2. Search for Keywords and Select Codebook Section

You are now on the 'Generate Dataset by Codebook Section' page, where all variables are organized into thematic sections. This setup allows you to quickly explore topics of interest, drawing from all available datasets.

You can search for keywords related to your topic of interest within the codebook section intros. For example, if your research focuses on conflict, you can type in that keyword. In the table, you can view details about each codebook section, including the dataset name, codebook section name, the number of variables included in that section, and a codebook intro further describing the variables.

Once you have selected all the codebook sections of interest, click the 'Generate Dataset' button at the top left corner or at the bottom of the page.

3. Customize Your Dataset- Output Unit, File Format, Unit Columns, and Empty Rows

Output Unit

Choose an Output Unit based on the selected codebook section(s). The system will suggest only those output units in which all selected variables are available. For example, "QoG  Country-Year" means that you will retrieve a dataset with one row per country and year, using QoG country definitions and available year identifiers.

For this example, we will select 'QoG Country-Year'.

File Format

Select a file format for your dataset. Available options include:

  • R (.rds)
  • STATA (.dta)
  • CSV (.csv)

Include Unit Columns

We recommend including Unit Columns to add unique identifiers for each row, which are helpful when merging or comparing datasets later.

Exclude Empty Rows

By default, all rows in the dataset are included, even those with missing data. However, you have the option to exclude empty rows, i.e. rows in which none of your selected variables matches the chosen output unit.

If you choose to exclude empty rows, any row that consists entirely of missing values- such as Demscores default placeholder -11111 or true missing values like NA - will be removed. This can help you create a more streamlined dataset, focusing only on rows with meaningful data, which is especially useful when dealing with sparse variables. 

However, keep in mind that excluding empty rows can affect your ability to easily merge or compare this dataset with others that use the same output unit. If the number of rows differ between datasets, you will need to rely on the unit identifiers (such as country or year) to perform a more complex merge later on. 

Tips: Use this option when working with variables that have limited data points, but be cautious if you plan to merge with other datasets later.

4. Generate Dataset

Once you have finalized your selections, click the 'Generate Dataset' button. Your dataset will be processed and made available for download, along with a customized codebook containing details on the variables you selected.