An entry is a collection of variables.
An entry class is proceeded by one or multiple pound signs
#, in the form of:
# <entry class> - <key 1>: <value 1> - <key 2>: <value 2> ... - <key n>: <value n>
# Example in previous examples are the classes of Level-1 Entries, as denoted by the single leading pound sign.
All REAM files start with a Level-1 Entry, and contain exactly one Level-1 Entry.
Entries are useful when describing an object with multiple attributes:
# Country - name: Belgium - capital: Brussels - population: 11433256 - euro_zone: TRUE
Here we define an object of
Country class, whose name is
Belgium, capital is
Brussels, population is
11433256, and is part of the
Let's annotate the dataset:
# Country - name: Belgium > short for the Kingdom of Belgium - capital: Brussels - population: 11433256 > data from 2019; retrieved from World Bank - euro_zone: TRUE > joined in 1999
Entries can have zero variables:
Entries should have local unique keys. The following code will raise an error:
# Country - name: Belgium - language: Dutch - language: French - language: German
The current parser don't check for duplicate keys yet, so technically this is still valid. This rule will be enforced in future versions.
Entries can be nested, and the level of the entry is denoted by the number of leading
So a Level-1 Entry takes the form of
# <Level 1 Entry Class>, and a Level-2 Entry takes the form of
## <Level 2 Entry Class>, and so forth.
# Country - name: Belgium ## Language - name: Dutch ## Language - name: French ## Language - name: German
# Country entry has one variable
name and three Level-2 child entries
## Language subentries are also terminal nodes as they do not contain any subentry.
When compiling the dataset, the parser look for all terminal nodes in the REAM file and flatten the data structure.
Thus the previous example produces a dataset with three rows (one for each terminal node) and two columns (one of each variable).
Note that the variable keys are scoped, so
## Language is allowed to have a variable with the key
name despite its parent entry
# Country also contain a variable with the same key.
Entry must be nested in order. Level-2 Entries can only be nested in a Level-1 Entry, and Level-3 Entries can only be nested in a Level-2 Entry, and so forth. Compare the datasets compiled from the following two examples with the previous one:
# Country - name: Belgium ## Language - name: Dutch > This is in a Level 2 Entry ### Language - name: French > This is in a Level 3 Entry ### Language - name: German > This is in a Level 3 Entry
# Country - name: Belgium ## Language - name: Dutch > This is in a Level 2 Entry ## Language - name: French > This is in a Level 2 Entry ### Language > This is in a Level 3 Entry - name: German
A visualization of the differences between the three schemas are as follows. The terminal nodes are colored yellow.
An entry can contain subentires of differenct classes:
# Country - name: Belgium ## City - name: Brussels ## Language - name: Dutch
Also, entries of the same class need not have identical variables, nor the same variable order.
# Country - name: Belgium ## Language - name: Dutch - size: 0.59 ## Language - size: 0.4 - name: French ## Language - name: German
Observe that the order of the variables are preserved by default.
The datasets compiled by the last two examples are not too useful for analysis. To compile quality analysis-ready datasets, we should specify the schema of the datasets in the codebook (not yet implemented).