Paper: "From Breaks to Bridges: An Architecture for Survey Redesigns That Expand Conceptual Scope"
Authors: Germán Tessmer (Universidad Nacional de Rosario) and Bárbara Boggiano (Universidad Alberto Hurtado)
Contact: german.tessmer@unr.edu.ar | Project page
This repository contains the complete R pipeline to reproduce all results, figures, and tables in the paper. The pipeline constructs a bridged labour-formality series for Argentina (2016Q4--2025Q3) from EPH microdata, combining deterministic harmonization, a two-factor latent structure, LASSO-selected predictive models (LPM, GLM, SLS), and a hybrid backcasting rule.
The paper's online appendices are included in this repository at appendix/2026_FromBreakstoBridges-OnlineAppendix.pdf.
- R >= 4.5.0
- renv for package management (run
renv::restore()to install all dependencies fromrenv.lock) - Hardware: 16 GB RAM recommended; 7+ CPU cores for parallel bootstrap and LOCO cross-validation
- OS: Developed on Windows 11; should work on Linux/macOS with minor path adjustments in
parametros.R
-
EPH Microdata (primary): Pre-processed panel files from the EPH--Observatorio repository. Please cite this dataset as:
Tessmer, Germán; Moleres, Santiago; Hess, Laureano Elian, 2024, "Encuesta Permanente de Hogares de Argentina (EPH - Observatorio). Explicada, etiquetada y ampliada", https://doi.org/10.57715/UNR/BL85Z8, RDA UNR, V3.
- DOI: 10.57715/UNR/BL85Z8
- Place the
.RDatafiles in the directory specified byRUTA_BASESinscript/config/parametros.R(default:C:/oes/eph_rdos/capa2/).
-
SIPA Administrative Data: Published by the Secretaría de Trabajo, Argentina.
- URL: https://www.argentina.gob.ar/trabajo/estadisticas/situacion-y-evolucion-del-trabajo-registrado
- Used only for external validation (Layer 7, script
10e).
-
INDEC Official Microdata: Original EPH individual and household databases.
No restricted-access or confidential data are used.
- Clone this repository and open the R project.
- Install dependencies:
renv::restore() - Edit
script/config/parametros.R:- Set
RUTA_BASESto the directory containing the EPH.RDatafiles. - Adjust
N_CORESif needed (default: auto-detected minus 1).
- Set
- Run the full pipeline:
source("00_master_runner.R")
The master runner executes all layers sequentially. Total runtime is approximately 4--6 hours depending on hardware (the heterofactor estimation in Layer 3 is the bottleneck).
Individual layers can also be run independently by sourcing scripts in order.
| Layer | Directory | Description | Key Scripts |
|---|---|---|---|
| 0 | script/00_diccionarios/ |
Variable dictionaries and crosswalks | 00_diccionarios.R |
| 1 | script/01_datos_base/ |
Panel construction, taxonomy, income cleaning, ICH | 01 -- 03c |
| 2 | script/02_proxies/ |
Proxy system for latent factors | 04, 05 |
| 3 | script/03_heterofactor/ |
Two-factor latent model (FIML) | 06a -- 06d |
| 4 | script/04_modelado/{LPM,GLM,SLS}/ |
LASSO model estimation (3 families) | 07a -- 07e |
| 5 | script/05_backcasting/{LPM,GLM,SLS}/ |
Historical backcasting and hybrid construction | 08a -- 08c |
| 6 | script/06_comparativo/ |
Cross-model comparison | 09a -- 09c |
| 7 | script/07_robusto/ |
Robustness checks (16 scripts) | 10a -- 10o |
All paths, seeds, temporal parameters, and model suffixes are defined in script/config/parametros.R. Key settings:
ANIO_INI / TRIM_INI--ANIO_FIN / TRIM_FIN: Panel time range (default: 2016Q4--2025Q3)N_TRIMESTRES_TRAINING: Number of overlap quarters used for model training (default: 4)SEED_GLOBAL: Random seed for reproducibility (default: 123)N_CORES: Parallel workers for bootstrap and cross-validation
All outputs are written to rdos/ (created automatically):
rdos/datos/-- Processed panel files (.rds)rdos/modelos/-- Fitted models (.rds)rdos/contratos/-- Validation contracts with citable statistics (.rds)rdos/reportes/-- HTML diagnostic reports and CSV summariesrdos/figuras/-- All figures as PDF files
The R scripts are written in Spanish (comments, variable names, diagnostic messages). Each script includes a bilingual header summarizing its purpose, inputs, and outputs in English. The pipeline structure and variable naming follow a consistent convention documented in the headers and in the variable dictionary (rdos/inputs/diccionarios/).
MIT License. See LICENSE for details.