Causal Inference-Based Covariate Selection for Binary Variables Via the Linear Probability Model

Abstract

Understanding causal mechanisms in observational data constitutes a challenging task. Here, selection of covariates is crucial to de-confound the causal relation of interest. This study extends a non-Gaussian Forward Selection (nGFS) algorithm to select eligible covariates for consistently estimating a causal effect between continuous variables to the binary variable case. Given that many constructs in the educational sciences are categorical in nature, we systematically investigated the capability of nGFS to handle binary data via the linear probability model. Comprehensive Monte-Carlo simulation experiments were used to assess the algorithm's effectiveness in providing unbiased estimates of causal effects with binary predictors or/and binary outcomes. Results indicate that nGFS maintains robust performance in terms of covariate selection with a binary focal predictor and a continuous outcome, when sample sizes are relatively large (e.g., n > 500). The nGFS based on the linear probability model, however, is not suited to perform covariate selection in the context of binary outcomes. An empirical data example from education research demonstrates the application of nGFS for observational data. Overall, findings highlight the utility of nGFS in accurately identifying relevant covariates and estimating causal effects in data scenarios prevalent in educational research.

Department(s)

Psychological Science

Keywords and Phrases

binary variable; Causal inference; covariate selection; linear probability model; observational data

International Standard Serial Number (ISSN)

1940-0683; 0022-0973

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2026 Taylor and Francis Group; Routledge, All rights reserved.

Publication Date

01 Jan 2025

Share

 
COinS