Regression Discontinuity Design (RDD)
Regression Discontinuity Design (RDD) is a quasi-experimental method that exploits arbitrary cutoffs in treatment assignment rules to identify causal effects. RDD compares units just above and below a threshold to estimate local treatment effects.
When to Use RDD
Ideal Conditions: - Treatment assignment is determined by a continuous variable (running variable) crossing a threshold - Assignment rule is strictly enforced and known - Units cannot precisely manipulate the running variable around the cutoff - Sufficient observations near the cutoff
Common Applications: - Educational interventions (test score cutoffs for remedial programs) - Financial aid eligibility (income thresholds) - Policy interventions (age cutoffs, geographic boundaries) - Medical treatments (clinical thresholds for treatment) - Electoral systems (vote share thresholds)
Not Suitable When: - Treatment assignment is not based on a clear cutoff - Running variable can be easily manipulated - Insufficient observations near the cutoff - Multiple simultaneous cutoffs exist
Theoretical Background
The RDD Framework
Basic Setup: - Running Variable (X): Continuous variable determining treatment assignment - Cutoff (c): Threshold value where treatment assignment changes - Treatment Assignment: \(D_i = 1\) if \(X_i \\geq c\), :math:`D_i = 0$ if :math:`X_i < c$
Sharp RDD: Treatment assignment is a deterministic function of the running variable:
Fuzzy RDD: Treatment probability changes discontinuously at the cutoff, but assignment is not deterministic:
Where :math:`g_1(c) \neq g_0(c)$.
RDD Estimand: The treatment effect at the cutoff:
Sharp RDD Estimation:
Key Assumptions
Continuity of Potential Outcomes
Definition: Potential outcomes are continuous at the cutoff in the absence of treatment.
Mathematical: :math:`\lim_{x \to c^+} E[Y_i(0) | X_i = x] = \lim_{x \to c^-} E[Y_i(0) | X_i = x]$
Why it matters: This is the core identifying assumption that allows causal interpretation.
Testing: Check for discontinuities in covariates at the cutoff.
No Precise Manipulation
Definition: Units cannot precisely control their value of the running variable around the cutoff.
Why it matters: If units can manipulate assignment, selection bias is reintroduced.
Testing: McCrary density test for discontinuities in running variable density.
No Other Discontinuities
Definition: No other treatments or interventions change discontinuously at the same cutoff.
Why it matters: Other discontinuous changes would confound the treatment effect.
Testing: Examine institutional rules and policy changes around the cutoff.
Types of RDD
Fuzzy RDD
Characteristics: - Treatment probability changes at cutoff but assignment is not deterministic - Some units above cutoff don’t receive treatment (non-compliance) - Some units below cutoff receive treatment (always-takers)
Estimation: - Use instrumental variables approach - Running variable above/below cutoff as instrument for treatment - Estimates Local Average Treatment Effect (LATE) for compliers
Two-Stage Approach: First Stage: :math:`D_i = \alpha_0 + \alpha_1 \mathbf{1}(X_i \geq c) + f(X_i) + \epsilon_i$ Second Stage: :math:`Y_i = \beta_0 + \tau \hat{D_i} + g(X_i) + u_i$
Implementation in Causal Agent
Fuzzy RDD Analysis
# Fuzzy RDD with imperfect compliance
result = agent.analyze(
data=rdd_data,
treatment='actually_treated', # actual treatment received
outcome='test_score',
running_var='prior_score',
cutoff_value=70,
method='fuzzy_rdd'
)
print(f"LATE Estimate: {result.late}")
print(f"First-stage jump: {result.first_stage_jump}")
Bandwidth Selection
# Custom bandwidth selection
result = agent.analyze(
data=rdd_data,
treatment='above_cutoff',
outcome='test_score',
running_var='prior_score',
cutoff_value=70,
bandwidth_method='optimal', # or 'cross_validation', 'rule_of_thumb'
bandwidth_value=5.0 # manual bandwidth
)
Diagnostic Tests and Validation
Manipulation Testing
Test whether units can precisely manipulate the running variable:
# McCrary density test
manipulation_test = agent.mccrary_test(
data=rdd_data,
running_var='prior_score',
cutoff_value=70
)
print(f"McCrary test p-value: {manipulation_test.p_value}")
print(f"Density discontinuity: {manipulation_test.discontinuity}")
What to look for: - Non-significant p-value (no evidence of manipulation) - Smooth density around the cutoff - No unusual bunching just above or below cutoff
Covariate Balance Testing
Check for discontinuities in predetermined characteristics:
# Test balance of covariates at cutoff
balance_test = agent.covariate_balance_rdd(
data=rdd_data,
covariates=['age', 'gender', 'socioeconomic_status'],
running_var='prior_score',
cutoff_value=70
)
print("Covariate balance results:")
for var, result in balance_test.items():
print(f"{var}: discontinuity = {result.discontinuity:.3f}, p = {result.p_value:.3f}")
Interpretation: - Non-significant discontinuities support validity - Significant jumps suggest potential confounding - Pattern of imbalances may indicate manipulation
Bandwidth Sensitivity
Test robustness to bandwidth choice:
# Sensitivity to bandwidth selection
bandwidth_sensitivity = agent.bandwidth_sensitivity(
data=rdd_data,
treatment='above_cutoff',
outcome='test_score',
running_var='prior_score',
cutoff_value=70,
bandwidth_range=[2, 3, 4, 5, 6, 7, 8]
)
print("Bandwidth sensitivity results:")
for bw, estimate in bandwidth_sensitivity.items():
print(f"Bandwidth {bw}: Effect = {estimate.effect:.3f} (SE = {estimate.se:.3f})")
Placebo Cutoff Tests
Test for treatment effects at fake cutoffs:
# Placebo tests at alternative cutoffs
placebo_tests = agent.placebo_cutoff_tests(
data=rdd_data,
treatment='above_cutoff',
outcome='test_score',
running_var='prior_score',
true_cutoff=70,
placebo_cutoffs=[65, 67.5, 72.5, 75]
)
print("Placebo test results:")
for cutoff, result in placebo_tests.items():
print(f"Cutoff {cutoff}: Effect = {result.effect:.3f}, p = {result.p_value:.3f}")
Interpretation: - Non-significant effects at placebo cutoffs support validity - Significant effects suggest confounding or model misspecification
Functional Form Testing
Test sensitivity to polynomial order and functional form:
# Test different polynomial orders
functional_form_test = agent.functional_form_sensitivity(
data=rdd_data,
treatment='above_cutoff',
outcome='test_score',
running_var='prior_score',
cutoff_value=70,
polynomial_orders=[1, 2, 3, 4]
)
Best Practices
Design and Data Collection
Running Variable Selection: - Choose variables that determine treatment assignment - Ensure precise measurement around cutoff - Document assignment rules clearly - Consider multiple running variables if relevant
Sample Size Planning: - Focus observations near the cutoff - Ensure adequate power for local effects - Consider optimal sample allocation - Plan for potential manipulation
Data Quality: - Verify assignment rule implementation - Check for measurement error in running variable - Document any exceptions or overrides - Collect rich covariate data for validation
Analysis Implementation
Bandwidth Selection: - Use data-driven optimal bandwidth methods - Report sensitivity to bandwidth choice - Consider different bandwidths for different outcomes - Balance bias-variance tradeoff
Functional Form: - Start with local linear regression - Test sensitivity to polynomial order - Consider nonparametric methods - Avoid overfitting with high-order polynomials
Standard Errors: - Use robust standard errors - Consider clustering if appropriate - Account for bandwidth selection uncertainty - Report confidence intervals
Validation and Robustness
Assumption Testing: - Always conduct manipulation tests - Check covariate balance at cutoff - Test for other discontinuities - Examine institutional details
Sensitivity Analysis: - Vary bandwidth systematically - Test different functional forms - Exclude observations very close to cutoff - Use alternative estimation methods
Transparency: - Report all diagnostic tests - Show graphical evidence - Discuss institutional context - Acknowledge limitations
Common Pitfalls and Solutions
Pitfall: Using inappropriate bandwidth Solution: Use optimal bandwidth methods and test sensitivity
Pitfall: Ignoring manipulation possibilities Solution: Always conduct McCrary tests and examine institutional incentives
Pitfall: Overfitting with high-order polynomials Solution: Use local linear regression and test functional form sensitivity
Pitfall: Misinterpreting local effects as global Solution: Clearly state that RDD estimates local effects at the cutoff
Pitfall: Inadequate sample size near cutoff Solution: Focus data collection near cutoff and conduct power analysis
Example: Educational Remediation Program
Research Question: What is the effect of mandatory tutoring on student achievement?
Setting: Students with test scores below 70 are required to attend tutoring - Running Variable: Prior test score (0-100) - Cutoff: Score of 70 - Treatment: Mandatory tutoring participation - Outcome: End-of-year test score
Analysis:
# Sharp RDD analysis
result = agent.analyze(
data=education_rdd,
treatment='mandatory_tutoring',
outcome='end_year_score',
running_var='prior_test_score',
cutoff_value=70
)
# Validation tests
manipulation_test = agent.mccrary_test(
data=education_rdd,
running_var='prior_test_score',
cutoff_value=70
)
balance_test = agent.covariate_balance_rdd(
data=education_rdd,
covariates=['age', 'gender', 'free_lunch'],
running_var='prior_test_score',
cutoff_value=70
)
print(f"RDD Treatment Effect: {result.ate:.2f} points")
print(f"95% CI: [{result.ci_lower:.2f}, {result.ci_upper:.2f}]")
print(f"McCrary test p-value: {manipulation_test.p_value:.3f}")
Results Interpretation: Students just below the cutoff (required to attend tutoring) scored X points higher on the end-of-year test compared to students just above the cutoff. The McCrary test shows no evidence of score manipulation (p = 0.XX).
Advanced RDD Methods
Multi-Cutoff RDD
When multiple cutoffs exist:
# Multiple cutoffs analysis
result = agent.analyze(
data=multi_cutoff_data,
treatment='treatment_intensity',
outcome='outcome_var',
running_var='score',
cutoff_values=[50, 70, 85],
method='multi_cutoff_rdd'
)
Geographic RDD
Using geographic boundaries as cutoffs:
# Geographic discontinuity
result = agent.analyze(
data=geographic_data,
treatment='policy_exposure',
outcome='outcome_var',
running_var='distance_to_boundary',
cutoff_value=0,
method='geographic_rdd'
)
Regression Kink Design
When treatment intensity (rather than probability) changes at cutoff:
# Regression kink design
result = agent.analyze(
data=kink_data,
treatment='treatment_intensity',
outcome='outcome_var',
running_var='eligibility_score',
cutoff_value=75,
method='regression_kink'
)
Further Reading
Foundational Papers: - Thistlethwaite, D.L. & Campbell, D.T. (1960). “Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment” - Hahn, J., Todd, P. & Van der Klaauw, W. (2001). “Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design” - Imbens, G.W. & Lemieux, T. (2008). “Regression Discontinuity Designs: A Guide to Practice”
Modern Developments: - Calonico, S., Cattaneo, M.D. & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs” - Cattaneo, M.D., Idrobo, N. & Titiunik, R. (2019). “A Practical Introduction to Regression Discontinuity Designs: Foundations” - Kolesár, M. & Rothe, C. (2018). “Inference in Regression Discontinuity Designs with a Discrete Running Variable”
Practical Guides: - Lee, D.S. & Lemieux, T. (2010). “Regression Discontinuity Designs in Economics” - Jacob, R., Zhu, P., Somers, M.A. & Bloom, H. (2012). “A Practical Guide to Regression Discontinuity”