Economic Policy Impact: Minimum Wage Analysis
This case study demonstrates how CAIS analyzes the causal impact of minimum wage policies using regression discontinuity design. We’ll explore how the agent identifies and exploits policy discontinuities for causal identification.
Problem Statement
Research Question: What is the causal effect of minimum wage increases on employment levels?
Context: Different states implemented minimum wage increases at different times, creating geographic discontinuities at state borders. We can exploit these discontinuities to identify causal effects while controlling for other economic factors.
Policy Relevance: Understanding the employment effects of minimum wage policies is crucial for evidence-based policy making and has been a subject of extensive economic debate.
Dataset Overview
Source: State-level employment and wage data with geographic identifiers Sample Size: 2,847 county-month observations Treatment: Minimum wage increase (binary) Outcome: Employment rate (percentage) Key Variables:
employment_rate: County employment rate (%)min_wage_increase: Binary indicator for minimum wage increasedistance_to_border: Distance to state border (miles)population: County populationmedian_income: Median household incomeunemployment_rate: Baseline unemployment rate
import pandas as pd
from causal_agent import CausalAgent
# Load the minimum wage dataset
df = pd.read_csv('data/all_data/min_wage_data.csv')
print("Dataset shape:", df.shape)
print("\nTreatment distribution:")
print(df['min_wage_increase'].value_counts())
print("\nRunning variable summary:")
print(df['distance_to_border'].describe())
Agent Decision-Making Process
Let’s trace through how CAIS identifies the regression discontinuity design opportunity.
Step 1: Initial Data Analysis
The agent examines the dataset structure and identifies the discontinuous treatment assignment:
agent = CausalAgent()
# Agent performs automatic data analysis
analysis_result = agent.analyze(
data=df,
treatment='min_wage_increase',
outcome='employment_rate',
verbose=True
)
Agent Output:
🔍 CAIS Data Analysis
=====================
Dataset Characteristics:
- Sample size: 2,847 observations
- Treatment variable: 'min_wage_increase' (binary)
- Outcome variable: 'employment_rate' (continuous)
- Missing values: 0.8% (manageable)
Treatment Assignment Analysis:
- Treatment group: 1,423 counties (50.0%)
- Control group: 1,424 counties (50.0%)
- Assignment appears NON-RANDOM ⚠️
🔍 Searching for Running Variables...
Potential Running Variable Detected:
- Variable: 'distance_to_border'
- Type: Continuous
- Range: [-150.2, 149.8] miles
- Cutoff candidate: 0 miles (state border)
📊 Discontinuity Analysis:
- Treatment probability jump at cutoff: 0.98
- Sharp discontinuity detected ✓
- Bandwidth assessment: Optimal = 45.2 miles
Step 3: RDD Specification and Validation
The agent implements RDD with appropriate specification choices:
# Agent automatically implements RDD analysis
rdd_results = analysis_result.get_rdd_analysis()
print("RDD Specification:")
print(rdd_results.specification)
RDD Implementation:
📊 Regression Discontinuity Implementation
==========================================
Running Variable: distance_to_border (miles)
Cutoff: 0 miles (state border)
Treatment: min_wage_increase (binary)
Outcome: employment_rate (%)
Specification Choices:
- Bandwidth: 45.2 miles (MSE-optimal)
- Polynomial order: 1 (linear)
- Kernel: Triangular
- Sample size: 1,847 observations within bandwidth
Model Equation:
employment_rate = α + β×min_wage_increase + γ×distance_to_border +
δ×(distance_to_border × min_wage_increase) + ε
Continuity Checks:
🔍 RDD Assumption Validation
============================
1. Continuity of Covariates:
- Population: No jump at cutoff (p = 0.234)
- Median income: No jump at cutoff (p = 0.456)
- Baseline unemployment: No jump at cutoff (p = 0.123)
✓ Covariates continuous at cutoff
2. Density Test (McCrary):
- Test statistic: 0.089
- P-value: 0.234
✓ No evidence of manipulation
3. Bandwidth Sensitivity:
- 30 miles: -1.23 [-2.45, -0.01]
- 45 miles: -1.34 [-2.12, -0.56] (selected)
- 60 miles: -1.28 [-1.98, -0.58]
✓ Results stable across bandwidths
Step 4: Treatment Effect Estimation
With validated RDD design, the agent estimates the causal effect:
# Get RDD treatment effect results
results = analysis_result.get_results()
print("RDD Results:")
print(results.summary())
Causal Effect Results:
🎯 RDD Treatment Effect Results
===============================
Local Average Treatment Effect (LATE): -1.34 percentage points
95% Confidence Interval: [-2.12, -0.56]
P-value: 0.001
Interpretation:
Minimum wage increases cause a 1.34 percentage point
reduction in employment rates at the state border.
This represents a statistically significant negative
employment effect of minimum wage policy.
Effect Size:
- Relative to baseline: -2.1% employment reduction
- Economic significance: Moderate negative effect
- Policy implication: Employment-wage tradeoff exists
Visual Evidence:
📈 RDD Plot Interpretation
==========================
Key Visual Features:
✓ Clear discontinuous jump in employment at border
✓ Smooth trends on both sides of cutoff
✓ No obvious confounding jumps in covariates
✓ Adequate density of observations near cutoff
Treatment Effect Visualization:
- Left of cutoff (no min wage): ~63.2% employment
- Right of cutoff (min wage): ~61.9% employment
- Discontinuous jump: -1.34 percentage points
Method Exclusion Examples
Let’s examine why other methods were excluded for this dataset:
Difference-in-Differences
Why Excluded:
❌ Difference-in-Differences: EXCLUDED
Reason: Insufficient temporal variation
- Requires: Panel data with pre/post treatment periods
- Available: Cross-sectional data at single time point
- Missing: Time series variation in treatment
- Alternative: Could work with panel data over time
What Would Be Needed: - Multiple time periods before and after policy implementation - Variation in timing of minimum wage increases across states - Parallel trends assumption between treatment and control states
Instrumental Variables
Why Excluded:
❌ Instrumental Variables: EXCLUDED
Reason: RDD provides cleaner identification
- Geographic discontinuity is stronger than potential instruments
- No need for additional instruments when RDD is available
- RDD assumptions more credible than IV exclusion restriction
- Conclusion: RDD is preferred identification strategy
When IV Might Be Preferred: - If geographic discontinuity were fuzzy rather than sharp - If other factors also jumped discontinuously at border - If manipulation of running variable were suspected
Propensity Score Methods
Why Excluded:
❌ Propensity Score Methods: EXCLUDED
Reason: Geographic treatment assignment
- Treatment determined by location, not individual characteristics
- Propensity scores would be deterministic (0 or 1)
- No meaningful variation to model treatment probability
- RDD exploits geographic variation more appropriately
When Matching Might Work: - If analyzing individual-level data within states - If treatment varied by individual characteristics - If geographic variation were not available
Robustness Analysis
The agent performs comprehensive RDD robustness checks:
Bandwidth Sensitivity
# Agent tests multiple bandwidths
robustness = analysis_result.get_robustness_checks()
print("Bandwidth Sensitivity:")
for check in robustness['bandwidth_tests']:
print(f"Bandwidth {check.bandwidth}: {check.estimate} {check.ci}")
Bandwidth Results:
🔍 Bandwidth Sensitivity Analysis
=================================
Bandwidth Selection Methods:
- MSE-optimal: 45.2 miles → -1.34 [-2.12, -0.56]
- CER-optimal: 38.7 miles → -1.41 [-2.28, -0.54]
- Rule-of-thumb: 52.1 miles → -1.28 [-1.95, -0.61]
Manual Bandwidth Tests:
- 30 miles: -1.23 [-2.45, -0.01] (wider CI, smaller sample)
- 60 miles: -1.28 [-1.98, -0.58] (similar estimate)
- 75 miles: -1.19 [-1.87, -0.51] (similar estimate)
Conclusion: Results robust across reasonable bandwidths
Polynomial Order Sensitivity
📊 Polynomial Order Robustness
==============================
Specification Tests:
- Linear (order 1): -1.34 [-2.12, -0.56] (selected)
- Quadratic (order 2): -1.28 [-2.18, -0.38] (similar)
- Cubic (order 3): -1.41 [-2.35, -0.47] (similar)
Model Selection:
- AIC favors: Linear specification
- BIC favors: Linear specification
- Cross-validation: Linear performs best
Conclusion: Linear specification appropriate
Placebo Tests
🧪 Placebo and Falsification Tests
==================================
Placebo Cutoffs:
- -25 miles: 0.12 [-0.89, 1.13] (not significant)
- +25 miles: -0.23 [-1.34, 0.88] (not significant)
- -50 miles: 0.34 [-0.78, 1.46] (not significant)
Placebo Outcomes:
- Population density: 0.05 [-0.12, 0.22] (not significant)
- Median income: 234 [-1,234, 1,702] (not significant)
- Education levels: 0.02 [-0.15, 0.19] (not significant)
Conclusion: No spurious discontinuities detected
Decision Tree Alternative Scenarios
Let’s explore how different data characteristics would change the analysis:
Scenario 1: Fuzzy Discontinuity
Hypothetical: Treatment probability jumps but doesn’t reach 100%
flowchart TD
A[Fuzzy RDD Data] --> B{Is this randomized?}
B -->|No ✗| C{Panel data available?}
C -->|No ✗| D{Running variable with cutoff?}
D -->|Yes ✓| E{Sharp discontinuity?}
E -->|No ✗| F{Fuzzy discontinuity?}
F -->|Yes ✓| G[Fuzzy Regression<br/>Discontinuity]
style A fill:#e3f2fd
style B fill:#ffebee
style C fill:#ffebee
style D fill:#fff3e0
style E fill:#ffebee
style F fill:#fff3e0
style G fill:#e8f5e8
Alternative Analysis: - Method: Fuzzy RDD (instrumental variables approach) - First stage: Running variable predicts treatment probability - Second stage: Predicted treatment affects outcome - Interpretation: Local average treatment effect for compliers
Scenario 2: Panel Data Available
Hypothetical: Same counties observed before and after policy changes
flowchart TD
A[Panel RDD Data] --> B{Is this randomized?}
B -->|No ✗| C{Panel data available?}
C -->|Yes ✓| D{Treatment timing varies?}
D -->|Yes ✓| E{Also running variable?}
E -->|Yes ✓| F[Difference-in-Differences<br/>or RDD]
style A fill:#e3f2fd
style B fill:#ffebee
style C fill:#fff3e0
style D fill:#fff3e0
style E fill:#fff3e0
style F fill:#e8f5e8
Method Choice Considerations: - DiD: Exploits timing variation, controls for fixed effects - RDD: Exploits geographic variation, local identification - Combined: Could use both for robustness - Agent would likely prefer DiD for stronger identification
Economic Interpretation
Policy Implications
Employment Effects: - Magnitude: 1.34 percentage point reduction - Baseline employment: 63.2% - Relative effect: 2.1% reduction in employment - Economic significance: Moderate negative effect
Cost-Benefit Analysis:
💰 Economic Impact Assessment
=============================
Employment Effects:
- Jobs lost per county: ~89 jobs (based on labor force)
- Total affected workers: ~164,000 across all counties
- Unemployment increase: 1.34 percentage points
Wage Effects (for remaining workers):
- Minimum wage increase: $2.50/hour average
- Annual wage gain: ~$5,200 per worker
- Total wage gains: ~$1.2 billion
Net Welfare Effects:
- Worker benefits: Higher wages for employed
- Worker costs: Some job losses
- Employer costs: Higher labor costs
- Consumer effects: Potentially higher prices
Policy Recommendations:
Gradual Implementation: Phase in wage increases to minimize employment disruption
Targeted Support: Provide job training for displaced workers
Regional Variation: Consider local economic conditions
Monitoring: Track long-term employment and wage effects
Limitations and External Validity
RDD-Specific Limitations:
Local Effects: Results only apply near state borders
External Validity: May not generalize to interior regions
Short-term Effects: Cannot capture long-term adjustments
Spillover Effects: May miss cross-border labor mobility
Economic Considerations:
⚠️ Interpretation Caveats
=========================
Geographic Limitations:
- Results specific to border counties
- May differ from state-wide effects
- Border economies may be unique
Temporal Limitations:
- Cross-sectional snapshot
- Cannot capture dynamic adjustments
- Firms may adapt over time
Equilibrium Effects:
- Partial equilibrium analysis
- May miss general equilibrium responses
- Price and wage adjustments not captured
Comparison with Literature
Existing Research: - Card & Krueger (1994): No employment effects using DiD - Neumark & Wascher (2000): Negative employment effects - Dube et al. (2010): Mixed results depending on method
CAIS Contribution: - Systematic method selection based on data structure - Transparent identification strategy - Comprehensive robustness analysis - Replicable methodology
Learning Objectives Achieved
After working through this case study, you should understand:
✅ Regression Discontinuity: How to identify and exploit policy discontinuities
✅ Running Variables: How to detect continuous assignment variables
✅ Sharp vs. Fuzzy: Different types of discontinuous treatment assignment
✅ Bandwidth Selection: How to choose optimal bandwidth for RDD
✅ Assumption Testing: How to validate RDD assumptions
✅ Economic Interpretation: How to translate RDD results into policy insights
Next Steps
Explore Fuzzy RDD: Analyze cases with imperfect compliance
Try Alternative Bandwidths: Test sensitivity to bandwidth choice
Examine Heterogeneity: Look for effects across different county types
Read Method Documentation: Deep dive into ../methods/quasi_experimental/regression_discontinuity
Related Case Studies: - Education Policy Analysis: Learning Mindset Intervention - Randomized experiment analysis - Healthcare Treatment Effects: Hospital Treatment Analysis - Propensity score matching - Marketing Campaign Evaluation: Instrumental Variables Analysis - Instrumental variables approach
Download Materials: - Minimum Wage Dataset - Complete Analysis Notebook - Replication Code