Observational Methods
====================

Observational methods extract causal insights from non-experimental data by controlling for confounding variables and making identifying assumptions about selection processes.

.. toctree::
   :maxdepth: 2

   propensity_score_matching
   propensity_score_weighting
   backdoor_adjustment
   linear_regression

Overview
--------

Observational methods are used when experimental or quasi-experimental designs are not available. These methods rely on the assumption that all confounding variables are observed and controlled for, making causal identification possible through statistical adjustment.

**Key Advantages:**
* Can be applied to any observational dataset
* Utilize existing data sources
* Cost-effective compared to experiments
* Allow for large sample sizes

**Key Limitations:**
* Strong unconfoundedness assumption required
* Vulnerable to omitted variable bias
* Selection on unobservables cannot be ruled out
* Require rich covariate data

Method Details
--------------

**Propensity Score Matching**
   Matches treated and control units with similar propensity scores.
   
   * **When to use**: Rich covariate data, clear treatment definition
   * **Key assumption**: Unconfoundedness given observed covariates
   * **Strengths**: Intuitive matching logic, balances covariates
   * **Limitations**: Curse of dimensionality, common support issues

**Propensity Score Weighting**
   Reweights observations to balance treatment and control groups.
   
   * **When to use**: When matching is not feasible or desirable
   * **Key assumption**: Unconfoundedness and positivity
   * **Strengths**: Uses all observations, flexible weighting schemes
   * **Limitations**: Sensitive to extreme weights, model dependence

**Backdoor Adjustment**
   Controls for confounders identified through causal graphs.
   
   * **When to use**: When causal graph is well-understood
   * **Key assumption**: Backdoor criterion satisfied
   * **Strengths**: Principled confounder selection
   * **Limitations**: Requires causal graph knowledge

**Linear Regression**
   Controls for confounders through linear regression adjustment.
   
   * **When to use**: Linear relationships, continuous outcomes
   * **Key assumption**: Correct functional form, no omitted variables
   * **Strengths**: Simple, interpretable, widely understood
   * **Limitations**: Strong functional form assumptions

Implementation in Causal Agent
-----------------------

Causal Agent automatically selects and implements appropriate observational methods:

.. code-block:: python

   from causal_agent import CausalAgent
   
   # Causal Agent selects best observational method
   agent = CausalAgent()
   result = agent.analyze(
       data=observational_data,
       treatment='treatment_variable',
       outcome='outcome_variable',
       covariates=['covar1', 'covar2', 'covar3']
   )

**Automatic Method Selection**
   Causal Agent chooses methods based on:
   * Data characteristics (sample size, covariate richness)
   * Treatment assignment patterns
   * Outcome variable type
   * User preferences and constraints

**Covariate Selection**
   * Automatic confounder detection
   * Causal graph-based selection when available
   * Statistical significance-based inclusion
   * Domain knowledge integration

Assumption Validation
---------------------

**Unconfoundedness**
   * Cannot be directly tested
   * Sensitivity analyses for unobserved confounding
   * Placebo tests using pre-treatment outcomes
   * Comparison with experimental benchmarks when available

**Positivity/Common Support**
   * Propensity score distribution overlap
   * Trimming observations outside common support
   * Diagnostic plots and statistics
   * Sensitivity to support restrictions

**Correct Specification**
   * Model specification tests
   * Functional form diagnostics
   * Residual analysis
   * Cross-validation approaches

Balance Assessment
------------------

**Covariate Balance**
   * Standardized mean differences
   * Variance ratios
   * Kolmogorov-Smirnov tests
   * Graphical balance assessment

**Propensity Score Balance**
   * Propensity score distribution comparison
   * Stratification balance tests
   * Matching quality diagnostics
   * Weighting effectiveness measures

Best Practices
--------------

**Study Design**
   * Collect rich covariate data
   * Include pre-treatment outcomes when possible
   * Consider multiple comparison groups
   * Document data collection process

**Analysis**
   * Check balance before and after adjustment
   * Conduct sensitivity analyses
   * Use multiple methods for robustness
   * Report all diagnostic results

**Interpretation**
   * Acknowledge unconfoundedness assumption
   * Discuss potential sources of bias
   * Consider external validity
   * Report confidence intervals and uncertainty

Sensitivity Analysis
--------------------

**Unobserved Confounding**
   * Rosenbaum bounds for matched samples
   * Imbens sensitivity analysis
   * Simulation-based approaches
   * Benchmarking against known confounders

**Model Specification**
   * Alternative functional forms
   * Different covariate sets
   * Various matching/weighting schemes
   * Robustness to outliers

**Sample Restrictions**
   * Different common support definitions
   * Trimming strategies
   * Subgroup analyses
   * Temporal stability

Common Challenges
-----------------

**Data Quality Issues**
   * Missing covariate data
   * Measurement error in variables
   * Inconsistent variable definitions
   * Sample selection issues

**Methodological Challenges**
   * Curse of dimensionality
   * Extreme propensity scores
   * Poor covariate balance
   * Model dependence

**Interpretation Issues**
   * Distinguishing correlation from causation
   * Communicating uncertainty
   * Addressing skepticism about assumptions
   * Policy relevance of estimates

Advanced Topics
---------------

**Machine Learning Methods**
   * Targeted maximum likelihood estimation (TMLE)
   * Double machine learning
   * Causal forests
   * Neural network-based methods

**Multiple Treatments**
   * Generalized propensity scores
   * Multiple treatment matching
   * Dose-response relationships
   * Treatment interaction effects

**Time-Varying Treatments**
   * Marginal structural models
   * G-computation
   * Inverse probability weighting over time
   * Sequential ignorability