causal_agent.methods package
Causal inference methods for the causal_agent module.
This package contains implementations of various causal inference methods that can be selected and applied by the causal_agent pipeline.
- class causal_agent.methods.CausalMethod[source]
Bases:
ABCBase class for all causal inference methods.
This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.
Each implementation should handle the specifics of the causal inference method while conforming to this interface.
- abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]
Validate method assumptions against the dataset.
- Parameters:
- Returns:
assumptions_valid (bool): Whether all assumptions are met
failed_assumptions (List[str]): List of failed assumptions
warnings (List[str]): List of warnings
suggestions (List[str]): Suggestions for addressing issues
- Return type:
Dict containing validation results with keys
- abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]
Estimate causal effect using this method.
- Parameters:
- Returns:
effect_estimate (float): Estimated causal effect
confidence_interval (tuple): Confidence interval (lower, upper)
p_value (float): P-value of the estimate
additional_metrics (Dict): Any method-specific metrics
- Return type:
Dict containing estimation results with keys
- causal_agent.methods.psm_estimate_effect(df, treatment, outcome, covariates, **kwargs)
Estimate ATT using Propensity Score Matching. Tries DoWhy’s PSM first, falls back to custom implementation if DoWhy fails. Uses bootstrap SE based on the custom implementation regardless.
- causal_agent.methods.psw_estimate_effect(df, treatment, outcome, covariates, **kwargs)
Generic propensity score weighting (IPW) implementation.
- Parameters:
- Returns:
Dictionary with effect estimate and diagnostics
- Return type:
- causal_agent.methods.iv_estimate_effect(df, treatment, outcome, covariates, query=None, dataset_description=None, llm=None, **kwargs)
- causal_agent.methods.did_estimate_effect(df, treatment, outcome, covariates, dataset_description=None, query=None, **kwargs)
Difference-in-Differences estimation using DoWhy with Statsmodels fallback.
- Parameters:
df (DataFrame) – Dataset containing causal variables
treatment (str) – Name of treatment variable (or variable indicating treated group)
outcome (str) – Name of outcome variable
dataset_description (str | None) – Optional dictionary describing the dataset
**kwargs – Method-specific parameters (e.g., time_var, group_var, query, llm instance if needed)
- Returns:
Dictionary with effect estimate and diagnostics
- Return type:
- causal_agent.methods.rdd_estimate_effect(df, treatment, outcome, running_variable, cutoff_value, covariates=None, bandwidth=None, query=None, llm=None, **kwargs)
Estimates the causal effect using Regression Discontinuity Design.
Tries DoWhy implementation first if use_dowhy=True, otherwise uses fallback.
- Parameters:
df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable (often implicitly defined by cutoff). DoWhy might still need it, fallback doesn’t use it directly.
outcome (str) – Name of the outcome variable.
running_variable (str) – Name of the variable determining treatment assignment.
cutoff – The threshold value for the running variable.
covariates (List[str] | None) – Optional list of covariate names (support varies).
bandwidth (float | None) – Optional bandwidth around the cutoff. If None, a default is used.
use_dowhy – Whether to attempt using the DoWhy library first.
query (str | None) – Optional user query for context.
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments for underlying methods.
- Returns:
Dictionary containing estimation results.
- Return type:
- causal_agent.methods.dim_estimate_effect(df, treatment, outcome, query=None, llm=None, **kwargs)
Estimates the causal effect using Difference in Means (via OLS).
Ignores any provided covariates.
- Parameters:
df (DataFrame) – Input DataFrame.
treatment (str) – Name of the binary treatment variable column (should be 0 or 1).
outcome (str) – Name of the outcome variable column.
query (str | None) – Optional user query for context.
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments (ignored).
- Returns:
‘effect_estimate’: The difference in means (treatment coefficient).
’p_value’: The p-value associated with the difference.
’confidence_interval’: The 95% confidence interval for the difference.
’standard_error’: The standard error of the difference.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Basic group statistics.
’interpretation’: LLM interpretation.
- Return type:
Dictionary containing estimation results
- causal_agent.methods.lr_estimate_effect(df, treatment, outcome, covariates=None, query_str=None, llm=None, **kwargs)
Estimates the causal effect using Linear Regression (OLS).
- Parameters:
df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable column.
outcome (str) – Name of the outcome variable column.
covariates (List[str] | None) – Optional list of covariate names.
query_str (str | None) – Optional user query for context (e.g., for LLM).
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments.
- Returns:
‘effect_estimate’: The estimated coefficient for the treatment variable.
’p_value’: The p-value associated with the treatment coefficient.
’confidence_interval’: The 95% confidence interval for the effect.
’standard_error’: The standard error of the treatment coefficient.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Placeholder for diagnostic results.
’interpretation’: Placeholder for LLM interpretation.
- Return type:
Dictionary containing estimation results
- causal_agent.methods.ba_estimate_effect(df, treatment, outcome, covariates, query=None, llm=None, **kwargs)
Estimates the causal effect using Backdoor Adjustment (via OLS regression).
Assumes the provided covariates list satisfies the backdoor criterion.
- Parameters:
df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable column.
outcome (str) – Name of the outcome variable column.
covariates (List[str]) – List of covariate names forming the backdoor adjustment set.
query (str | None) – Optional user query for context (e.g., for LLM).
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments.
- Returns:
‘effect_estimate’: The estimated coefficient for the treatment variable.
’p_value’: The p-value associated with the treatment coefficient.
’confidence_interval’: The 95% confidence interval for the effect.
’standard_error’: The standard error of the treatment coefficient.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Placeholder for diagnostic results.
’interpretation’: LLM interpretation.
- Return type:
Dictionary containing estimation results
- causal_agent.methods.estimate_effect_gps(df, treatment, outcome, covariates, **kwargs)[source]
Estimates the causal effect using the Generalized Propensity Score method for continuous treatments.
This function will be called by the method_executor_tool.
- Parameters:
df (DataFrame) – The input DataFrame.
treatment (str) – The name of the continuous treatment variable column.
outcome (str) – The name of the outcome variable column.
**kwargs (Any) – Additional arguments for controlling the estimation, including: - gps_model_spec (dict): Specification for the GPS model (T ~ X). - outcome_model_spec (dict): Specification for the outcome model (Y ~ T, GPS). - t_values_range (list or dict): Specification for treatment levels for ADRF. - n_bootstraps (int): Number of bootstrap replications for SEs.
- Returns:
“effect_estimate”: Typically the ADRF or a specific contrast.
”standard_error”: Standard error for the primary effect estimate.
”confidence_interval”: Confidence interval for the primary estimate.
”adrf_curve”: Data representing the Average Dose-Response Function.
”specific_contrasts”: Any calculated specific contrasts.
”diagnostics”: Results from diagnostic checks (e.g., balance).
”method_details”: Description of the method and models used.
”parameters_used”: Dictionary of parameters used.
- Return type:
A dictionary containing the estimation results, including
Submodules
causal_agent.methods.causal_method module
Abstract base class for all causal inference methods.
This module defines the interface that all causal inference methods must implement, ensuring consistent behavior across different methods.
- class causal_agent.methods.causal_method.CausalMethod[source]
Bases:
ABCBase class for all causal inference methods.
This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.
Each implementation should handle the specifics of the causal inference method while conforming to this interface.
- abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]
Validate method assumptions against the dataset.
- Parameters:
- Returns:
assumptions_valid (bool): Whether all assumptions are met
failed_assumptions (List[str]): List of failed assumptions
warnings (List[str]): List of warnings
suggestions (List[str]): Suggestions for addressing issues
- Return type:
Dict containing validation results with keys
- abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]
Estimate causal effect using this method.
- Parameters:
- Returns:
effect_estimate (float): Estimated causal effect
confidence_interval (tuple): Confidence interval (lower, upper)
p_value (float): P-value of the estimate
additional_metrics (Dict): Any method-specific metrics
- Return type:
Dict containing estimation results with keys
causal_agent.methods.utils module
Utility functions for causal inference methods.
This module provides common utility functions used across different causal inference methods.
- causal_agent.methods.utils.check_binary_treatment(treatment_series)[source]
Check if treatment variable is binary.
- Parameters:
treatment_series (Series) – Series containing treatment variable
- Returns:
Boolean indicating if treatment is binary
- Return type:
- causal_agent.methods.utils.calculate_standardized_differences(df, treatment, covariates)[source]
Calculate standardized differences between treated and control groups.
- causal_agent.methods.utils.check_overlap(df, treatment, propensity_scores, threshold=0.5)[source]
Check overlap in propensity scores between treated and control groups.
- Parameters:
- Returns:
Dictionary with overlap statistics
- Return type:
- causal_agent.methods.utils.plot_propensity_overlap(df, treatment, propensity_scores, save_path=None)[source]
Plot overlap in propensity scores.
- causal_agent.methods.utils.plot_covariate_balance(standardized_diffs, threshold=0.1, save_path=None)[source]
Plot standardized differences for covariates before and after matching.
- causal_agent.methods.utils.check_temporal_structure(df)[source]
Check if dataset has temporal structure.
- causal_agent.methods.utils.check_for_discontinuities(df, outcome, threshold_zscore=3.0)[source]
Check for potential discontinuities in continuous variables.
- causal_agent.methods.utils.find_potential_instruments(df, treatment, outcome, correlation_threshold=0.3)[source]
Find potential instrumental variables.
- causal_agent.methods.utils.test_parallel_trends(df, treatment, outcome, time_var, unit_var)[source]
Test for parallel trends assumption in difference-in-differences.
- causal_agent.methods.utils.preprocess_data(df, treatment_var, outcome_var, covariates, verbose=True)[source]
Preprocess the dataset to handle missing values and encode categorical variables.
- Parameters:
- Returns:
Preprocessed dataset, updated treatment var name, updated outcome var name, updated covariates list, and column mappings.
- Return type:
Subpackages
- causal_agent.methods.backdoor_adjustment package
- causal_agent.methods.diff_in_means package
- causal_agent.methods.generalized_propensity_score package
- causal_agent.methods.instrumental_variable package
- causal_agent.methods.linear_regression package
- causal_agent.methods.propensity_score package
estimate_propensity_scores()estimate_matching_effect()estimate_weighting_effect()assess_balance()plot_overlap()plot_balance()- Submodules
- causal_agent.methods.propensity_score.base module
- causal_agent.methods.propensity_score.diagnostics module
- causal_agent.methods.propensity_score.llm_assist module
- causal_agent.methods.propensity_score.matching module
- causal_agent.methods.propensity_score.weighting module
- causal_agent.methods.regression_discontinuity package