causal_agent.methods package

Causal inference methods for the causal_agent module.

This package contains implementations of various causal inference methods that can be selected and applied by the causal_agent pipeline.

class causal_agent.methods.CausalMethod[source]

Bases: ABC

Base class for all causal inference methods.

This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.

Each implementation should handle the specifics of the causal inference method while conforming to this interface.

abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]

Validate method assumptions against the dataset.

Parameters:

df (DataFrame) – DataFrame containing the dataset
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

assumptions_valid (bool): Whether all assumptions are met
failed_assumptions (List[str]): List of failed assumptions
warnings (List[str]): List of warnings
suggestions (List[str]): Suggestions for addressing issues

Return type:

Dict containing validation results with keys

abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]

Estimate causal effect using this method.

Parameters:

df (DataFrame) – DataFrame containing the dataset
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

effect_estimate (float): Estimated causal effect
confidence_interval (tuple): Confidence interval (lower, upper)
p_value (float): P-value of the estimate
additional_metrics (Dict): Any method-specific metrics

Return type:

Dict containing estimation results with keys

abstractmethod generate_code(dataset_path, treatment, outcome, covariates)[source]

Generate executable code for this causal method.

Parameters:

dataset_path (str) – Path to the dataset file
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

String containing executable Python code implementing this method

Return type:

str

abstractmethod explain()[source]

Explain this causal method, its assumptions, and when to use it.

Returns:: String with detailed explanation of the method
Return type:: str

causal_agent.methods.psm_estimate_effect(df, treatment, outcome, covariates, **kwargs)

Estimate ATT using Propensity Score Matching. Tries DoWhy’s PSM first, falls back to custom implementation if DoWhy fails. Uses bootstrap SE based on the custom implementation regardless.

causal_agent.methods.psw_estimate_effect(df, treatment, outcome, covariates, **kwargs)

Generic propensity score weighting (IPW) implementation.

Parameters:

df (DataFrame) – Dataset containing causal variables
treatment (str) – Name of treatment variable
outcome (str) – Name of outcome variable
covariates (List[str]) – List of covariate names
**kwargs – Method-specific parameters (e.g., weight_type, trim_threshold, query)

Returns:

Dictionary with effect estimate and diagnostics

Return type:

Dict[str, Any]

causal_agent.methods.iv_estimate_effect(df, treatment, outcome, covariates, query=None, dataset_description=None, llm=None, **kwargs)

causal_agent.methods.did_estimate_effect(df, treatment, outcome, covariates, dataset_description=None, query=None, **kwargs)

Difference-in-Differences estimation using DoWhy with Statsmodels fallback.

Parameters:

df (DataFrame) – Dataset containing causal variables
treatment (str) – Name of treatment variable (or variable indicating treated group)
outcome (str) – Name of outcome variable
covariates (List[str]) – List of covariate names
dataset_description (str | None) – Optional dictionary describing the dataset
**kwargs – Method-specific parameters (e.g., time_var, group_var, query, llm instance if needed)

Returns:

Dictionary with effect estimate and diagnostics

Return type:

Dict[str, Any]

causal_agent.methods.rdd_estimate_effect(df, treatment, outcome, running_variable, cutoff_value, covariates=None, bandwidth=None, query=None, llm=None, **kwargs)

Estimates the causal effect using Regression Discontinuity Design.

Tries DoWhy implementation first if use_dowhy=True, otherwise uses fallback.

Parameters:

df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable (often implicitly defined by cutoff). DoWhy might still need it, fallback doesn’t use it directly.
outcome (str) – Name of the outcome variable.
running_variable (str) – Name of the variable determining treatment assignment.
cutoff – The threshold value for the running variable.
covariates (List[str] | None) – Optional list of covariate names (support varies).
bandwidth (float | None) – Optional bandwidth around the cutoff. If None, a default is used.
use_dowhy – Whether to attempt using the DoWhy library first.
query (str | None) – Optional user query for context.
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments for underlying methods.

Returns:

Dictionary containing estimation results.

Return type:

Dict[str, Any]

causal_agent.methods.dim_estimate_effect(df, treatment, outcome, query=None, llm=None, **kwargs)

Estimates the causal effect using Difference in Means (via OLS).

Ignores any provided covariates.

Parameters:

df (DataFrame) – Input DataFrame.
treatment (str) – Name of the binary treatment variable column (should be 0 or 1).
outcome (str) – Name of the outcome variable column.
query (str | None) – Optional user query for context.
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments (ignored).

Returns:

‘effect_estimate’: The difference in means (treatment coefficient).
’p_value’: The p-value associated with the difference.
’confidence_interval’: The 95% confidence interval for the difference.
’standard_error’: The standard error of the difference.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Basic group statistics.
’interpretation’: LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.lr_estimate_effect(df, treatment, outcome, covariates=None, query_str=None, llm=None, **kwargs)

Estimates the causal effect using Linear Regression (OLS).

Parameters:

df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable column.
outcome (str) – Name of the outcome variable column.
covariates (List[str] | None) – Optional list of covariate names.
query_str (str | None) – Optional user query for context (e.g., for LLM).
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments.

Returns:

‘effect_estimate’: The estimated coefficient for the treatment variable.
’p_value’: The p-value associated with the treatment coefficient.
’confidence_interval’: The 95% confidence interval for the effect.
’standard_error’: The standard error of the treatment coefficient.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Placeholder for diagnostic results.
’interpretation’: Placeholder for LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.ba_estimate_effect(df, treatment, outcome, covariates, query=None, llm=None, **kwargs)

Estimates the causal effect using Backdoor Adjustment (via OLS regression).

Assumes the provided covariates list satisfies the backdoor criterion.

Parameters:

df (DataFrame) – Input DataFrame.
treatment (str) – Name of the treatment variable column.
outcome (str) – Name of the outcome variable column.
covariates (List[str]) – List of covariate names forming the backdoor adjustment set.
query (str | None) – Optional user query for context (e.g., for LLM).
llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.
**kwargs – Additional keyword arguments.

Returns:

‘effect_estimate’: The estimated coefficient for the treatment variable.
’p_value’: The p-value associated with the treatment coefficient.
’confidence_interval’: The 95% confidence interval for the effect.
’standard_error’: The standard error of the treatment coefficient.
’formula’: The regression formula used.
’model_summary’: Summary object from statsmodels.
’diagnostics’: Placeholder for diagnostic results.
’interpretation’: LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.estimate_effect_gps(df, treatment, outcome, covariates, **kwargs)[source]

Estimates the causal effect using the Generalized Propensity Score method for continuous treatments.

This function will be called by the method_executor_tool.

Parameters:

df (DataFrame) – The input DataFrame.
treatment (str) – The name of the continuous treatment variable column.
outcome (str) – The name of the outcome variable column.
covariates (List[str]) – A list of covariate column names.
**kwargs (Any) – Additional arguments for controlling the estimation, including: - gps_model_spec (dict): Specification for the GPS model (T ~ X). - outcome_model_spec (dict): Specification for the outcome model (Y ~ T, GPS). - t_values_range (list or dict): Specification for treatment levels for ADRF. - n_bootstraps (int): Number of bootstrap replications for SEs.

Returns:

“effect_estimate”: Typically the ADRF or a specific contrast.
”standard_error”: Standard error for the primary effect estimate.
”confidence_interval”: Confidence interval for the primary estimate.
”adrf_curve”: Data representing the Average Dose-Response Function.
”specific_contrasts”: Any calculated specific contrasts.
”diagnostics”: Results from diagnostic checks (e.g., balance).
”method_details”: Description of the method and models used.
”parameters_used”: Dictionary of parameters used.

Return type:

A dictionary containing the estimation results, including

Submodules

causal_agent.methods.causal_method module

Abstract base class for all causal inference methods.

This module defines the interface that all causal inference methods must implement, ensuring consistent behavior across different methods.

class causal_agent.methods.causal_method.CausalMethod[source]

Bases: ABC

Base class for all causal inference methods.

This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.

Each implementation should handle the specifics of the causal inference method while conforming to this interface.

abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]

Validate method assumptions against the dataset.

Parameters:

df (DataFrame) – DataFrame containing the dataset
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

assumptions_valid (bool): Whether all assumptions are met
failed_assumptions (List[str]): List of failed assumptions
warnings (List[str]): List of warnings
suggestions (List[str]): Suggestions for addressing issues

Return type:

Dict containing validation results with keys

abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]

Estimate causal effect using this method.

Parameters:

df (DataFrame) – DataFrame containing the dataset
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

effect_estimate (float): Estimated causal effect
confidence_interval (tuple): Confidence interval (lower, upper)
p_value (float): P-value of the estimate
additional_metrics (Dict): Any method-specific metrics

Return type:

Dict containing estimation results with keys

abstractmethod generate_code(dataset_path, treatment, outcome, covariates)[source]

Generate executable code for this causal method.

Parameters:

dataset_path (str) – Path to the dataset file
treatment (str) – Name of the treatment variable column
outcome (str) – Name of the outcome variable column
covariates (List[str]) – List of covariate column names

Returns:

String containing executable Python code implementing this method

Return type:

str

abstractmethod explain()[source]

Explain this causal method, its assumptions, and when to use it.

Returns:: String with detailed explanation of the method
Return type:: str

causal_agent.methods.utils module

Utility functions for causal inference methods.

This module provides common utility functions used across different causal inference methods.

causal_agent.methods.utils.check_binary_treatment(treatment_series)[source]

Check if treatment variable is binary.

Parameters:: treatment_series (Series) – Series containing treatment variable
Returns:: Boolean indicating if treatment is binary
Return type:: bool

causal_agent.methods.utils.calculate_standardized_differences(df, treatment, covariates)[source]

Calculate standardized differences between treated and control groups.

Parameters:

df (DataFrame) – DataFrame containing the data
treatment (str) – Name of treatment variable
covariates (List[str]) – List of covariate variable names

Returns:

Dictionary with standardized differences for each covariate

Return type:

Dict[str, float]

causal_agent.methods.utils.check_overlap(df, treatment, propensity_scores, threshold=0.5)[source]

Check overlap in propensity scores between treated and control groups.

Parameters:

df (DataFrame) – DataFrame containing the data
treatment (str) – Name of treatment variable
propensity_scores (ndarray) – Array of propensity scores
threshold (float) – Threshold for sufficient overlap (proportion of range)

Returns:

Dictionary with overlap statistics

Return type:

Dict[str, Any]

causal_agent.methods.utils.plot_propensity_overlap(df, treatment, propensity_scores, save_path=None)[source]

Plot overlap in propensity scores.

Parameters:

df (DataFrame) – DataFrame containing the data
treatment (str) – Name of treatment variable
propensity_scores (ndarray) – Array of propensity scores
save_path (str | None) – Optional path to save the plot

causal_agent.methods.utils.plot_covariate_balance(standardized_diffs, threshold=0.1, save_path=None)[source]

Plot standardized differences for covariates before and after matching.

Parameters:

standardized_diffs (Dict[str, float]) – Dictionary with standardized differences
threshold (float) – Threshold for acceptable balance
save_path (str | None) – Optional path to save the plot

causal_agent.methods.utils.check_temporal_structure(df)[source]

Check if dataset has temporal structure.

Parameters:: df (DataFrame) – DataFrame to check
Returns:: Dictionary with temporal structure information
Return type:: Dict[str, Any]

causal_agent.methods.utils.check_for_discontinuities(df, outcome, threshold_zscore=3.0)[source]

Check for potential discontinuities in continuous variables.

Parameters:

df (DataFrame) – DataFrame to check
outcome (str) – Name of outcome variable
threshold_zscore (float) – Z-score threshold for detecting discontinuities

Returns:

Dictionary with discontinuity information

Return type:

Dict[str, Any]

causal_agent.methods.utils.find_potential_instruments(df, treatment, outcome, correlation_threshold=0.3)[source]

Find potential instrumental variables.

Parameters:

df (DataFrame) – DataFrame to check
treatment (str) – Name of treatment variable
outcome (str) – Name of outcome variable
correlation_threshold (float) – Threshold for correlation with treatment

Returns:

Dictionary with potential instruments information

Return type:

Dict[str, Any]

causal_agent.methods.utils.test_parallel_trends(df, treatment, outcome, time_var, unit_var)[source]

Test for parallel trends assumption in difference-in-differences.

Parameters:

df (DataFrame) – DataFrame to check
treatment (str) – Name of treatment variable
outcome (str) – Name of outcome variable
time_var (str) – Name of time variable
unit_var (str) – Name of unit variable

Returns:

Dictionary with parallel trends test results

Return type:

Dict[str, Any]

causal_agent.methods.utils.preprocess_data(df, treatment_var, outcome_var, covariates, verbose=True)[source]

Preprocess the dataset to handle missing values and encode categorical variables.

Parameters:

df (pd.DataFrame) – The dataset
treatment_var (str) – The treatment variable name
outcome_var (str) – The outcome variable name
covariates (list) – List of covariate variable names
verbose (bool) – Whether to print verbose output

Returns:

Preprocessed dataset, updated treatment var name, updated outcome var name, updated covariates list, and column mappings.

Return type:

Tuple[pd.DataFrame, str, str, List[str], Dict[str, Any]]

causal_agent.methods.utils.check_collinearity(df, covariates)[source]

causal_agent.methods package

Submodules

causal_agent.methods.causal_method module

causal_agent.methods.utils module

Subpackages