causal_agent.methods package

Causal inference methods for the causal_agent module.

This package contains implementations of various causal inference methods that can be selected and applied by the causal_agent pipeline.

class causal_agent.methods.CausalMethod[source]

Bases: ABC

Base class for all causal inference methods.

This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.

Each implementation should handle the specifics of the causal inference method while conforming to this interface.

abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]

Validate method assumptions against the dataset.

Parameters:
  • df (DataFrame) – DataFrame containing the dataset

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

  • assumptions_valid (bool): Whether all assumptions are met

  • failed_assumptions (List[str]): List of failed assumptions

  • warnings (List[str]): List of warnings

  • suggestions (List[str]): Suggestions for addressing issues

Return type:

Dict containing validation results with keys

abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]

Estimate causal effect using this method.

Parameters:
  • df (DataFrame) – DataFrame containing the dataset

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

  • effect_estimate (float): Estimated causal effect

  • confidence_interval (tuple): Confidence interval (lower, upper)

  • p_value (float): P-value of the estimate

  • additional_metrics (Dict): Any method-specific metrics

Return type:

Dict containing estimation results with keys

abstractmethod generate_code(dataset_path, treatment, outcome, covariates)[source]

Generate executable code for this causal method.

Parameters:
  • dataset_path (str) – Path to the dataset file

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

String containing executable Python code implementing this method

Return type:

str

abstractmethod explain()[source]

Explain this causal method, its assumptions, and when to use it.

Returns:

String with detailed explanation of the method

Return type:

str

causal_agent.methods.psm_estimate_effect(df, treatment, outcome, covariates, **kwargs)

Estimate ATT using Propensity Score Matching. Tries DoWhy’s PSM first, falls back to custom implementation if DoWhy fails. Uses bootstrap SE based on the custom implementation regardless.

causal_agent.methods.psw_estimate_effect(df, treatment, outcome, covariates, **kwargs)

Generic propensity score weighting (IPW) implementation.

Parameters:
  • df (DataFrame) – Dataset containing causal variables

  • treatment (str) – Name of treatment variable

  • outcome (str) – Name of outcome variable

  • covariates (List[str]) – List of covariate names

  • **kwargs – Method-specific parameters (e.g., weight_type, trim_threshold, query)

Returns:

Dictionary with effect estimate and diagnostics

Return type:

Dict[str, Any]

causal_agent.methods.iv_estimate_effect(df, treatment, outcome, covariates, query=None, dataset_description=None, llm=None, **kwargs)
causal_agent.methods.did_estimate_effect(df, treatment, outcome, covariates, dataset_description=None, query=None, **kwargs)

Difference-in-Differences estimation using DoWhy with Statsmodels fallback.

Parameters:
  • df (DataFrame) – Dataset containing causal variables

  • treatment (str) – Name of treatment variable (or variable indicating treated group)

  • outcome (str) – Name of outcome variable

  • covariates (List[str]) – List of covariate names

  • dataset_description (str | None) – Optional dictionary describing the dataset

  • **kwargs – Method-specific parameters (e.g., time_var, group_var, query, llm instance if needed)

Returns:

Dictionary with effect estimate and diagnostics

Return type:

Dict[str, Any]

causal_agent.methods.rdd_estimate_effect(df, treatment, outcome, running_variable, cutoff_value, covariates=None, bandwidth=None, query=None, llm=None, **kwargs)

Estimates the causal effect using Regression Discontinuity Design.

Tries DoWhy implementation first if use_dowhy=True, otherwise uses fallback.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable (often implicitly defined by cutoff). DoWhy might still need it, fallback doesn’t use it directly.

  • outcome (str) – Name of the outcome variable.

  • running_variable (str) – Name of the variable determining treatment assignment.

  • cutoff – The threshold value for the running variable.

  • covariates (List[str] | None) – Optional list of covariate names (support varies).

  • bandwidth (float | None) – Optional bandwidth around the cutoff. If None, a default is used.

  • use_dowhy – Whether to attempt using the DoWhy library first.

  • query (str | None) – Optional user query for context.

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.

  • **kwargs – Additional keyword arguments for underlying methods.

Returns:

Dictionary containing estimation results.

Return type:

Dict[str, Any]

causal_agent.methods.dim_estimate_effect(df, treatment, outcome, query=None, llm=None, **kwargs)

Estimates the causal effect using Difference in Means (via OLS).

Ignores any provided covariates.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the binary treatment variable column (should be 0 or 1).

  • outcome (str) – Name of the outcome variable column.

  • query (str | None) – Optional user query for context.

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.

  • **kwargs – Additional keyword arguments (ignored).

Returns:

  • ‘effect_estimate’: The difference in means (treatment coefficient).

  • ’p_value’: The p-value associated with the difference.

  • ’confidence_interval’: The 95% confidence interval for the difference.

  • ’standard_error’: The standard error of the difference.

  • ’formula’: The regression formula used.

  • ’model_summary’: Summary object from statsmodels.

  • ’diagnostics’: Basic group statistics.

  • ’interpretation’: LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.lr_estimate_effect(df, treatment, outcome, covariates=None, query_str=None, llm=None, **kwargs)

Estimates the causal effect using Linear Regression (OLS).

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable column.

  • outcome (str) – Name of the outcome variable column.

  • covariates (List[str] | None) – Optional list of covariate names.

  • query_str (str | None) – Optional user query for context (e.g., for LLM).

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.

  • **kwargs – Additional keyword arguments.

Returns:

  • ‘effect_estimate’: The estimated coefficient for the treatment variable.

  • ’p_value’: The p-value associated with the treatment coefficient.

  • ’confidence_interval’: The 95% confidence interval for the effect.

  • ’standard_error’: The standard error of the treatment coefficient.

  • ’formula’: The regression formula used.

  • ’model_summary’: Summary object from statsmodels.

  • ’diagnostics’: Placeholder for diagnostic results.

  • ’interpretation’: Placeholder for LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.ba_estimate_effect(df, treatment, outcome, covariates, query=None, llm=None, **kwargs)

Estimates the causal effect using Backdoor Adjustment (via OLS regression).

Assumes the provided covariates list satisfies the backdoor criterion.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable column.

  • outcome (str) – Name of the outcome variable column.

  • covariates (List[str]) – List of covariate names forming the backdoor adjustment set.

  • query (str | None) – Optional user query for context (e.g., for LLM).

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional Language Model instance.

  • **kwargs – Additional keyword arguments.

Returns:

  • ‘effect_estimate’: The estimated coefficient for the treatment variable.

  • ’p_value’: The p-value associated with the treatment coefficient.

  • ’confidence_interval’: The 95% confidence interval for the effect.

  • ’standard_error’: The standard error of the treatment coefficient.

  • ’formula’: The regression formula used.

  • ’model_summary’: Summary object from statsmodels.

  • ’diagnostics’: Placeholder for diagnostic results.

  • ’interpretation’: LLM interpretation.

Return type:

Dictionary containing estimation results

causal_agent.methods.estimate_effect_gps(df, treatment, outcome, covariates, **kwargs)[source]

Estimates the causal effect using the Generalized Propensity Score method for continuous treatments.

This function will be called by the method_executor_tool.

Parameters:
  • df (DataFrame) – The input DataFrame.

  • treatment (str) – The name of the continuous treatment variable column.

  • outcome (str) – The name of the outcome variable column.

  • covariates (List[str]) – A list of covariate column names.

  • **kwargs (Any) – Additional arguments for controlling the estimation, including: - gps_model_spec (dict): Specification for the GPS model (T ~ X). - outcome_model_spec (dict): Specification for the outcome model (Y ~ T, GPS). - t_values_range (list or dict): Specification for treatment levels for ADRF. - n_bootstraps (int): Number of bootstrap replications for SEs.

Returns:

  • “effect_estimate”: Typically the ADRF or a specific contrast.

  • ”standard_error”: Standard error for the primary effect estimate.

  • ”confidence_interval”: Confidence interval for the primary estimate.

  • ”adrf_curve”: Data representing the Average Dose-Response Function.

  • ”specific_contrasts”: Any calculated specific contrasts.

  • ”diagnostics”: Results from diagnostic checks (e.g., balance).

  • ”method_details”: Description of the method and models used.

  • ”parameters_used”: Dictionary of parameters used.

Return type:

A dictionary containing the estimation results, including

Submodules

causal_agent.methods.causal_method module

Abstract base class for all causal inference methods.

This module defines the interface that all causal inference methods must implement, ensuring consistent behavior across different methods.

class causal_agent.methods.causal_method.CausalMethod[source]

Bases: ABC

Base class for all causal inference methods.

This abstract class defines the required methods that all causal inference implementations must provide. It ensures a consistent interface across different methods like propensity score matching, instrumental variables, etc.

Each implementation should handle the specifics of the causal inference method while conforming to this interface.

abstractmethod validate_assumptions(df, treatment, outcome, covariates)[source]

Validate method assumptions against the dataset.

Parameters:
  • df (DataFrame) – DataFrame containing the dataset

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

  • assumptions_valid (bool): Whether all assumptions are met

  • failed_assumptions (List[str]): List of failed assumptions

  • warnings (List[str]): List of warnings

  • suggestions (List[str]): Suggestions for addressing issues

Return type:

Dict containing validation results with keys

abstractmethod estimate_effect(df, treatment, outcome, covariates)[source]

Estimate causal effect using this method.

Parameters:
  • df (DataFrame) – DataFrame containing the dataset

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

  • effect_estimate (float): Estimated causal effect

  • confidence_interval (tuple): Confidence interval (lower, upper)

  • p_value (float): P-value of the estimate

  • additional_metrics (Dict): Any method-specific metrics

Return type:

Dict containing estimation results with keys

abstractmethod generate_code(dataset_path, treatment, outcome, covariates)[source]

Generate executable code for this causal method.

Parameters:
  • dataset_path (str) – Path to the dataset file

  • treatment (str) – Name of the treatment variable column

  • outcome (str) – Name of the outcome variable column

  • covariates (List[str]) – List of covariate column names

Returns:

String containing executable Python code implementing this method

Return type:

str

abstractmethod explain()[source]

Explain this causal method, its assumptions, and when to use it.

Returns:

String with detailed explanation of the method

Return type:

str

causal_agent.methods.utils module

Utility functions for causal inference methods.

This module provides common utility functions used across different causal inference methods.

causal_agent.methods.utils.check_binary_treatment(treatment_series)[source]

Check if treatment variable is binary.

Parameters:

treatment_series (Series) – Series containing treatment variable

Returns:

Boolean indicating if treatment is binary

Return type:

bool

causal_agent.methods.utils.calculate_standardized_differences(df, treatment, covariates)[source]

Calculate standardized differences between treated and control groups.

Parameters:
  • df (DataFrame) – DataFrame containing the data

  • treatment (str) – Name of treatment variable

  • covariates (List[str]) – List of covariate variable names

Returns:

Dictionary with standardized differences for each covariate

Return type:

Dict[str, float]

causal_agent.methods.utils.check_overlap(df, treatment, propensity_scores, threshold=0.5)[source]

Check overlap in propensity scores between treated and control groups.

Parameters:
  • df (DataFrame) – DataFrame containing the data

  • treatment (str) – Name of treatment variable

  • propensity_scores (ndarray) – Array of propensity scores

  • threshold (float) – Threshold for sufficient overlap (proportion of range)

Returns:

Dictionary with overlap statistics

Return type:

Dict[str, Any]

causal_agent.methods.utils.plot_propensity_overlap(df, treatment, propensity_scores, save_path=None)[source]

Plot overlap in propensity scores.

Parameters:
  • df (DataFrame) – DataFrame containing the data

  • treatment (str) – Name of treatment variable

  • propensity_scores (ndarray) – Array of propensity scores

  • save_path (str | None) – Optional path to save the plot

causal_agent.methods.utils.plot_covariate_balance(standardized_diffs, threshold=0.1, save_path=None)[source]

Plot standardized differences for covariates before and after matching.

Parameters:
  • standardized_diffs (Dict[str, float]) – Dictionary with standardized differences

  • threshold (float) – Threshold for acceptable balance

  • save_path (str | None) – Optional path to save the plot

causal_agent.methods.utils.check_temporal_structure(df)[source]

Check if dataset has temporal structure.

Parameters:

df (DataFrame) – DataFrame to check

Returns:

Dictionary with temporal structure information

Return type:

Dict[str, Any]

causal_agent.methods.utils.check_for_discontinuities(df, outcome, threshold_zscore=3.0)[source]

Check for potential discontinuities in continuous variables.

Parameters:
  • df (DataFrame) – DataFrame to check

  • outcome (str) – Name of outcome variable

  • threshold_zscore (float) – Z-score threshold for detecting discontinuities

Returns:

Dictionary with discontinuity information

Return type:

Dict[str, Any]

causal_agent.methods.utils.find_potential_instruments(df, treatment, outcome, correlation_threshold=0.3)[source]

Find potential instrumental variables.

Parameters:
  • df (DataFrame) – DataFrame to check

  • treatment (str) – Name of treatment variable

  • outcome (str) – Name of outcome variable

  • correlation_threshold (float) – Threshold for correlation with treatment

Returns:

Dictionary with potential instruments information

Return type:

Dict[str, Any]

Test for parallel trends assumption in difference-in-differences.

Parameters:
  • df (DataFrame) – DataFrame to check

  • treatment (str) – Name of treatment variable

  • outcome (str) – Name of outcome variable

  • time_var (str) – Name of time variable

  • unit_var (str) – Name of unit variable

Returns:

Dictionary with parallel trends test results

Return type:

Dict[str, Any]

causal_agent.methods.utils.preprocess_data(df, treatment_var, outcome_var, covariates, verbose=True)[source]

Preprocess the dataset to handle missing values and encode categorical variables.

Parameters:
  • df (pd.DataFrame) – The dataset

  • treatment_var (str) – The treatment variable name

  • outcome_var (str) – The outcome variable name

  • covariates (list) – List of covariate variable names

  • verbose (bool) – Whether to print verbose output

Returns:

Preprocessed dataset, updated treatment var name, updated outcome var name, updated covariates list, and column mappings.

Return type:

Tuple[pd.DataFrame, str, str, List[str], Dict[str, Any]]

causal_agent.methods.utils.check_collinearity(df, covariates)[source]

Subpackages