causal_agent.methods.instrumental_variable package

Submodules

causal_agent.methods.instrumental_variable.diagnostics module

causal_agent.methods.instrumental_variable.diagnostics.calculate_first_stage_f_statistic(df, treatment, instruments, covariates)[source]

Calculates the F-statistic for instrument relevance in the first stage regression.

Regresses treatment ~ instruments + covariates. Tests the joint significance of the instrument coefficients.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable.

  • instruments (List[str]) – List of instrument variable names.

  • covariates (List[str]) – List of covariate names.

Returns:

A tuple containing (F-statistic, p-value). Returns (None, None) on error.

Return type:

Tuple[float | None, float | None]

causal_agent.methods.instrumental_variable.diagnostics.run_overidentification_test(sm_results, df, treatment, outcome, instruments, covariates)[source]

Runs an overidentification test (Sargan-Hansen) if applicable.

This test is only valid if the number of instruments exceeds the number of endogenous regressors (typically 1, the treatment variable).

Requires results from a statsmodels IV estimation.

Parameters:
  • sm_results (Any | None) – The fitted results object from statsmodels IV2SLS.fit().

  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable.

  • outcome (str) – Name of the outcome variable.

  • instruments (List[str]) – List of instrument variable names.

  • covariates (List[str]) – List of covariate names.

Returns:

(test_statistic, p_value, status_message) or (None, None, error_message)

Return type:

Tuple

causal_agent.methods.instrumental_variable.diagnostics.run_iv_diagnostics(df, treatment, outcome, instruments, covariates, sm_results=None, dw_results=None)[source]

Runs standard IV diagnostic checks.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • treatment (str) – Name of the treatment variable.

  • outcome (str) – Name of the outcome variable.

  • instruments (List[str]) – List of instrument variable names.

  • covariates (List[str]) – List of covariate names.

  • sm_results (Any | None) – Optional fitted results object from statsmodels IV2SLS.fit().

  • dw_results (Any | None) – Optional results object from DoWhy (structure may vary).

Returns:

Dictionary containing diagnostic results.

Return type:

Dict[str, Any]

causal_agent.methods.instrumental_variable.estimator module

causal_agent.methods.instrumental_variable.estimator.build_iv_graph_gml(treatment, outcome, instruments, covariates)[source]

Constructs a GML string representing the causal graph for IV.

Assumptions: - Instruments cause Treatment - Covariates cause Treatment and Outcome - Treatment causes Outcome - Instruments do NOT directly cause Outcome (Exclusion) - Instruments are NOT caused by Covariates (can be relaxed if needed) - Unobserved Confounder (U) affects Treatment and Outcome

Parameters:
  • treatment (str) – Name of the treatment variable.

  • outcome (str) – Name of the outcome variable.

  • instruments (List[str]) – List of instrument variable names.

  • covariates (List[str]) – List of covariate names.

Returns:

A GML graph string.

Return type:

str

causal_agent.methods.instrumental_variable.estimator.format_iv_results(estimate, raw_results, diagnostics, treatment, outcome, instrument, method_used, llm=None)[source]

Formats the results from IV estimation into a standardized dictionary.

Parameters:
  • estimate (float | None) – The point estimate of the causal effect.

  • raw_results (Dict) – Dictionary containing raw outputs from DoWhy/statsmodels.

  • diagnostics (Dict) – Dictionary containing diagnostic results.

  • treatment (str) – Name of the treatment variable.

  • outcome (str) – Name of the outcome variable.

  • instrument (List[str]) – List of instrument variable names.

  • method_used (str) – ‘dowhy’ or ‘statsmodels’.

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional LLM instance for interpretation.

Returns:

Standardized results dictionary.

Return type:

Dict[str, Any]

causal_agent.methods.instrumental_variable.estimator.estimate_effect(df, treatment, outcome, covariates, query=None, dataset_description=None, llm=None, **kwargs)[source]

causal_agent.methods.instrumental_variable.llm_assist module

LLM assistance functions for Instrumental Variable (IV) analysis.

This module provides functions for LLM-based assistance in instrumental variable analysis, including identifying potential instruments, validating IV assumptions, and interpreting results.

causal_agent.methods.instrumental_variable.llm_assist.identify_instrument_variable(df_cols, query, llm=None)[source]

Use LLM to identify potential instrumental variables from available columns.

Parameters:
  • df_cols (List[str]) – List of column names from the dataset

  • query (str) – User’s causal query text

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional LLM model instance

Returns:

List of column names identified as potential instruments

Return type:

List[str]

causal_agent.methods.instrumental_variable.llm_assist.validate_instrument_assumptions_qualitative(treatment, outcome, instrument, covariates, query, llm=None)[source]

Use LLM to provide qualitative assessment of IV assumptions.

Parameters:
  • treatment (str) – Treatment variable name

  • outcome (str) – Outcome variable name

  • instrument (List[str]) – List of instrumental variable names

  • covariates (List[str]) – List of covariate variable names

  • query (str) – User’s causal query text

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional LLM model instance

Returns:

Dictionary with qualitative assessments of exclusion and exogeneity assumptions

Return type:

Dict[str, str]

causal_agent.methods.instrumental_variable.llm_assist.interpret_iv_results(results, diagnostics, llm=None)[source]

Use LLM to interpret IV results in natural language.

Parameters:
  • results (Dict[str, Any]) – Dictionary of estimation results (e.g., effect_estimate, p_value, confidence_interval)

  • diagnostics (Dict[str, Any]) – Dictionary of diagnostic test results (e.g., first_stage_f_statistic, overid_test)

  • llm (langchain.chat_models.base.BaseChatModel | None) – Optional LLM model instance

Returns:

String containing natural language interpretation of results

Return type:

str