causal_agent

Causal Agent - Automated Causal Inference with Large Language Models.

The causal_agent module provides an LLM-powered tool for generating data-driven answers to natural language causal queries. It automatically:

  • Parses natural language causal questions

  • Analyzes dataset characteristics and variables

  • Selects appropriate causal inference methods

  • Executes causal analysis with proper diagnostics

  • Interprets results in plain language

Example

>>> from causal_agent import run_causal_analysis
>>> result = run_causal_analysis(
...     query="What is the effect of education on income?",
...     dataset_path="data.csv",
...     dataset_description="Education and income dataset"
... )
>>> print(f"Effect: {result['results']['results']['effect_estimate']}")

The module supports various causal inference methods including: - Randomized Controlled Trials (RCT) - Difference-in-Differences (DiD) - Instrumental Variables (IV) - Regression Discontinuity Design (RDD) - Propensity Score Matching/Weighting - Backdoor Adjustment - Linear Regression with controls

Functions

analyze_dataset(dataset_path[, llm_client, ...])

Analyze a dataset to identify important characteristics for causal inference.

create_workflow_state_update(current_step, ...)

Create a standardized workflow state update dictionary.

format_output(query, method, results, ...[, ...])

Format final results including numerical estimates and explanations.

generate_explanation(method_info, ...[, ...])

Generates a comprehensive explanation text for the causal analysis.

interpret_query(query_info, dataset_analysis)

Interpret query using hybrid heuristic/LLM approach to identify variables.

parse_input(query[, dataset_path_arg, ...])

Parse the user's causal query using LLM and regex.

run_causal_analysis(query, dataset_path[, ...])

Run causal analysis on a dataset based on a user query.

validate_method(method_info, ...)

Validate the selected causal method against dataset characteristics.

causal_agent.run_causal_analysis(query, dataset_path, dataset_description=None, api_key=None)[source]

Run causal analysis on a dataset based on a user query.

Parameters:
  • query (str) – User’s causal question

  • dataset_path (str) – Path to the dataset

  • dataset_description (str | None) – Optional textual description of the dataset

  • api_key (str | None) – Optional OpenAI API key (DEPRECATED - will be ignored)

Returns:

Dictionary containing the final formatted analysis results from the agent’s last step.

Return type:

Dict[str, Any]