Causal AI Scientist
0.1.2

Getting Started

  • Getting Started
    • Installation Guide
      • Prerequisites
      • Quick Start (Recommended)
      • Installation Methods
        • Method 1: pip (PyPI)
        • Method 2: Conda Environment
        • Method 4: Development Installation
      • Configuration Setup
      • Environment-Specific Instructions
        • Google Colab
        • Jupyter Notebook
      • Verification
      • Next Steps
        • Getting Additional Help
    • Quickstart Tutorial
      • Overview
      • Prerequisites
      • Step 1: Setup and Configuration
      • Step 2: Prepare Your Data
      • Step 3: Run Your First Analysis
      • Step 4: Understanding the Results
      • Step 5: Exploring Different Queries
      • Step 6: Working with Your Own Data
      • Common Use Cases
      • Understanding Method Selection
      • Next Steps
      • Troubleshooting Quick Fixes
    • Your First Causal Analysis
      • Understanding Causal Questions
        • What is Causal Inference?
        • Good vs. Poor Causal Questions
      • Step-by-Step Analysis Walkthrough
        • Step 1: Problem Setup
        • Step 2: Data Preparation
        • Step 3: Running the Analysis
        • Step 4: Understanding the Results
        • Step 5: Interpreting the Results
        • Step 6: Examining Method Selection
        • Step 7: Validating Results
      • Common Patterns and What They Mean
        • Different Types of Results
        • Understanding Confidence Intervals
      • Troubleshooting Common Issues
        • Data Quality Issues
        • Method Selection Issues
        • Result Interpretation Issues
      • Next Steps and Advanced Topics
    • What You’ll Learn
    • Prerequisites
    • Next Steps

User Guide

  • User Guide
    • Basic Usage
      • Core Workflow
      • Python API Usage
        • Single Analysis
        • Understanding Results
      • Command Line Interface
        • Single Analysis
      • Common Analysis Patterns
        • Experimental Data (RCT)
        • Observational Data
        • Time Series / Panel Data
        • Instrumental Variables
        • Regression Discontinuity
      • Working with Results
        • Extracting Key Information
        • Interpreting Diagnostics
      • Error Handling
        • Common Issues and Solutions
      • Best Practices
        • Data Preparation
        • Query Formulation
        • Result Interpretation
      • Next Steps
    • Advanced Usage
      • Advanced Configuration
        • Environment Variables
        • Programmatic Configuration
      • Method Selection Control
        • Understanding Automatic Selection
        • Influencing Method Selection
      • Custom Analysis Workflows
        • Multi-Method Comparison
        • Sensitivity Analysis
      • Integration Patterns
        • Jupyter Notebook Integration
        • Pipeline Integration
      • Custom Data Preprocessing
        • Data Validation and Cleaning
      • Performance Optimization
        • Caching Results
      • Best Practices for Advanced Usage
      • Next Steps
    • Batch Processing
      • Command Line Batch Processing
        • Basic Batch Analysis
        • Metadata CSV Format
        • Example Batch Command
      • Next Steps
    • Configuration
      • LLM Provider Configuration
        • Supported Providers
        • OpenAI Configuration
        • Anthropic Configuration
        • Together AI Configuration
      • Environment Configuration
        • Using .env Files
      • Next Steps
    • Guide Overview
    • Common Workflows
    • Best Practices

Tutorials & Examples

  • Tutorials & Examples
    • Interactive Notebooks
      • Notebook Categories
      • Running the Notebooks
      • Notebook Features
    • Case Studies
      • Education Policy Analysis: Learning Mindset Intervention
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Decision Tree Walkthrough
        • Method Exclusion Examples
        • Real-World Implications
        • Comparison with Alternative Approaches
        • Learning Objectives Achieved
        • Next Steps
      • Healthcare Treatment Effects: Hospital Treatment Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Clinical Implications
        • Comparison with Traditional Analysis
        • Learning Objectives Achieved
        • Next Steps
      • Marketing Campaign Evaluation: Instrumental Variables Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Business Implications
        • Comparison with Traditional Analysis
        • Learning Objectives Achieved
        • Next Steps
      • Economic Policy Impact: Minimum Wage Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Economic Interpretation
        • Learning Objectives Achieved
        • Next Steps
      • Technology Product Features: A/B Testing Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Business Decision Framework
        • Comparison with Traditional A/B Testing
        • Alternative Experimental Designs
        • Long-term Monitoring Strategy
        • Learning Objectives Achieved
        • Next Steps
      • Case Study Structure
      • Featured Case Studies
      • Learning Objectives
      • Datasets and Resources
    • Code Examples
      • Dataset Properties and Method Selection Gallery
        • Overview
        • Gallery Examples
        • Method Exclusion Examples
        • Dataset Property Decision Matrix
        • Common Decision Patterns
        • Next Steps
      • Decision Path Comparisons: Similar Datasets, Different Methods
        • Overview
        • Comparison 1: Randomized vs. Observational Education Data
        • Comparison 2: Cross-Sectional vs. Panel Policy Data
        • Comparison 3: Sharp vs. Fuzzy Discontinuity
        • Comparison 4: Strong vs. Weak Instrument
        • Comparison 5: Good vs. Poor Covariate Overlap
        • Key Learning Points
        • Next Steps
      • Example Categories
      • Quick Start Examples
      • Example Datasets
      • Usage Tips
      • Contributing Examples
    • Learning Path
    • Tutorial Categories

Causal Inference Methods

  • Causal Inference Methods
    • Overview of Causal Inference Methods
      • What is Causal Inference?
      • The Fundamental Problem of Causal Inference
      • Method Categories in Causal Agent
        • Experimental Methods
        • Quasi-Experimental Methods
        • Observational Methods
      • How Causal Agent Selects Methods
      • The Decision Tree Process
      • Method Assumptions and Validity
      • Best Practices
      • Getting Started
    • Method Selection Decision Tree
      • Complete Decision Tree Algorithm
      • Dataset Property Influence Visualization
      • Decision Criteria Explained
        • 1. Randomized Experiment Check
        • 2. Data Structure Analysis
        • 3. Instrumental Variable Assessment
        • 4. Treatment Variable Type
        • 5. Covariate Assessment
      • Step-by-Step Decision Walkthroughs
        • Walkthrough 1: Randomized Controlled Trial
        • Walkthrough 2: Panel Data Analysis
        • Walkthrough 6: Complex Multi-Treatment Scenario
        • Walkthrough 7: Weak Instrument Scenario
        • Edge Cases and Troubleshooting
        • Algorithm Robustness and Validation
        • Walkthrough 3: Regression Discontinuity
        • Walkthrough 4: Observational Study with Rich Covariates
        • Walkthrough 5: Instrumental Variables Analysis
      • Method Selection Examples
        • Example 1: A/B Test Analysis
        • Example 2: Policy Evaluation
        • Example 3: Observational Study
      • Decision Node Documentation
        • Node 1: Randomization Assessment
        • Node 2A: RCT Covariate Assessment
        • Node 2B: Data Structure Assessment
        • Node 3C: Treatment Variable Type Assessment
        • Node 4A: Instrumental Variable Assessment (Binary Treatment)
        • Node 5A: Covariate Richness Assessment
        • Node 6A: Covariate Overlap Assessment
        • Priority Ordering and Method Selection
      • Understanding Method Recommendations
        • Priority Ordering
        • Alternative Methods
      • Customizing Method Selection
        • Excluding Methods
        • Forcing Method Selection
      • Validating Method Choice
      • Interactive Tools and Utilities
        • Method Comparison Tool
        • Method Diagnostic Tool
      • Decision Tree Algorithm Implementation
      • Next Steps
    • Experimental Methods
      • Randomized Controlled Trials (RCT)
        • When to Use RCTs
        • Theoretical Background
        • Key Assumptions
        • Types of RCT Analysis
        • Implementation in CAIS
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Educational Intervention RCT
        • Further Reading
      • Overview
      • Method Details
      • Implementation in CAIS
      • Best Practices
      • Common Challenges
    • Quasi-Experimental Methods
      • Difference-in-Differences (DiD)
        • When to Use DiD
        • Theoretical Background
        • Key Assumptions
        • Implementation in CAIS
        • Diagnostic Tests and Validation
        • Advanced DiD Methods
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Minimum Wage Policy Evaluation
        • Extensions and Related Methods
        • Further Reading
      • Instrumental Variables (IV)
        • When to Use IV
        • Theoretical Background
        • Key Assumptions
        • Types of IV Estimands
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Common IV Applications
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Returns to Education
        • Advanced IV Methods
        • Further Reading
      • Regression Discontinuity Design (RDD)
        • When to Use RDD
        • Theoretical Background
        • Key Assumptions
        • Types of RDD
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Educational Remediation Program
        • Advanced RDD Methods
        • Further Reading
      • Overview
      • Method Details
      • Implementation in Causal Agent
      • Assumption Validation
      • Best Practices
      • Common Pitfalls
    • Observational Methods
      • Propensity Score Matching
        • When to Use PSM
        • Theoretical Background
        • Key Assumptions
        • Types of Matching
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Job Training Program Evaluation
        • Extensions and Related Methods
        • Further Reading
      • Overview
      • Method Details
      • Implementation in Causal Agent
      • Assumption Validation
      • Balance Assessment
      • Best Practices
      • Sensitivity Analysis
      • Common Challenges
      • Advanced Topics
    • Method Selection Guide
    • Method Comparison
    • Choosing the Right Method

Theoretical Background

  • Theoretical Background
    • Causal Inference Basics
      • What is Causal Inference?
      • The Fundamental Problem
      • How Automated Systems Approach This Problem
      • Key Concepts for Automated Analysis
        • Confounding
        • Selection Bias
        • Treatment Assignment Mechanisms
      • The Agent’s Decision-Making Process
      • Types of Causal Questions
      • Common Misconceptions
      • Why Automated Causal Analysis Matters
      • Next Steps
    • Agent Architecture and Decision-Making Process
      • Overview of the Autonomous Agent
      • Agent Workflow: Step-by-Step Process
        • 1. Initial Data Analysis
        • 2. Variable Identification
        • 3. Treatment Assignment Analysis
        • 4. Decision Tree Navigation
        • 5. Method Selection and Prioritization
        • 6. Assumption Testing and Validation
        • 7. Effect Estimation
        • 8. Result Interpretation and Communication
      • LLM Integration Architecture
        • Data Understanding Prompts
        • Method Selection Prompts
        • Result Interpretation Prompts
      • Error Handling and Recovery
      • Agent Limitations and Human Oversight
      • Continuous Learning and Improvement
    • LLM Integration in Causal Analysis
      • Why LLMs in Causal Inference?
      • LLM Integration Architecture
      • Stage 1: Data Understanding and Variable Identification
      • Stage 2: Method Selection and Prioritization
      • Stage 3: Assumption Testing and Validation
      • Stage 4: Result Interpretation and Communication
      • Advanced LLM Integration Features
      • Quality Assurance and Validation
      • Limitations and Future Directions
    • Method Selection and Decision-Making
      • The Method Selection Challenge
      • The Agent’s Decision Framework
        • Decision Tree Overview
      • Stage 1: Treatment Assignment Analysis
        • Random Assignment (Experimental)
        • As-Good-As-Random Assignment (Quasi-Experimental)
        • Non-Random Assignment (Observational)
      • Stage 2: Data Structure Analysis
        • Panel Data Detection
        • Cross-Sectional Data
        • Time Series Data
      • Stage 3: Method Prioritization
        • Identification Strength Hierarchy
        • Assumption Assessment
      • Stage 4: Robustness and Sensitivity Analysis
        • Multiple Method Implementation
        • Assumption Testing Protocol
      • Stage 5: Method Selection Decision
        • Decision Integration
        • Communicating Uncertainty
      • Common Decision Scenarios
      • Best Practices for Users
    • Result Interpretation and Communication
      • The Challenge of Causal Interpretation
      • Understanding Causal Effect Estimates
        • Types of Causal Effects
        • Effect Magnitudes and Practical Significance
        • Confidence Intervals and Uncertainty
      • Method-Specific Interpretation Considerations
        • Randomized Experiments
        • Difference-in-Differences
        • Instrumental Variables
        • Regression Discontinuity
        • Propensity Score Methods
      • Communicating Limitations and Assumptions
        • Assumption Violations
        • External Validity
      • Tailoring Communication to Different Audiences
        • Academic Audiences
        • Policy Makers
        • General Public
        • Stakeholders and Practitioners
      • Handling Negative or Null Results
      • Best Practices for Result Interpretation
    • Glossary
      • A
      • B
      • C
      • D
      • E
      • F
      • I
      • L
      • M
      • N
      • O
      • P
      • Q
      • R
      • S
      • T
      • U
      • V
      • Common Acronyms
      • Statistical Terms
      • AI and Machine Learning Terms
      • Research Design Terms
      • Policy Evaluation Terms
    • Learning Path
    • Key Concepts
    • Common Pitfalls and How the Agent Addresses Them
    • Agent Capabilities and Limitations

API Reference

  • API Reference
    • Module Reference
      • causal_agent
        • causal_agent.analyze_dataset
        • causal_agent.create_workflow_state_update
        • causal_agent.format_output
        • causal_agent.generate_explanation
        • causal_agent.interpret_query
        • causal_agent.parse_input
        • causal_agent.run_causal_analysis
        • causal_agent.validate_method
        • run_causal_analysis()
      • causal_agent.components package
        • parse_input()
        • analyze_dataset()
        • interpret_query()
        • select_method()
        • validate_method()
        • generate_explanation()
        • format_output()
        • create_workflow_state_update()
        • Submodules
        • causal_agent.components.dataset_analyzer module
        • causal_agent.components.decision_tree module
        • causal_agent.components.decision_tree_llm module
        • causal_agent.components.explanation_generator module
        • causal_agent.components.input_parser module
        • causal_agent.components.method_validator module
        • causal_agent.components.output_formatter module
        • causal_agent.components.query_interpreter module
        • causal_agent.components.state_manager module
      • causal_agent.tools package
        • input_parser_tool()
        • dataset_analyzer_tool()
        • query_interpreter_tool()
        • method_selector_tool()
        • method_validator_tool()
        • method_executor_tool()
        • explanation_generator_tool()
        • output_formatter_tool()
        • Submodules
        • causal_agent.tools.data_analyzer module
        • causal_agent.tools.dataset_analyzer_tool module
        • causal_agent.tools.explanation_generator_tool module
        • causal_agent.tools.input_parser_tool module
        • causal_agent.tools.method_executor_tool module
        • causal_agent.tools.method_selector_tool module
        • causal_agent.tools.method_validator_tool module
        • causal_agent.tools.output_formatter_tool module
        • causal_agent.tools.query_interpreter_tool module
      • causal_agent.utils package
        • Submodules
        • causal_agent.utils.agent module
        • causal_agent.utils.llm_helpers module
      • causal_agent.methods package
        • CausalMethod
        • psm_estimate_effect()
        • psw_estimate_effect()
        • iv_estimate_effect()
        • did_estimate_effect()
        • rdd_estimate_effect()
        • dim_estimate_effect()
        • lr_estimate_effect()
        • ba_estimate_effect()
        • estimate_effect_gps()
        • Submodules
        • causal_agent.methods.causal_method module
        • causal_agent.methods.utils module
        • Subpackages
      • Core Modules
      • Auto-Generated Documentation
      • Navigation
      • Code Examples
    • API Overview
      • causal_agent
        • causal_agent.analyze_dataset
        • causal_agent.create_workflow_state_update
        • causal_agent.format_output
        • causal_agent.generate_explanation
        • causal_agent.interpret_query
        • causal_agent.parse_input
        • causal_agent.run_causal_analysis
        • causal_agent.validate_method
        • run_causal_analysis()
    • Quick Reference
    • Cross-References and Links
    • Navigation Tips

Development

  • Development
    • Architecture
      • System Overview
      • High-Level Architecture
      • Agent Workflow
      • Component Architecture
        • Core Components
        • Analysis Components
      • Tool Architecture
      • Method Implementation Architecture
      • LLM Integration Architecture
      • Data Flow Architecture
      • Testing Architecture
      • Extension Points
      • Performance Considerations
      • Security and Privacy
    • Extending Methods
      • Overview
      • Method Implementation Structure
        • Base Method Interface
        • Method Categories
      • Step-by-Step Method Implementation
        • Step 1: Create Method Implementation
        • Step 2: Create Method-Specific Components
        • Step 3: Update Decision Tree Logic
        • Step 4: Update Method Executor
        • Step 5: Create LLM Integration
        • Step 6: Create Comprehensive Tests
        • Step 7: Integration Tests
        • Step 8: Create Synthetic Data for Testing
        • Step 9: Documentation
      • Testing Your Implementation
        • Comprehensive Testing Strategy
        • Running Tests
        • Validation Checklist
      • Best Practices
        • Code Quality
        • Statistical Rigor
        • Integration
      • Common Pitfalls
      • Getting Help
    • LLM Integration
      • Overview
      • LLM Provider Architecture
        • Supported Providers
        • Configuration Management
        • Environment Configuration
      • Prompt Engineering Patterns
        • Core Prompt Structure
        • Variable Identification Prompts
        • Method Selection Prompts
        • Result Interpretation Prompts
      • Response Processing Architecture
        • Structured Output Parsing
        • Response Validation Schemas
        • Error Handling and Retry Logic
      • Prompt Optimization Strategies
        • Few-Shot Learning
        • Chain-of-Thought Reasoning
        • Prompt Versioning and A/B Testing
      • Integration with Decision Tree
        • LLM-Enhanced Decision Logic
      • Performance Optimization
        • Caching Strategies
        • Batch Processing
      • Monitoring and Debugging
        • LLM Call Logging
      • Testing LLM Integration
        • Mock LLM Responses
        • Integration Testing
      • Best Practices
        • Prompt Design
        • Error Handling
        • Performance
        • Security
    • Synthetic Data Generation System
      • Overview
      • System Architecture and Decision Tree Integration
        • Decision Tree Validation Through Synthetic Data
      • Data Generation Framework
        • Core Components
        • Configuration System
        • Base Data Generator Architecture
        • Base Data Generator
      • Method-Specific Generators and Decision Tree Testing
        • Randomized Controlled Trial (RCT) Generator
        • Multi-Treatment RCT Generator
        • Difference-in-Differences (DiD) Generators
        • Instrumental Variables (IV) Generators
        • Regression Discontinuity (RDD) Generator
        • Propensity Score Generators
        • Front-Door Criterion Generator
        • Difference-in-Differences Generator
        • Instrumental Variables Generator
        • Regression Discontinuity Generator
        • Propensity Score Generator
      • Data Generation Workflow and Scripts
        • Generation Pipeline Overview
        • Step 1: Configuration and Parameter Setup
        • Step 2: Raw Data Generation
        • Step 3: Context Generation with LLM Integration
        • Step 4: Data Finalization and Integration
        • Logging and Quality Control
        • Batch Processing and Agent Testing
      • Scenario Generation and Testing
        • Assumption Violation Scenarios
        • Edge Case and Robustness Testing
      • Usage Examples and Best Practices
        • Complete Workflow Example
        • Batch Testing Example
        • Best Practices for Synthetic Data Generation
      • Integration with CAIS Testing Framework
        • Continuous Integration Testing
        • Performance Benchmarking
        • Quality Assurance and Validation
      • Future Enhancements and Extensions
        • Planned Improvements
        • Contributing to the Synthetic Data System
      • Conclusion
        • Data Validation
      • Testing Integration
        • Using Synthetic Data in Tests
        • Continuous Integration
      • Best Practices
        • Data Generation Guidelines
        • Validation Standards
        • Testing Integration
    • Testing Framework
      • Overview
      • Test Organization
        • Directory Structure
      • Base Test Infrastructure
        • Base Test Classes
        • Pytest Configuration
      • Unit Testing
        • Component Unit Tests
        • Method Unit Tests
        • Tool Unit Tests
      • Integration Testing
        • Workflow Integration Tests
        • LLM Integration Tests
      • End-to-End Testing
        • Complete Workflow Tests
      • Performance Testing
        • Scalability Tests
        • Method Performance Tests
      • Test Automation and CI/CD
        • GitHub Actions Configuration
        • Test Coverage and Quality
      • Best Practices
        • Test Design Principles
        • Data Management
        • Continuous Integration
    • Getting Started with Development
    • Development Workflow
    • Project Structure
    • Contribution Areas
    • Development Standards

About

  • About CAIS
    • Citation Information
      • BibTeX Format
      • License and Usage Terms
      • Contact for Citation Questions
    • Changelog
      • Version 0.1.2 (Current)
      • Version 0.1.1
      • Version 0.1.0
      • Contributing to Changelog
      • Release Process
      • Contact for Release Information
    • License and Usage Terms
      • MIT License
      • What This Means
      • Usage Guidelines
      • Third-Party Dependencies
      • Data and Model Licenses
      • Contributing and License
      • License Compliance
      • Frequently Asked Questions
      • Contact for License Questions
      • License History
    • Project Overview
Causal AI Scientist
0.1.2
🚀 Getting Started 🔬 Methods 📖 Tutorials 📚 API

Getting Started

  • Getting Started
    • Installation Guide
      • Prerequisites
      • Quick Start (Recommended)
      • Installation Methods
        • Method 1: pip (PyPI)
        • Method 2: Conda Environment
        • Method 4: Development Installation
      • Configuration Setup
      • Environment-Specific Instructions
        • Google Colab
        • Jupyter Notebook
      • Verification
      • Next Steps
        • Getting Additional Help
    • Quickstart Tutorial
      • Overview
      • Prerequisites
      • Step 1: Setup and Configuration
      • Step 2: Prepare Your Data
      • Step 3: Run Your First Analysis
      • Step 4: Understanding the Results
      • Step 5: Exploring Different Queries
      • Step 6: Working with Your Own Data
      • Common Use Cases
      • Understanding Method Selection
      • Next Steps
      • Troubleshooting Quick Fixes
    • Your First Causal Analysis
      • Understanding Causal Questions
        • What is Causal Inference?
        • Good vs. Poor Causal Questions
      • Step-by-Step Analysis Walkthrough
        • Step 1: Problem Setup
        • Step 2: Data Preparation
        • Step 3: Running the Analysis
        • Step 4: Understanding the Results
        • Step 5: Interpreting the Results
        • Step 6: Examining Method Selection
        • Step 7: Validating Results
      • Common Patterns and What They Mean
        • Different Types of Results
        • Understanding Confidence Intervals
      • Troubleshooting Common Issues
        • Data Quality Issues
        • Method Selection Issues
        • Result Interpretation Issues
      • Next Steps and Advanced Topics
    • What You’ll Learn
    • Prerequisites
    • Next Steps

User Guide

  • User Guide
    • Basic Usage
      • Core Workflow
      • Python API Usage
        • Single Analysis
        • Understanding Results
      • Command Line Interface
        • Single Analysis
      • Common Analysis Patterns
        • Experimental Data (RCT)
        • Observational Data
        • Time Series / Panel Data
        • Instrumental Variables
        • Regression Discontinuity
      • Working with Results
        • Extracting Key Information
        • Interpreting Diagnostics
      • Error Handling
        • Common Issues and Solutions
      • Best Practices
        • Data Preparation
        • Query Formulation
        • Result Interpretation
      • Next Steps
    • Advanced Usage
      • Advanced Configuration
        • Environment Variables
        • Programmatic Configuration
      • Method Selection Control
        • Understanding Automatic Selection
        • Influencing Method Selection
      • Custom Analysis Workflows
        • Multi-Method Comparison
        • Sensitivity Analysis
      • Integration Patterns
        • Jupyter Notebook Integration
        • Pipeline Integration
      • Custom Data Preprocessing
        • Data Validation and Cleaning
      • Performance Optimization
        • Caching Results
      • Best Practices for Advanced Usage
      • Next Steps
    • Batch Processing
      • Command Line Batch Processing
        • Basic Batch Analysis
        • Metadata CSV Format
        • Example Batch Command
      • Next Steps
    • Configuration
      • LLM Provider Configuration
        • Supported Providers
        • OpenAI Configuration
        • Anthropic Configuration
        • Together AI Configuration
      • Environment Configuration
        • Using .env Files
      • Next Steps
    • Guide Overview
    • Common Workflows
    • Best Practices

Tutorials & Examples

  • Tutorials & Examples
    • Interactive Notebooks
      • Notebook Categories
      • Running the Notebooks
      • Notebook Features
    • Case Studies
      • Education Policy Analysis: Learning Mindset Intervention
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Decision Tree Walkthrough
        • Method Exclusion Examples
        • Real-World Implications
        • Comparison with Alternative Approaches
        • Learning Objectives Achieved
        • Next Steps
      • Healthcare Treatment Effects: Hospital Treatment Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Clinical Implications
        • Comparison with Traditional Analysis
        • Learning Objectives Achieved
        • Next Steps
      • Marketing Campaign Evaluation: Instrumental Variables Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Business Implications
        • Comparison with Traditional Analysis
        • Learning Objectives Achieved
        • Next Steps
      • Economic Policy Impact: Minimum Wage Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Decision Tree Alternative Scenarios
        • Economic Interpretation
        • Learning Objectives Achieved
        • Next Steps
      • Technology Product Features: A/B Testing Analysis
        • Problem Statement
        • Dataset Overview
        • Agent Decision-Making Process
        • Method Exclusion Examples
        • Robustness Analysis
        • Business Decision Framework
        • Comparison with Traditional A/B Testing
        • Alternative Experimental Designs
        • Long-term Monitoring Strategy
        • Learning Objectives Achieved
        • Next Steps
      • Case Study Structure
      • Featured Case Studies
      • Learning Objectives
      • Datasets and Resources
    • Code Examples
      • Dataset Properties and Method Selection Gallery
        • Overview
        • Gallery Examples
        • Method Exclusion Examples
        • Dataset Property Decision Matrix
        • Common Decision Patterns
        • Next Steps
      • Decision Path Comparisons: Similar Datasets, Different Methods
        • Overview
        • Comparison 1: Randomized vs. Observational Education Data
        • Comparison 2: Cross-Sectional vs. Panel Policy Data
        • Comparison 3: Sharp vs. Fuzzy Discontinuity
        • Comparison 4: Strong vs. Weak Instrument
        • Comparison 5: Good vs. Poor Covariate Overlap
        • Key Learning Points
        • Next Steps
      • Example Categories
      • Quick Start Examples
      • Example Datasets
      • Usage Tips
      • Contributing Examples
    • Learning Path
    • Tutorial Categories

Causal Inference Methods

  • Causal Inference Methods
    • Overview of Causal Inference Methods
      • What is Causal Inference?
      • The Fundamental Problem of Causal Inference
      • Method Categories in Causal Agent
        • Experimental Methods
        • Quasi-Experimental Methods
        • Observational Methods
      • How Causal Agent Selects Methods
      • The Decision Tree Process
      • Method Assumptions and Validity
      • Best Practices
      • Getting Started
    • Method Selection Decision Tree
      • Complete Decision Tree Algorithm
      • Dataset Property Influence Visualization
      • Decision Criteria Explained
        • 1. Randomized Experiment Check
        • 2. Data Structure Analysis
        • 3. Instrumental Variable Assessment
        • 4. Treatment Variable Type
        • 5. Covariate Assessment
      • Step-by-Step Decision Walkthroughs
        • Walkthrough 1: Randomized Controlled Trial
        • Walkthrough 2: Panel Data Analysis
        • Walkthrough 6: Complex Multi-Treatment Scenario
        • Walkthrough 7: Weak Instrument Scenario
        • Edge Cases and Troubleshooting
        • Algorithm Robustness and Validation
        • Walkthrough 3: Regression Discontinuity
        • Walkthrough 4: Observational Study with Rich Covariates
        • Walkthrough 5: Instrumental Variables Analysis
      • Method Selection Examples
        • Example 1: A/B Test Analysis
        • Example 2: Policy Evaluation
        • Example 3: Observational Study
      • Decision Node Documentation
        • Node 1: Randomization Assessment
        • Node 2A: RCT Covariate Assessment
        • Node 2B: Data Structure Assessment
        • Node 3C: Treatment Variable Type Assessment
        • Node 4A: Instrumental Variable Assessment (Binary Treatment)
        • Node 5A: Covariate Richness Assessment
        • Node 6A: Covariate Overlap Assessment
        • Priority Ordering and Method Selection
      • Understanding Method Recommendations
        • Priority Ordering
        • Alternative Methods
      • Customizing Method Selection
        • Excluding Methods
        • Forcing Method Selection
      • Validating Method Choice
      • Interactive Tools and Utilities
        • Method Comparison Tool
        • Method Diagnostic Tool
      • Decision Tree Algorithm Implementation
      • Next Steps
    • Experimental Methods
      • Randomized Controlled Trials (RCT)
        • When to Use RCTs
        • Theoretical Background
        • Key Assumptions
        • Types of RCT Analysis
        • Implementation in CAIS
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Educational Intervention RCT
        • Further Reading
      • Overview
      • Method Details
      • Implementation in CAIS
      • Best Practices
      • Common Challenges
    • Quasi-Experimental Methods
      • Difference-in-Differences (DiD)
        • When to Use DiD
        • Theoretical Background
        • Key Assumptions
        • Implementation in CAIS
        • Diagnostic Tests and Validation
        • Advanced DiD Methods
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Minimum Wage Policy Evaluation
        • Extensions and Related Methods
        • Further Reading
      • Instrumental Variables (IV)
        • When to Use IV
        • Theoretical Background
        • Key Assumptions
        • Types of IV Estimands
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Common IV Applications
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Returns to Education
        • Advanced IV Methods
        • Further Reading
      • Regression Discontinuity Design (RDD)
        • When to Use RDD
        • Theoretical Background
        • Key Assumptions
        • Types of RDD
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Educational Remediation Program
        • Advanced RDD Methods
        • Further Reading
      • Overview
      • Method Details
      • Implementation in Causal Agent
      • Assumption Validation
      • Best Practices
      • Common Pitfalls
    • Observational Methods
      • Propensity Score Matching
        • When to Use PSM
        • Theoretical Background
        • Key Assumptions
        • Types of Matching
        • Implementation in Causal Agent
        • Diagnostic Tests and Validation
        • Best Practices
        • Common Pitfalls and Solutions
        • Example: Job Training Program Evaluation
        • Extensions and Related Methods
        • Further Reading
      • Overview
      • Method Details
      • Implementation in Causal Agent
      • Assumption Validation
      • Balance Assessment
      • Best Practices
      • Sensitivity Analysis
      • Common Challenges
      • Advanced Topics
    • Method Selection Guide
    • Method Comparison
    • Choosing the Right Method

Theoretical Background

  • Theoretical Background
    • Causal Inference Basics
      • What is Causal Inference?
      • The Fundamental Problem
      • How Automated Systems Approach This Problem
      • Key Concepts for Automated Analysis
        • Confounding
        • Selection Bias
        • Treatment Assignment Mechanisms
      • The Agent’s Decision-Making Process
      • Types of Causal Questions
      • Common Misconceptions
      • Why Automated Causal Analysis Matters
      • Next Steps
    • Agent Architecture and Decision-Making Process
      • Overview of the Autonomous Agent
      • Agent Workflow: Step-by-Step Process
        • 1. Initial Data Analysis
        • 2. Variable Identification
        • 3. Treatment Assignment Analysis
        • 4. Decision Tree Navigation
        • 5. Method Selection and Prioritization
        • 6. Assumption Testing and Validation
        • 7. Effect Estimation
        • 8. Result Interpretation and Communication
      • LLM Integration Architecture
        • Data Understanding Prompts
        • Method Selection Prompts
        • Result Interpretation Prompts
      • Error Handling and Recovery
      • Agent Limitations and Human Oversight
      • Continuous Learning and Improvement
    • LLM Integration in Causal Analysis
      • Why LLMs in Causal Inference?
      • LLM Integration Architecture
      • Stage 1: Data Understanding and Variable Identification
      • Stage 2: Method Selection and Prioritization
      • Stage 3: Assumption Testing and Validation
      • Stage 4: Result Interpretation and Communication
      • Advanced LLM Integration Features
      • Quality Assurance and Validation
      • Limitations and Future Directions
    • Method Selection and Decision-Making
      • The Method Selection Challenge
      • The Agent’s Decision Framework
        • Decision Tree Overview
      • Stage 1: Treatment Assignment Analysis
        • Random Assignment (Experimental)
        • As-Good-As-Random Assignment (Quasi-Experimental)
        • Non-Random Assignment (Observational)
      • Stage 2: Data Structure Analysis
        • Panel Data Detection
        • Cross-Sectional Data
        • Time Series Data
      • Stage 3: Method Prioritization
        • Identification Strength Hierarchy
        • Assumption Assessment
      • Stage 4: Robustness and Sensitivity Analysis
        • Multiple Method Implementation
        • Assumption Testing Protocol
      • Stage 5: Method Selection Decision
        • Decision Integration
        • Communicating Uncertainty
      • Common Decision Scenarios
      • Best Practices for Users
    • Result Interpretation and Communication
      • The Challenge of Causal Interpretation
      • Understanding Causal Effect Estimates
        • Types of Causal Effects
        • Effect Magnitudes and Practical Significance
        • Confidence Intervals and Uncertainty
      • Method-Specific Interpretation Considerations
        • Randomized Experiments
        • Difference-in-Differences
        • Instrumental Variables
        • Regression Discontinuity
        • Propensity Score Methods
      • Communicating Limitations and Assumptions
        • Assumption Violations
        • External Validity
      • Tailoring Communication to Different Audiences
        • Academic Audiences
        • Policy Makers
        • General Public
        • Stakeholders and Practitioners
      • Handling Negative or Null Results
      • Best Practices for Result Interpretation
    • Glossary
      • A
      • B
      • C
      • D
      • E
      • F
      • I
      • L
      • M
      • N
      • O
      • P
      • Q
      • R
      • S
      • T
      • U
      • V
      • Common Acronyms
      • Statistical Terms
      • AI and Machine Learning Terms
      • Research Design Terms
      • Policy Evaluation Terms
    • Learning Path
    • Key Concepts
    • Common Pitfalls and How the Agent Addresses Them
    • Agent Capabilities and Limitations

API Reference

  • API Reference
    • Module Reference
      • causal_agent
        • causal_agent.analyze_dataset
        • causal_agent.create_workflow_state_update
        • causal_agent.format_output
        • causal_agent.generate_explanation
        • causal_agent.interpret_query
        • causal_agent.parse_input
        • causal_agent.run_causal_analysis
        • causal_agent.validate_method
        • run_causal_analysis()
      • causal_agent.components package
        • parse_input()
        • analyze_dataset()
        • interpret_query()
        • select_method()
        • validate_method()
        • generate_explanation()
        • format_output()
        • create_workflow_state_update()
        • Submodules
        • causal_agent.components.dataset_analyzer module
        • causal_agent.components.decision_tree module
        • causal_agent.components.decision_tree_llm module
        • causal_agent.components.explanation_generator module
        • causal_agent.components.input_parser module
        • causal_agent.components.method_validator module
        • causal_agent.components.output_formatter module
        • causal_agent.components.query_interpreter module
        • causal_agent.components.state_manager module
      • causal_agent.tools package
        • input_parser_tool()
        • dataset_analyzer_tool()
        • query_interpreter_tool()
        • method_selector_tool()
        • method_validator_tool()
        • method_executor_tool()
        • explanation_generator_tool()
        • output_formatter_tool()
        • Submodules
        • causal_agent.tools.data_analyzer module
        • causal_agent.tools.dataset_analyzer_tool module
        • causal_agent.tools.explanation_generator_tool module
        • causal_agent.tools.input_parser_tool module
        • causal_agent.tools.method_executor_tool module
        • causal_agent.tools.method_selector_tool module
        • causal_agent.tools.method_validator_tool module
        • causal_agent.tools.output_formatter_tool module
        • causal_agent.tools.query_interpreter_tool module
      • causal_agent.utils package
        • Submodules
        • causal_agent.utils.agent module
        • causal_agent.utils.llm_helpers module
      • causal_agent.methods package
        • CausalMethod
        • psm_estimate_effect()
        • psw_estimate_effect()
        • iv_estimate_effect()
        • did_estimate_effect()
        • rdd_estimate_effect()
        • dim_estimate_effect()
        • lr_estimate_effect()
        • ba_estimate_effect()
        • estimate_effect_gps()
        • Submodules
        • causal_agent.methods.causal_method module
        • causal_agent.methods.utils module
        • Subpackages
      • Core Modules
      • Auto-Generated Documentation
      • Navigation
      • Code Examples
    • API Overview
      • causal_agent
        • causal_agent.analyze_dataset
        • causal_agent.create_workflow_state_update
        • causal_agent.format_output
        • causal_agent.generate_explanation
        • causal_agent.interpret_query
        • causal_agent.parse_input
        • causal_agent.run_causal_analysis
        • causal_agent.validate_method
        • run_causal_analysis()
    • Quick Reference
    • Cross-References and Links
    • Navigation Tips

Development

  • Development
    • Architecture
      • System Overview
      • High-Level Architecture
      • Agent Workflow
      • Component Architecture
        • Core Components
        • Analysis Components
      • Tool Architecture
      • Method Implementation Architecture
      • LLM Integration Architecture
      • Data Flow Architecture
      • Testing Architecture
      • Extension Points
      • Performance Considerations
      • Security and Privacy
    • Extending Methods
      • Overview
      • Method Implementation Structure
        • Base Method Interface
        • Method Categories
      • Step-by-Step Method Implementation
        • Step 1: Create Method Implementation
        • Step 2: Create Method-Specific Components
        • Step 3: Update Decision Tree Logic
        • Step 4: Update Method Executor
        • Step 5: Create LLM Integration
        • Step 6: Create Comprehensive Tests
        • Step 7: Integration Tests
        • Step 8: Create Synthetic Data for Testing
        • Step 9: Documentation
      • Testing Your Implementation
        • Comprehensive Testing Strategy
        • Running Tests
        • Validation Checklist
      • Best Practices
        • Code Quality
        • Statistical Rigor
        • Integration
      • Common Pitfalls
      • Getting Help
    • LLM Integration
      • Overview
      • LLM Provider Architecture
        • Supported Providers
        • Configuration Management
        • Environment Configuration
      • Prompt Engineering Patterns
        • Core Prompt Structure
        • Variable Identification Prompts
        • Method Selection Prompts
        • Result Interpretation Prompts
      • Response Processing Architecture
        • Structured Output Parsing
        • Response Validation Schemas
        • Error Handling and Retry Logic
      • Prompt Optimization Strategies
        • Few-Shot Learning
        • Chain-of-Thought Reasoning
        • Prompt Versioning and A/B Testing
      • Integration with Decision Tree
        • LLM-Enhanced Decision Logic
      • Performance Optimization
        • Caching Strategies
        • Batch Processing
      • Monitoring and Debugging
        • LLM Call Logging
      • Testing LLM Integration
        • Mock LLM Responses
        • Integration Testing
      • Best Practices
        • Prompt Design
        • Error Handling
        • Performance
        • Security
    • Synthetic Data Generation System
      • Overview
      • System Architecture and Decision Tree Integration
        • Decision Tree Validation Through Synthetic Data
      • Data Generation Framework
        • Core Components
        • Configuration System
        • Base Data Generator Architecture
        • Base Data Generator
      • Method-Specific Generators and Decision Tree Testing
        • Randomized Controlled Trial (RCT) Generator
        • Multi-Treatment RCT Generator
        • Difference-in-Differences (DiD) Generators
        • Instrumental Variables (IV) Generators
        • Regression Discontinuity (RDD) Generator
        • Propensity Score Generators
        • Front-Door Criterion Generator
        • Difference-in-Differences Generator
        • Instrumental Variables Generator
        • Regression Discontinuity Generator
        • Propensity Score Generator
      • Data Generation Workflow and Scripts
        • Generation Pipeline Overview
        • Step 1: Configuration and Parameter Setup
        • Step 2: Raw Data Generation
        • Step 3: Context Generation with LLM Integration
        • Step 4: Data Finalization and Integration
        • Logging and Quality Control
        • Batch Processing and Agent Testing
      • Scenario Generation and Testing
        • Assumption Violation Scenarios
        • Edge Case and Robustness Testing
      • Usage Examples and Best Practices
        • Complete Workflow Example
        • Batch Testing Example
        • Best Practices for Synthetic Data Generation
      • Integration with CAIS Testing Framework
        • Continuous Integration Testing
        • Performance Benchmarking
        • Quality Assurance and Validation
      • Future Enhancements and Extensions
        • Planned Improvements
        • Contributing to the Synthetic Data System
      • Conclusion
        • Data Validation
      • Testing Integration
        • Using Synthetic Data in Tests
        • Continuous Integration
      • Best Practices
        • Data Generation Guidelines
        • Validation Standards
        • Testing Integration
    • Testing Framework
      • Overview
      • Test Organization
        • Directory Structure
      • Base Test Infrastructure
        • Base Test Classes
        • Pytest Configuration
      • Unit Testing
        • Component Unit Tests
        • Method Unit Tests
        • Tool Unit Tests
      • Integration Testing
        • Workflow Integration Tests
        • LLM Integration Tests
      • End-to-End Testing
        • Complete Workflow Tests
      • Performance Testing
        • Scalability Tests
        • Method Performance Tests
      • Test Automation and CI/CD
        • GitHub Actions Configuration
        • Test Coverage and Quality
      • Best Practices
        • Test Design Principles
        • Data Management
        • Continuous Integration
    • Getting Started with Development
    • Development Workflow
    • Project Structure
    • Contribution Areas
    • Development Standards

About

  • About CAIS
    • Citation Information
      • BibTeX Format
      • License and Usage Terms
      • Contact for Citation Questions
    • Changelog
      • Version 0.1.2 (Current)
      • Version 0.1.1
      • Version 0.1.0
      • Contributing to Changelog
      • Release Process
      • Contact for Release Information
    • License and Usage Terms
      • MIT License
      • What This Means
      • Usage Guidelines
      • Third-Party Dependencies
      • Data and Model Licenses
      • Contributing and License
      • License Compliance
      • Frequently Asked Questions
      • Contact for License Questions
      • License History
    • Project Overview
  • Tutorials & Examples
  • Code Examples
  • Dataset Properties and Method Selection Gallery
  • View page source

Dataset Properties and Method Selection Gallery

This gallery demonstrates how different dataset characteristics lead CAIS to select different causal inference methods. Each example shows the decision tree path and explains why specific methods are chosen or excluded.

Overview

CAIS uses a systematic decision tree to select the most appropriate causal inference method based on your data characteristics. This gallery provides visual examples of how different data properties lead to different method selections.

Key Decision Factors: - Randomization status - Data structure (cross-sectional, panel, etc.) - Treatment variable type (binary, continuous, categorical) - Available instruments - Covariate richness and overlap

Gallery Examples

Example 1: Perfect Randomized Experiment

Dataset Characteristics: - Randomized controlled trial - Binary treatment assignment - Rich baseline covariates - Perfect compliance

        flowchart TD
    A[RCT Dataset] --> B{Is this randomized?}
    B -->|Yes ✓| C{Are covariates available?}
    C -->|Yes ✓| D[Linear Regression<br/>with Covariates]

    style A fill:#e3f2fd
    style B fill:#e8f5e8
    style C fill:#fff3e0
    style D fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: Linear Regression with Covariates

Decision Path:
1. Randomization check: ✅ PASSED (balanced assignment)
2. Covariate assessment: ✅ AVAILABLE (baseline measures)
3. Selected method: Linear regression with covariates

Why this method?
✓ Randomization ensures causal identification
✓ Covariates improve precision (reduce standard errors)
✓ Transparent and interpretable results
✓ Optimal for experimental data

Example Datasets: Learning mindset intervention, A/B tests, clinical trials

—

Example 2: Observational Data with Rich Covariates

Dataset Characteristics: - Non-randomized observational study - Binary treatment - Rich set of confounding variables - Good covariate overlap

        flowchart TD
    A[Observational Dataset] --> B{Is this randomized?}
    B -->|No ✗| C{Panel data available?}
    C -->|No ✗| D{Running variable?}
    D -->|No ✗| E{Binary treatment?}
    E -->|Yes ✓| F{Instrumental variable?}
    F -->|No ✗| G{Rich covariates?}
    G -->|Yes ✓| H{Good overlap?}
    H -->|Yes ✓| I[Propensity Score<br/>Matching]

    style A fill:#e3f2fd
    style B fill:#ffebee
    style C fill:#ffebee
    style D fill:#ffebee
    style E fill:#fff3e0
    style F fill:#ffebee
    style G fill:#fff3e0
    style H fill:#fff3e0
    style I fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: Propensity Score Matching

Decision Path:
1. Randomization check: ❌ FAILED (selection bias detected)
2. Panel data check: ❌ NOT AVAILABLE
3. Running variable check: ❌ NOT AVAILABLE
4. Treatment type: ✅ BINARY
5. Instrumental variable: ❌ NOT AVAILABLE
6. Covariate richness: ✅ RICH COVARIATES
7. Overlap assessment: ✅ GOOD OVERLAP
8. Selected method: Propensity score matching

Why this method?
✓ Handles selection bias through matching
✓ Rich covariates enable credible matching
✓ Good overlap ensures valid comparisons
✓ Transparent balance assessment

Example Datasets: Hospital treatment effects, job training programs, educational interventions

—

Example 3: Panel Data with Treatment Timing

Dataset Characteristics: - Panel data (multiple time periods) - Treatment timing varies across units - Clear before/after periods - Parallel trends plausible

        flowchart TD
    A[Panel Dataset] --> B{Is this randomized?}
    B -->|No ✗| C{Panel data available?}
    C -->|Yes ✓| D{Treatment timing varies?}
    D -->|Yes ✓| E[Difference-in-Differences]

    style A fill:#e3f2fd
    style B fill:#ffebee
    style C fill:#fff3e0
    style D fill:#fff3e0
    style E fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: Difference-in-Differences

Decision Path:
1. Randomization check: ❌ FAILED
2. Panel data check: ✅ AVAILABLE (multiple time periods)
3. Treatment timing: ✅ VARIES across units
4. Selected method: Difference-in-differences

Why this method?
✓ Exploits timing variation for identification
✓ Controls for time-invariant confounders
✓ Handles unobserved heterogeneity
✓ Robust to selection on observables and unobservables

Key assumption: Parallel trends between treatment and control

Example Datasets: Policy evaluations, minimum wage studies, healthcare reforms

—

Example 4: Sharp Regression Discontinuity

Dataset Characteristics: - Continuous running variable - Sharp cutoff for treatment assignment - Treatment probability jumps discontinuously - No manipulation of running variable

        flowchart TD
    A[RDD Dataset] --> B{Is this randomized?}
    B -->|No ✗| C{Panel data available?}
    C -->|No ✗| D{Running variable with cutoff?}
    D -->|Yes ✓| E{Sharp discontinuity?}
    E -->|Yes ✓| F[Regression Discontinuity<br/>Design]

    style A fill:#e3f2fd
    style B fill:#ffebee
    style C fill:#ffebee
    style D fill:#fff3e0
    style E fill:#fff3e0
    style F fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: Regression Discontinuity Design

Decision Path:
1. Randomization check: ❌ FAILED
2. Panel data check: ❌ NOT AVAILABLE
3. Running variable: ✅ DETECTED (continuous assignment variable)
4. Discontinuity: ✅ SHARP (treatment probability jumps)
5. Selected method: Regression discontinuity design

Why this method?
✓ Exploits discontinuous assignment rule
✓ Local randomization around cutoff
✓ Credible identification strategy
✓ Transparent assumptions

Key assumption: Continuity of potential outcomes at cutoff

Example Datasets: Scholarship eligibility, policy thresholds, age-based programs

—

Example 5: Instrumental Variables

Dataset Characteristics: - Endogenous treatment assignment - Valid instrumental variable available - Strong first-stage relationship - Credible exclusion restriction

        flowchart TD
    A[IV Dataset] --> B{Is this randomized?}
    B -->|No ✗| C{Panel data available?}
    C -->|No ✗| D{Running variable?}
    D -->|No ✗| E{Binary treatment?}
    E -->|Yes ✓| F{Instrumental variable?}
    F -->|Yes ✓| G{Valid instrument?}
    G -->|Yes ✓| H[Instrumental Variables]

    style A fill:#e3f2fd
    style B fill:#ffebee
    style C fill:#ffebee
    style D fill:#ffebee
    style E fill:#fff3e0
    style F fill:#fff3e0
    style G fill:#fff3e0
    style H fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: Instrumental Variables

Decision Path:
1. Randomization check: ❌ FAILED
2. Panel data check: ❌ NOT AVAILABLE
3. Running variable check: ❌ NOT AVAILABLE
4. Treatment type: ✅ BINARY
5. Instrumental variable: ✅ DETECTED
6. Instrument validation: ✅ VALID (relevance + exogeneity)
7. Selected method: Instrumental variables

Why this method?
✓ Handles unmeasured confounding
✓ Valid instrument provides exogenous variation
✓ Strong first-stage relationship
✓ Credible exclusion restriction

Key assumptions: Relevance, exogeneity, exclusion restriction

Example Datasets: Marketing campaigns with server downtime, education with distance instruments

—

Example 6: Continuous Treatment with IV

Dataset Characteristics: - Continuous treatment variable - Endogeneity concerns - Valid instrumental variable - No clear cutoff or panel structure

        flowchart TD
    A[Continuous Treatment] --> B{Is this randomized?}
    B -->|No ✗| C{Panel data available?}
    C -->|No ✗| D{Running variable?}
    D -->|No ✗| E{Binary treatment?}
    E -->|No ✗| F{Continuous treatment}
    F --> G{Instrumental variable?}
    G -->|Yes ✓| H[Instrumental Variables<br/>Continuous Treatment]

    style A fill:#e3f2fd
    style B fill:#ffebee
    style C fill:#ffebee
    style D fill:#ffebee
    style E fill:#ffebee
    style F fill:#fff3e0
    style G fill:#fff3e0
    style H fill:#e8f5e8
    

Agent Decision Process:

🎯 Method Selection: IV with Continuous Treatment

Decision Path:
1. Randomization check: ❌ FAILED
2. Panel data check: ❌ NOT AVAILABLE
3. Running variable check: ❌ NOT AVAILABLE
4. Treatment type: ✅ CONTINUOUS
5. Instrumental variable: ✅ AVAILABLE
6. Selected method: IV with continuous treatment

Why this method?
✓ Handles continuous endogenous treatment
✓ Valid instrument provides identification
✓ Can estimate dose-response relationships
✓ Flexible functional form specification

Example Datasets: Advertising intensity, education years, healthcare dosage

Method Exclusion Examples

Understanding why methods are excluded is as important as understanding why they’re selected.

Example 7: Why Not Difference-in-Differences?

Dataset: Cross-sectional observational data with rich covariates

❌ Difference-in-Differences: EXCLUDED

Data Requirements Not Met:
- Requires: Panel data with multiple time periods
- Available: Cross-sectional data (single time point)
- Missing: Pre-treatment outcome measurements
- Missing: Variation in treatment timing

Alternative Selected: Propensity Score Matching
- Uses available rich covariates
- Handles selection bias through matching
- Appropriate for cross-sectional data

—

Example 8: Why Not Regression Discontinuity?

Dataset: Observational data without clear assignment rule

❌ Regression Discontinuity: EXCLUDED

Data Requirements Not Met:
- Requires: Continuous running variable with sharp cutoff
- Available: Discretionary treatment assignment
- Missing: Clear assignment rule or threshold
- Problem: No discontinuous treatment probability

Alternative Selected: Propensity Score Methods
- Handles discretionary assignment
- Uses observed characteristics for matching
- Appropriate for non-rule-based assignment

—

Example 9: Why Not Instrumental Variables?

Dataset: Randomized experiment with perfect compliance

❌ Instrumental Variables: EXCLUDED

Not Needed:
- Randomization already provides identification
- No endogeneity concerns in experimental data
- IV would be less efficient than direct analysis
- Perfect compliance eliminates need for instruments

Selected Method: Linear Regression with Covariates
- Leverages randomization for identification
- More efficient than IV approach
- Simpler interpretation and implementation

Dataset Property Decision Matrix

This matrix shows how different combinations of data characteristics lead to method selection:

Method Selection Matrix

Randomized

Panel Data

Running Var

Instrument

Treatment Type

Selected Method

✅ Yes

Any

Any

Any

Binary

Linear Regression + Covariates

✅ Yes

Any

Any

Any

Continuous

Linear Regression + Covariates

❌ No

✅ Yes

Any

Any

Any

Difference-in-Differences

❌ No

❌ No

✅ Yes

Any

Any

Regression Discontinuity

❌ No

❌ No

❌ No

✅ Yes

Binary

Instrumental Variables

❌ No

❌ No

❌ No

✅ Yes

Continuous

IV Continuous Treatment

❌ No

❌ No

❌ No

❌ No

Binary

Propensity Score Methods

❌ No

❌ No

❌ No

❌ No

Continuous

Linear Regression + Controls

Common Decision Patterns

Pattern 1: Experimental Data Priority

Rule: Randomized experiments always preferred when available

Priority Hierarchy:
1. Randomized Controlled Trial → Linear Regression + Covariates
2. Natural Experiment (RDD/IV) → RDD or IV
3. Quasi-Experiment (DiD) → Difference-in-Differences
4. Observational (Matching) → Propensity Score Methods
5. Observational (Regression) → Linear Regression + Controls

Pattern 2: Data Structure Drives Method

Rule: Method selection follows data availability hierarchy

Data Structure Priority:
1. Randomization → Experimental methods
2. Panel + Timing → Difference-in-Differences
3. Running Variable → Regression Discontinuity
4. Valid Instrument → Instrumental Variables
5. Rich Covariates → Propensity Score Methods
6. Limited Data → Linear Regression

Pattern 3: Treatment Type Considerations

Rule: Treatment variable type affects method choice within categories

Treatment Type Adaptations:
- Binary Treatment: Standard methods (matching, IV, etc.)
- Continuous Treatment: Specialized versions (generalized PS, IV)
- Categorical Treatment: Multinomial approaches
- Time-Varying Treatment: Dynamic methods

Next Steps

  1. Apply to Your Data: Use the decision framework with your datasets

  2. Explore Case Studies: See detailed examples in Case Studies

  3. Read Method Documentation: Deep dive into specific methods in Causal Inference Methods

Related Resources: - Method Selection Decision Tree - Complete decision tree documentation - Case Studies - Detailed case studies by domain - Quickstart Tutorial - Quick start guide for CAIS

Previous Next

© Copyright 2024, CAIS Team.

Built with Sphinx using a theme provided by Read the Docs.