Basic Usage
===========

This guide covers the fundamental workflows for conducting causal analysis with CAIS. Whether you're new to causal inference or experienced with other tools, this section will help you get started with common analysis patterns.

Core Workflow
-------------

CAIS follows a structured workflow that guides you through the causal analysis process:

1. **Input Parsing**: Understanding your research question and dataset
2. **Dataset Analysis**: Examining data structure and variable types
3. **Query Interpretation**: Identifying treatment, outcome, and control variables
4. **Method Selection**: Choosing the appropriate causal inference method
5. **Method Validation**: Checking assumptions and prerequisites
6. **Method Execution**: Running the analysis with diagnostics
7. **Result Interpretation**: Generating explanations and insights
8. **Output Formatting**: Presenting results in a structured format

Python API Usage
-----------------

Single Analysis
~~~~~~~~~~~~~~~

The most common use case is analyzing a single dataset with a specific causal question:

.. code-block:: python

    from causal_agent import run_causal_analysis
    
    # Basic analysis
    result = run_causal_analysis(
        query="What is the effect of job training on earnings?",
        dataset_path="data/lalonde_data.csv",
        dataset_description="LaLonde job training experiment data"
    )
    
    # Access key results
    effect_estimate = result['results']['results']['effect_estimate']
    method_used = result['results']['results']['method_used']
    treatment_var = result['results']['variables']['treatment_variable']
    outcome_var = result['results']['variables']['outcome_variable']
    
    print(f"Method: {method_used}")
    print(f"Effect of {treatment_var} on {outcome_var}: {effect_estimate}")

Understanding Results
~~~~~~~~~~~~~~~~~~~~

Causal Agent returns a structured dictionary with comprehensive analysis results:

.. code-block:: python

    # Example result structure
    {
        'results': {
            'results': {
                'effect_estimate': 1794.34,
                'standard_error': 632.85,
                'confidence_interval': [554.95, 3033.73],
                'p_value': 0.0045,
                'method_used': 'propensity_score_matching'
            },
            'variables': {
                'treatment_variable': 'treat',
                'outcome_variable': 're78',
                'covariates': ['age', 'education', 'black', 'hispanic', 'married']
            },
            'diagnostics': {
                'balance_statistics': {...},
                'assumption_checks': {...}
            },
            'explanation': "The analysis found a significant positive effect..."
        }
    }

Command Line Interface
----------------------

For quick analyses or integration into scripts, use the CLI:

Single Analysis
~~~~~~~~~~~~~~~

.. code-block:: bash

    # Basic command
    causal_agent run data/lalonde_data.csv "What is the effect of job training on earnings?"
    
    # With dataset description
    causal_agent run data/lalonde_data.csv \
        "What is the effect of job training on earnings?" \
        --desc "LaLonde job training experiment with treatment and control groups"
    
    # Specify LLM provider and model
    causal_agent run data/lalonde_data.csv \
        "What is the effect of job training on earnings?" \
        --llm-provider anthropic \
        --llm-name claude-3-5-sonnet-latest

Common Analysis Patterns
------------------------

Experimental Data (RCT)
~~~~~~~~~~~~~~~~~~~~~~~

When you have randomized controlled trial data:

.. code-block:: python

    result = run_causal_analysis(
        query="What is the treatment effect in this randomized experiment?",
        dataset_path="data/rct_data.csv",
        dataset_description="Randomized controlled trial with treatment and control groups"
    )

Observational Data
~~~~~~~~~~~~~~~~~~

For observational studies where you need to control for confounders:

.. code-block:: python

    result = run_causal_analysis(
        query="What is the effect of education on income, controlling for background factors?",
        dataset_path="data/observational_data.csv", 
        dataset_description="Survey data with education, income, and demographic variables"
    )

Time Series / Panel Data
~~~~~~~~~~~~~~~~~~~~~~~~

For difference-in-differences or other temporal analyses:

.. code-block:: python

    result = run_causal_analysis(
        query="What was the effect of the policy change on outcomes over time?",
        dataset_path="data/panel_data.csv",
        dataset_description="Panel data with pre/post policy implementation periods"
    )

Instrumental Variables
~~~~~~~~~~~~~~~~~~~~~

When you have an instrument for causal identification:

.. code-block:: python

    result = run_causal_analysis(
        query="What is the effect of education on wages using distance to college as an instrument?",
        dataset_path="data/iv_data.csv",
        dataset_description="Data with education, wages, and distance to college as instrument"
    )

Regression Discontinuity
~~~~~~~~~~~~~~~~~~~~~~~

For sharp cutoff designs:

.. code-block:: python

    result = run_causal_analysis(
        query="What is the effect of the scholarship program on test scores?",
        dataset_path="data/rdd_data.csv",
        dataset_description="Student data with test scores and scholarship eligibility cutoff"
    )

Working with Results
--------------------

Extracting Key Information
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Get the main causal effect estimate
    effect = result['results']['results']['effect_estimate']
    se = result['results']['results']['standard_error']
    ci = result['results']['results'].get('confidence_interval', None)

    # Check statistical significance
    p_value = result['results']['results']['p_value']
    is_significant = p_value < 0.05

    # Get variable information
    variables = result['results']['variables']
    treatment = variables['treatment_variable']
    outcome = variables['outcome_variable']
    covariates = variables.get('covariates', [])


Interpreting Diagnostics
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Access diagnostic information
    diagnostics = result['results']['results']['diagnostics']
    
    # For propensity score methods
    if 'balance_statistics' in diagnostics:
        balance = diagnostics['balance_statistics']
        print("Covariate balance after matching:")
        for var, stats in balance.items():
            print(f"  {var}: standardized difference = {stats['std_diff']:.3f}")
    
    # For IV methods  
    if 'first_stage_f_stat' in diagnostics:
        f_stat = diagnostics['first_stage_f_stat']
        print(f"First stage F-statistic: {f_stat:.2f}")
        if f_stat < 10:
            print("Warning: Weak instrument (F < 10)")

Error Handling
--------------

CAIS provides informative error messages when issues occur:

.. code-block:: python

    result = run_causal_analysis(
        query="What is the effect of X on Y?",
        dataset_path="data/problematic_data.csv"
    )
    
    # Check for errors
    if 'error' in result:
        print(f"Analysis failed: {result['error']}")
    else:
        # Process successful results
        effect = result['results']['results']['effect_estimate']

Common Issues and Solutions
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Missing Variables**
    If CAIS can't identify treatment or outcome variables, be more specific in your query:
    
    .. code-block:: python
    
        # Instead of: "What causes what?"
        # Use: "What is the effect of education on income?"

**Data Format Issues**
    Ensure your CSV has proper headers and numeric variables are correctly formatted:
    
    .. code-block:: python
    
        import pandas as pd
        df = pd.read_csv("data.csv")
        print(df.dtypes)  # Check variable types
        print(df.head())  # Check data format

**Method Selection Issues**
    If the automatic method selection isn't appropriate, the explanation will indicate why certain methods were chosen or rejected.

Best Practices
--------------

Data Preparation
~~~~~~~~~~~~~~~~

1. **Clean Variable Names**: Use descriptive, consistent variable names
2. **Handle Missing Data**: Address missing values before analysis
3. **Check Data Types**: Ensure treatment variables are properly coded (0/1 for binary)
4. **Document Your Data**: Provide clear dataset descriptions

Query Formulation
~~~~~~~~~~~~~~~~~

1. **Be Specific**: Clearly state treatment and outcome variables
2. **Use Causal Language**: Frame questions in terms of effects and causation
3. **Provide Context**: Include relevant background information in dataset descriptions

Result Interpretation
~~~~~~~~~~~~~~~~~~~~

1. **Check Assumptions**: Review diagnostic tests and assumption checks
2. **Consider Effect Size**: Look beyond statistical significance to practical significance
3. **Validate Results**: Compare with domain knowledge and alternative methods
4. **Document Decisions**: Keep track of analysis choices for reproducibility

Next Steps
----------

- For more advanced features and customization options, see :doc:`advanced_usage`
- To process multiple datasets efficiently, see :doc:`batch_processing`  
- For LLM provider setup and configuration, see :doc:`configuration`
- For detailed method documentation, see :doc:`../methods/index`