Quickstart Tutorial

Get up and running with Causal Agent in under 10 minutes! This tutorial will walk you through your first causal analysis using the Causal AI Scientist.

Overview

In this quickstart, you’ll learn how to:

  1. Set up Causal Agent with your API key

  2. Load a sample dataset

  3. Run your first causal analysis

  4. Interpret the results

Prerequisites

Before starting, make sure you have:

  • Causal Agent installed (see Installation Guide)

  • An OpenAI API key (or other supported LLM provider)

  • Basic familiarity with Python

Step 1: Setup and Configuration

First, let’s set up your environment and API key:

import os
from causal_agent import run_causal_analysis

# Set your API key (replace with your actual key)
os.environ['OPENAI_API_KEY'] = 'your-openai-api-key-here'

# Alternatively, create a .env file with your API key
# OPENAI_API_KEY=your-openai-api-key-here

Step 2: Prepare Your Data

For this tutorial, we’ll use a sample dataset about job training programs. You can use your own data or download our sample:

import pandas as pd

# Option 1: Use Causal Agent sample data
from causal_agent.synthetic import load_sample_data

# Load a sample job training dataset
data = load_sample_data('job_training')
data.to_csv('job_training_data.csv', index=False)

# Option 2: Use your own data
# data = pd.read_csv('your_dataset.csv')

Sample Data Structure:

# Let's examine the data structure
print(data.head())
print(f"Dataset shape: {data.shape}")
print(f"Columns: {list(data.columns)}")

Expected output:

Dataset shape: (1000, 8)
Columns: ['participant_id', 'job_training', 'age', 'education', 'prior_income', 'post_income', 'employment_status', 'region']

Step 3: Run Your First Analysis

Now let’s run a causal analysis to answer: “Does participating in job training increase income?”

# Run causal analysis
result = run_causal_analysis(
    query="Does participating in job training increase income?",
    dataset_path="job_training_data.csv",
    dataset_description="""
    This dataset contains information about individuals who may or may not have
    participated in a job training program. It includes demographic information
    (age, education), employment history (prior_income, employment_status),
    treatment status (job_training), outcome (post_income), and geographic
    information (region).
    """
)

print("Analysis complete!")

Step 4: Understanding the Results

Causal Agent returns a comprehensive result object. Let’s explore what it contains:

# Print the main results
print("=== CAUSAL ANALYSIS RESULTS ===")
print(f"Query: {result['query']}")
print(f"Method Used: {result['results']['results']['method_used']}")
print(f"Treatment Variable: {result['results']['variables']['treatment_variable']}")
print(f"Outcome Variable: {result['results']['variables']['outcome_variable']}")
print(f"Causal Effect: {result['results']['results']['effect_estimate']}")
print(f"Standard Error: {result['results']['results']['standard_error']}")
print(f"P-value: {result['results']['results']['p_value']}")

# Print the interpretation
print("\n=== INTERPRETATION ===")
print(result['explanation']['final_explanation_text'])

Sample Output:

=== CAUSAL ANALYSIS RESULTS ===
Query: Does participating in job training increase income?
Method Used: Propensity Score Matching
Treatment Variable: job_training
Outcome Variable: post_income
Causal Effect: 2847.32
Standard Error: 423.18
P-value: 0.001

=== INTERPRETATION ===
The analysis suggests that participating in job training increases income by
approximately $2,847 on average. This effect is statistically significant
(p < 0.05), indicating that job training has a positive causal impact on
post-training income levels.

Step 5: Exploring Different Queries

Try different causal questions with the same dataset:

# Different causal questions
queries = [
    "What is the effect of education level on income?",
    "Does age affect the likelihood of participating in job training?",
    "How does region influence employment outcomes?"
]

for query in queries:
    print(f"\n--- Analyzing: {query} ---")
    result = run_causal_analysis(
        query=query,
        dataset_path="job_training_data.csv",
        dataset_description="Job training dataset with demographic and outcome variables"
    )

    print(f"Method: {result['results']['results']['method_used']}")
    print(f"Effect: {result['results']['results']['effect_estimate']}")

Step 6: Working with Your Own Data

To analyze your own dataset, follow this template:

# Template for your own analysis
your_result = run_causal_analysis(
    query="Your causal question here",
    dataset_path="path/to/your/data.csv",
    dataset_description="""
    Describe your dataset here:
    - What does each row represent?
    - What are the key variables?
    - What is the context/domain?
    - Any important data collection details?
    """
)

# Examine results
print(f"Method selected: {your_result['results']['method_used']}")
print(f"Treatment: {your_result['results']['treatment_variable']}")
print(f"Outcome: {your_result['results']['outcome_variable']}")
print(f"Effect: {your_result['results']['effect_estimate']}")

Common Use Cases

Here are some example queries you can try with different types of data:

Education Research:

query = "Does class size reduction improve student test scores?"
# Dataset should have: class_size, test_scores, student demographics

Healthcare:

query = "What is the effect of a new treatment on patient recovery time?"
# Dataset should have: treatment_received, recovery_days, patient characteristics

Economics:

query = "Does minimum wage increase affect employment rates?"
# Dataset should have: min_wage_policy, employment_rate, regional controls

Marketing:

query = "How does email marketing affect customer purchase behavior?"
# Dataset should have: email_received, purchase_amount, customer demographics

Understanding Method Selection

Causal Agent automatically selects the most appropriate causal inference method based on your data characteristics:

# Check what method was selected and why
print(f"Selected Method: {result['results']['method_used']}")
print(f"Method Reasoning: {result['results'].get('method_reasoning', 'Not available')}")

# Common methods Causal Agent might select:
# - Randomized Controlled Trial (RCT) analysis
# - Propensity Score Matching
# - Difference-in-Differences (DiD)
# - Instrumental Variables (IV)
# - Regression Discontinuity Design (RDD)
# - Linear Regression with controls

Next Steps

Congratulations! You’ve completed your first causal analysis with Causal Agent. Here’s what to explore next:

Immediate Next Steps:

  1. Try the detailed tutorial: Your First Causal Analysis - Learn more about interpreting results

  2. Explore different datasets: Use the sample datasets in the data/ directory

  3. Learn about methods: Causal Inference Methods - Understand when each method is used

Advanced Usage:

  1. User Guide: User Guide - Advanced configuration and batch processing

  2. Tutorials: Tutorials & Examples - Domain-specific examples and case studies

  3. API Reference: API Reference - Complete function documentation

Getting Help:

Troubleshooting Quick Fixes

API Key Issues:

# Verify your API key is set
import os
print("API Key set:", "OPENAI_API_KEY" in os.environ)

Import Errors:

# Reinstall if needed
pip install --upgrade causal-agent

Data Format Issues:

# Ensure your data is in CSV format with proper headers
data = pd.read_csv('your_data.csv')
print(data.dtypes)  # Check data types
print(data.isnull().sum())  # Check for missing values

Ready for more? Continue to Your First Causal Analysis for a deeper dive into causal analysis concepts!