Quickstart Tutorial

Get up and running with Causal Agent in under 10 minutes! This tutorial will walk you through your first causal analysis using the Causal AI Scientist.

Overview 

In this quickstart, you’ll learn how to:

Set up Causal Agent with your API key
Load a sample dataset
Run your first causal analysis
Interpret the results

Prerequisites 

Before starting, make sure you have:

Causal Agent installed (see Installation Guide)
An OpenAI API key (or other supported LLM provider)
Basic familiarity with Python

Step 1: Setup and Configuration 

First, let’s set up your environment and API key:

import os
from causal_agent import run_causal_analysis

# Set your API key (replace with your actual key)
os.environ['OPENAI_API_KEY'] = 'your-openai-api-key-here'

# Alternatively, create a .env file with your API key
# OPENAI_API_KEY=your-openai-api-key-here

Step 2: Prepare Your Data 

For this tutorial, we’ll use a sample dataset about job training programs. You can use your own data or download our sample:

import pandas as pd

# Option 1: Use Causal Agent sample data
from causal_agent.synthetic import load_sample_data

# Load a sample job training dataset
data = load_sample_data('job_training')
data.to_csv('job_training_data.csv', index=False)

# Option 2: Use your own data
# data = pd.read_csv('your_dataset.csv')

Sample Data Structure:

# Let's examine the data structure
print(data.head())
print(f"Dataset shape: {data.shape}")
print(f"Columns: {list(data.columns)}")

Expected output:

Dataset shape: (1000, 8)
Columns: ['participant_id', 'job_training', 'age', 'education', 'prior_income', 'post_income', 'employment_status', 'region']

Step 3: Run Your First Analysis 

Now let’s run a causal analysis to answer: “Does participating in job training increase income?”

# Run causal analysis
result = run_causal_analysis(
    query="Does participating in job training increase income?",
    dataset_path="job_training_data.csv",
    dataset_description="""
    This dataset contains information about individuals who may or may not have
    participated in a job training program. It includes demographic information
    (age, education), employment history (prior_income, employment_status),
    treatment status (job_training), outcome (post_income), and geographic
    information (region).
    """
)

print("Analysis complete!")

Step 4: Understanding the Results 

Causal Agent returns a comprehensive result object. Let’s explore what it contains:

# Print the main results
print("=== CAUSAL ANALYSIS RESULTS ===")
print(f"Query: {result['query']}")
print(f"Method Used: {result['results']['results']['method_used']}")
print(f"Treatment Variable: {result['results']['variables']['treatment_variable']}")
print(f"Outcome Variable: {result['results']['variables']['outcome_variable']}")
print(f"Causal Effect: {result['results']['results']['effect_estimate']}")
print(f"Standard Error: {result['results']['results']['standard_error']}")
print(f"P-value: {result['results']['results']['p_value']}")

# Print the interpretation
print("\n=== INTERPRETATION ===")
print(result['explanation']['final_explanation_text'])

Sample Output:

=== CAUSAL ANALYSIS RESULTS ===
Query: Does participating in job training increase income?
Method Used: Propensity Score Matching
Treatment Variable: job_training
Outcome Variable: post_income
Causal Effect: 2847.32
Standard Error: 423.18
P-value: 0.001

=== INTERPRETATION ===
The analysis suggests that participating in job training increases income by
approximately $2,847 on average. This effect is statistically significant
(p < 0.05), indicating that job training has a positive causal impact on
post-training income levels.

Step 5: Exploring Different Queries 

Try different causal questions with the same dataset:

# Different causal questions
queries = [
    "What is the effect of education level on income?",
    "Does age affect the likelihood of participating in job training?",
    "How does region influence employment outcomes?"
]

for query in queries:
    print(f"\n--- Analyzing: {query} ---")
    result = run_causal_analysis(
        query=query,
        dataset_path="job_training_data.csv",
        dataset_description="Job training dataset with demographic and outcome variables"
    )

    print(f"Method: {result['results']['results']['method_used']}")
    print(f"Effect: {result['results']['results']['effect_estimate']}")

Step 6: Working with Your Own Data 

To analyze your own dataset, follow this template:

# Template for your own analysis
your_result = run_causal_analysis(
    query="Your causal question here",
    dataset_path="path/to/your/data.csv",
    dataset_description="""
    Describe your dataset here:
    - What does each row represent?
    - What are the key variables?
    - What is the context/domain?
    - Any important data collection details?
    """
)

# Examine results
print(f"Method selected: {your_result['results']['method_used']}")
print(f"Treatment: {your_result['results']['treatment_variable']}")
print(f"Outcome: {your_result['results']['outcome_variable']}")
print(f"Effect: {your_result['results']['effect_estimate']}")

Common Use Cases 

Here are some example queries you can try with different types of data:

Education Research:

query = "Does class size reduction improve student test scores?"
# Dataset should have: class_size, test_scores, student demographics

Healthcare:

query = "What is the effect of a new treatment on patient recovery time?"
# Dataset should have: treatment_received, recovery_days, patient characteristics

Economics:

query = "Does minimum wage increase affect employment rates?"
# Dataset should have: min_wage_policy, employment_rate, regional controls

Marketing:

query = "How does email marketing affect customer purchase behavior?"
# Dataset should have: email_received, purchase_amount, customer demographics

Understanding Method Selection 

Causal Agent automatically selects the most appropriate causal inference method based on your data characteristics:

# Check what method was selected and why
print(f"Selected Method: {result['results']['method_used']}")
print(f"Method Reasoning: {result['results'].get('method_reasoning', 'Not available')}")

# Common methods Causal Agent might select:
# - Randomized Controlled Trial (RCT) analysis
# - Propensity Score Matching
# - Difference-in-Differences (DiD)
# - Instrumental Variables (IV)
# - Regression Discontinuity Design (RDD)
# - Linear Regression with controls

Next Steps 

Congratulations! You’ve completed your first causal analysis with Causal Agent. Here’s what to explore next:

Immediate Next Steps:

Try the detailed tutorial: Your First Causal Analysis - Learn more about interpreting results
Explore different datasets: Use the sample datasets in the data/ directory
Learn about methods: Causal Inference Methods - Understand when each method is used

Advanced Usage:

User Guide: User Guide - Advanced configuration and batch processing
Tutorials: Tutorials & Examples - Domain-specific examples and case studies
API Reference: API Reference - Complete function documentation

Getting Help:

Troubleshooting: See the troubleshooting section below
Community: Join our GitHub Discussions
Issues: Report bugs on GitHub Issues

Troubleshooting Quick Fixes 

API Key Issues:

# Verify your API key is set
import os
print("API Key set:", "OPENAI_API_KEY" in os.environ)

Import Errors:

# Reinstall if needed
pip install --upgrade causal-agent

Data Format Issues:

# Ensure your data is in CSV format with proper headers
data = pd.read_csv('your_data.csv')
print(data.dtypes)  # Check data types
print(data.isnull().sum())  # Check for missing values

Ready for more? Continue to Your First Causal Analysis for a deeper dive into causal analysis concepts!