causal_agent.components package

causal_agent.components.analyze_dataset(dataset_path, llm_client=None, dataset_description=None, original_query=None)[source]

Analyze a dataset to identify important characteristics for causal inference.

Parameters:

dataset_path (str) – Path to the dataset file
llm_client (langchain_core.language_models.BaseChatModel | None) – Optional LLM client for enhanced analysis
dataset_description (str | None) – Optional description of the dataset for context

Returns:

dataset_info: Basic information about the dataset
columns: List of column names
potential_treatments: List of potential treatment variables (possibly LLM augmented)
potential_outcomes: List of potential outcome variables (possibly LLM augmented)
temporal_structure_detected: Whether temporal structure was detected
panel_data_detected: Whether panel data structure was detected
potential_instruments_detected: Whether potential instruments were detected
discontinuities_detected: Whether discontinuities were detected
llm_augmentation: Status of LLM augmentation if used

Return type:

Dict containing dataset analysis results

causal_agent.components.interpret_query(query_info, dataset_analysis, dataset_description=None)[source]

Interpret query using hybrid heuristic/LLM approach to identify variables.

Parameters:

query_info (Dict[str, Any]) – Information extracted from the user’s query (text, hints).
dataset_analysis (Dict[str, Any]) – Information about the dataset structure (columns, types, etc.).
dataset_description (str | None) – Optional textual description of the dataset.
llm – Optional language model instance.

Returns:

Dict containing identified variables (treatment, outcome, covariates, etc., and is_rct).

Return type:

causal_agent.components.select_method(dataset_properties, excluded_methods=None)[source]

causal_agent.components.validate_method(method_info, dataset_analysis, variables)[source]

Validate the selected causal method against dataset characteristics.

Parameters:

method_info (Dict[str, Any]) – Information about the selected method from decision_tree
dataset_analysis (Dict[str, Any]) – Dataset analysis results from dataset_analyzer
variables (Dict[str, Any]) – Identified variables from query_interpreter

Returns:

valid: Boolean indicating if method is valid
concerns: List of concerns/issues with the selected method
alternative_suggestions: Alternative methods if the selected method is problematic
recommended_method: Updated method recommendation if issues are found

Return type:

Dict with validation results

causal_agent.components.generate_explanation(method_info, validation_result, variables, results, dataset_analysis=None, dataset_description=None, llm=None)[source]

Generates a comprehensive explanation text for the causal analysis.

Parameters:

method_info (Dict[str, Any]) – Dictionary containing selected method details.
validation_result (Dict[str, Any]) – Dictionary containing method validation results.
variables (Dict[str, Any]) – Dictionary containing identified variables.
results (Dict[str, Any]) – Dictionary containing numerical results from the method execution.
dataset_analysis (Dict[str, Any] | None) – Optional dictionary with dataset analysis details.
dataset_description (str | None) – Optional string describing the dataset.
llm (langchain_core.language_models.BaseChatModel | None) – Optional language model instance (for potential future use in generation).

Returns:

Dictionary containing the final explanation text.

Return type:

Dict[str, str]

causal_agent.components.format_output(query, method, results, explanation, dataset_analysis=None, dataset_description=None)[source]

Format final results including numerical estimates and explanations.

Parameters:

query (str) – Original user query
method (str) – Causal inference method used (string name)
results (Dict[str, Any]) – Numerical results from method_executor_tool
explanation (Dict[str, Any]) – Structured explanation object from explainer_tool
dataset_analysis (Dict[str, Any] | None) – Optional dictionary of dataset analysis results
dataset_description (str | None) – Optional string description of the dataset

Returns:

Dict with formatted output fields ready for presentation.

Return type:

FormattedOutput

causal_agent.components.create_workflow_state_update(current_step, step_completed_flag, next_tool, next_step_reason, error=None)[source]

Create a standardized workflow state update dictionary.

Parameters:

current_step (str) – Current step in the workflow (e.g., “input_processing”)
step_completed_flag (bool) – Flag indicating which step was completed (e.g., “query_parsed”)
next_tool (str) – Name of the next tool to call
next_step_reason (str) – Reason message for the next step
error (str | None) – Optional error message if the step failed

Returns:

Dictionary containing the workflow_state sub-dictionary

Return type:

Submodules

causal_agent.components.dataset_analyzer module

Dataset analyzer component for causal inference.

This module provides functionality to analyze datasets to detect characteristics relevant for causal inference methods, including temporal structure, potential instrumental variables, discontinuities, and variable relationships.

causal_agent.components.dataset_analyzer.analyze_dataset(dataset_path, llm_client=None, dataset_description=None, original_query=None)[source]

Analyze a dataset to identify important characteristics for causal inference.

Parameters:

dataset_path (str) – Path to the dataset file
llm_client (langchain_core.language_models.BaseChatModel | None) – Optional LLM client for enhanced analysis
dataset_description (str | None) – Optional description of the dataset for context

Returns:

dataset_info: Basic information about the dataset
columns: List of column names
potential_treatments: List of potential treatment variables (possibly LLM augmented)
potential_outcomes: List of potential outcome variables (possibly LLM augmented)
temporal_structure_detected: Whether temporal structure was detected
panel_data_detected: Whether panel data structure was detected
potential_instruments_detected: Whether potential instruments were detected
discontinuities_detected: Whether discontinuities were detected
llm_augmentation: Status of LLM augmentation if used

Return type:

Dict containing dataset analysis results

causal_agent.components.dataset_analyzer.detect_temporal_structure(df, llm_client=None, dataset_description=None, original_query=None)[source]

Detect temporal structure in the dataset, using LLM for enhanced identification.

Parameters:

df (DataFrame) – DataFrame to analyze
llm_client (langchain_core.language_models.BaseChatModel | None) – Optional LLM client for enhanced identification
dataset_description (str | None) – Optional description of the dataset for context

Returns:

has_temporal_structure: Whether temporal structure exists
temporal_columns: Primary time column identified (or list if multiple from heuristic)
is_panel_data: Whether data is in panel format
time_column: Primary time column identified for panel data
id_column: Primary unit ID column identified for panel data
time_periods: Number of time periods (if panel data)
units: Number of unique units (if panel data)
identification_method: How time/unit vars were identified (‘LLM’, ‘Heuristic’, ‘None’)

Return type:

Dict with information about temporal structure

causal_agent.components.dataset_analyzer.find_potential_instruments(df, llm_client=None, potential_treatments=None, potential_outcomes=None, dataset_description=None)[source]

Find potential instrumental variables in the dataset, using LLM if available. Falls back to heuristic method if LLM fails or is not available.

Parameters:

df (DataFrame) – DataFrame to analyze
llm_client (langchain_core.language_models.BaseChatModel | None) – Optional LLM client for enhanced identification
potential_treatments (List[str]) – Optional list of potential treatment variables
potential_outcomes (List[str]) – Optional list of potential outcome variables
dataset_description (str | None) – Optional description of the dataset for context

Returns:

List of potential instrumental variables with their properties

Return type:

List[Dict[str, Any]]

causal_agent.components.dataset_analyzer.detect_discontinuities(df)[source]

Identify discontinuities in continuous variables (for RDD).

Parameters:: df (DataFrame) – DataFrame to analyze
Returns:: Dict with information about detected discontinuities
Return type:: Dict[str, Any]

causal_agent.components.dataset_analyzer.assess_variable_relationships(df, corr_matrix)[source]

Assess relationships between variables in the dataset.

Parameters:

df (DataFrame) – DataFrame to analyze
corr_matrix (DataFrame) – Precomputed correlation matrix for numeric columns

Returns:

strongly_correlated_pairs: Pairs of strongly correlated variables
potential_confounders: Variables that might be confounders

Return type:

Dict with information about variable relationships

causal_agent.components.decision_tree module

decision tree component for selecting causal inference methods

this module implements the decision tree logic to select the most appropriate causal inference method based on dataset characteristics and available variables

causal_agent.components.decision_tree.select_method(dataset_properties, excluded_methods=None)[source]

causal_agent.components.decision_tree.rule_based_select_method(dataset_analysis, variables, is_rct, llm, dataset_description, original_query, excluded_methods=None)[source]

Wrapped function to select causal method based on dataset properties and query

Parameters:

dataset_analysis (Dict) – results of dataset analysis
variables (Dict) – dictionary of variable names and types
is_rct (bool) – whether the dataset is from a randomized controlled trial
llm (BaseChatModel) – language model instance for generating prompts
dataset_description (str) – description of the dataset
original_query (str) – the original user query
excluded_methods (List[str], optional) – list of methods to exclude from selection

class causal_agent.components.decision_tree.DecisionTreeEngine(verbose=False)[source]

Bases: object

Engine for applying decision trees to select appropriate causal methods.

This class wraps the functional decision tree implementation to provide an object-oriented interface for method selection.

__init__(verbose=False)[source]

select_method(df, treatment, outcome, covariates, dataset_analysis, query_details)[source]

Apply decision tree to select appropriate causal method.

causal_agent.components.decision_tree_llm module

LLM-based Decision tree component for selecting causal inference methods.

This module implements the decision tree logic via an LLM prompt to select the most appropriate causal inference method based on dataset characteristics and available variables.

class causal_agent.components.decision_tree_llm.DecisionTreeLLMEngine(verbose=False)[source]

Bases: object

Engine for applying an LLM-based decision tree to select appropriate causal methods.

__init__(verbose=False)[source]

Initialize the LLM decision tree engine.

Parameters:: verbose (bool) – Whether to print verbose information.

select_method_llm(dataset_analysis, variables, is_rct=False, llm=None, excluded_methods=None)[source]

Apply LLM-based decision tree to select appropriate causal method.

Parameters:

dataset_analysis (Dict[str, Any]) – Dataset analysis results.
variables (Dict[str, Any]) – Identified variables from query_interpreter.
is_rct (bool) – Boolean indicating if the data comes from an RCT.
llm (langchain_core.language_models.BaseChatModel | None) – Langchain BaseChatModel instance for making the call.
excluded_methods (List[str] | None) – Optional list of method names to exclude from selection.

Returns:

Dict with selected method, justification, and assumptions. Example: {{

”selected_method”: “difference_in_differences”, “method_justification”: “Reasoning…”, “method_assumptions”: [“Assumption 1”, …], “alternative_methods”: [“instrumental_variable”]

}}

Return type:

causal_agent.components.explanation_generator module

Explanation generator component for causal inference methods.

This module generates explanations for causal inference methods, including what the method does, its assumptions, and how it will be applied to the dataset.

causal_agent.components.explanation_generator.generate_explanation(method_info, validation_result, variables, results, dataset_analysis=None, dataset_description=None, llm=None)[source]

Generates a comprehensive explanation text for the causal analysis.

Parameters:

method_info (Dict[str, Any]) – Dictionary containing selected method details.
validation_result (Dict[str, Any]) – Dictionary containing method validation results.
variables (Dict[str, Any]) – Dictionary containing identified variables.
results (Dict[str, Any]) – Dictionary containing numerical results from the method execution.
dataset_analysis (Dict[str, Any] | None) – Optional dictionary with dataset analysis details.
dataset_description (str | None) – Optional string describing the dataset.
llm (langchain_core.language_models.BaseChatModel | None) – Optional language model instance (for potential future use in generation).

Returns:

Dictionary containing the final explanation text.

Return type:

Dict[str, str]

causal_agent.components.explanation_generator.get_method_explanation(method)[source]

Get explanation for what the method does.

Parameters:: method (str) – Causal inference method name
Returns:: String explaining what the method does
Return type:: str

causal_agent.components.explanation_generator.explain_assumptions(assumptions)[source]

Explain each assumption of the method.

Parameters:: assumptions (List[str]) – List of assumption names
Returns:: List of dictionaries with assumption name and explanation
Return type:: List[Dict[str, str]]

causal_agent.components.explanation_generator.explain_application(method, treatment, outcome, covariates, variables)[source]

Explain how the method will be applied to the dataset.

Parameters:

method (str) – Causal inference method name
treatment (str) – Treatment variable name
outcome (str) – Outcome variable name
covariates (List[str]) – List of covariate names
variables (Dict[str, Any]) – Dictionary of identified variables

Returns:

String explaining the application

Return type:

causal_agent.components.explanation_generator.explain_limitations(method, concerns)[source]

Explain the limitations of the method based on validation concerns.

Parameters:

method (str) – Causal inference method name
concerns (List[str]) – List of concerns from validation

Returns:

String explaining the limitations

Return type:

causal_agent.components.explanation_generator.generate_interpretation_guide(method, treatment, outcome)[source]

Generate guide for interpreting the results.

Parameters:

method (str) – Causal inference method name
treatment (str) – Treatment variable name
outcome (str) – Outcome variable name

Returns:

String with interpretation guide

Return type:

causal_agent.components.input_parser module

Input parser component for extracting information from causal queries.

This module provides functionality to parse user queries and extract key elements such as the causal question, relevant variables, and constraints.

class causal_agent.components.input_parser.ParsedVariables(*, treatment=<factory>, outcome=<factory>, covariates_mentioned=<factory>, grouping_vars=<factory>, instruments_mentioned=<factory>)[source]

Bases: BaseModel

treatment: List[str]

outcome: List[str]

covariates_mentioned: List[str] | None

grouping_vars: List[str] | None

instruments_mentioned: List[str] | None

__copy__()

Returns a shallow copy of the model.

__deepcopy__(memo=None)

Returns a deep copy of the model.

classmethod __get_pydantic_json_schema__(core_schema, handler, /)

Hook into generating the model’s JSON schema.

Parameters:

core_schema (CoreSchema) – A pydantic-core CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema ({‘type’: ‘nullable’, ‘schema’: current_schema}), or just call the handler with the original schema.
handler (GetJsonSchemaHandler) – Call into Pydantic’s internal JSON schema generation. This will raise a pydantic.errors.PydanticInvalidForJsonSchema if JSON schema generation fails. Since this gets called by BaseModel.model_json_schema you can override the schema_generator argument to that function to change JSON schema generation globally for a type.

Returns:

A JSON schema, as a Python object.

Return type:

JsonSchemaValue

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

__iter__()

So dict(model) works.

__pretty__(fmt, **kwargs)

Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.

classmethod __pydantic_init_subclass__(**kwargs)

This is intended to behave just like __init_subclass__, but is called by ModelMetaclass only after basic class initialization is complete. In particular, attributes like model_fields will be present when this is called, but forward annotations are not guaranteed to be resolved yet, meaning that creating an instance of the class may fail.

This is necessary because __init_subclass__ will always be called by type.__new__, and it would require a prohibitively large refactor to the ModelMetaclass to ensure that type.__new__ was called in such a manner that the class would already be sufficiently initialized.

This will receive the same kwargs that would be passed to the standard __init_subclass__, namely, any kwargs passed to the class definition that aren’t used internally by Pydantic.

Parameters:: **kwargs (Any) – Any keyword arguments passed to the class definition that aren’t used internally by Pydantic.

Note

You may want to override [__pydantic_on_complete__()][pydantic.main.BaseModel.__pydantic_on_complete__] instead, which is called once the class and its fields are fully initialized and ready for validation.

classmethod __pydantic_on_complete__()

This is called once the class and its fields are fully initialized and ready to be used.

This typically happens when the class is created (just before [__pydantic_init_subclass__()][pydantic.main.BaseModel.__pydantic_init_subclass__] is called on the superclass), except when forward annotations are used that could not immediately be resolved. In that case, it will be called later, when the model is rebuilt automatically or explicitly using [model_rebuild()][pydantic.main.BaseModel.model_rebuild].

__repr_name__()

Name of the instance’s class, used in __repr__.

__repr_recursion__(object)

Returns the string representation of a recursive object.

__rich_repr__()

Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.

classmethod construct(_fields_set=None, **values)

copy(*, include=None, exclude=None, update=None, deep=False)

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

Return type:

Self

dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)

classmethod from_orm(obj)

json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)

model_computed_fields = {}

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod model_construct(_fields_set=None, **values)

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

Return type:

model_copy(*, update=None, deep=False)

!!! abstract “Usage Documentation”: [model_copy](../concepts/models.md#model-copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#python-mode)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A dictionary representation of the model.

Return type:

model_dump_json(*, indent=None, ensure_ascii=False, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#json-mode)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
ensure_ascii (bool) – If True, the output is guaranteed to have all incoming non-ASCII characters escaped. If False (the default), these characters will be output as-is.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A JSON string representation of the model.

Return type:

property model_extra: dict[str, Any] | None

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'covariates_mentioned': FieldInfo(annotation=Union[List[str], NoneType], required=False, default_factory=list, description='Covariate/control variable(s) explicitly mentioned in the query.'), 'grouping_vars': FieldInfo(annotation=Union[List[str], NoneType], required=False, default_factory=list, description='Variable(s) identifying groups or units for analysis.'), 'instruments_mentioned': FieldInfo(annotation=Union[List[str], NoneType], required=False, default_factory=list, description='Potential instrumental variable(s) mentioned.'), 'outcome': FieldInfo(annotation=List[str], required=False, default_factory=list, description='Variable(s) representing the outcome/result.'), 'treatment': FieldInfo(annotation=List[str], required=False, default_factory=list, description='Variable(s) representing the treatment/intervention.')}

property model_fields_set: set[str]

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classmethod model_json_schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, schema_generator=GenerateJsonSchema, mode='validation', *, union_format='any_of')

Generates a JSON schema for a model class.

Parameters:

by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
union_format (Literal['any_of', 'primitive_type_array']) –
The format to use when combining schemas from unions together. Can be one of:
- ’any_of’: Use the [anyOf](https://json-schema.org/understanding-json-schema/reference/combining#anyOf)
keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

classmethod model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.
Return type:: str

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, extra=None, from_attributes=None, context=None, by_alias=None, by_name=None)

Validate a pydantic model instance.

Parameters:

obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

classmethod model_validate_json(json_data, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

classmethod model_validate_strings(obj, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

Validate the given object with string data against the Pydantic model.

Parameters:

obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Return type:

classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod parse_obj(obj)

classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE)

classmethod schema_json(*, by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, **dumps_kwargs)

classmethod update_forward_refs(**localns)

classmethod validate(value)

class causal_agent.components.input_parser.ParsedQueryInfo(*, query_type, variables, constraints=<factory>, dataset_path_mentioned=None)[source]

Bases: BaseModel

query_type: str

variables: ParsedVariables

constraints: List[str] | None

dataset_path_mentioned: str | None

__copy__()

Returns a shallow copy of the model.

__deepcopy__(memo=None)

Returns a deep copy of the model.

classmethod __get_pydantic_json_schema__(core_schema, handler, /)

Hook into generating the model’s JSON schema.

Parameters:

core_schema (CoreSchema) – A pydantic-core CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema ({‘type’: ‘nullable’, ‘schema’: current_schema}), or just call the handler with the original schema.
handler (GetJsonSchemaHandler) – Call into Pydantic’s internal JSON schema generation. This will raise a pydantic.errors.PydanticInvalidForJsonSchema if JSON schema generation fails. Since this gets called by BaseModel.model_json_schema you can override the schema_generator argument to that function to change JSON schema generation globally for a type.

Returns:

A JSON schema, as a Python object.

Return type:

JsonSchemaValue

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

__iter__()

So dict(model) works.

__pretty__(fmt, **kwargs)

Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.

classmethod __pydantic_init_subclass__(**kwargs)

This is intended to behave just like __init_subclass__, but is called by ModelMetaclass only after basic class initialization is complete. In particular, attributes like model_fields will be present when this is called, but forward annotations are not guaranteed to be resolved yet, meaning that creating an instance of the class may fail.

This is necessary because __init_subclass__ will always be called by type.__new__, and it would require a prohibitively large refactor to the ModelMetaclass to ensure that type.__new__ was called in such a manner that the class would already be sufficiently initialized.

This will receive the same kwargs that would be passed to the standard __init_subclass__, namely, any kwargs passed to the class definition that aren’t used internally by Pydantic.

Parameters:: **kwargs (Any) – Any keyword arguments passed to the class definition that aren’t used internally by Pydantic.

Note

You may want to override [__pydantic_on_complete__()][pydantic.main.BaseModel.__pydantic_on_complete__] instead, which is called once the class and its fields are fully initialized and ready for validation.

classmethod __pydantic_on_complete__()

This is called once the class and its fields are fully initialized and ready to be used.

This typically happens when the class is created (just before [__pydantic_init_subclass__()][pydantic.main.BaseModel.__pydantic_init_subclass__] is called on the superclass), except when forward annotations are used that could not immediately be resolved. In that case, it will be called later, when the model is rebuilt automatically or explicitly using [model_rebuild()][pydantic.main.BaseModel.model_rebuild].

__repr_name__()

Name of the instance’s class, used in __repr__.

__repr_recursion__(object)

Returns the string representation of a recursive object.

__rich_repr__()

Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.

classmethod construct(_fields_set=None, **values)

copy(*, include=None, exclude=None, update=None, deep=False)

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

Return type:

Self

dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)

classmethod from_orm(obj)

json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)

model_computed_fields = {}

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod model_construct(_fields_set=None, **values)

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

Return type:

model_copy(*, update=None, deep=False)

!!! abstract “Usage Documentation”: [model_copy](../concepts/models.md#model-copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#python-mode)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A dictionary representation of the model.

Return type:

model_dump_json(*, indent=None, ensure_ascii=False, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#json-mode)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
ensure_ascii (bool) – If True, the output is guaranteed to have all incoming non-ASCII characters escaped. If False (the default), these characters will be output as-is.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A JSON string representation of the model.

Return type:

property model_extra: dict[str, Any] | None

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'constraints': FieldInfo(annotation=Union[List[str], NoneType], required=False, default_factory=list, description="Constraints or conditions mentioned (e.g., 'X > 10', 'country = USA')."), 'dataset_path_mentioned': FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='Dataset path explicitly mentioned in the query, if any.'), 'query_type': FieldInfo(annotation=str, required=True, description='Type of query (e.g., EFFECT_ESTIMATION, COUNTERFACTUAL, CORRELATION, DESCRIPTIVE, OTHER). Required.'), 'variables': FieldInfo(annotation=ParsedVariables, required=True, description='Variables identified in the query.')}

property model_fields_set: set[str]

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classmethod model_json_schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, schema_generator=GenerateJsonSchema, mode='validation', *, union_format='any_of')

Generates a JSON schema for a model class.

Parameters:

by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
union_format (Literal['any_of', 'primitive_type_array']) –
The format to use when combining schemas from unions together. Can be one of:
- ’any_of’: Use the [anyOf](https://json-schema.org/understanding-json-schema/reference/combining#anyOf)
keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

classmethod model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.
Return type:: str

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, extra=None, from_attributes=None, context=None, by_alias=None, by_name=None)

Validate a pydantic model instance.

Parameters:

obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

classmethod model_validate_json(json_data, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

classmethod model_validate_strings(obj, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

Validate the given object with string data against the Pydantic model.

Parameters:

obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Return type:

classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod parse_obj(obj)

classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE)

classmethod schema_json(*, by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, **dumps_kwargs)

classmethod update_forward_refs(**localns)

classmethod validate(value)

class causal_agent.components.input_parser.ExtractedPath(*, dataset_path=None)[source]

Bases: BaseModel

dataset_path: str | None

__copy__()

Returns a shallow copy of the model.

__deepcopy__(memo=None)

Returns a deep copy of the model.

classmethod __get_pydantic_json_schema__(core_schema, handler, /)

Hook into generating the model’s JSON schema.

Parameters:

core_schema (CoreSchema) – A pydantic-core CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema ({‘type’: ‘nullable’, ‘schema’: current_schema}), or just call the handler with the original schema.
handler (GetJsonSchemaHandler) – Call into Pydantic’s internal JSON schema generation. This will raise a pydantic.errors.PydanticInvalidForJsonSchema if JSON schema generation fails. Since this gets called by BaseModel.model_json_schema you can override the schema_generator argument to that function to change JSON schema generation globally for a type.

Returns:

A JSON schema, as a Python object.

Return type:

JsonSchemaValue

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

__iter__()

So dict(model) works.

__pretty__(fmt, **kwargs)

Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.

classmethod __pydantic_init_subclass__(**kwargs)

This is intended to behave just like __init_subclass__, but is called by ModelMetaclass only after basic class initialization is complete. In particular, attributes like model_fields will be present when this is called, but forward annotations are not guaranteed to be resolved yet, meaning that creating an instance of the class may fail.

This is necessary because __init_subclass__ will always be called by type.__new__, and it would require a prohibitively large refactor to the ModelMetaclass to ensure that type.__new__ was called in such a manner that the class would already be sufficiently initialized.

This will receive the same kwargs that would be passed to the standard __init_subclass__, namely, any kwargs passed to the class definition that aren’t used internally by Pydantic.

Parameters:: **kwargs (Any) – Any keyword arguments passed to the class definition that aren’t used internally by Pydantic.

Note

You may want to override [__pydantic_on_complete__()][pydantic.main.BaseModel.__pydantic_on_complete__] instead, which is called once the class and its fields are fully initialized and ready for validation.

classmethod __pydantic_on_complete__()

This is called once the class and its fields are fully initialized and ready to be used.

This typically happens when the class is created (just before [__pydantic_init_subclass__()][pydantic.main.BaseModel.__pydantic_init_subclass__] is called on the superclass), except when forward annotations are used that could not immediately be resolved. In that case, it will be called later, when the model is rebuilt automatically or explicitly using [model_rebuild()][pydantic.main.BaseModel.model_rebuild].

__repr_name__()

Name of the instance’s class, used in __repr__.

__repr_recursion__(object)

Returns the string representation of a recursive object.

__rich_repr__()

Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.

classmethod construct(_fields_set=None, **values)

copy(*, include=None, exclude=None, update=None, deep=False)

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

Return type:

Self

dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)

classmethod from_orm(obj)

json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)

model_computed_fields = {}

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod model_construct(_fields_set=None, **values)

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

Return type:

model_copy(*, update=None, deep=False)

!!! abstract “Usage Documentation”: [model_copy](../concepts/models.md#model-copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#python-mode)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A dictionary representation of the model.

Return type:

model_dump_json(*, indent=None, ensure_ascii=False, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False, polymorphic_serialization=None)

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#json-mode)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
ensure_ascii (bool) – If True, the output is guaranteed to have all incoming non-ASCII characters escaped. If False (the default), these characters will be output as-is.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
polymorphic_serialization (bool | None) – Whether to use model and dataclass polymorphic serialization for this call.

Returns:

A JSON string representation of the model.

Return type:

property model_extra: dict[str, Any] | None

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'dataset_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='File path or URL for the dataset mentioned in the query.')}

property model_fields_set: set[str]

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classmethod model_json_schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, schema_generator=GenerateJsonSchema, mode='validation', *, union_format='any_of')

Generates a JSON schema for a model class.

Parameters:

by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
union_format (Literal['any_of', 'primitive_type_array']) –
The format to use when combining schemas from unions together. Can be one of:
- ’any_of’: Use the [anyOf](https://json-schema.org/understanding-json-schema/reference/combining#anyOf)
keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

classmethod model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.
Return type:: str

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, extra=None, from_attributes=None, context=None, by_alias=None, by_name=None)

Validate a pydantic model instance.

Parameters:

obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

classmethod model_validate_json(json_data, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

classmethod model_validate_strings(obj, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)

Validate the given object with string data against the Pydantic model.

Parameters:

obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Return type:

classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod parse_obj(obj)

classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

classmethod schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE)

classmethod schema_json(*, by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, **dumps_kwargs)

classmethod update_forward_refs(**localns)

classmethod validate(value)

causal_agent.components.input_parser.extract_dataset_path(query, llm=None)[source]

Extract dataset path from the query using regex patterns, with LLM fallback.

Parameters:

query (str) – The user’s causal question text
llm (langchain_core.language_models.BaseChatModel | None) – The shared LLM client instance for fallback.

Returns:

String with dataset path or None if not found

Return type:

str | None

causal_agent.components.input_parser.parse_input(query, dataset_path_arg=None, dataset_info=None, llm=None)[source]

Parse the user’s causal query using LLM and regex.

Parameters:

query (str) – The user’s causal question text.
dataset_path_arg (str | None) – Path to dataset if provided directly as an argument.
dataset_info (Dict | None) – Dictionary with dataset context (columns, types, etc.).
llm (langchain_core.language_models.BaseChatModel | None) – The shared LLM client instance.

Returns:

Dict containing parsed query information.

Return type:

causal_agent.components.input_parser.extract_dataset_path_regex(query)[source]

Extract dataset path from the query using regex patterns.

Parameters:: query (str) – The user’s causal question text
Returns:: String with dataset path or None if not found
Return type:: str | None

causal_agent.components.method_validator module

Method validator component for causal inference methods.

This module validates the selected causal inference method against dataset characteristics and available variables.

causal_agent.components.method_validator.validate_method(method_info, dataset_analysis, variables)[source]

Validate the selected causal method against dataset characteristics.

Parameters:

method_info (Dict[str, Any]) – Information about the selected method from decision_tree
dataset_analysis (Dict[str, Any]) – Dataset analysis results from dataset_analyzer
variables (Dict[str, Any]) – Identified variables from query_interpreter

Returns:

valid: Boolean indicating if method is valid
concerns: List of concerns/issues with the selected method
alternative_suggestions: Alternative methods if the selected method is problematic
recommended_method: Updated method recommendation if issues are found

Return type:

Dict with validation results

causal_agent.components.method_validator.validate_propensity_score_matching(validation_result, dataset_analysis, variables)[source]

Validate propensity score matching method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.validate_regression_adjustment(validation_result, dataset_analysis, variables)[source]

Validate regression adjustment method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.validate_instrumental_variable(validation_result, dataset_analysis, variables)[source]

Validate instrumental variable method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.validate_difference_in_differences(validation_result, dataset_analysis, variables)[source]

Validate difference-in-differences method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.validate_regression_discontinuity(validation_result, dataset_analysis, variables)[source]

Validate regression discontinuity method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.validate_backdoor_adjustment(validation_result, dataset_analysis, variables)[source]

Validate backdoor adjustment method requirements.

Parameters:

validation_result (Dict[str, Any]) – Current validation result to update
dataset_analysis (Dict[str, Any]) – Dataset analysis results
variables (Dict[str, Any]) – Identified variables

causal_agent.components.method_validator.recommend_alternative(method, concerns, alternatives)[source]

Recommend an alternative method if the current one has issues.

Parameters:

method (str) – Current method
concerns (List[str]) – List of concerns with the current method
alternatives (List[str]) – List of alternative methods suggested by the decision tree

Returns:

String with the recommended method

Return type:

causal_agent.components.output_formatter module

Output formatter component for causal inference results.

This module formats the results of causal analysis into a clear, structured output for presentation to the user.

causal_agent.components.output_formatter.format_output(query, method, results, explanation, dataset_analysis=None, dataset_description=None)[source]

Format final results including numerical estimates and explanations.

Parameters:

query (str) – Original user query
method (str) – Causal inference method used (string name)
results (Dict[str, Any]) – Numerical results from method_executor_tool
explanation (Dict[str, Any]) – Structured explanation object from explainer_tool
dataset_analysis (Dict[str, Any] | None) – Optional dictionary of dataset analysis results
dataset_description (str | None) – Optional string description of the dataset

Returns:

Dict with formatted output fields ready for presentation.

Return type:

FormattedOutput

causal_agent.components.query_interpreter module

Query interpreter component for causal inference.

This module provides functionality to match query concepts to actual dataset variables, identifying treatment, outcome, and covariate variables for causal inference analysis.

causal_agent.components.query_interpreter.infer_treatment_variable_type(treatment_variable, column_categories, dataset_analysis)[source]

Determine treatment variable type from column category and unique value count :param treatment_variable: name of the treatment variable :param column_categories: mapping of column names to their categories :param dataset_analysis: exploratory analysis results

Returns:: type of the treatment variable (e.g., “binary”, “continuous”, etc
Return type:: str

causal_agent.components.query_interpreter.determine_treatment_reference_level(is_rct, llm, treatment_variable, query_text, dataset_description, file_path, columns)[source]

Determines the treatment reference level

causal_agent.components.query_interpreter.identify_interaction_term(llm, treatment_variable, covariates, column_categories, query_text, dataset_description)[source]

Identifies the interaction term based on the query and the dataset information

causal_agent.components.query_interpreter.interpret_query(query_info, dataset_analysis, dataset_description=None)[source]

Interpret query using hybrid heuristic/LLM approach to identify variables.

Parameters:

query_info (Dict[str, Any]) – Information extracted from the user’s query (text, hints).
dataset_analysis (Dict[str, Any]) – Information about the dataset structure (columns, types, etc.).
dataset_description (str | None) – Optional textual description of the dataset.
llm – Optional language model instance.

Returns:

Dict containing identified variables (treatment, outcome, covariates, etc., and is_rct).

Return type:

causal_agent.components.query_interpreter.compute_smd(df, treat, covars_list)[source]

Computed the standardized mean differences (SMD) for the treatment variable :param df: The dataset. :type df: pd.DataFrame :param treat: Name of the binary treatment column (0/1). :type treat: str :param covars_list: List of covariate names to consider for SMD calculation :type covars_list: List[str]

Returns:: the standardized mean difference (SMD)
Return type:: Dict{str ->float}

causal_agent.components.state_manager module

State management utilities for the causal_agent workflow.

This module provides utility functions to create standardized state updates for passing between tools in the causal_agent workflow.

causal_agent.components.state_manager.create_workflow_state_update(current_step, step_completed_flag, next_tool, next_step_reason, error=None)[source]

Create a standardized workflow state update dictionary.

Parameters:

current_step (str) – Current step in the workflow (e.g., “input_processing”)
step_completed_flag (bool) – Flag indicating which step was completed (e.g., “query_parsed”)
next_tool (str) – Name of the next tool to call
next_step_reason (str) – Reason message for the next step
error (str | None) – Optional error message if the step failed

Returns:

Dictionary containing the workflow_state sub-dictionary

Return type: