NaiveAdjustmentRegressor#

class pgmpy.prediction.NaiveAdjustmentRegressor.NaiveAdjustmentRegressor(causal_graph, estimator: BaseEstimator | None = None)[source]#

Bases: _BaseCausalPrediction

Naive adjustment regressor using causal graph roles for feature selection.

This estimator concatenates exposure, adjustment, and pretreatment variables as features to predict the outcome variable using standard ML algorithms. It’s “naive” because it uses a simple prediction model with the adjustment set and doesn’t employ sophisticated causal inference methods like double ML, inverse propensity weighting, or other advanced causal estimation techniques.

Parameters:

causal_graphDAG, PDAG, ADMG, MAG, or PAG: Causal graph with defined variable roles. Must have exactly one exposure and one outcome variable. The adjustment role is optional (can be missing, empty or contain variables).
estimatorsklearn estimator, optional (default=LinearRegression()): Base estimator for prediction.

Attributes:

`estimator_`sklearn estimator: The fitted base estimator.
`feature_names_in_`ndarray of shape (n_features,): Names of features seen during fit.
`n_features_in_`int: Number of features seen during fit.
`exposure_var_`str: Name of exposure variable extracted from causal graph.
`adjustment_vars_`list: List of adjustment variable names extracted from causal graph.
`pretreatment_vars_`list: List of pretreatment variable names extracted from causal graph.
`outcome_var_`str: Name of outcome variable extracted from causal graph.
`feature_columns_fit_`list: List of feature column names used (exposure + adjustment + pretreatment).
`explanation_`str: Formatted description of the fitted model.

Examples

Basic usage with a simple causal DAG:

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.base import DAG
>>> from pgmpy.prediction import NaiveAdjustmentRegressor
>>> from sklearn.linear_model import LinearRegression
>>>
>>> # Create a simple causal DAG: Z -> X, Z -> Y, X -> Y
>>> # where Z is a confounder, X is exposure, Y is outcome
>>> dag = DAG(
...     ebunch=[("Z", "X"), ("Z", "Y"), ("X", "Y")],
...     roles={"exposures": "X", "outcomes": "Y", "adjustment": ["Z"]},
... )
>>>
>>> # Generate some synthetic data
>>> np.random.seed(42)
>>> n = 100
>>> Z = np.random.normal(0, 1, n)
>>> X = 0.5 * Z + np.random.normal(0, 0.5, n)
>>> Y = 2.0 * X + 1.5 * Z + np.random.normal(0, 0.3, n)
>>>
>>> data = pd.DataFrame({"X": X, "Y": Y, "Z": Z})
>>>
>>> # Fit the regressor
>>> regressor = NaiveAdjustmentRegressor(causal_graph=dag)
>>> _ = regressor.fit(data[["X", "Z"]], data["Y"])
>>>
>>> # Make predictions
>>> predictions = regressor.predict(data[["X", "Z"]])
>>> print(f"Predictions shape: {predictions.shape}")
Predictions shape: (100,)

Using a custom estimator:

>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> # Use Random Forest as the base estimator
>>> rf_regressor = NaiveAdjustmentRegressor(
...     causal_graph=dag,
...     estimator=RandomForestRegressor(n_estimators=10, random_state=42),
... )
>>> _ = rf_regressor.fit(data[["X", "Z"]], data["Y"])

Example with pretreatment variables:

>>> # Create DAG with pretreatment variable P -> Y
>>> dag_with_pretreatment = DAG(
...     ebunch=[("P", "Y"), ("Z", "X"), ("Z", "Y"), ("X", "Y")],
...     roles={
...         "exposures": "X",
...         "outcomes": "Y",
...         "adjustment": ["Z"],
...         "pretreatment": ["P"],
...     },
... )
>>>
>>> # Generate data with proper relationships using simulate
>>> lgbn_with_P = DAG.from_dagitty(
...     "dag { P -> Y [beta=0.8] Z -> X [beta=0.5] X -> Y [beta=2.0] Z -> Y [beta=1.5] }"
... )
>>> data_with_P = lgbn_with_P.simulate(100, seed=42)
>>>
>>> regressor_with_P = NaiveAdjustmentRegressor(causal_graph=dag_with_pretreatment)
>>> _ = regressor_with_P.fit(data_with_P[["X", "Z", "P"]], data_with_P["Y"])

fit(X, y, sample_weight: ndarray | None = None)[source]#

Fit the Naive Adjustment Regressor.

Parameters:

Xarray-like or DataFrame of shape (n_samples, n_features): Training data. Column names must exactly match variable names in the causal graph. - If DataFrame: Column names must match DAG variable names exactly - If numpy array: Will be converted to DataFrame with columns [0, 1, 2, …],

so DAG should use integer variable names
yarray-like of shape (n_samples,): Target values (outcome variable).
sample_weightarray-like of shape (n_samples,), optional: Sample weights for training.

Returns:

selfobject: Returns self for method chaining.

get_feature_names_out(input_features=None)[source]#: Get output feature names for transformation.

predict(X)[source]#

Make predictions using the fitted regressor.

Parameters:

Xarray-like or DataFrame of shape (n_samples, n_features): Input data. Column names must exactly match variable names in the causal graph. - If DataFrame: Column names must match DAG variable names exactly - If numpy array: Will be converted to DataFrame with columns [0, 1, 2, …],

so DAG should use integer variable names

Returns:

predictionsndarray of shape (n_samples,): Predicted values.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → NaiveAdjustmentRegressor#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit.

Returns:

selfobject: The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → NaiveAdjustmentRegressor#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.