NaiveAdjustmentRegressor#
- class pgmpy.prediction.NaiveAdjustmentRegressor.NaiveAdjustmentRegressor(causal_graph, estimator: BaseEstimator | None = None)[source]#
Bases:
_BaseCausalPredictionNaive adjustment regressor using causal graph roles for feature selection.
This estimator concatenates exposure, adjustment, and pretreatment variables as features to predict the outcome variable using standard ML algorithms. It’s “naive” because it uses a simple prediction model with the adjustment set and doesn’t employ sophisticated causal inference methods like double ML, inverse propensity weighting, or other advanced causal estimation techniques.
- Parameters:
- causal_graphDAG, PDAG, ADMG, MAG, or PAG
Causal graph with defined variable roles. Must have exactly one exposure and one outcome variable. The adjustment role is optional (can be missing, empty or contain variables).
- estimatorsklearn estimator, optional (default=LinearRegression())
Base estimator for prediction.
- Attributes:
- `estimator_`sklearn estimator
The fitted base estimator.
- `feature_names_in_`ndarray of shape (n_features,)
Names of features seen during fit.
- `n_features_in_`int
Number of features seen during fit.
- `exposure_var_`str
Name of exposure variable extracted from causal graph.
- `adjustment_vars_`list
List of adjustment variable names extracted from causal graph.
- `pretreatment_vars_`list
List of pretreatment variable names extracted from causal graph.
- `outcome_var_`str
Name of outcome variable extracted from causal graph.
- `feature_columns_fit_`list
List of feature column names used (exposure + adjustment + pretreatment).
- `explanation_`str
Formatted description of the fitted model.
Examples
Basic usage with a simple causal DAG:
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.base import DAG >>> from pgmpy.prediction import NaiveAdjustmentRegressor >>> from sklearn.linear_model import LinearRegression >>> >>> # Create a simple causal DAG: Z -> X, Z -> Y, X -> Y >>> # where Z is a confounder, X is exposure, Y is outcome >>> dag = DAG( ... ebunch=[("Z", "X"), ("Z", "Y"), ("X", "Y")], ... roles={"exposures": "X", "outcomes": "Y", "adjustment": ["Z"]}, ... ) >>> >>> # Generate some synthetic data >>> np.random.seed(42) >>> n = 100 >>> Z = np.random.normal(0, 1, n) >>> X = 0.5 * Z + np.random.normal(0, 0.5, n) >>> Y = 2.0 * X + 1.5 * Z + np.random.normal(0, 0.3, n) >>> >>> data = pd.DataFrame({"X": X, "Y": Y, "Z": Z}) >>> >>> # Fit the regressor >>> regressor = NaiveAdjustmentRegressor(causal_graph=dag) >>> _ = regressor.fit(data[["X", "Z"]], data["Y"]) >>> >>> # Make predictions >>> predictions = regressor.predict(data[["X", "Z"]]) >>> print(f"Predictions shape: {predictions.shape}") Predictions shape: (100,)
Using a custom estimator:
>>> from sklearn.ensemble import RandomForestRegressor >>> >>> # Use Random Forest as the base estimator >>> rf_regressor = NaiveAdjustmentRegressor( ... causal_graph=dag, ... estimator=RandomForestRegressor(n_estimators=10, random_state=42), ... ) >>> _ = rf_regressor.fit(data[["X", "Z"]], data["Y"])
Example with pretreatment variables:
>>> # Create DAG with pretreatment variable P -> Y >>> dag_with_pretreatment = DAG( ... ebunch=[("P", "Y"), ("Z", "X"), ("Z", "Y"), ("X", "Y")], ... roles={ ... "exposures": "X", ... "outcomes": "Y", ... "adjustment": ["Z"], ... "pretreatment": ["P"], ... }, ... ) >>> >>> # Generate data with proper relationships using simulate >>> lgbn_with_P = DAG.from_dagitty( ... "dag { P -> Y [beta=0.8] Z -> X [beta=0.5] X -> Y [beta=2.0] Z -> Y [beta=1.5] }" ... ) >>> data_with_P = lgbn_with_P.simulate(100, seed=42) >>> >>> regressor_with_P = NaiveAdjustmentRegressor(causal_graph=dag_with_pretreatment) >>> _ = regressor_with_P.fit(data_with_P[["X", "Z", "P"]], data_with_P["Y"])
- fit(X, y, sample_weight: ndarray | None = None)[source]#
Fit the Naive Adjustment Regressor.
- Parameters:
- Xarray-like or DataFrame of shape (n_samples, n_features)
Training data. Column names must exactly match variable names in the causal graph. - If DataFrame: Column names must match DAG variable names exactly - If numpy array: Will be converted to DataFrame with columns [0, 1, 2, …],
so DAG should use integer variable names
- yarray-like of shape (n_samples,)
Target values (outcome variable).
- sample_weightarray-like of shape (n_samples,), optional
Sample weights for training.
- Returns:
- selfobject
Returns self for method chaining.
- predict(X)[source]#
Make predictions using the fitted regressor.
- Parameters:
- Xarray-like or DataFrame of shape (n_samples, n_features)
Input data. Column names must exactly match variable names in the causal graph. - If DataFrame: Column names must match DAG variable names exactly - If numpy array: Will be converted to DataFrame with columns [0, 1, 2, …],
so DAG should use integer variable names
- Returns:
- predictionsndarray of shape (n_samples,)
Predicted values.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NaiveAdjustmentRegressor#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NaiveAdjustmentRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.