NaiveIVRegressor#
- class pgmpy.prediction.NaiveIVRegressor.NaiveIVRegressor(causal_graph, stage1_estimator: Any | None = None, stage2_estimator: Any | None = None)[source]#
Bases:
_BaseCausalPredictionImplements Naive Instrumental Variable (IV) regressor (single exposure, multiple instruments).
This estimator implements a simple two-stage least squares style procedure for the case of a single exposure and a single outcome with one or more instrumental variables. The first stage fits exposure ~ instrument using stage1_estimator. The second stage fits outcome ~ predicted_exposure (+ pretreatment covariates) using stage2_estimator.
- Parameters:
- causal_graphDAG, PDAG, ADMG, MAG, or PAG
Causal graph with defined variable roles
- stage1_estimatoroptional, sklearn regressor (default = LinearRegression())
Estimator for stage 1 regression of exposure on instrument(s)
- stage2_estimatoroptional, sklearn regressor (default = LinearRegression())
Estimator for stage 2 regression of outcome on predicted exposure and pretreatment covariates (if any).
- Attributes:
- exposure_var_str
Name of the exposure variable (single).
- outcome_var_str
Name of the outcome variable (single).
- instrument_vars_list of str
Names of instrument variables extracted from the causal graph
- pretreatment_vars_list of str
Names of pretreatment covariates extracted from the causal graph.
- feature_columns_fit_list of str
Names of features used during ‘fit’
- feature_columns_predict_list of str
Names of features used during predict.
- stage1_est_estimator
Fitted first-stage estimator.
- stage2_est_estimator
Fitted second-stage estimator.
- coef_array-like
Coefficients from the fitted stage2_estimator (if available).
References
[1]“Instrumental Variables Estimation.” Wikipedia: https://en.wikipedia.org/wiki/Instrumental_variables_estimation
Examples
>>> # Example 1: Basic usage with LinearRegression estimators >>> import pandas as pd >>> from pgmpy.base import DAG >>> from sklearn.linear_model import LinearRegression >>> from pgmpy.prediction import NaiveIVRegressor >>> >>> # Simulate data from a linear Gaussian Bayesian network >>> lgbn = DAG.from_dagitty( ... "dag { Z1 -> X [beta=0.2] Z2 -> X [beta=0.2] X -> Y [beta=0.3] }" ... ) >>> data = lgbn.simulate(1000, seed=42) # returns a pandas DataFrame >>> df = data.loc[:, ["X", "Z1", "Z2"]] >>> df = (df - df.mean(axis=0)) / df.std(axis=0) >>> y = data["Y"] >>> G = DAG( ... lgbn.edges(), ... roles={"exposures": "X", "instrument": ("Z1", "Z2"), "outcomes": "Y"}, ... ) >>> >>> model = NaiveIVRegressor( ... causal_graph=G, ... stage1_estimator=LinearRegression(), ... stage2_estimator=LinearRegression(), ... ) >>> # Fit the model and make predictions >>> _ = model.fit(df, y) >>> preds = model.predict(df) >>> preds.shape[0] 1000
>>> # Example 2: Usage with multiple instruments and pretreatment >>> import pandas as pd >>> from pgmpy.base import DAG >>> from sklearn.linear_model import LinearRegression >>> from pgmpy.prediction import NaiveIVRegressor >>> >>> # Simulate data from a linear Gaussian Bayesian Network >>> lgbn = DAG.from_dagitty( ... "dag { U1 -> X [beta=0.3] U2 -> X [beta=0.2] U3 -> X [beta=0.1] " ... "U4 -> X [beta=0.2] X -> Y [beta=0.6] P -> Y [beta=0.2] }" ... ) >>> data = lgbn.simulate(300, seed=42) >>> df = data.loc[:, ["X", "U1", "U2", "U3", "P"]] >>> >>> dag = DAG( ... ebunch=[ ... ("U1", "X"), ... ("U2", "X"), ... ("U3", "X"), ... ("U4", "X"), ... ("X", "Y"), ... ("P", "Y"), ... ], ... roles={ ... "exposures": "X", ... "instrument": ("U1", "U2", "U3"), ... "outcomes": "Y", ... "pretreatment": ["P"], ... }, ... ) >>> model = NaiveIVRegressor( ... causal_graph=dag, ... ) >>> >>> # Fit the model and make predictions >>> _ = model.fit(df, data["Y"]) >>> preds = model.predict(df) >>> preds.shape[0] 300
>>> # Example 3: Usage with custom estimators and numpy array inputs >>> import pandas as pd >>> import numpy as np >>> from pgmpy.base import DAG >>> from sklearn.linear_model import LinearRegression >>> from sklearn.ensemble import RandomForestRegressor >>> from pgmpy.prediction import NaiveIVRegressor >>> >>> dag = DAG( ... ebunch=[(1, 0), (0, 2)], ... roles={"exposures": [0], "outcomes": [2], "instrument": [1]}, ... ) >>> model = NaiveIVRegressor( ... causal_graph=dag, ... stage1_estimator=RandomForestRegressor(), ... stage2_estimator=LinearRegression(), ... ) >>> >>> # Simulate some random data >>> n_samples = 50 >>> X_array = np.random.normal(0, 1, (n_samples, 2)) >>> y_array = np.random.normal(0, 1, n_samples) >>> >>> # Fit the model and make predictions >>> _ = model.fit(X_array, y_array) >>> preds = model.predict(X_array) >>> preds.shape[0] 50
- fit(X, y, sample_weight: Any | None = None)[source]#
This method performs two-stage least squares regression using the specified causal graph. It first fits the stage 1 estimator to predict the exposure variable from the instrument, then fits the stage 2 estimator to predict the outcome variable from the predicted exposure and pretreatment variables.
- Parameters:
- Xpandas.DataFrame or numpy ndarray
Feature data containing exposure, instrument, and pretreatment variables.
- ypandas.Series, pandas.DataFrame, or numpy.ndarray
Outcome variable.
- sample_weightarray-like, optional
Sample weights for fitting the estimators.
- Returns:
- selfobject
Fitted estimator.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NaiveIVRegressor#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NaiveIVRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.