Linear Gaussian Bayesian Network¶
- class pgmpy.models.LinearGaussianBayesianNetwork.LinearGaussianBayesianNetwork(ebunch: List[Tuple[Hashable, Hashable]] | None = None, latents: Set[Hashable] = {}, lavaan_str: str | None = None, dagitty_str: str | None = None)[source]¶
A Linear Gaussian Bayesian Network is a Bayesian Network whose variables are all continuous, and whose CPDs are linear Gaussians.
An important result is that Linear Gaussian Bayesian Networks are an alternative representation for the class of multivariate Gaussian distributions.
- add_cpds(*cpds: LinearGaussianCPD) None[source]¶
Add Linear Gaussian CPDs (Conditional Probability Distributions) to the Bayesian Network.
- Parameters:
cpds (instances of LinearGaussianCPD) – LinearGaussianCPDs which will be associated with the model.
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3) >>> for cpd in model.cpds: ... print(cpd) ... P(x1) = N(1; 4) P(x2 | x1) = N(-5 + 0.5*x1; 4) P(x3 | x2) = N(4 + -1*x2; 3)
- check_model() bool[source]¶
Checks the model for structural/parameter consistency.
Currently checks: * Each CPD’s listed parents match the graph’s parents.
- Returns:
True if all checks pass; raises ValueError otherwise.
- Return type:
bool
- copy()[source]¶
Returns a copy of the model.
- Returns:
Model’s copy – Copy of the model on which the method was called.
- Return type:
pgmpy.models.LinearGaussianBayesianNetwork
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("A", "B"), ("B", "C")]) >>> cpd_a = LinearGaussianCPD(variable="A", beta=[1], std=4) >>> cpd_b = LinearGaussianCPD( ... variable="B", beta=[-5, 0.5], std=4, evidence=["A"] ... ) >>> cpd_c = LinearGaussianCPD( ... variable="C", beta=[4, -1], std=3, evidence=["x2"] ... ) >>> model.add_cpds(cpd_a, cpd_b, cpd_c) >>> copy_model = model.copy() >>> copy_model.nodes() NodeView(('A', 'B', 'C')) >>> copy_model.edges() OutEdgeView([('A', 'B'), ('B', 'C')]) >>> len(copy_model.get_cpds()) 3
- fit(data: DataFrame, estimator: str = 'mle', std_estimator: str = 'unbiased') LinearGaussianBayesianNetwork[source]¶
Estimates (fits) the Linear Gaussian CPDs from data.
- Parameters:
data (pd.DataFrame) – Continuous-valued data containing all model variables. A pandas DataFrame with the data to which to fit the model structure. All variables must be continuously valued. Currently only ‘mle’ (OLS) supported. The estimator to use for estimating the parameters. Currently, MLE via OLS is the only supported method. ‘mle’ uses ddof=0; ‘unbiased’ uses ddof = 1 + number_of_parents. Whether to use maximum likelihood estimate (MLE) or unbiased estimate for standard deviation. If ‘mle’, then ddof=0 is used while calculating standard deviation. If unbiased, ddof = 1 + number of parents.
- Returns:
self
None (The estimated LinearGaussianCPDs are added to the model. They can) – be accessed using model.cpds.
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import LinearGaussianBayesianNetwork >>> df = pd.DataFrame( ... np.random.normal(0, 1, (100, 3)), columns=["x1", "x2", "x3"] ... ) >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> model.fit(df) >>> model.cpds [<LinearGaussianCPD: P(x1) = N(-0.114; 0.911) at 0x7eb77d30cec0>, [<LinearGaussianCPD: P(x1) = N(-0.114; 0.911) at 0x7eb77d30cec0, <LinearGaussianCPD: P(x2 | x1) = N(0.07*x1 + -0.075; 1.172) at 0x7eb77171fb60,
- get_cpds(node: Hashable | None = None) LinearGaussianCPD | List[LinearGaussianCPD][source]¶
Returns the CPD of the specified node. If node is not specified, returns all CPDs that have been added so far to the graph.
- Parameters:
node (any hashable python object (optional)) – The node whose CPD we want. If node not specified returns all the CPDs added to the model.
- Returns:
A CPD or list of Linear Gaussian CPDs.
- Return type:
list[LinearGaussianCPD] or LinearGaussianCPD
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3) >>> model.get_cpds() [P(x1) = N(1; 4), P(x2 | x1) = N(-5 + 0.5*x1; 4), P(x3 | x2) = N(4 + -1*x2; 3)]
- static get_random(n_nodes: int = 5, edge_prob: float = 0.5, node_names: List | None = None, latents: bool = False, loc: float = 0, scale: float = 1, seed: int | None = None) LinearGaussianBayesianNetwork[source]¶
Returns a randomly generated Linear Gaussian Bayesian Network on n_nodes Returns a randomly generated Linear Gaussian Bayesian Network on n_nodes variables with edge probabiliy of edge_prob between variables. :param n_nodes: Number of nodes.
The number of nodes in the randomly generated DAG.
Probability of an edge (consistent with a topological order). The probability of edge between any two nodes in the topologically sorted DAG.
- Parameters:
node_names (list (default: None)) – A list of variables names to use in the random graph. If None, the node names are integer values starting from 0.
latents (bool (default: False))
loc (float) –
Mean of normal for coefficients. The mean of the normal distribution from which the coefficients are sampled.
Std dev of normal for coefficients. The standard deviation of the normal distribution from which the coefficients are sampled.
seed (int) – The seed for the random number generator.
- Returns:
The randomly generated model.
- Return type:
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> model = LinearGaussianBayesianNetwork.get_random(n_nodes=5) >>> model.nodes() NodeView((0, 3, 1, 2, 4)) >>> model.edges() OutEdgeView([(0, 3), (3, 4), (1, 3), (2, 4)]) >>> model.cpds [<LinearGaussianCPD: P(0) = N(1.764; 1.613) at 0x2732f41aae0, <LinearGaussianCPD: P(3 | 0, 1) = N(-0.721*0 + -0.079*1 + 0.943; 0.12) at 0x2732f16db20, <LinearGaussianCPD: P(1) = N(-0.534; 0.208) at 0x2732f320b30, <LinearGaussianCPD: P(2) = N(-0.023; 0.166) at 0x2732d8d5f40, <LinearGaussianCPD: P(4 | 2, 3) = N(-0.24*2 + -0.907*3 + 0.625; 0.48) at 0x2737fecdaf0]
- get_random_cpds(loc: float = 0, scale: float = 1, inplace: bool = False, seed: int | None = None) None | List[LinearGaussianCPD][source]¶
Generates random Linear Gaussian CPDs for the model. The coefficients are sampled from a normal distribution with mean loc and standard deviation scale.
- Parameters:
loc (float) – Mean of the normal from which coefficients are sampled.
scale (float) – Std dev of the normal from which coefficients are sampled.
inplace (bool (default: False)) – If True, adds the generated LinearGaussianCPDs to the model; otherwise returns them.
seed (int (optional)) – Seed for the random number generator.
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> model.get_random_cpds(loc=0, scale=1, seed=42)
- is_imap(JPD: Any) None[source]¶
For now, is_imap method has not been implemented for LinearGaussianBayesianNetwork.
- log_likelihood(data: DataFrame) float[source]¶
Computes the log-likelihood of the given dataset under the current Linear Gaussian Bayesian Network.
- Parameters:
data (pandas.DataFrame) – Observations for all variables (columns must match model variables).
- Returns:
Total log-likelihood of the data under the model.
- Return type:
float
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3) >>> df = pd.DataFrame( ... np.random.normal(0, 1, size=(100, 3)), columns=["x1", "x2", "x3"] ... ) >>> model.log_likelihood(df) -1128.66
- predict(data: DataFrame, distribution: str = 'joint') Tuple[List[str], ndarray, ndarray][source]¶
Predicts the conditional distribution of missing variables
Predicts the distribution of the missing variable (i.e. missing columns) in the given dataset and returns its mean and covariance.
- Parameters:
data (pandas.DataFrame) – DataFrame with a subset of model variables observed. The dataframe with missing variable which to predict.
- Returns:
variables (list) – Missing variables (order matches returned distribution). The list of variables on which the returned conditional distribution is defined on.
mu (np.array) –
- The mean array of the conditional joint distribution over
the missing variables corresponding to each row of data.
cov (np.array) – The covariance of the conditional joint distribution over the missing variables.
Examples
>>> # Drop a column you want to predict (avoid inplace=True to keep return value) >>> from pgmpy.utils import get_example_model >>> model = get_example_model("ecoli70") >>> df = model.simulate(n_samples=5) >>> # Drop a column that we want to predict. >>> df = df.drop(columns=["folK"], axis=1, inplace=True) >>> model.predict(df) array([[0.13440001]]))
- remove_cpds(*cpds: LinearGaussianCPD) None[source]¶
Removes the CPDs provided in the arguments.
- Parameters:
*cpds (LinearGaussianCPD) – LinearGaussianCPD objects (or their variable names) to remove.
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3) >>> for cpd in model.get_cpds(): ... print(cpd) ... P(x1) = N(1; 4) P(x2 | x1) = N(-5 + 0.5*x1; 4) P(x3 | x2) = N(4 + -1*x2; 3)
>>> model.remove_cpds(cpd2, cpd3) >>> for cpd in model.get_cpds(): ... print(cpd) ... P(x1) = N(1; 4)
- simulate(n_samples: int = 1000, do: Dict[str, float] | None = None, evidence: Dict[str, float] | None = None, virtual_intervention: List[LinearGaussianCPD] | None = None, include_latents: bool = False, seed: int | None = None) DataFrame[source]¶
Simulates data from the model.
- Parameters:
n_samples (int) – Number of samples to draw. The number of samples to draw from the model.
do (dict (default: None)) – The interventions to apply to the model. dict should be of the form {variable_name: value}
evidence (dict (default: None)) – Observed evidence to apply to the model. dict should be of the form {variable_name: value}
virtual_intervention (list) – Also known as soft intervention. virtual_intervention should be a list of pgmpy.factors.discrete.LinearGaussianCPD objects specifying the virtual/soft intervention probabilities.
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – Seed for the random number generator.
- Returns:
pandas.DataFrame
pandas.DataFrame (generated samples) – A pandas data frame with the generated samples.
Examples
>>> model.simulate(n_samples=3, seed=42) >>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3)
Simple forward sampling >>> model.simulate(n_samples=3, seed=42, do={“x2”: 0.0})
Sampling with intervention (do) >>> model.simulate(n_samples=3, seed=42, evidence={“x1”: 2.0})
Sampling with evidence >>> model.simulate(n_samples=3, seed=42, do={“x2”: 1.0}, evidence={“x1”: 0.0})
Sampling with both intervention and evidence
- to_joint_gaussian() Tuple[ndarray, ndarray][source]¶
Represents the Linear Gaussian Bayesian Network as a joint Linear Gaussian Bayesian Networks can be represented using a joint Gaussian distribution over all the variables. This method gives the mean and covariance of this equivalent joint gaussian distribution. :returns: mean, cov – Mean vector and covariance matrix of the joint Gaussian.
The mean and the covariance matrix of the joint gaussian distribution.
- Return type:
np.ndarray, np.ndarray
Examples
>>> mean, cov = model.to_joint_gaussian() >>> from pgmpy.models import LinearGaussianBayesianNetwork >>> from pgmpy.factors.continuous import LinearGaussianCPD >>> model = LinearGaussianBayesianNetwork([("x1", "x2"), ("x2", "x3")]) >>> cpd1 = LinearGaussianCPD("x1", [1], 4) >>> cpd2 = LinearGaussianCPD("x2", [-5, 0.5], 4, ["x1"]) >>> cpd3 = LinearGaussianCPD("x3", [4, -1], 3, ["x2"]) >>> model.add_cpds(cpd1, cpd2, cpd3) >>> mean array([ 1. , -4.5, 8.5]) >>> cov array([[ 16., 8., -8.], [ 8., 20., -20.], [ -8., -20., 29.]])