Bayesian Network¶
- class pgmpy.models.BayesianNetwork.BayesianNetwork(ebunch=None, latents={})[source]¶
Initializes a Bayesian Network. A models stores nodes and edges with conditional probability distribution (cpd) and other attributes.
models hold directed edges. Self loops are not allowed neither multiple (parallel) edges.
Nodes can be any hashable python object.
Edges are represented as links between nodes.
- Parameters:
ebunch (input graph) – Data to initialize graph. If ebunch=None (default) an empty graph is created. The ebunch can be an edge list, or any NetworkX graph object.
latents (list, array-like) – List of variables which are latent (i.e. unobserved) in the model.
Examples
Create an empty Bayesian Network with no nodes and no edges.
>>> from pgmpy.models import BayesianNetwork >>> G = BayesianNetwork()
G can be grown in several ways.
Nodes:
Add one node at a time:
>>> G.add_node('a')
Add the nodes from any container (a list, set or tuple or the nodes from another graph).
>>> G.add_nodes_from(['a', 'b'])
Edges:
G can also be grown by adding edges.
Add one edge,
>>> G.add_edge('a', 'b')
a list of edges,
>>> G.add_edges_from([('a', 'b'), ('b', 'c')])
If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.
Shortcuts:
Many common graph features allow python syntax for speed reporting.
>>> 'a' in G # check if node in graph True >>> len(G) # number of nodes in graph 3
- add_cpds(*cpds)[source]¶
Add CPD (Conditional Probability Distribution) to the Bayesian Model.
- Parameters:
cpds (list, set, tuple (array-like)) – List of CPDs which will be associated with the model
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete.CPD import TabularCPD >>> student = BayesianNetwork([('diff', 'grades'), ('aptitude', 'grades')]) >>> grades_cpd = TabularCPD('grades', 3, [[0.1,0.1,0.1,0.1,0.1,0.1], ... [0.1,0.1,0.1,0.1,0.1,0.1], ... [0.8,0.8,0.8,0.8,0.8,0.8]], ... evidence=['diff', 'aptitude'], evidence_card=[2, 3], ... state_names={'grades': ['gradeA', 'gradeB', 'gradeC'], ... 'diff': ['easy', 'hard'], ... 'aptitude': ['low', 'medium', 'high']}) >>> student.add_cpds(grades_cpd)
diff:
easy
hard
aptitude:
low
medium
high
low
medium
high
gradeA
0.1
0.1
0.1
0.1
0.1
0.1
gradeB
0.1
0.1
0.1
0.1
0.1
0.1
gradeC
0.8
0.8
0.8
0.8
0.8
0.8
- add_edge(u, v, **kwargs)[source]¶
Add an edge between u and v.
The nodes u and v will be automatically added if they are not already in the graph
- Parameters:
u (nodes) – Nodes can be any hashable python object.
v (nodes) – Nodes can be any hashable python object.
Examples
>>> from pgmpy.models import BayesianNetwork >>> G = BayesianNetwork() >>> G.add_nodes_from(['grade', 'intel']) >>> G.add_edge('grade', 'intel')
- check_model()[source]¶
Check the model for various errors. This method checks for the following errors.
Checks if the sum of the probabilities for each state is equal to 1 (tol=0.01).
Checks if the CPDs associated with nodes are consistent with their parents.
- Returns:
check – True if all the checks pass otherwise should throw an error.
- Return type:
boolean
- copy()[source]¶
Returns a copy of the model.
- Returns:
Model’s copy – Copy of the model on which the method was called.
- Return type:
pgmpy.models.BayesianNetwork
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> model = BayesianNetwork([('A', 'B'), ('B', 'C')]) >>> cpd_a = TabularCPD('A', 2, [[0.2], [0.8]]) >>> cpd_b = TabularCPD('B', 2, [[0.3, 0.7], [0.7, 0.3]], ... evidence=['A'], ... evidence_card=[2]) >>> cpd_c = TabularCPD('C', 2, [[0.1, 0.9], [0.9, 0.1]], ... evidence=['B'], ... evidence_card=[2]) >>> model.add_cpds(cpd_a, cpd_b, cpd_c) >>> copy_model = model.copy() >>> copy_model.nodes() NodeView(('A', 'B', 'C')) >>> copy_model.edges() OutEdgeView([('A', 'B'), ('B', 'C')]) >>> len(copy_model.get_cpds()) 3
- do(nodes, inplace=False)[source]¶
Applies the do operation. The do operation removes all incoming edges to variables in nodes and marginalizes their CPDs to only contain the variable itself.
- Parameters:
nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.
- Returns:
Modified network – If inplace=True, modifies the object itself else returns an instance of BayesianNetwork modified by the do operation.
- Return type:
pgmpy.models.BayesianNetwork or None
Examples
>>> from pgmpy.utils import get_example_model >>> asia = get_example_model('asia') >>> asia.edges() OutEdgeView([('asia', 'tub'), ('tub', 'either'), ('smoke', 'lung'), ('smoke', 'bronc'), ('lung', 'either'), ('bronc', 'dysp'), ('either', 'xray'), ('either', 'dysp')]) >>> do_bronc = asia.do(['bronc']) OutEdgeView([('asia', 'tub'), ('tub', 'either'), ('smoke', 'lung'), ('lung', 'either'), ('bronc', 'dysp'), ('either', 'xray'), ('either', 'dysp')])
- fit_update(data, n_prev_samples=None, n_jobs=1)[source]¶
Method to update the parameters of the BayesianNetwork with more data. Internally, uses BayesianEstimator with dirichlet prior, and uses the current CPDs (along with n_prev_samples) to compute the pseudo_counts.
- Parameters:
data (pandas.DataFrame) – The new dataset which to use for updating the model.
n_prev_samples (int) – The number of samples/datapoints on which the model was trained before. This parameter determines how much weight should the new data be given. If None, n_prev_samples = nrow(data).
n_jobs (int (default: 1)) – Number of threads/processes to use for estimation. Using n_jobs > 1 for small models or datasets might be slower.
- Returns:
Updated model – Modifies the network inplace.
- Return type:
None
Examples
>>> from pgmpy.utils import get_example_model >>> from pgmpy.sampling import BayesianModelSampling >>> model = get_example_model('alarm') >>> # Generate some new data. >>> data = BayesianModelSampling(model).forward_sample(int(1e3)) >>> model.fit_update(data)
- get_cardinality(node=None)[source]¶
Returns the cardinality of the node. Throws an error if the CPD for the queried node hasn’t been added to the network.
- Parameters:
node (Any hashable python object(optional).) – The node whose cardinality we want. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.
- Returns:
variable cardinalities – If node is specified returns the cardinality of the node else returns a dictionary with the cardinality of each variable in the network
- Return type:
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> cpd_diff = TabularCPD('diff', 2, [[0.6], [0.4]]); >>> cpd_intel = TabularCPD('intel', 2, [[0.7], [0.3]]); >>> cpd_grade = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7], ... [0.9, 0.1, 0.8, 0.3]], ... ['intel', 'diff'], [2, 2]) >>> student.add_cpds(cpd_diff,cpd_intel,cpd_grade) >>> student.get_cardinality() defaultdict(<class 'int'>, {'diff': 2, 'intel': 2, 'grade': 2})
>>> student.get_cardinality('intel') 2
- get_cpds(node=None)[source]¶
Returns the cpd of the node. If node is not specified returns all the CPDs that have been added till now to the graph
- Parameters:
node (any hashable python object (optional)) – The node whose CPD we want. If node not specified returns all the CPDs added to the model.
- Returns:
cpd – If ‘node’ is specified, returns the ‘TabularCPD’ object corresponding to the node. If ‘node’ is not specified, returns a list of all ‘TabularCPD’ objects added to the model.
- Return type:
TabularCPD object or list of TabularCPD objects
- Raises:
ValueError – If the specified node is not present in the model.
Examples
>>> from pgmpy.utils import get_example_model >>> model = get_example_model('asia') >>> cpds = model.get_cpds() >>> cpds [<TabularCPD representing P(asia:2) at 0x7dbbd9bdbb80>, <TabularCPD representing P(bronc:2 | smoke:2) at 0x7dbbd9bda3e0>, <TabularCPD representing P(dysp:2 | bronc:2, either:2) at 0x7dbbd9bd8550>, <TabularCPD representing P(either:2 | lung:2, tub:2) at 0x7dbbd9bda800>, <TabularCPD representing P(lung:2 | smoke:2) at 0x7dbbd9bd89d0>, <TabularCPD representing P(smoke:2) at 0x7dbbd9bd8f70>, <TabularCPD representing P(tub:2 | asia:2) at 0x7dbbd9bda860>, <TabularCPD representing P(xray:2 | either:2) at 0x7dbbd9bd9a80>] >>> cpd = model.get_cpds('bronc') >>> cpd <TabularCPD representing P(bronc:2 | smoke:2) at 0x7dbbd9bda3e0>
- get_markov_blanket(node)[source]¶
Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.
- Returns:
Markov Blanket – List of nodes contained in Markov Blanket of node
- Return type:
- Parameters:
node (string, int or any hashable python object.) – The node whose markov blanket would be returned.
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> G = BayesianNetwork([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'), ... ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')]) >>> G.get_markov_blanket('y') ['s', 'u', 'w', 'v', 'z', 'x']
- static get_random(n_nodes=5, edge_prob=0.5, node_names=None, n_states=None, latents=False, seed=None)[source]¶
Returns a randomly generated Bayesian Network on n_nodes variables with edge probabiliy of edge_prob between variables.
- Parameters:
n_nodes (int) – The number of nodes in the randomly generated DAG.
edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.
node_names (list (default: None)) – A list of variables names to use in the random graph. If None, the node names are integer values starting from 0.
n_states (int or dict (default: None)) – The number of states of each variable in the form {variable: no_of_states}. If a single value is provided, all nodes will have the same number of states. When None randomly generates the number of states.
latents (bool (default: False)) – If True, also creates latent variables.
seed (int (default: None)) – The seed value for random number generators.
- Returns:
Random DAG – The randomly generated DAG.
- Return type:
Examples
>>> from pgmpy.models import BayesianNetwork >>> model = BayesianNetwork.get_random(n_nodes=5) >>> model.nodes() NodeView((0, 1, 3, 4, 2)) >>> model.edges() OutEdgeView([(0, 1), (0, 3), (1, 3), (1, 4), (3, 4), (2, 3)]) >>> model.cpds [<TabularCPD representing P(0:0) at 0x7f97e16eabe0>, <TabularCPD representing P(1:1 | 0:0) at 0x7f97e16ea670>, <TabularCPD representing P(3:3 | 0:0, 1:1, 2:2) at 0x7f97e16820d0>, <TabularCPD representing P(4:4 | 1:1, 3:3) at 0x7f97e16eae80>, <TabularCPD representing P(2:2) at 0x7f97e1682c40>]
- get_random_cpds(n_states=None, inplace=False, seed=None)[source]¶
Given a model, generates and adds random TabularCPD for each node resulting in a fully parameterized network.
- Parameters:
n_states (int or dict (default: None)) – The number of states of each variable in the model. If None, randomly generates the number of states.
inplace (bool (default: False)) – If inplace=True, adds the generated TabularCPDs to model itself, else creates a copy of the model.
seed (int (default: None)) – The seed value for random number generators.
- get_state_probability(states)[source]¶
Given a fully specified Bayesian Network, returns the probability of the given set of states.
- Parameters:
state (dict) – dict of the form {variable: state}
- Returns:
float
- Return type:
The probability value
Examples
>>> from pgmpy.utils import get_example_model >>> model = get_example_model('asia') >>> model.get_state_probability({'either': 'no', 'tub': 'no', 'xray': 'yes', 'bronc': 'no'}) 0.02605122
- is_imap(JPD)[source]¶
Checks whether the Bayesian Network is Imap of given JointProbabilityDistribution
- Parameters:
JPD (An instance of JointProbabilityDistribution Class, for which you want to check the Imap)
- Returns:
is IMAP – True if Bayesian Network is Imap for given Joint Probability Distribution False otherwise
- Return type:
True or False
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.factors.discrete import JointProbabilityDistribution >>> G = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]]) >>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]]) >>> grade_cpd = TabularCPD('grade', 3, ... [[0.1,0.1,0.1,0.1,0.1,0.1], ... [0.1,0.1,0.1,0.1,0.1,0.1], ... [0.8,0.8,0.8,0.8,0.8,0.8]], ... evidence=['diff', 'intel'], ... evidence_card=[2, 3]) >>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd) >>> val = [0.01, 0.01, 0.08, 0.006, 0.006, 0.048, 0.004, 0.004, 0.032, 0.04, 0.04, 0.32, 0.024, 0.024, 0.192, 0.016, 0.016, 0.128] >>> JPD = JointProbabilityDistribution(['diff', 'intel', 'grade'], [2, 3, 3], val) >>> G.is_imap(JPD) True
- static load(filename, filetype='bif', **kwargs)[source]¶
Read the model from a file.
- Parameters:
filename (str) – The path along with the filename where to read the file.
filetype (str (default: bif)) – The format of the model file. Can be one of the following: bif, uai, xmlbif.
kwargs (kwargs) – Any additional arguments for the reader class or get_model method. Please refer the file format class for details.
Examples
>>> from pgmpy.utils import get_example_model >>> alarm = get_example_model('alarm') >>> alarm.save('alarm.bif', filetype='bif') >>> alarm_model = BayesianNetwork.load('alarm.bif', filetype='bif')
- predict(data, algo=None, stochastic=False, n_jobs=-1, seed=None, **kwargs)[source]¶
Predicts states of all the missing variables.
- Parameters:
data (pandas DataFrame object) – A DataFrame object with column names same as the variables in the model.
algo (a subclass of pgmpy.inference.Inference or pgmpy.inference.ApproxInference) – An algorithm class from pgmpy Inference algorithms. Default is Variable Elimination.
stochastic (boolean) –
If True, does prediction by sampling from the distribution of predicted variable(s). If False, returns the states with the highest probability value (i.e. MAP) for the
predicted variable(s).
n_jobs (int (default: -1)) – The number of CPU cores to use. If -1, uses all available cores.
seed (int (default: None)) – When stochastic=True, the seed value to use for random number generators.
**kwargs –
Optional keyword arguments specific to the selected algorithm.
Variable Elimination:
elimination_order: str or list (default=’greedy’) Order in which to eliminate the variables in the algorithm. If list is provided, should contain all variables in the model except the ones in variables. str options are: greedy, WeightedMinFill, MinNeighbors, MinWeight, MinFill. Please refer https://pgmpy.org/exact_infer/ve.html#module-pgmpy.inference.EliminationOrder for details.
joint: boolean (should only be used with stochastic=True i.e. when not calculating MAP) If True, returns a Joint Distribution over variables. If False, returns a dict of distributions over each of the variables.
Belief Propagation:
joint: boolean (should only be used with stochastic=True i.e. when not calculating MAP) If True, returns a Joint Distribution over variables. If False, returns a dict of distributions over each of the variables.
Approx Inference:
n_samples: int The number of samples to generate for computing the distributions. Higher n_samples results in more accurate results at the cost of more computation time.
samples: pd.DataFrame (default: None) If provided, uses these samples to compute the distribution instead of generating samples. samples must conform with the evidence and virtual_evidence.
state_names: dict (default: None) A dict of state names for each variable in variables in the form {variable_name: list of states}. If None, inferred from the data but is possible that the final distribution misses some states.
seed: int (default: None) Sets the seed for the random generators.
joint: boolean (should only be used with stochastic=True i.e. when not calculating MAP) If True, returns a Joint Distribution over variables. If False, returns a dict of distributions over each of the variables.
- Returns:
Inference results – If stochastic is True, returns state(s) by sampling from the distribution of predicted variables. If stochastic is False, returns state(s) with the highest probability value.
- Return type:
Pandas DataFrame
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import BayesianNetwork >>> from pgmpy.inference import ApproxInference >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)), ... columns=['A', 'B', 'C', 'D', 'E']) >>> train_data = values[:800] >>> predict_data = values[800:] >>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')]) >>> model.fit(train_data) >>> predict_data = predict_data.copy() >>> predict_data.drop('E', axis=1, inplace=True) >>> approx_inf_parameters = {'n_samples':int(1e3),'seed':42} >>> y_pred = model.predict(predict_data,algo=ApproxInference,**approx_inf_parameters) >>> y_pred['E'] E 800 1 801 0 802 1 803 1 804 1 ... ... 995 1 996 1 997 1 998 1 999 0
- predict_probability(data)[source]¶
Predicts probabilities of all states of the missing variables.
- Parameters:
data (pandas DataFrame object) – A DataFrame object with column names same as the variables in the model.
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import BayesianNetwork >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(100, 5)), ... columns=['A', 'B', 'C', 'D', 'E']) >>> train_data = values[:80] >>> predict_data = values[80:] >>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')]) >>> model.fit(values) >>> predict_data = predict_data.copy() >>> predict_data.drop('B', axis=1, inplace=True) >>> y_prob = model.predict_probability(predict_data) >>> y_prob B_0 B_1 80 0.439178 0.560822 81 0.581970 0.418030 82 0.488275 0.511725 83 0.581970 0.418030 84 0.510794 0.489206 85 0.439178 0.560822 86 0.439178 0.560822 87 0.417124 0.582876 88 0.407978 0.592022 89 0.429905 0.570095 90 0.581970 0.418030 91 0.407978 0.592022 92 0.429905 0.570095 93 0.429905 0.570095 94 0.439178 0.560822 95 0.407978 0.592022 96 0.559904 0.440096 97 0.417124 0.582876 98 0.488275 0.511725 99 0.407978 0.592022
- remove_cpds(*cpds)[source]¶
Removes the cpds that are provided in the argument.
- Parameters:
*cpds (TabularCPD object) – A CPD object on any subset of the variables of the model which is to be associated with the model.
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> cpd = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7], ... [0.9, 0.1, 0.8, 0.3]], ... ['intel', 'diff'], [2, 2]) >>> student.add_cpds(cpd) >>> student.remove_cpds(cpd)
- remove_node(node)[source]¶
Remove node from the model.
Removing a node also removes all the associated edges, removes the CPD of the node and marginalizes the CPDs of its children.
- Parameters:
node (node) – Node which is to be removed from the model.
- Return type:
None
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.models import BayesianNetwork >>> model = BayesianNetwork([('A', 'B'), ('B', 'C'), ... ('A', 'D'), ('D', 'C')]) >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)), ... columns=['A', 'B', 'C', 'D']) >>> model.fit(values) >>> model.get_cpds() [<TabularCPD representing P(A:2) at 0x7f28248e2438>, <TabularCPD representing P(B:2 | A:2) at 0x7f28248e23c8>, <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>, <TabularCPD representing P(D:2 | A:2) at 0x7f28248e26a0>] >>> model.remove_node('A') >>> model.get_cpds() [<TabularCPD representing P(B:2) at 0x7f28248e23c8>, <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>, <TabularCPD representing P(D:2) at 0x7f28248e26a0>]
- remove_nodes_from(nodes)[source]¶
Remove multiple nodes from the model.
Removing a node also removes all the associated edges, removes the CPD of the node and marginalizes the CPDs of its children.
- Parameters:
nodes (list, set (iterable)) – Nodes which are to be removed from the model.
- Return type:
None
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.models import BayesianNetwork >>> model = BayesianNetwork([('A', 'B'), ('B', 'C'), ... ('A', 'D'), ('D', 'C')]) >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)), ... columns=['A', 'B', 'C', 'D']) >>> model.fit(values) >>> model.get_cpds() [<TabularCPD representing P(A:2) at 0x7f28248e2438>, <TabularCPD representing P(B:2 | A:2) at 0x7f28248e23c8>, <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>, <TabularCPD representing P(D:2 | A:2) at 0x7f28248e26a0>] >>> model.remove_nodes_from(['A', 'B']) >>> model.get_cpds() [<TabularCPD representing P(C:2 | D:2) at 0x7f28248e2a58>, <TabularCPD representing P(D:2) at 0x7f28248e26d8>]
- save(filename, filetype='bif')[source]¶
Writes the model to a file. Plese avoid using any special characters or spaces in variable or state names.
- Parameters:
Examples
>>> from pgmpy.utils import get_example_model >>> alarm = get_example_model('alarm') >>> alarm.save('alarm.bif', filetype='bif')
- simulate(n_samples=10, do=None, evidence=None, virtual_evidence=None, virtual_intervention=None, include_latents=False, partial_samples=None, seed=None, show_progress=True, missing_prob=None, return_full=False)[source]¶
Simulates data from the given model. Internally uses methods from pgmpy.sampling.BayesianModelSampling to generate the data.
- Parameters:
n_samples (int) – The number of data samples to simulate from the model.
do (dict) – The interventions to apply to the model. dict should be of the form {variable_name: state}
evidence (dict) – Observed evidence to apply to the model. dict should be of the form {variable_name: state}
virtual_evidence (list) – Probabilistically apply evidence to the model. virtual_evidence should be a list of pgmpy.factors.discrete.TabularCPD objects specifying the virtual probabilities.
virtual_intervention (list) – Also known as soft intervention. virtual_intervention should be a list of pgmpy.factors.discrete.TabularCPD objects specifying the virtual/soft intervention probabilities.
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them. partial_samples.shape[0] must be equal to n_samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (bool) – If True, shows a progress bar when generating samples.
missing_prob (TabularCPD, list (default: None)) – The probability of missing value for the variable of TabularCPD. In case of missing value for more than one variable, provide list of TabularCPD. The variable name of each TabularCPD should end with the name of node in BayesianNetwork with * at the end of the name. The state names of each TabularCPD should be the same as the state names of the corresponding node in BayesianNetwork.
return_full (bool (default: False)) – If True, return both full samples and samples with missing values (if performed).
- Returns:
A dataframe with the simulated data
- Return type:
pd.DataFrame
Examples
>>> from pgmpy.utils import get_example_model
Simulation without any evidence or intervention:
>>> model = get_example_model('alarm') >>> model.simulate(n_samples=10)
Simulation with the hard evidence: MINVOLSET = HIGH:
>>> model.simulate(n_samples=10, evidence={"MINVOLSET": "HIGH"})
Simulation with hard intervention: CVP = LOW:
>>> model.simulate(n_samples=10, do={"CVP": "LOW"})
Simulation with virtual/soft evidence: p(MINVOLSET=LOW) = 0.8, p(MINVOLSET=HIGH) = 0.2, p(MINVOLSET=NORMAL) = 0:
>>> virt_evidence = [TabularCPD("MINVOLSET", 3, [[0.8], [0.0], [0.2]], state_names={"MINVOLSET": ["LOW", "NORMAL", "HIGH"]})] >>> model.simulate(n_samples, virtual_evidence=virt_evidence)
Simulation with virtual/soft intervention: p(CVP=LOW) = 0.2, p(CVP=NORMAL)=0.5, p(CVP=HIGH)=0.3:
>>> virt_intervention = [TabularCPD("CVP", 3, [[0.2], [0.5], [0.3]], state_names={"CVP": ["LOW", "NORMAL", "HIGH"]})] >>> model.simulate(n_samples, virtual_intervention=virt_intervention)
Simulation with missing values: >>> from pgmpy.factors.discrete.CPD import TabularCPD >>> cpd = TabularCPD(“HISTORY*”, 2, [[0.5], [0.5]]) >>> model.simulate(n_samples, missing_prob=cpd)
>>> cpd = TabularCPD("HISTORY*", 2, [[0.5, 0.5], [0.5, 0.5]],["HISTORY"], [2], state_names={"HISTORY*" : [0,1], "HISTORY" : ['TRUE', 'FALSE']}) >>> model.simulate(n_samples, missing_prob=cpd)
>>> cpd = TabularCPD("HISTORY*", 2, [[0.2, 0.1, 0.6, 0.4, 0.7, 0.2], [0.8, 0.9, 0.4, 0.6, 0.3, 0.8]], ["HYPOVOLEMIA", "LVEDVOLUME"], [2, 3], state_names={"HISTORY*" : [0,1], "HYPOVOLEMIA" : ['TRUE', 'FALSE'], 'LVEDVOLUME': ['LOW', 'NORMAL', 'HIGH']}) >>> model.simulate(n_samples=10, missing_prob=cpd)
- property states¶
Returns a dictionary mapping each node to its list of possible states.
- Returns:
state_dict – Dictionary of nodes to possible states
- Return type:
- to_junction_tree()[source]¶
Creates a junction tree (or clique tree) for a given Bayesian Network.
For converting a Bayesian Model into a Clique tree, first it is converted into a Markov one.
For a given markov model (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of the edge to other.
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> G = BayesianNetwork([('diff', 'grade'), ('intel', 'grade'), ... ('intel', 'SAT'), ('grade', 'letter')]) >>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]]) >>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]]) >>> grade_cpd = TabularCPD('grade', 3, ... [[0.1,0.1,0.1,0.1,0.1,0.1], ... [0.1,0.1,0.1,0.1,0.1,0.1], ... [0.8,0.8,0.8,0.8,0.8,0.8]], ... evidence=['diff', 'intel'], ... evidence_card=[2, 3]) >>> sat_cpd = TabularCPD('SAT', 2, ... [[0.1, 0.2, 0.7], ... [0.9, 0.8, 0.3]], ... evidence=['intel'], evidence_card=[3]) >>> letter_cpd = TabularCPD('letter', 2, ... [[0.1, 0.4, 0.8], ... [0.9, 0.6, 0.2]], ... evidence=['grade'], evidence_card=[3]) >>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd, sat_cpd, letter_cpd) >>> jt = G.to_junction_tree()
- to_markov_model()[source]¶
Converts Bayesian Network to Markov Model. The Markov Model created would be the moral graph of the Bayesian Network.
Examples
>>> from pgmpy.models import BayesianNetwork >>> G = BayesianNetwork([('diff', 'grade'), ('intel', 'grade'), ... ('intel', 'SAT'), ('grade', 'letter')]) >>> mm = G.to_markov_model() >>> mm.nodes() NodeView(('diff', 'grade', 'intel', 'letter', 'SAT')) >>> mm.edges() EdgeView([('diff', 'grade'), ('diff', 'intel'), ('grade', 'letter'), ('grade', 'intel'), ('intel', 'SAT')])