Models

Bayesian Model

class pgmpy.models.BayesianModel.BayesianModel(ebunch=None)[source]

Base class for bayesian model.

A models stores nodes and edges with conditional probability distribution (cpd) and other attributes.

models hold directed edges. Self loops are not allowed neither multiple (parallel) edges.

Nodes can be any hashable python object.

Edges are represented as links between nodes.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Examples

Create an empty bayesian model with no nodes and no edges.

>>> from pgmpy.models import BayesianModel
>>> G = BayesianModel()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
add_cpds(*cpds)[source]

Add CPD (Conditional Probability Distribution) to the Bayesian Model.

Parameters

cpds (list, set, tuple (array-like)) – List of CPDs which will be associated with the model

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete.CPD import TabularCPD
>>> student = BayesianModel([('diff', 'grades'), ('intel', 'grades')])
>>> grades_cpd = TabularCPD('grades', 3, [[0.1,0.1,0.1,0.1,0.1,0.1],
...                                       [0.1,0.1,0.1,0.1,0.1,0.1],
...                                       [0.8,0.8,0.8,0.8,0.8,0.8]],
...                         evidence=['diff', 'intel'], evidence_card=[2, 3])
>>> student.add_cpds(grades_cpd)

diff:

easy

hard

intel:

dumb

avg

smart

dumb

avg

smart

gradeA

0.1

0.1

0.1

0.1

0.1

0.1

gradeB

0.1

0.1

0.1

0.1

0.1

0.1

gradeC

0.8

0.8

0.8

0.8

0.8

0.8

add_edge(u, v, **kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

Parameters

u,v (nodes) – Nodes can be any hashable python object.

Examples

>>> from pgmpy.models import BayesianModel/home/abinash/software_packages/numpy-1.7.1
>>> G = BayesianModel()
>>> G.add_nodes_from(['grade', 'intel'])
>>> G.add_edge('grade', 'intel')
check_model()[source]

Check the model for various errors. This method checks for the following errors.

  • Checks if the sum of the probabilities for each state is equal to 1 (tol=0.01).

  • Checks if the CPDs associated with nodes are consistent with their parents.

Returns

check – True if all the checks are passed

Return type

boolean

copy()[source]

Returns a copy of the model.

Returns

BayesianModel

Return type

Copy of the model on which the method was called.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> model = BayesianModel([('A', 'B'), ('B', 'C')])
>>> cpd_a = TabularCPD('A', 2, [[0.2], [0.8]])
>>> cpd_b = TabularCPD('B', 2, [[0.3, 0.7], [0.7, 0.3]],
                       evidence=['A'],
                       evidence_card=[2])
>>> cpd_c = TabularCPD('C', 2, [[0.1, 0.9], [0.9, 0.1]],
                       evidence=['B'],
                       evidence_card=[2])
>>> model.add_cpds(cpd_a, cpd_b, cpd_c)
>>> copy_model = model.copy()
>>> copy_model.nodes()
['C', 'A', 'B']
>>> copy_model.edges()
[('A', 'B'), ('B', 'C')]
>>> copy_model.get_cpds()
[<TabularCPD representing P(A:2) at 0x7f2824930a58>,
 <TabularCPD representing P(B:2 | A:2) at 0x7f2824930a90>,
 <TabularCPD representing P(C:2 | B:2) at 0x7f2824944240>]
fit(data, estimator=None, state_names=[], complete_samples_only=True, **kwargs)[source]

Estimates the CPD for each variable based on a given data set.

Parameters
  • data (pandas DataFrame object) – DataFrame object with column names identical to the variable names of the network. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • estimator (Estimator class) – One of: - MaximumLikelihoodEstimator (default) - BayesianEstimator: In this case, pass ‘prior_type’ and either ‘pseudo_counts’ or ‘equivalent_sample_size’ as additional keyword arguments. See BayesianEstimator.get_parameters() for usage.

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

  • complete_samples_only (bool (default True)) – Specifies how to deal with missing data, if present. If set to True all rows that contain np.Nan somewhere are ignored. If False then, for each variable, every row where neither the variable nor its parents are np.NaN is used.

Examples

>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = BayesianModel([('A', 'C'), ('B', 'C')])
>>> model.fit(data)
>>> model.get_cpds()
[<TabularCPD representing P(A:2) at 0x7fb98a7d50f0>,
<TabularCPD representing P(B:2) at 0x7fb98a7d5588>,
<TabularCPD representing P(C:2 | A:2, B:2) at 0x7fb98a7b1f98>]
get_cardinality(node=None)[source]

Returns the cardinality of the node. Throws an error if the CPD for the queried node hasn’t been added to the network.

Parameters

node (Any hashable python object(optional)) – The node whose cardinality we want. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Returns

int or dict – If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Return type

If node is specified returns the cardinality of the node.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_diff = TabularCPD('diff',2,[[0.6,0.4]]);
>>> cpd_intel = TabularCPD('intel',2,[[0.7,0.3]]);
>>> cpd_grade = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7],
...                                     [0.9, 0.1, 0.8, 0.3]],
...                                 ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_diff,cpd_intel,cpd_grade)
>>> student.get_cardinality()
defaultdict(int, {'diff': 2, 'grade': 2, 'intel': 2})
>>> student.get_cardinality('intel')
2
get_cpds(node=None)[source]

Returns the cpd of the node. If node is not specified returns all the CPDs that have been added till now to the graph

Parameters

node (any hashable python object (optional)) – The node whose CPD we want. If node not specified returns all the CPDs added to the model.

Returns

Return type

A list of TabularCPDs.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7],
...                               [0.9, 0.1, 0.8, 0.3]],
...                  ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd)
>>> student.get_cpds()
get_factorized_product(latex=False)[source]
get_markov_blanket(node)[source]

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns

list(blanket_nodes)

Return type

List of nodes contained in Markov Blanket

Parameters

node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = BayesianModel([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'),
                       ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')])
>>> bayes_model.get_markov_blanket('y')
['s', 'w', 'x', 'u', 'z', 'v']
is_imap(JPD)[source]

Checks whether the bayesian model is Imap of given JointProbabilityDistribution

Parameters

JPD (An instance of JointProbabilityDistribution Class, for which you want to) – check the Imap

Returns

boolean – False otherwise

Return type

True if bayesian model is Imap for given Joint Probability Distribution

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.factors.discrete import JointProbabilityDistribution
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]])
>>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]])
>>> grade_cpd = TabularCPD('grade', 3,
...                        [[0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.8,0.8,0.8,0.8,0.8,0.8]],
...                        evidence=['diff', 'intel'],
...                        evidence_card=[2, 3])
>>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd)
>>> val = [0.01, 0.01, 0.08, 0.006, 0.006, 0.048, 0.004, 0.004, 0.032,
           0.04, 0.04, 0.32, 0.024, 0.024, 0.192, 0.016, 0.016, 0.128]
>>> JPD = JointProbabilityDistribution(['diff', 'intel', 'grade'], [2, 3, 3], val)
>>> G.is_imap(JPD)
True
predict(data, n_jobs=-1)[source]

Predicts states of all the missing variables.

Parameters

data (pandas DataFrame object) – A DataFrame object with column names same as the variables in the model.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> train_data = values[:800]
>>> predict_data = values[800:]
>>> model = BayesianModel([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')])
>>> model.fit(values)
>>> predict_data = predict_data.copy()
>>> predict_data.drop('E', axis=1, inplace=True)
>>> y_pred = model.predict(predict_data)
>>> y_pred
    E
800 0
801 1
802 1
803 1
804 0
... ...
993 0
994 0
995 1
996 1
997 0
998 0
999 0
predict_probability(data)[source]

Predicts probabilities of all states of the missing variables.

Parameters

data (pandas DataFrame object) – A DataFrame object with column names same as the variables in the model.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(100, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> train_data = values[:80]
>>> predict_data = values[80:]
>>> model = BayesianModel([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')])
>>> model.fit(values)
>>> predict_data = predict_data.copy()
>>> predict_data.drop('B', axis=1, inplace=True)
>>> y_prob = model.predict_probability(predict_data)
>>> y_prob
    B_0         B_1
80  0.439178    0.560822
81  0.581970    0.418030
82  0.488275    0.511725
83  0.581970    0.418030
84  0.510794    0.489206
85  0.439178    0.560822
86  0.439178    0.560822
87  0.417124    0.582876
88  0.407978    0.592022
89  0.429905    0.570095
90  0.581970    0.418030
91  0.407978    0.592022
92  0.429905    0.570095
93  0.429905    0.570095
94  0.439178    0.560822
95  0.407978    0.592022
96  0.559904    0.440096
97  0.417124    0.582876
98  0.488275    0.511725
99  0.407978    0.592022
remove_cpds(*cpds)[source]

Removes the cpds that are provided in the argument.

Parameters

*cpds (TabularCPD object) – A CPD object on any subset of the variables of the model which is to be associated with the model.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7],
...                               [0.9, 0.1, 0.8, 0.3]],
...                  ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd)
>>> student.remove_cpds(cpd)
remove_node(node)[source]

Remove node from the model.

Removing a node also removes all the associated edges, removes the CPD of the node and marginalizes the CPDs of it’s children.

Parameters

node (node) – Node which is to be removed from the model.

Returns

Return type

None

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import BayesianModel
>>> model = BayesianModel([('A', 'B'), ('B', 'C'),
...                        ('A', 'D'), ('D', 'C')])
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)),
...                       columns=['A', 'B', 'C', 'D'])
>>> model.fit(values)
>>> model.get_cpds()
[<TabularCPD representing P(A:2) at 0x7f28248e2438>,
 <TabularCPD representing P(B:2 | A:2) at 0x7f28248e23c8>,
 <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>,
 <TabularCPD representing P(D:2 | A:2) at 0x7f28248e26a0>]
>>> model.remove_node('A')
>>> model.get_cpds()
[<TabularCPD representing P(B:2) at 0x7f28248e23c8>,
 <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>,
 <TabularCPD representing P(D:2) at 0x7f28248e26a0>]
remove_nodes_from(nodes)[source]

Remove multiple nodes from the model.

Removing a node also removes all the associated edges, removes the CPD of the node and marginalizes the CPDs of it’s children.

Parameters

nodes (list, set (iterable)) – Nodes which are to be removed from the model.

Returns

Return type

None

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import BayesianModel
>>> model = BayesianModel([('A', 'B'), ('B', 'C'),
...                        ('A', 'D'), ('D', 'C')])
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)),
...                       columns=['A', 'B', 'C', 'D'])
>>> model.fit(values)
>>> model.get_cpds()
[<TabularCPD representing P(A:2) at 0x7f28248e2438>,
 <TabularCPD representing P(B:2 | A:2) at 0x7f28248e23c8>,
 <TabularCPD representing P(C:2 | B:2, D:2) at 0x7f28248e2748>,
 <TabularCPD representing P(D:2 | A:2) at 0x7f28248e26a0>]
>>> model.remove_nodes_from(['A', 'B'])
>>> model.get_cpds()
[<TabularCPD representing P(C:2 | D:2) at 0x7f28248e2a58>,
 <TabularCPD representing P(D:2) at 0x7f28248e26d8>]
to_junction_tree()[source]

Creates a junction tree (or clique tree) for a given bayesian model.

For converting a Bayesian Model into a Clique tree, first it is converted into a Markov one.

For a given markov model (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of the edge to other.

Examples

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade'),
...                    ('intel', 'SAT'), ('grade', 'letter')])
>>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]])
>>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]])
>>> grade_cpd = TabularCPD('grade', 3,
...                        [[0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.8,0.8,0.8,0.8,0.8,0.8]],
...                        evidence=['diff', 'intel'],
...                        evidence_card=[2, 3])
>>> sat_cpd = TabularCPD('SAT', 2,
...                      [[0.1, 0.2, 0.7],
...                       [0.9, 0.8, 0.3]],
...                      evidence=['intel'], evidence_card=[3])
>>> letter_cpd = TabularCPD('letter', 2,
...                         [[0.1, 0.4, 0.8],
...                          [0.9, 0.6, 0.2]],
...                         evidence=['grade'], evidence_card=[3])
>>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd, sat_cpd, letter_cpd)
>>> jt = G.to_junction_tree()
to_markov_model()[source]

Converts bayesian model to markov model. The markov model created would be the moral graph of the bayesian model.

Examples

>>> from pgmpy.models import BayesianModel
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade'),
...                    ('intel', 'SAT'), ('grade', 'letter')])
>>> mm = G.to_markov_model()
>>> mm.nodes()
['diff', 'grade', 'intel', 'SAT', 'letter']
>>> mm.edges()
[('diff', 'intel'), ('diff', 'grade'), ('intel', 'grade'),
('intel', 'SAT'), ('grade', 'letter')]

Markov Model

class pgmpy.models.MarkovModel.MarkovModel(ebunch=None)[source]

Base class for markov model.

A MarkovModel stores nodes and edges with potentials

MarkovModel holds undirected edges.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Examples

Create an empty Markov Model with no nodes and no edges.

>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
add_edge(u, v, **kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

Parameters

u,v (nodes) – Nodes can be any hashable Python object.

Examples

>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()
>>> G.add_nodes_from(['Alice', 'Bob', 'Charles'])
>>> G.add_edge('Alice', 'Bob')
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values

Parameters

*factor (pgmpy.factors.factors object) – A factor object on any subset of the variables of the model which is to be associated with the model.

Returns

Return type

None

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles'),
...                        ('Charles', 'Debbie'), ('Debbie', 'Alice')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[3, 2],
...                 values=np.random.rand(6))
>>> student.add_factors(factor)
check_model()[source]

Check the model for various errors. This method checks for the following errors -

  • Checks if the cardinalities of all the variables are consistent across all the factors.

  • Factors are defined for all the random variables.

Returns

check – True if all the checks are passed

Return type

boolean

copy()[source]

Returns a copy of this Markov Model.

Returns

MarkovModel

Return type

Copy of this Markov model.

Examples

>>> from pgmpy.factors.discrete import DiscreteFactor
>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()
>>> G.add_nodes_from([('a', 'b'), ('b', 'c')])
>>> G.add_edge(('a', 'b'), ('b', 'c'))
>>> G_copy = G.copy()
>>> G_copy.edges()
[(('a', 'b'), ('b', 'c'))]
>>> G_copy.nodes()
[('a', 'b'), ('b', 'c')]
>>> factor = DiscreteFactor([('a', 'b')], cardinality=[3],
...                 values=np.random.rand(3))
>>> G.add_factors(factor)
>>> G.get_factors()
[<DiscreteFactor representing phi(('a', 'b'):3) at 0x...>]
>>> G_copy.get_factors()
[]
get_cardinality(node=None)[source]

Returns the cardinality of the node. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Parameters

node (any hashable python object (optional)) – The node whose cardinality we want. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.get_cardinality(node='Alice')
2
>>> student.get_cardinality()
defaultdict(<class 'int'>, {'Bob': 2, 'Alice': 2})
get_factors(node=None)[source]

Returns all the factors containing the node. If node is not specified returns all the factors that have been added till now to the graph.

Parameters

node (any hashable python object (optional)) – The node whose factor we want. If node is not specified

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor1 = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> factor2 = DiscreteFactor(['Bob', 'Charles'], cardinality=[2, 3],
...                                     values=np.ones(6))
>>> student.add_factors(factor1,factor2)
>>> student.get_factors()
[<DiscreteFactor representing phi(Alice:2, Bob:2) at 0x7f8a0e9bf630>,
 <DiscreteFactor representing phi(Bob:2, Charles:3) at 0x7f8a0e9bf5f8>]
>>> student.get_factors('Alice')
[<DiscreteFactor representing phi(Alice:2, Bob:2) at 0x7f8a0e9bf630>]
get_local_independencies(latex=False)[source]

Returns all the local independencies present in the markov model.

Local independencies are the independence assertion in the form of .. math:: {X perp W - {X} - MB(X) | MB(X)} where MB is the markov blanket of all the random variables in X

Parameters

latex (boolean) – If latex=True then latex string of the indepedence assertion would be created

Examples

>>> from pgmpy.models import MarkovModel
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> mm.get_local_independecies()
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = MarkovModel()
>>> G.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> G.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                   ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                   ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in G.edges()]
>>> G.add_factors(*phi)
>>> G.get_partition_function()
markov_blanket(node)[source]

Returns a markov blanket for a random variable.

Markov blanket is the neighboring nodes of the given node.

Examples

>>> from pgmpy.models import MarkovModel
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> mm.markov_blanket('x1')
remove_factors(*factors)[source]

Removes the given factors from the added factors.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.remove_factors(factor)
to_bayesian_model()[source]

Creates a Bayesian Model which is a minimum I-Map for this markov model.

The ordering of parents may not remain constant. It would depend on the ordering of variable in the junction tree (which is not constant) all the time.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in mm.edges()]
>>> mm.add_factors(*phi)
>>> bm = mm.to_bayesian_model()
to_factor_graph()[source]

Converts the markov model into factor graph.

A factor graph contains two types of nodes. One type corresponds to random variables whereas the second type corresponds to factors over these variables. The graph only contains edges between variables and factor nodes. Each factor node is associated with one factor whose scope is the set of variables that are its neighbors.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor1 = DiscreteFactor(['Alice', 'Bob'], [3, 2], np.random.rand(6))
>>> factor2 = DiscreteFactor(['Bob', 'Charles'], [2, 2], np.random.rand(4))
>>> student.add_factors(factor1, factor2)
>>> factor_graph = student.to_factor_graph()
to_junction_tree()[source]

Creates a junction tree (or clique tree) for a given markov model.

For a given markov model (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of the edge to other.

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in mm.edges()]
>>> mm.add_factors(*phi)
>>> junction_tree = mm.to_junction_tree()
triangulate(heuristic='H6', order=None, inplace=False)[source]

Triangulate the graph.

If order of deletion is given heuristic algorithm will not be used.

Parameters
  • heuristic (H1 | H2 | H3 | H4 | H5 | H6) –

    The heuristic algorithm to use to decide the deletion order of the variables to compute the triangulated graph. Let X be the set of variables and X(i) denotes the i-th variable.

    • S(i) - The size of the clique created by deleting the variable.

    • E(i) - Cardinality of variable X(i).

    • M(i) - Maximum size of cliques given by X(i) and its adjacent nodes.

    • C(i) - Sum of size of cliques given by X(i) and its adjacent nodes.

    The heuristic algorithm decide the deletion order if this way:

    • H1 - Delete the variable with minimal S(i).

    • H2 - Delete the variable with minimal S(i)/E(i).

    • H3 - Delete the variable with minimal S(i) - M(i).

    • H4 - Delete the variable with minimal S(i) - C(i).

    • H5 - Delete the variable with minimal S(i)/M(i).

    • H6 - Delete the variable with minimal S(i)/C(i).

  • order (list, tuple (array-like)) – The order of deletion of the variables to compute the triagulated graph. If order is given heuristic algorithm will not be used.

  • inplace (True | False) – if inplace is true then adds the edges to the object from which it is called else returns a new object.

References

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.3607

Examples

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = MarkovModel()
>>> G.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> G.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                   ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                   ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in G.edges()]
>>> G.add_factors(*phi)
>>> G_chordal = G.triangulate()

Factor Graph

class pgmpy.models.FactorGraph.FactorGraph(ebunch=None)[source]

Class for representing factor graph.

DiscreteFactor graph is a bipartite graph representing factorization of a function. They allow efficient computation of marginal distributions through sum-product algorithm.

A factor graph contains two types of nodes. One type corresponds to random variables whereas the second type corresponds to factors over these variables. The graph only contains edges between variables and factor nodes. Each factor node is associated with one factor whose scope is the set of variables that are its neighbors.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list.

Examples

Create an empty FactorGraph with no nodes and no edges

>>> from pgmpy.models import FactorGraph
>>> G = FactorGraph()

G can be grown by adding variable nodes as well as factor nodes

Nodes:

Add a node at a time or a list of nodes.

>>> G.add_node('a')
>>> G.add_nodes_from(['a', 'b'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1)
>>> G.add_nodes_from([phi1])

Edges:

G can also be grown by adding edges.

>>> G.add_edge('a', phi1)

or a list of edges

>>> G.add_edges_from([('a', phi1), ('b', phi1)])
add_edge(u, v, **kwargs)[source]

Add an edge between variable_node and factor_node.

Parameters

v (u,) – Nodes can be any hashable Python object.

Examples

>>> from pgmpy.models import FactorGraph
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edge('a', phi1)
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values.

Parameters

*factor (pgmpy.factors.DiscreteFactor object) – A factor object on any subset of the variables of the model which is to be associated with the model.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
check_model()[source]

Check the model for various errors. This method checks for the following errors. In the same time it also updates the cardinalities of all the random variables.

  • Check whether bipartite property of factor graph is still maintained or not.

  • Check whether factors are associated for all the random variables or not.

  • Check if factors are defined for each factor node or not.

  • Check if cardinality information for all the variables is availble or not.

  • Check if cardinality of random variable remains same across all the factors.

copy()[source]

Returns a copy of the model.

Returns

FactorGraph

Return type

Copy of FactorGraph

Examples

>>> import numpy as np
>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from([('a', 'b'), ('b', 'c')])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G_copy = G.copy()
>>> G_copy.nodes()
[<Factor representing phi(b:2, c:2) at 0xb4badd4c>, 'b', 'c',
  'a', <Factor representing phi(a:2, b:2) at 0xb4badf2c>]
get_cardinality(node=None)[source]

Returns the cardinality of the node

Parameters

node (any hashable python object (optional)) – The node whose cardinality we want. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Returns

int or dict – If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Return type

If node is specified returns the cardinality of the node.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.add_factors(phi1, phi2)
>>> G.get_cardinality()
    defaultdict(<class 'int'>, {'c': 2, 'b': 2, 'a': 2})
>>> G.get_cardinality('a')
    2
get_factor_nodes()[source]

Returns factors nodes present in the graph.

Before calling this method make sure that all the factors are added properly.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factor_nodes()
[<DiscreteFactor representing phi(b:2, c:2) at 0x4b8c7f0>,
 <DiscreteFactor representing phi(a:2, b:2) at 0x4b8c5b0>]
get_factors(node=None)[source]

Returns the factors that have been added till now to the graph.

If node is not None, it would return the factor corresponding to the given node.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factors()
>>> G.get_factors(node=phi1)
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factors()
>>> G.get_partition_function()
get_variable_nodes()[source]

Returns variable nodes present in the graph.

Before calling this method make sure that all the factors are added properly.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_variable_nodes()
['a', 'b']
remove_factors(*factors)[source]

Removes the given factors from the added factors.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1)
>>> G.remove_factors(phi1)
to_junction_tree()[source]

Create a junction treeo (or clique tree) for a given factor graph.

For a given factor graph (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of edge to other

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> mm = G.to_markov_model()
to_markov_model()[source]

Converts the factor graph into markov model.

A markov model contains nodes as random variables and edge between two nodes imply interaction between them.

Examples

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> mm = G.to_markov_model()

Cluster Graph

class pgmpy.models.ClusterGraph.ClusterGraph(ebunch=None)[source]

Base class for representing Cluster Graph.

Cluster graph is an undirected graph which is associated with a subset of variables. The graph contains undirected edges that connects clusters whose scopes have a non-empty intersection.

Formally, a cluster graph is \mathcal{U} for a set of factors \Phi over \mathcal{X} is an undirected graph, each of whose nodes i is associated with a subset C_i \subseteq X. A cluster graph must be family-preserving - each factor \phi \in \Phi must be associated with a cluster C, denoted \alpha(\phi), such that Scope[\phi] \subseteq C_i. Each edge between a pair of clusters C_i and C_j is associated with a sepset S_{i,j} \subseteq C_i \cap C_j.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list

Examples

Create an empty ClusterGraph with no nodes and no edges

>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()

G can be grown by adding clique nodes.

Nodes:

Add a tuple (or list or set) of nodes as single clique node.

>>> G.add_node(('a', 'b', 'c'))
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])

Edges:

G can also be grown by adding edges.

>>> G.add_edge(('a', 'b', 'c'), ('a', 'b'))

or a list of edges

>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_edge(u, v, **kwargs)[source]

Add an edge between two clique nodes.

Parameters

v (u,) – Nodes can be any list or set or tuple of nodes forming a clique.

Examples

>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values

Parameters

*factor (pgmpy.factors.factors object) – A factor object on any subset of the variables of the model which is to be associated with the model.

Returns

Return type

None

Examples

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> student.add_node(('Alice', 'Bob'))
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[3, 2],
...                 values=np.random.rand(6))
>>> student.add_factors(factor)
add_node(node, **kwargs)[source]

Add a single node to the cluster graph.

Parameters

node (node) – A node should be a collection of nodes forming a clique. It can be a list, set or tuple of nodes

Examples

>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_node(('a', 'b', 'c'))
add_nodes_from(nodes, **kwargs)[source]

Add multiple nodes to the cluster graph.

Parameters

nodes (iterable container) – A container of nodes (list, dict, set, etc.).

Examples

>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])
check_model()[source]

Check the model for various errors. This method checks for the following errors.

  • Checks if factors are defined for all the cliques or not.

  • Check for running intersection property is not done explicitly over here as it done in the add_edges method.

  • Checks if cardinality information for all the variables is availble or not. If not it raises an error.

  • Check if cardinality of random variable remains same across all the factors.

Returns

check – True if all the checks are passed

Return type

boolean

copy()[source]

Returns a copy of ClusterGraph.

Returns

ClusterGraph

Return type

copy of ClusterGraph

Examples

>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b'), ('b', 'c')])
>>> G.add_edge(('a', 'b'), ('b', 'c'))
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> graph_copy = G.copy()
>>> graph_copy.factors
[<DiscreteFactor representing phi(a:2, b:2) at 0xb71b19cc>,
 <DiscreteFactor representing phi(b:2, c:2) at 0xb4eaf3ac>]
>>> graph_copy.edges()
[(('a', 'b'), ('b', 'c'))]
>>> graph_copy.nodes()
[('a', 'b'), ('b', 'c')]
get_cardinality(node=None)[source]

Returns the cardinality of the node

Parameters

node (any hashable python object (optional)) – The node whose cardinality we want. If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Returns

int or dict – If node is not specified returns a dictionary with the given variable as keys and their respective cardinality as values.

Return type

If node is specified returns the cardinality of the node.

Examples

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_node(('Alice', 'Bob'))
>>> student.add_factors(factor)
>>> student.get_cardinality()
defaultdict(<class 'int'>, {'Bob': 2, 'Alice': 2})
>>> student.get_cardinality(node='Alice')
2
get_factors(node=None)[source]

Return the factors that have been added till now to the graph.

If node is not None, it would return the factor corresponding to the given node.

Examples

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b', 'c'], [2, 2, 2], np.random.rand(8))
>>> phi2 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi3 = DiscreteFactor(['a', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2, phi3)
>>> G.get_factors()
>>> G.get_factors(node=('a', 'b', 'c'))
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

Examples

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b', 'c'], [2, 2, 2], np.random.rand(8))
>>> phi2 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi3 = DiscreteFactor(['a', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2, phi3)
>>> G.get_partition_function()
remove_factors(*factors)[source]

Removes the given factors from the added factors.

Examples

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 value=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.remove_factors(factor)

Junction Tree

class pgmpy.models.JunctionTree.JunctionTree(ebunch=None)[source]

Class for representing Junction Tree.

Junction tree is undirected graph where each node represents a clique (list, tuple or set of nodes) and edges represent sepset between two cliques. Each sepset in G separates the variables strictly on one side of edge to other.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list.

Examples

Create an empty JunctionTree with no nodes and no edges

>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()

G can be grown by adding clique nodes.

Nodes:

Add a tuple (or list or set) of nodes as single clique node.

>>> G.add_node(('a', 'b', 'c'))
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])

Edges:

G can also be grown by adding edges.

>>> G.add_edge(('a', 'b', 'c'), ('a', 'b'))

or a list of edges

>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_edge(u, v, **kwargs)[source]

Add an edge between two clique nodes.

Parameters

v (u,) – Nodes can be any list or set or tuple of nodes forming a clique.

Examples

>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
check_model()[source]

Check the model for various errors. This method checks for the following errors. In the same time also updates the cardinalities of all the random variables.

  • Checks if clique potentials are defined for all the cliques or not.

  • Check for running intersection property is not done explicitly over here as it done in the add_edges method.

Returns

check – True if all the checks are passed

Return type

boolean

copy()[source]

Returns a copy of JunctionTree.

Returns

JunctionTree

Return type

copy of JunctionTree

Examples

>>> import numpy as np
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')), (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b'], [1, 2], np.random.rand(2))
>>> phi2 = DiscreteFactor(['a', 'c'], [1, 2], np.random.rand(2))
>>> G.add_factors(phi1,phi2)
>>> modelCopy = G.copy()
>>> modelCopy.edges()
[(('a', 'b'), ('a', 'b', 'c')), (('a', 'c'), ('a', 'b', 'c'))]
>>> G.factors
[<DiscreteFactor representing phi(a:1, b:2) at 0xb720ee4c>,
 <DiscreteFactor representing phi(a:1, c:2) at 0xb4e1e06c>]
>>> modelCopy.factors
[<DiscreteFactor representing phi(a:1, b:2) at 0xb4bd11ec>,
 <DiscreteFactor representing phi(a:1, c:2) at 0xb4bd138c>]

Markov Chain

class pgmpy.models.MarkovChain.MarkovChain(variables=None, card=None, start_state=None)[source]

Class to represent a Markov Chain with multiple kernels for factored state space, along with methods to simulate a run.

Examples

Create an empty Markov Chain:

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()

And then add variables to it

>>> model.add_variables_from(['intel', 'diff'], [2, 3])

Or directly create a Markov Chain from a list of variables and their cardinalities

>>> model = MC(['intel', 'diff'], [2, 3])

Add transition models

>>> intel_tm = {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}}
>>> model.add_transition_model('diff', diff_tm)

Set a start state

>>> from pgmpy.factors.discrete import State
>>> model.set_start_state([State('intel', 0), State('diff', 2)])

Sample from it

>>> model.sample(size=5)
   intel  diff
0      0     2
1      1     0
2      0     1
3      1     0
4      0     2
add_transition_model(variable, transition_model)[source]

Adds a transition model for a particular variable.

Parameters
  • variable (any hashable python object) – must be an existing variable of the model.

  • transition_model (dict or 2d array) – dict representing valid transition probabilities defined for every possible state of the variable. array represent a square matrix where every row sums to 1, array[i,j] indicates the transition probalities from State i to State j

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variable('grade', 3)
>>> grade_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}}
>>> grade_tm_matrix = np.array([[0.1, 0.5, 0.4], [0.2, 0.2, 0.6], [0.7, 0.15, 0.15]])
>>> model.add_transition_model('grade', grade_tm)
>>> model.add_transition_model('grade', grade_tm_matrix)
add_variable(variable, card=0)[source]

Add a variable to the model.

Parameters
  • variable (any hashable python object) –

  • card (int) – Representing the cardinality of the variable to be added.

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variable('x', 4)
add_variables_from(variables, cards)[source]

Add several variables to the model at once.

Parameters
  • variables (array-like iterable object) – List of variables to be added.

  • cards (array-like iterable object) – List of cardinalities of the variables to be added.

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variables_from(['x', 'y'], [3, 4])
copy()[source]

Returns a copy of Markov Chain Model.

Returns

MarkovChain

Return type

Copy of MarkovChain.

Examples

>>> from pgmpy.models import MarkovChain
>>> from pgmpy.factors.discrete import State
>>> model = MarkovChain()
>>> model.add_variables_from(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {0: 0.3, 1: 0.3, 2: 0.4}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.set_start_state([State('intel', 0), State('diff', 2)])
>>> model_copy = model.copy()
>>> model_copy.transition_models
>>> {'diff': {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6}, 2: {0: 0.7, 1: 0.15, 2: 0.15}},
     'intel': {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}}}
generate_sample(start_state=None, size=1)[source]

Generator version of self.sample

Returns

Return type

List of State namedtuples, representing the assignment to all variables of the model.

Examples

>>> from pgmpy.models.MarkovChain import MarkovChain
>>> from pgmpy.factors.discrete import State
>>> model = MarkovChain()
>>> model.add_variables_from(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {0: 0.3, 1: 0.3, 2: 0.4}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> gen = model.generate_sample([State('intel', 0), State('diff', 0)], 2)
>>> [sample for sample in gen]
[[State(var='intel', state=2), State(var='diff', state=1)],
 [State(var='intel', state=2), State(var='diff', state=0)]]
is_stationarity(tolerance=0.2, sample=None)[source]

Checks if the given markov chain is stationary and checks the steady state probablity values for the state are consistent.

Parameters
  • tolerance (float) – represents the diff between actual steady state value and the computed value

  • sample ([State(i,j)]) – represents the list of state which the markov chain has sampled

Returns

True, if the markov chain converges to steady state distribution within the tolerance False, if the markov chain does not converge to steady state distribution within tolerance

Return type

Boolean

Examples

>>> from pgmpy.models.MarkovChain import MarkovChain
>>> from pgmpy.factors.discrete import State
>>> model = MarkovChain()
>>> model.add_variables_from(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {0: 0.3, 1: 0.3, 2: 0.4}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.is_stationarity()
True
prob_from_sample(state, sample=None, window_size=None)[source]

Given an instantiation (partial or complete) of the variables of the model, compute the probability of observing it over multiple windows in a given sample.

If ‘sample’ is not passed as an argument, generate the statistic by sampling from the Markov Chain, starting with a random initial state.

Examples

>>> from pgmpy.models.MarkovChain import MarkovChain as MC
>>> from pgmpy.factors.discrete import State
>>> model = MC(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {2: 0.5, 1:0.5}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.prob_from_sample([State('diff', 0)])
array([ 0.27,  0.4 ,  0.18,  0.23, ..., 0.29])
random_state()[source]

Generates a random state of the Markov Chain.

Returns

Return type

List of namedtuples, representing a random assignment to all variables of the model.

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC(['intel', 'diff'], [2, 3])
>>> model.random_state()
[State('diff', 2), State('intel', 1)]
sample(start_state=None, size=1)[source]

Sample from the Markov Chain.

Parameters
  • start_state (dict or array-like iterable) – Representing the starting states of the variables. If None is passed, a random start_state is chosen.

  • size (int) – Number of samples to be generated.

Returns

Return type

pandas.DataFrame

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> from pgmpy.factors.discrete import State
>>> model = MC(['intel', 'diff'], [2, 3])
>>> model.set_start_state([State('intel', 0), State('diff', 2)])
>>> intel_tm = {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.sample(size=5)
   intel  diff
0      0     2
1      1     0
2      0     1
3      1     0
4      0     2
set_start_state(start_state)[source]

Set the start state of the Markov Chain. If the start_state is given as a array-like iterable, its contents are reordered in the internal representation.

Parameters

start_state (dict or array-like iterable object) – Dict (or list) of tuples representing the starting states of the variables.

Examples

>>> from pgmpy.models import MarkovChain as MC
>>> from pgmpy.factors.discrete import State
>>> model = MC(['a', 'b'], [2, 2])
>>> model.set_start_state([State('a', 0), State('b', 1)])

NoisyOr Model

class pgmpy.models.NoisyOrModel.NoisyOrModel(variables, cardinality, inhibitor_probability)[source]

Base class for Noisy-Or models.

This is an implementation of generalized Noisy-Or models and is not limited to Boolean variables and also any arbitrary function can be used instead of the boolean OR function.

Reference: http://xenon.stanford.edu/~srinivas/research/6-UAI93-Srinivas-Generalization-of-Noisy-Or.pdf

add_variables(variables, cardinality, inhibitor_probability)[source]

Adds variables to the NoisyOrModel.

Parameters
  • variables (list, tuple, dict (array like)) – array containing names of the variables that are to be added.

  • cardinality (list, tuple, dict (array like)) – array containing integers representing the cardinality of the variables.

  • inhibitor_probability (list, tuple, dict (array_like)) – array containing the inhibitor probabilities corresponding to each variable.

Examples

>>> from pgmpy.models import NoisyOrModel
>>> model = NoisyOrModel(['x1', 'x2', 'x3'], [2, 3, 2], [[0.6, 0.4],
...                                                      [0.2, 0.4, 0.7],
...                                                      [0.1, 0. 4]])
>>> model.add_variables(['x4'], [3], [0.1, 0.4, 0.2])
del_variables(variables)[source]

Deletes variables from the NoisyOrModel.

Parameters

variables (list, tuple, dict (array like)) – list of variables to be deleted.

Examples

>>> from pgmpy.models import NoisyOrModel
>>> model = NoisyOrModel(['x1', 'x2', 'x3'], [2, 3, 2], [[0.6, 0.4],
...                                                      [0.2, 0.4, 0.7],
...                                                      [0.1, 0. 4]])
>>> model.del_variables(['x1'])

Naive Bayes

class pgmpy.models.NaiveBayes.NaiveBayes(ebunch=None)[source]

Class to represent Naive Bayes. Subclass of Bayesian Model. Model holds directed edges from one parent node to multiple children nodes only.

Parameters

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Examples

Create an empty Naive Bayes Model with no nodes and no edges.

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b', 'c'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('a', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
active_trail_nodes(start, observed=None)[source]

Returns all the nodes reachable from start via an active trail.

Parameters
  • start (Graph node) –

  • observed (List of nodes (optional)) – If given the active trail would be computed assuming these nodes to be observed.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_nodes_from(['a', 'b', 'c', 'd'])
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> model.active_trail_nodes('a')
{'a', 'b', 'c', 'd'}
>>> model.active_trail_nodes('a', ['b', 'c'])
{'a', 'd'}
>>> model.active_trail_nodes('b', ['a'])
{'b'}
add_edge(u, v, *kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

Parameters

u,v (nodes) – Nodes can be any hashable python object.

Examples

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> G.add_edge('a', 'b')
>>> G.add_edge('a', 'c')
>>> G.edges()
[('a', 'c'), ('a', 'b')]
fit(data, parent_node=None, estimator=None)[source]

Computes the CPD for each node from a given data in the form of a pandas dataframe. If a variable from the data is not present in the model, it adds that node into the model.

Parameters
  • data (pandas DataFrame object) – A DataFrame object with column names same as the variable names of network

  • parent_node (any hashable python object (optional)) – Parent node of the model, if not specified it looks for a previously specified parent node.

  • estimator (Estimator class) – Any pgmpy estimator. If nothing is specified, the default MaximumLikelihoodEstimator would be used.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> model.fit(values, 'A')
>>> model.get_cpds()
[<TabularCPD representing P(D:2 | A:2) at 0x4b72870>,
 <TabularCPD representing P(E:2 | A:2) at 0x4bb2150>,
 <TabularCPD representing P(A:2) at 0x4bb23d0>,
 <TabularCPD representing P(B:2 | A:2) at 0x4bb24b0>,
 <TabularCPD representing P(C:2 | A:2) at 0x4bb2750>]
>>> model.edges()
[('A', 'D'), ('A', 'E'), ('A', 'B'), ('A', 'C')]
local_independencies(variables)[source]

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters

variables (str or array like) – variables whose local independencies are to found.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> ind = model.local_independencies('b')
>>> ind
(b _|_ d, c | a)

Dynamic Bayesian Network

class pgmpy.models.DynamicBayesianNetwork.DynamicBayesianNetwork(ebunch=None)[source]

Bases: pgmpy.base.DAG.DAG

active_trail_nodes(variables, observed=None)

Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values. :param variables: variables whose active trails are to be found. :type variables: str or array like :param observed: If given the active trails would be computed assuming these nodes to be observed. :type observed: List of nodes (optional)

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades')])
>>> student.active_trail_nodes('diff')
{'diff': {'diff', 'grades'}}
>>> student.active_trail_nodes(['diff', 'intel'], observed='grades')
{'diff': {'diff', 'intel'}, 'intel': {'diff', 'intel'}}

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

add_cpds(*cpds)[source]

This method adds the cpds to the dynamic bayesian network. Note that while adding variables and the evidence in cpd, they have to be of the following form (node_name, time_slice) Here, node_name is the node that is inserted while the time_slice is an integer value, which denotes the index of the time_slice that the node belongs to.

Parameters

cpds (list, set, tuple (array-like)) – List of CPDs which are to be associated with the model. Each CPD should be an instance of TabularCPD.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> from pgmpy.factors.discrete import TabularCPD
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D', 0),('G', 0)),(('I', 0),('G', 0)),(('D', 0),('D', 1)),(('I', 0),('I', 1))])
>>> grade_cpd = TabularCPD(('G', 0), 3, [[0.3, 0.05, 0.9, 0.5],
...                                      [0.4, 0.25, 0.8, 0.03],
...                                      [0.3, 0.7, 0.02, 0.2]],
...                        evidence=[('I', 0),('D', 0)],
...                        evidence_card=[2, 2])
>>> d_i_cpd = TabularCPD(('D',1), 2, [[0.6, 0.3],
...                                   [0.4, 0.7]],
...                      evidence=[('D',0)],
...                      evidence_card=2)
>>> diff_cpd = TabularCPD(('D', 0), 2, [[0.6, 0.4]])
>>> intel_cpd = TabularCPD(('I', 0), 2, [[0.7, 0.3]])
>>> i_i_cpd = TabularCPD(('I', 1), 2, [[0.5, 0.4],
...                                    [0.5, 0.6]],
...                      evidence=[('I', 0)],
...                      evidence_card=2)
>>> dbn.add_cpds(grade_cpd, d_i_cpd, diff_cpd, intel_cpd, i_i_cpd)
>>> dbn.get_cpds()
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7ff7f27b0cf8>,
 <TabularCPD representing P(('D', 1):2 | ('D', 0):2) at 0x7ff810b9c2e8>,
 <TabularCPD representing P(('D', 0):2) at 0x7ff7f27e6f98>,
 <TabularCPD representing P(('I', 0):2) at 0x7ff7f27e6ba8>,
 <TabularCPD representing P(('I', 1):2 | ('I', 0):2) at 0x7ff7f27e6668>]
add_edge(start, end, **kwargs)[source]

Add an edge between two nodes.

The nodes will be automatically added if they are not present in the network.

Parameters
  • start (tuple) – Both the start and end nodes should specify the time slice as (node_name, time_slice). Here, node_name can be any hashable python object while the time_slice is an integer value, which denotes the time slice that the node belongs to.

  • end (tuple) – Both the start and end nodes should specify the time slice as (node_name, time_slice). Here, node_name can be any hashable python object while the time_slice is an integer value, which denotes the time slice that the node belongs to.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> model = DBN()
>>> model.add_nodes_from(['D', 'I'])
>>> model.add_edge(('D',0), ('I',0))
>>> sorted(model.edges())
[(('D', 0), ('I', 0)), (('D', 1), ('I', 1))]
add_edges_from(ebunch, **kwargs)[source]

Add all the edges in ebunch.

If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.

Parameters

ebunch (list, array-like) – List of edges to add. Each edge must be of the form of ((start, time_slice), (end, time_slice)).

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D', 0), ('G', 0)), (('I', 0), ('G', 0))])
>>> dbn.nodes()
['G', 'I', 'D']
>>> dbn.edges()
[(('D', 1), ('G', 1)),
 (('I', 1), ('G', 1)),
 (('D', 0), ('G', 0)),
 (('I', 0), ('G', 0))]
add_node(node, **attr)[source]

Adds a single node to the Network

Parameters

node (node) – A node can be any hashable Python object.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_node('A')
['A']
add_nodes_from(nodes, **attr)[source]

Add multiple nodes to the Network.

Parameters

nodes (iterable container) – A container of nodes (list, dict, set, etc.).

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_nodes_from(['A', 'B', 'C'])
add_weighted_edges_from(ebunch_to_add, weight='weight', **attr)

Add weighted edges in ebunch_to_add with specified weight attr

Parameters
  • ebunch_to_add (container of edges) – Each edge given in the list or container will be added to the graph. The edges must be given as 3-tuples (u, v, w) where w is a number.

  • weight (string, optional (default= 'weight')) – The attribute name for the edge weights to be added.

  • attr (keyword arguments, optional (default= no attributes)) – Edge attributes to add/update for all edges.

See also

add_edge()

add a single edge

add_edges_from()

add multiple edges

Notes

Adding the same edge twice for Graph/DiGraph simply updates the edge data. For MultiGraph/MultiDiGraph, duplicate edges are stored.

Examples

>>> G = nx.Graph()   # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.add_weighted_edges_from([(0, 1, 3.0), (1, 2, 7.5)])
property adj

Graph adjacency object holding the neighbors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.adj[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.adj behaves like a dict. Useful idioms include for nbr, datadict in G.adj[n].items():.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj holds outgoing (successor) info.

adjacency()

Returns an iterator over (node, adjacency dict) tuples for all nodes.

For directed graphs, only outgoing neighbors/adjacencies are included.

Returns

adj_iter – An iterator over (node, adjacency dictionary) for all nodes in the graph.

Return type

iterator

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> [(n, nbrdict) for n, nbrdict in G.adjacency()]
[(0, {1: {}}), (1, {0: {}, 2: {}}), (2, {1: {}, 3: {}}), (3, {2: {}})]
adjlist_inner_dict_factory

alias of builtins.dict

adjlist_outer_dict_factory

alias of builtins.dict

check_model()[source]

Check the model for various errors. This method checks for the following errors.

  • Checks if the sum of the probabilities in each associated CPD for each

    state is equal to 1 (tol=0.01).

  • Checks if the CPDs associated with nodes are consistent with their parents.

Returns

boolean – according to the problem.

Return type

True if everything seems to be order. Otherwise raises error

clear()

Remove all nodes and edges from the graph.

This also removes the name, and all graph, node, and edge attributes.

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.clear()
>>> list(G.nodes)
[]
>>> list(G.edges)
[]
copy()[source]

Returns a copy of the dynamic bayesian network.

Returns

DynamicBayesianNetwork

Return type

copy of the dynamic bayesian network

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> from pgmpy.factors.discrete import TabularCPD
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D',0),('G',0)),(('I',0),('G',0)),(('D',0),('D',1)),(('I',0),('I',1))])
>>> grade_cpd =  TabularCPD(('G',0), 3, [[0.3,0.05,0.9,0.5],
                                [0.4,0.25,0.8,0.03],
                                [0.3,0.7,0.02,0.2]], [('I', 0),('D', 0)],[2,2])
>>> dbn.add_cpds(grade_cpd)
>>> dbn_copy = dbn.copy()
>>> dbn_copy.nodes()
['Z', 'G', 'I', 'D']
>>> dbn_copy.edges()
[(('I', 1), ('G', 1)),
(('I', 0), ('I', 1)),
(('I', 0), ('G', 0)),
(('D', 1), ('G', 1)),
(('D', 0), ('G', 0)),
(('D', 0), ('D', 1))]
>> dbn_copy.get_cpds()
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f13961a3320>]
property degree

A DegreeView for the Graph as G.degree or G.degree().

The node degree is the number of edges adjacent to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator for (node, degree) as well as lookup for the degree for a single node.

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • weight (string or None, optional (default=None)) – The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

Returns

  • If a single node is requested

  • deg (int) – Degree of the node

  • OR if multiple nodes are requested

  • nd_iter (iterator) – The iterator returns two-tuples of (node, degree).

See also

in_degree, out_degree

Examples

>>> G = nx.DiGraph()   # or MultiDiGraph
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.degree(0) # node 0 with degree 1
1
>>> list(G.degree([0, 1, 2]))
[(0, 1), (1, 2), (2, 2)]
do(node)

Applies the do operator to the graph and returns a new DAG with the transformed graph.

The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.

Parameters

node (string) – The name of the node to apply the do-operator to.

Returns

DAG

Return type

A new instance of DAG modified by the do-operator

Examples

Initialize a DAG >>> graph = DAG() >>> graph.add_edges_from([(‘X’, ‘A’),

(‘A’, ‘Y’), (‘A’, ‘B’)])

Applying the do-operator will return a new DAG with the desired structure. >>> graph_do_A = self.graph.do(‘A’) Which we can verify is missing the edges we would expect. >>> graph_do_A.edges [(‘A’, ‘B’), (‘A’, ‘Y’)]

References

Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.

edge_attr_dict_factory

alias of builtins.dict

edge_subgraph(edges)

Returns the subgraph induced by the specified edges.

The induced subgraph contains each edge in edges and each node incident to any one of those edges.

Parameters

edges (iterable) – An iterable of edges in this graph.

Returns

G – An edge-induced subgraph of this graph with the same edge attributes.

Return type

Graph

Notes

The graph, edge, and node attributes in the returned subgraph view are references to the corresponding attributes in the original graph. The view is read-only.

To create a full graph version of the subgraph with its own copy of the edge or node attributes, use:

>>> G.edge_subgraph(edges).copy()  

Examples

>>> G = nx.path_graph(5)
>>> H = G.edge_subgraph([(0, 1), (3, 4)])
>>> list(H.nodes)
[0, 1, 3, 4]
>>> list(H.edges)
[(0, 1), (3, 4)]
property edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • data (string or bool, optional (default=False)) – The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).

  • default (value, optional (default=None)) – Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

Returns

edges – A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

Return type

OutEdgeView

See also

in_edges, out_edges

Notes

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

Examples

>>> G = nx.DiGraph()   # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data('weight', default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges incident to these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges incident to a single node (use G.adj[0]?)
OutEdgeDataView([(0, 1)])
get_children(node)

Returns a list of children of node. Throws an error if the node is not present in the graph.

Parameters

node (string, int or any hashable python object.) – The node whose children would be returned.

Examples

>>> from pgmpy.base import DAG
>>> g = DAG(ebunch=[('A', 'B'), ('C', 'B'), ('B', 'D'),
                              ('B', 'E'), ('B', 'F'), ('E', 'G')])
>>> g.get_children(node='B')
['D', 'E', 'F']
get_cpds(node=None, time_slice=0)[source]

Returns the CPDs that have been associated with the network.

Parameters
  • node (tuple (node_name, time_slice)) – The node should be in the following form (node_name, time_slice). Here, node_name is the node that is inserted while the time_slice is an integer value, which denotes the index of the time_slice that the node belongs to.

  • time_slice (int) – The time_slice should be a positive integer greater than or equal to zero.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> from pgmpy.factors.discrete import TabularCPD
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D',0),('G',0)),(('I',0),('G',0)),(('D',0),('D',1)),(('I',0),('I',1))])
>>> grade_cpd =  TabularCPD(('G',0), 3, [[0.3,0.05,0.9,0.5],
...                                      [0.4,0.25,0.8,0.03],
...                                      [0.3,0.7,0.02,0.2]], [('I', 0),('D', 0)],[2,2])
>>> dbn.add_cpds(grade_cpd)
>>> dbn.get_cpds()
get_edge_data(u, v, default=None)

Returns the attribute dictionary associated with edge (u, v).

This is identical to G[u][v] except the default is returned instead of an exception if the edge doesn’t exist.

Parameters
  • v (u,) –

  • default (any Python object (default=None)) – Value to return if the edge (u, v) is not found.

Returns

edge_dict – The edge attribute dictionary.

Return type

dictionary

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G[0][1]
{}

Warning: Assigning to G[u][v] is not permitted. But it is safe to assign attributes G[u][v][‘foo’]

>>> G[0][1]['weight'] = 7
>>> G[0][1]['weight']
7
>>> G[1][0]['weight']
7
>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.get_edge_data(0, 1)  # default edge data is {}
{}
>>> e = (0, 1)
>>> G.get_edge_data(*e)  # tuple form
{}
>>> G.get_edge_data('a', 'b', default=0)  # edge not in graph, return 0
0
get_immoralities()

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

Returns

set

Return type

A set of all the immoralities in the model

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
...                         ('intel', 'SAT'), ('grade', 'letter')])
>>> student.get_immoralities()
{('diff','intel')}
get_independencies(latex=False)

Computes independencies in the DAG, by checking d-seperation.

Parameters

latex (boolean) – If latex=True then latex string of the independence assertion would be created.

Examples

>>> from pgmpy.base import DAG
>>> chain = DAG([('X', 'Y'), ('Y', 'Z')])
>>> chain.get_independencies()
(X _|_ Z | Y)
(Z _|_ X | Y)
get_inter_edges()[source]

Returns the inter-slice edges present in the 2-TBN.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D', 0), ('G', 0)), (('I', 0), ('G', 0)),
...                     (('G', 0), ('L', 0)), (('D', 0), ('D', 1)),
...                     (('I', 0), ('I', 1)), (('G', 0), ('G', 1)),
...                     (('G', 0), ('L', 1)), (('L', 0), ('L', 1))])
>>> dbn.get_inter_edges()
[(('D', 0), ('D', 1)),
 (('G', 0), ('G', 1)),
 (('G', 0), ('L', 1)),
 (('I', 0), ('I', 1)),
 (('L', 0), ('L', 1))]
get_interface_nodes(time_slice=0)[source]

Returns the nodes in the first timeslice whose children are present in the first timeslice.

Parameters

time_slice (int) – The timeslice should be a positive value greater than or equal to zero

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_nodes_from(['D', 'G', 'I', 'S', 'L'])
>>> dbn.add_edges_from([(('D',0),('G',0)),(('I',0),('G',0)),(('G',0),('L',0)),(('D',0),('D',1))])
>>> dbn.get_interface_nodes()
[('D', 0)]
get_intra_edges(time_slice=0)[source]

Returns the intra slice edges present in the 2-TBN.

Parameters

time_slice (int (whole number)) – The time slice for which to get intra edges. The timeslice should be a positive value or zero.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_nodes_from(['D', 'G', 'I', 'S', 'L'])
>>> dbn.add_edges_from([(('D', 0), ('G', 0)), (('I', 0), ('G', 0)),
...                     (('G', 0), ('L', 0)), (('D', 0), ('D', 1)),
...                     (('I', 0), ('I', 1)), (('G', 0), ('G', 1)),
...                     (('G', 0), ('L', 1)), (('L', 0), ('L', 1))])
>>> dbn.get_intra_edges()
[(('D', 0), ('G', 0)), (('G', 0), ('L', 0)), (('I', 0), ('G', 0))
get_leaves()

Returns a list of leaves of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D')])
>>> graph.get_leaves()
['C', 'D']
get_markov_blanket(node)

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns

list(blanket_nodes)

Return type

List of nodes contained in Markov Blanket

Parameters

node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

Examples

>>> from pgmpy.base import DAG
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = DAG([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'),
                       ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')])
>>> G.get_markov_blanket('y')
['s', 'w', 'x', 'u', 'z', 'v']
get_parents(node)

Returns a list of parents of node.

Throws an error if the node is not present in the graph.

Parameters

node (string, int or any hashable python object.) – The node whose parents would be returned.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')])
>>> G.get_parents(node='grade')
['diff', 'intel']
get_roots()

Returns a list of roots of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D'), ('E', 'B')])
>>> graph.get_roots()
['A', 'E']
get_slice_nodes(time_slice=0)[source]

Returns the nodes present in a particular timeslice

Parameters

time_slice (int) – The timeslice should be a positive value greater than or equal to zero

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN()
>>> dbn.add_nodes_from(['D', 'G', 'I', 'S', 'L'])
>>> dbn.add_edges_from([(('D', 0),('G', 0)),(('I', 0),('G', 0)),(('G', 0),('L', 0)),(('D', 0),('D', 1))])
>>> dbn.get_slice_nodes()
graph_attr_dict_factory

alias of builtins.dict

has_edge(u, v)

Returns True if the edge (u, v) is in the graph.

This is the same as v in G[u] without KeyError exceptions.

Parameters

v (u,) – Nodes can be, for example, strings or numbers. Nodes must be hashable (and not None) Python objects.

Returns

edge_ind – True if edge is in the graph, False otherwise.

Return type

bool

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.has_edge(0, 1)  # using two nodes
True
>>> e = (0, 1)
>>> G.has_edge(*e)  #  e is a 2-tuple (u, v)
True
>>> e = (0, 1, {'weight':7})
>>> G.has_edge(*e[:2])  # e is a 3-tuple (u, v, data_dictionary)
True

The following syntax are equivalent:

>>> G.has_edge(0, 1)
True
>>> 1 in G[0]  # though this gives KeyError if 0 not in G
True
has_node(n)

Returns True if the graph contains the node n.

Identical to n in G

Parameters

n (node) –

Examples

>>> G = nx.path_graph(3)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.has_node(0)
True

It is more readable and simpler to use

>>> 0 in G
True
has_predecessor(u, v)

Returns True if node u has predecessor v.

This is true if graph has the edge u<-v.

has_successor(u, v)

Returns True if node u has successor v.

This is true if graph has the edge u->v.

property in_degree

An InDegreeView for (node, in_degree) or in_degree for single node.

The node in_degree is the number of edges pointing to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iteration over (node, in_degree) as well as lookup for the degree for a single node.

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • weight (string or None, optional (default=None)) – The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

Returns

  • If a single node is requested

  • deg (int) – In-degree of the node

  • OR if multiple nodes are requested

  • nd_iter (iterator) – The iterator returns two-tuples of (node, in-degree).

See also

degree, out_degree

Examples

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.in_degree(0) # node 0 with degree 0
0
>>> list(G.in_degree([0, 1, 2]))
[(0, 0), (1, 1), (2, 1)]
in_degree_iter(nbunch=None, weight=None)
property in_edges

An InEdgeView of the Graph as G.in_edges or G.in_edges().

in_edges(self, nbunch=None, data=False, default=None):

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • data (string or bool, optional (default=False)) – The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).

  • default (value, optional (default=None)) – Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

Returns

in_edges – A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

Return type

InEdgeView

See also

edges

initialize_initial_state()[source]

This method will automatically re-adjust the cpds and the edges added to the bayesian network. If an edge that is added as an intra time slice edge in the 0th timeslice, this method will automatically add it in the 1st timeslice. It will also add the cpds. However, to call this method, one needs to add cpds as well as the edges in the bayesian network of the whole skeleton including the 0th and the 1st timeslice,.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = DBN()
>>> student.add_nodes_from(['D', 'G', 'I', 'S', 'L'])
>>> student.add_edges_from([(('D', 0),('G', 0)),(('I', 0),('G', 0)),(('D', 0),('D', 1)),(('I', 0),('I', 1))])
>>> grade_cpd = TabularCPD(('G', 0), 3, [[0.3, 0.05, 0.9, 0.5],
...                                      [0.4, 0.25, 0.08, 0.3],
...                                      [0.3, 0.7, 0.02, 0.2]],
...                        evidence=[('I', 0),('D', 0)],
...                        evidence_card=[2, 2])
>>> d_i_cpd = TabularCPD(('D', 1), 2, [[0.6, 0.3],
...                                    [0.4, 0.7]],
...                      evidence=[('D', 0)],
...                      evidence_card=2)
>>> diff_cpd = TabularCPD(('D', 0), 2, [[0.6, 0.4]])
>>> intel_cpd = TabularCPD(('I',0), 2, [[0.7, 0.3]])
>>> i_i_cpd = TabularCPD(('I', 1), 2, [[0.5, 0.4],
...                                    [0.5, 0.6]],
...                      evidence=[('I', 0)],
...                      evidence_card=2)
>>> student.add_cpds(grade_cpd, d_i_cpd, diff_cpd, intel_cpd, i_i_cpd)
>>> student.initialize_initial_state()
is_active_trail(start, end, observed=None)

Returns True if there is any active trail between start and end node :param start: :type start: Graph Node :param end: :type end: Graph Node :param observed: If given the active trail would be computed assuming these nodes to be observed. :type observed: List of nodes (optional) :param additional_observed: If given the active trail would be computed assuming these nodes to be observed along with

the nodes marked as observed in the model.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades', 'letter', 'sat'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades'), ('grades', 'letter'),
...                         ('intel', 'sat')])
>>> student.is_active_trail('diff', 'intel')
False
>>> student.is_active_trail('grades', 'sat')
True
is_directed()

Returns True if graph is directed, False otherwise.

is_iequivalent(model)

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Note: For same skeleton different names of nodes can work but for immoralities names of nodes must be same

Parameters

model (A DAG object, for which you want to check I-equivalence) –

Returns

boolean

Return type

True if both are I-equivalent, False otherwise

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_edges_from([('V', 'W'), ('W', 'X'),
...                   ('X', 'Y'), ('Z', 'Y')])
>>> G1 = DAG()
>>> G1.add_edges_from([('W', 'V'), ('X', 'W'),
...                    ('X', 'Y'), ('Z', 'Y')])
>>> G.is_iequivalent(G1)
True
is_multigraph()

Returns True if graph is a multigraph, False otherwise.

local_independencies(variables)

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters

variables (str or array like) – variables whose local independencies are to be found.

Examples

>>> from pgmpy.models import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
>>>                         ('grade', 'letter'), ('intel', 'SAT')])
>>> ind = student.local_independencies('grade')
>>> ind
(grade _|_ SAT | diff, intel)
moralize()[source]

Removes all the immoralities in the Network and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> dbn = DBN([(('D',0), ('G',0)), (('I',0), ('G',0))])
>>> moral_graph = dbn.moralize()
>>> moral_graph.edges()
[(('G', 0), ('I', 0)),
(('G', 0), ('D', 0)),
(('D', 1), ('I', 1)),
(('D', 1), ('G', 1)),
(('I', 0), ('D', 0)),
(('G', 1), ('I', 1))]
property name

String identifier of the graph.

This graph attribute appears in the attribute dict G.graph keyed by the string “name”. as well as an attribute (technically a property) G.name. This is entirely user controlled.

nbunch_iter(nbunch=None)

Returns an iterator over nodes contained in nbunch that are also in the graph.

The nodes in nbunch are checked for membership in the graph and if not are silently ignored.

Parameters

nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

Returns

niter – An iterator over nodes in nbunch that are also in the graph. If nbunch is None, iterate over all nodes in the graph.

Return type

iterator

Raises

NetworkXError – If nbunch is not a node or or sequence of nodes. If a node in nbunch is not hashable.

See also

Graph.__iter__()

Notes

When nbunch is an iterator, the returned iterator yields values directly from nbunch, becoming exhausted when nbunch is exhausted.

To test whether nbunch is a single node, one can use “if nbunch in self:”, even after processing with this routine.

If nbunch is not a node or a (possibly empty) sequence/iterator or None, a NetworkXError is raised. Also, if any object in nbunch is not hashable, a NetworkXError is raised.

neighbors(n)

Returns an iterator over successor nodes of n.

A successor of n is a node m such that there exists a directed edge from n to m.

Parameters

n (node) – A node in the graph

Raises

NetworkXError – If n is not in the graph.

See also

predecessors()

Notes

neighbors() and successors() are the same.

node_attr_dict_factory

alias of builtins.dict

node_dict_factory

alias of builtins.dict

property nodes

A NodeView of the Graph as G.nodes or G.nodes().

Can be used as G.nodes for data lookup and for set-like operations. Can also be used as G.nodes(data=’color’, default=None) to return a NodeDataView which reports specific node data but no set operations. It presents a dict-like interface as well with G.nodes.items() iterating over (node, nodedata) 2-tuples and G.nodes[3][‘foo’] providing the value of the foo attribute for node 3. In addition, a view G.nodes.data(‘foo’) provides a dict-like interface to the foo attribute of each node. G.nodes.data(‘foo’, default=1) provides a default for nodes that do not have attribute foo.

Parameters
  • data (string or bool, optional (default=False)) – The node attribute returned in 2-tuple (n, ddict[data]). If True, return entire node attribute dict as (n, ddict). If False, return just the nodes n.

  • default (value, optional (default=None)) – Value used for nodes that don’t have the requested attribute. Only relevant if data is not True or False.

Returns

Allows set-like operations over the nodes as well as node attribute dict lookup and calling to get a NodeDataView. A NodeDataView iterates over (n, data) and has no set operations. A NodeView iterates over n and includes set operations.

When called, if data is False, an iterator over nodes. Otherwise an iterator of 2-tuples (node, attribute value) where the attribute is specified in data. If data is True then the attribute becomes the entire data dictionary.

Return type

NodeView

Notes

If your node data is not needed, it is simpler and equivalent to use the expression for n in G, or list(G).

Examples

There are two simple ways of getting a list of all nodes in the graph:

>>> G = nx.path_graph(3)
>>> list(G.nodes)
[0, 1, 2]
>>> list(G)
[0, 1, 2]

To get the node data along with the nodes:

>>> G.add_node(1, time='5pm')
>>> G.nodes[0]['foo'] = 'bar'
>>> list(G.nodes(data=True))
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]
>>> list(G.nodes.data())
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]
>>> list(G.nodes(data='foo'))
[(0, 'bar'), (1, None), (2, None)]
>>> list(G.nodes.data('foo'))
[(0, 'bar'), (1, None), (2, None)]
>>> list(G.nodes(data='time'))
[(0, None), (1, '5pm'), (2, None)]
>>> list(G.nodes.data('time'))
[(0, None), (1, '5pm'), (2, None)]
>>> list(G.nodes(data='time', default='Not Available'))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]
>>> list(G.nodes.data('time', default='Not Available'))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]

If some of your nodes have an attribute and the rest are assumed to have a default attribute value you can create a dictionary from node/attribute pairs using the default keyword argument to guarantee the value is never None:

>>> G = nx.Graph()
>>> G.add_node(0)
>>> G.add_node(1, weight=2)
>>> G.add_node(2, weight=3)
>>> dict(G.nodes(data='weight', default=1))
{0: 1, 1: 2, 2: 3}
number_of_edges(u=None, v=None)

Returns the number of edges between two nodes.

Parameters

v (u,) – If u and v are specified, return the number of edges between u and v. Otherwise return the total number of all edges.

Returns

nedges – The number of edges in the graph. If nodes u and v are specified return the number of edges between those nodes. If the graph is directed, this only returns the number of edges from u to v.

Return type

int

See also

size()

Examples

For undirected graphs, this method counts the total number of edges in the graph:

>>> G = nx.path_graph(4)
>>> G.number_of_edges()
3

If you specify two nodes, this counts the total number of edges joining the two nodes:

>>> G.number_of_edges(0, 1)
1

For directed graphs, this method can count the total number of directed edges from u to v:

>>> G = nx.DiGraph()
>>> G.add_edge(0, 1)
>>> G.add_edge(1, 0)
>>> G.number_of_edges(0, 1)
1
number_of_nodes()

Returns the number of nodes in the graph.

Returns

nnodes – The number of nodes in the graph.

Return type

int

See also

order(), __len__()

Examples

>>> G = nx.path_graph(3)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.number_of_nodes()
3
order()

Returns the number of nodes in the graph.

Returns

nnodes – The number of nodes in the graph.

Return type

int

See also

number_of_nodes(), __len__()

Examples

>>> G = nx.path_graph(3)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.order()
3
property out_degree

An OutDegreeView for (node, out_degree)

The node out_degree is the number of edges pointing out of the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator over (node, out_degree) as well as lookup for the degree for a single node.

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • weight (string or None, optional (default=None)) – The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

Returns

  • If a single node is requested

  • deg (int) – Out-degree of the node

  • OR if multiple nodes are requested

  • nd_iter (iterator) – The iterator returns two-tuples of (node, out-degree).

See also

degree, in_degree

Examples

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.out_degree(0) # node 0 with degree 1
1
>>> list(G.out_degree([0, 1, 2]))
[(0, 1), (1, 1), (2, 1)]
out_degree_iter(nbunch=None, weight=None)
property out_edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

Parameters
  • nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.

  • data (string or bool, optional (default=False)) – The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).

  • default (value, optional (default=None)) – Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

Returns

edges – A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

Return type

OutEdgeView

See also

in_edges, out_edges

Notes

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

Examples

>>> G = nx.DiGraph()   # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data('weight', default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges incident to these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges incident to a single node (use G.adj[0]?)
OutEdgeDataView([(0, 1)])
property pred

Graph adjacency object holding the predecessors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.pred[2][3][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.pred behaves like a dict. Useful idioms include for nbr, datadict in G.pred[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.pred[node].data(‘foo’): A default can be set via a default argument to the data method.

predecessors(n)

Returns an iterator over predecessor nodes of n.

A predecessor of n is a node m such that there exists a directed edge from m to n.

Parameters

n (node) – A node in the graph

Raises

NetworkXError – If n is not in the graph.

See also

successors()

remove_cpds(*cpds)[source]

Removes the cpds that are provided in the argument.

Parameters

*cpds (list, set, tuple (array-like)) – List of CPDs which are to be associated with the model. Each CPD should be an instance of TabularCPD.

Examples

>>> from pgmpy.models import DynamicBayesianNetwork as DBN
>>> from pgmpy.factors.discrete import TabularCPD
>>> dbn = DBN()
>>> dbn.add_edges_from([(('D',0),('G',0)),(('I',0),('G',0)),(('D',0),('D',1)),(('I',0),('I',1))])
>>> grade_cpd =  TabularCPD(('G',0), 3, [[0.3,0.05,0.9,0.5],
...                                      [0.4,0.25,0.8,0.03],
...                                      [0.3,0.7,0.02,0.2]], [('I', 0),('D', 0)],[2,2])
>>> dbn.add_cpds(grade_cpd)
>>> dbn.get_cpds()
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x3348ab0>]
>>> dbn.remove_cpds(grade_cpd)
>>> dbn.get_cpds()
[]
remove_edge(u, v)

Remove the edge between u and v.

Parameters

v (u,) – Remove the edge between nodes u and v.

Raises

NetworkXError – If there is not an edge between u and v.

See also

remove_edges_from()

remove a collection of edges

Examples

>>> G = nx.Graph()   # or DiGraph, etc
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.remove_edge(0, 1)
>>> e = (1, 2)
>>> G.remove_edge(*e) # unpacks e from an edge tuple
>>> e = (2, 3, {'weight':7}) # an edge with attribute data
>>> G.remove_edge(*e[:2]) # select first part of edge tuple
remove_edges_from(ebunch)

Remove all edges specified in ebunch.

Parameters

ebunch (list or container of edge tuples) –

Each edge given in the list or container will be removed from the graph. The edges can be:

  • 2-tuples (u, v) edge between u and v.

  • 3-tuples (u, v, k) where k is ignored.

See also

remove_edge()

remove a single edge

Notes

Will fail silently if an edge in ebunch is not in the graph.

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> ebunch = [(1, 2), (2, 3)]
>>> G.remove_edges_from(ebunch)
remove_node(n)

Remove node n.

Removes the node n and all adjacent edges. Attempting to remove a non-existent node will raise an exception.

Parameters

n (node) – A node in the graph

Raises

NetworkXError – If n is not in the graph.

Examples

>>> G = nx.path_graph(3)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> list(G.edges)
[(0, 1), (1, 2)]
>>> G.remove_node(1)
>>> list(G.edges)
[]
remove_nodes_from(nodes)

Remove multiple nodes.

Parameters

nodes (iterable container) – A container of nodes (list, dict, set, etc.). If a node in the container is not in the graph it is silently ignored.

See also

remove_node()

Examples

>>> G = nx.path_graph(3)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> e = list(G.nodes)
>>> e
[0, 1, 2]
>>> G.remove_nodes_from(e)
>>> list(G.nodes)
[]
reverse(copy=True)

Returns the reverse of the graph.

The reverse is a graph with the same nodes and edges but with the directions of the edges reversed.

Parameters

copy (bool optional (default=True)) – If True, return a new DiGraph holding the reversed edges. If False, the reverse graph is created using a view of the original graph.

size(weight=None)

Returns the number of edges or total of all edge weights.

Parameters

weight (string or None, optional (default=None)) – The edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1.

Returns

size – The number of edges or (if weight keyword is provided) the total weight sum.

If weight is None, returns an int. Otherwise a float (or more general numeric if the weights are more general).

Return type

numeric

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.size()
3
>>> G = nx.Graph()   # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.add_edge('a', 'b', weight=2)
>>> G.add_edge('b', 'c', weight=4)
>>> G.size()
2
>>> G.size(weight='weight')
6.0
subgraph(nodes)

Returns a SubGraph view of the subgraph induced on nodes.

The induced subgraph of the graph contains the nodes in nodes and the edges between those nodes.

Parameters

nodes (list, iterable) – A container of nodes which will be iterated through once.

Returns

G – A subgraph view of the graph. The graph structure cannot be changed but node/edge attributes can and are shared with the original graph.

Return type

SubGraph View

Notes

The graph, edge and node attributes are shared with the original graph. Changes to the graph structure is ruled out by the view, but changes to attributes are reflected in the original graph.

To create a subgraph with its own copy of the edge/node attributes use: G.subgraph(nodes).copy()

For an inplace reduction of a graph to a subgraph you can remove nodes: G.remove_nodes_from([n for n in G if n not in set(nodes)])

Subgraph views are sometimes NOT what you want. In most cases where you want to do more than simply look at the induced edges, it makes more sense to just create the subgraph as its own graph with code like:

# Create a subgraph SG based on a (possibly multigraph) G
SG = G.__class__()
SG.add_nodes_from((n, G.nodes[n]) for n in largest_wcc)
if SG.is_multigraph:
    SG.add_edges_from((n, nbr, key, d)
        for n, nbrs in G.adj.items() if n in largest_wcc
        for nbr, keydict in nbrs.items() if nbr in largest_wcc
        for key, d in keydict.items())
else:
    SG.add_edges_from((n, nbr, d)
        for n, nbrs in G.adj.items() if n in largest_wcc
        for nbr, d in nbrs.items() if nbr in largest_wcc)
SG.graph.update(G.graph)

Examples

>>> G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> H = G.subgraph([0, 1, 2])
>>> list(H.edges)
[(0, 1), (1, 2)]
property succ

Graph adjacency object holding the successors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.succ[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.succ behaves like a dict. Useful idioms include for nbr, datadict in G.succ[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.succ[node].data(‘foo’): and a default can be set via a default argument to the data method.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj is identical to G.succ.

successors(n)

Returns an iterator over successor nodes of n.

A successor of n is a node m such that there exists a directed edge from n to m.

Parameters

n (node) – A node in the graph

Raises

NetworkXError – If n is not in the graph.

See also

predecessors()

Notes

neighbors() and successors() are the same.

to_directed(as_view=False)

Returns a directed representation of the graph.

Returns

G – A directed graph with the same name, same nodes, and with each edge (u, v, data) replaced by two directed edges (u, v, data) and (v, u, data).

Return type

DiGraph

Notes

This returns a “deepcopy” of the edge, node, and graph attributes which attempts to completely copy all of the data and references.

This is in contrast to the similar D=DiGraph(G) which returns a shallow copy of the data.

See the Python copy module for more information on shallow and deep copies, https://docs.python.org/2/library/copy.html.

Warning: If you have subclassed Graph to use dict-like objects in the data structure, those changes do not transfer to the DiGraph created by this method.

Examples

>>> G = nx.Graph()  # or MultiGraph, etc
>>> G.add_edge(0, 1)
>>> H = G.to_directed()
>>> list(H.edges)
[(0, 1), (1, 0)]

If already directed, return a (deep) copy

>>> G = nx.DiGraph()  # or MultiDiGraph, etc
>>> G.add_edge(0, 1)
>>> H = G.to_directed()
>>> list(H.edges)
[(0, 1)]
to_directed_class()

Returns the class to use for empty directed copies.

If you subclass the base classes, use this to designate what directed class to use for to_directed() copies.

to_undirected(reciprocal=False, as_view=False)

Returns an undirected representation of the digraph.

Parameters
  • reciprocal (bool (optional)) – If True only keep edges that appear in both directions in the original digraph.

  • as_view (bool (optional, default=False)) – If True return an undirected view of the original directed graph.

Returns

G – An undirected graph with the same name and nodes and with edge (u, v, data) if either (u, v, data) or (v, u, data) is in the digraph. If both edges exist in digraph and their edge data is different, only one edge is created with an arbitrary choice of which edge data to use. You must check and correct for this manually if desired.

Return type

Graph

See also

Graph(), copy(), add_edge(), add_edges_from()

Notes

If edges in both directions (u, v) and (v, u) exist in the graph, attributes for the new undirected edge will be a combination of the attributes of the directed edges. The edge data is updated in the (arbitrary) order that the edges are encountered. For more customized control of the edge attributes use add_edge().

This returns a “deepcopy” of the edge, node, and graph attributes which attempts to completely copy all of the data and references.

This is in contrast to the similar G=DiGraph(D) which returns a shallow copy of the data.

See the Python copy module for more information on shallow and deep copies, https://docs.python.org/2/library/copy.html.

Warning: If you have subclassed DiGraph to use dict-like objects in the data structure, those changes do not transfer to the Graph created by this method.

Examples

>>> G = nx.path_graph(2)   # or MultiGraph, etc
>>> H = G.to_directed()
>>> list(H.edges)
[(0, 1), (1, 0)]
>>> G2 = H.to_undirected()
>>> list(G2.edges)
[(0, 1)]
to_undirected_class()

Returns the class to use for empty undirected copies.

If you subclass the base classes, use this to designate what directed class to use for to_directed() copies.

update(edges=None, nodes=None)

Update the graph using nodes/edges/graphs as input.

Like dict.update, this method takes a graph as input, adding the graph’s nodes and edges to this graph. It can also take two inputs: edges and nodes. Finally it can take either edges or nodes. To specify only nodes the keyword nodes must be used.

The collections of edges and nodes are treated similarly to the add_edges_from/add_nodes_from methods. When iterated, they should yield 2-tuples (u, v) or 3-tuples (u, v, datadict).

Parameters
  • edges (Graph object, collection of edges, or None) – The first parameter can be a graph or some edges. If it has attributes nodes and edges, then it is taken to be a Graph-like object and those attributes are used as collections of nodes and edges to be added to the graph. If the first parameter does not have those attributes, it is treated as a collection of edges and added to the graph. If the first argument is None, no edges are added.

  • nodes (collection of nodes, or None) – The second parameter is treated as a collection of nodes to be added to the graph unless it is None. If edges is None and nodes is None an exception is raised. If the first parameter is a Graph, then nodes is ignored.

Examples

>>> G = nx.path_graph(5)
>>> G.update(nx.complete_graph(range(4,10)))
>>> from itertools import combinations
>>> edges = ((u, v, {'power': u * v})
...          for u, v in combinations(range(10, 20), 2)
...          if u * v < 225)
>>> nodes = [1000]  # for singleton, use a container
>>> G.update(edges, nodes)

Notes

It you want to update the graph using an adjacency structure it is straightforward to obtain the edges/nodes from adjacency. The following examples provide common cases, your adjacency may be slightly different and require tweaks of these examples.

>>> # dict-of-set/list/tuple
>>> adj = {1: {2, 3}, 2: {1, 3}, 3: {1, 2}}
>>> e = [(u, v) for u, nbrs in adj.items() for v in  nbrs]
>>> G.update(edges=e, nodes=adj)
>>> DG = nx.DiGraph()
>>> # dict-of-dict-of-attribute
>>> adj = {1: {2: 1.3, 3: 0.7}, 2: {1: 1.4}, 3: {1: 0.7}}
>>> e = [(u, v, {'weight': d}) for u, nbrs in adj.items()
...      for v, d in nbrs.items()]
>>> DG.update(edges=e, nodes=adj)
>>> # dict-of-dict-of-dict
>>> adj = {1: {2: {'weight': 1.3}, 3: {'color': 0.7, 'weight':1.2}}}
>>> e = [(u, v, {'weight': d}) for u, nbrs in adj.items()
...      for v, d in nbrs.items()]
>>> DG.update(edges=e, nodes=adj)
>>> # predecessor adjacency (dict-of-set)
>>> pred = {1: {2, 3}, 2: {3}, 3: {3}}
>>> e = [(v, u) for u, nbrs in pred.items() for v in nbrs]
>>> # MultiGraph dict-of-dict-of-dict-of-attribute
>>> MDG = nx.MultiDiGraph()
>>> adj = {1: {2: {0: {'weight': 1.3}, 1: {'weight': 1.2}}},
...        3: {2: {0: {'weight': 0.7}}}}
>>> e = [(u, v, ekey, d) for u, nbrs in adj.items()
...      for v, keydict in nbrs.items()
...      for ekey, d in keydict.items()]
>>> MDG.update(edges=e)

See also

add_edges_from()

add multiple edges to a graph

add_nodes_from()

add multiple nodes to a graph

Structural Equation Models

class pgmpy.models.SEM.SEM(syntax, **kwargs)[source]

Class for representing Structural Equation Models. This class is a wrapper over SEMGraph and SEMAlg to provide a consistent API over the different representations.

model

A graphical representation of the model.

Type

SEMGraph instance

fit()[source]
classmethod from_RAM(variables, B, zeta, observed=None, wedge_y=None, fixed_values=None)[source]

Initializes a SEM instance using Reticular Action Model(RAM) notation. The model is defined as: ..math:

\mathbf{\eta} = \mathbf{B \eta} + \mathbf{\epsilon} \\
\mathbf{\y} = \wedge_y \mathbf{\eta}
\zeta = COV(\mathbf{\epsilon})

where \mathbf{\eta} is the set of variables (both latent and observed), \mathbf{\epsilon} are the error terms, \mathbf{y} is the set of observed variables, \wedge_y is a boolean array of the shape (no of observed variables, no of total variables).

Parameters
  • variables (list, array-like) – List of variables (both latent and observed) in the model.

  • B (2-D boolean array (shape: len(variables) x len(variables))) – The non-zero parameters in B matrix. Refer model definition in docstring for details.

  • zeta (2-D boolean array (shape: len(variables) x len(variables))) – The non-zero parameters in \zeta (error covariance) matrix. Refer model definition in docstring for details.

  • observed (list, array-like (optional: Either observed or wedge_y needs to be specified)) – List of observed variables in the model.

  • wedge_y (2-D array (shape: no. observed x total vars) (optional: Either observed or wedge_y)) – The \wedge_y matrix. Refer model definition in docstring for details.

  • fixed_values (dict (optional)) – If specified, fixes the parameter values and are not changed during estimation. A dict with the keys B, zeta.

Returns

pgmpy.models.SEM instance

Return type

An instance of the object with initialized values.

Examples

>>> from pgmpy.models import SEM
>>> SEM.from_RAM(TODO: Finish this)
classmethod from_graph(ebunch, latents=[], err_corr=[], err_var={})[source]

Initializes a SEM instance using graphical structure.

Parameters
  • ebunch (list/array-like) –

    List of edges in form of tuples. Each tuple can be of two possible shape:
    1. (u, v): This would add an edge from u to v without setting any parameter

      for the edge.

    2. (u, v, parameter): This would add an edge from u to v and set the edge’s

      parameter to parameter.

  • latents (list/array-like) – List of nodes which are latent. All other variables are considered observed.

  • err_corr (list/array-like) –

    List of tuples representing edges between error terms. It can be of the following forms:
    1. (u, v): Add correlation between error terms of u and v. Doesn’t set any variance or

      covariance values.

    2. (u, v, covar): Adds correlation between the error terms of u and v and sets the

      parameter to covar.

  • err_var (dict) – Dict of the form (var: variance).

Examples

Defining a model (Union sentiment model[1]) without setting any paramaters. >>> from pgmpy.models import SEM >>> sem = SEM.from_graph(ebunch=[(‘deferenc’, ‘unionsen’), (‘laboract’, ‘unionsen’), … (‘yrsmill’, ‘unionsen’), (‘age’, ‘deferenc’), … (‘age’, ‘laboract’), (‘deferenc’, ‘laboract’)], … latents=[], … err_corr=[(‘yrsmill’, ‘age’)], … err_var={})

Defining a model (Education [2]) with all the parameters set. For not setting any parameter np.NaN can be explicitly passed. >>> sem_edu = SEM.from_graph(ebunch=[(‘intelligence’, ‘academic’, 0.8), (‘intelligence’, ‘scale_1’, 0.7), … (‘intelligence’, ‘scale_2’, 0.64), (‘intelligence’, ‘scale_3’, 0.73), … (‘intelligence’, ‘scale_4’, 0.82), (‘academic’, ‘SAT_score’, 0.98), … (‘academic’, ‘High_school_gpa’, 0.75), (‘academic’, ‘ACT_score’, 0.87)], … latents=[‘intelligence’, ‘academic’], … err_corr=[] … err_var={})

References

[1] McDonald, A, J., & Clelland, D. A. (1984). Textile Workers and Union Sentiment.

Social Forces, 63(2), 502–521

[2] https://en.wikipedia.org/wiki/Structural_equation_modeling#/

media/File:Example_Structural_equation_model.svg

classmethod from_lavaan(string=None, filename=None)[source]

Initializes a SEM instance using lavaan syntax.

Parameters
  • string (str (default: None)) – A lavaan style multiline set of regression equation representing the model. Refer http://lavaan.ugent.be/tutorial/syntax1.html for details.

  • filename (str (default: None)) – The filename of the file containing the model in lavaan syntax.

Examples

classmethod from_lisrel(var_names, params, fixed_masks=None)[source]

Initializes a SEM instance using LISREL notation. The LISREL notation is defined as: ..math:

\mathbf{\eta} = \mathbf{B \eta} + \mathbf{\Gamma \xi} + mathbf{\zeta} \\
\mathbf{y} = \mathbf{\wedge_y \eta} + \mathbf{\epsilon} \\
\mathbf{x} = \mathbf{\wedge_x \xi} + \mathbf{\delta}

where \mathbf{\eta} is the set of endogenous variables, \mathbf{\xi} is the set of exogeneous variables, \mathbf{y} and \mathbf{x} are the set of measurement variables for \mathbf{\eta} and \mathbf{\xi} respectively. \mathbf{\zeta}, \mathbf{\epsilon}, and \mathbf{\delta} are the error terms for \mathbf{\eta}, \mathbf{y}, and \mathbf{x} respectively.

Parameters
  • str_model (str (default: None)) –

    A lavaan style multiline set of regression equation representing the model. Refer http://lavaan.ugent.be/tutorial/syntax1.html for details.

    If None requires var_names and params to be specified.

  • var_names (dict (default: None)) – A dict with the keys: eta, xi, y, and x. Each keys should have a list as the value with the name of variables.

  • params (dict (default: None)) –

    A dict of LISREL representation non-zero parameters. Must contain the following keys: B, gamma, wedge_y, wedge_x, phi, theta_e, theta_del, and psi.

    If None str_model must be specified.

  • fixed_params (dict (default: None)) –

    A dict of fixed values for parameters. The shape of the parameters should be same as params.

    If None all the parameters are learnable.

Returns

pgmpy.models.SEM instance

Return type

An instance of the object with initalized values.

Examples

>>> from pgmpy.models import SEMAlg
# TODO: Finish this example
class pgmpy.models.SEM.SEMAlg(eta=None, B=None, zeta=None, wedge_y=None, fixed_values=None)[source]

Base class for algebraic representation of Structural Equation Models(SEMs). The model is represented using the Reticular Action Model (RAM).

generate_samples(n_samples=100)[source]

Generates random samples from the model.

Parameters

n_samples (int) – The number of samples to generate.

Returns

pd.DataFrame

Return type

The genrated samples.

set_params(B, zeta)[source]

Sets the fixed parameters of the model.

Parameters
  • B (2D array) – The B matrix.

  • zeta (2D array) – The covariance matrix.

to_SEMGraph()[source]

Creates a graph structure from the LISREL representation.

Returns

pgmpy.models.SEMGraph instance

Return type

A path model of the model.

Examples

>>> from pgmpy.models import SEMAlg
>>> model = SEMAlg()
# TODO: Finish this example
class pgmpy.models.SEM.SEMGraph(ebunch=[], latents=[], err_corr=[], err_var={})[source]

Base class for graphical representation of Structural Equation Models(SEMs).

All variables are by default assumed to have an associated error latent variable, therefore doesn’t need to be specified.

latents

List of all the latent variables in the model except the error terms.

Type

list

observed

List of all the observed variables in the model.

Type

list

graph

The graphical structure of the latent and observed variables except the error terms. The parameteers are stored in the weight attribute of each edge.

Type

nx.DirectedGraph

err_graph

An undirected graph representing the relations between the error terms of the model. The node of the graph has the same name as the variable but represents the error terms. The variance is stored in the weight attribute of the node and the covariance is stored in the weight attribute of the edge.

Type

nx.Graph

full_graph_struct

Represents the full graph structure. The names of error terms starts with . and new nodes are added for each correlation which starts with ...

Type

nx.DiGraph

active_trail_nodes(variables, observed=[], avoid_nodes=[], struct='full')[source]

Finds all the observed variables which are d-connected to variables in the graph_struct when observed variables are observed.

Parameters
  • variables (str or array like) – Observed variables whose d-connected variables are to be found.

  • observed (list/array-like) – If given the active trails would be computed assuming these nodes to be observed.

  • avoid_nodes (list/array-like) – If specificed, the algorithm doesn’t account for paths that have influence flowing through the avoid node.

  • struct (str or nx.DiGraph instance) – If “full”, considers correlation between error terms for computing d-connection. If “non_error”, doesn’t condised error correlations for computing d-connection. If instance of nx.DiGraph, finds d-connected variables on the given graph.

Examples

>>> from pgmpy.models import SEM
>>> model = SEMGraph(ebunch=[('yrsmill', 'unionsen'), ('age', 'laboract'),
...                          ('age', 'deferenc'), ('deferenc', 'laboract'),
...                          ('deferenc', 'unionsen'), ('laboract', 'unionsen')],
...                  latents=[],
...                  err_corr=[('yrsmill', 'age')])
>>> model.active_trail_nodes('age')
Returns

dict – Returns a dict with variables as the key and a list of d-connected variables as the value.

Return type

{str: list}

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

get_conditional_ivs(X, Y, scaling_indicators={})[source]

Returns the conditional IVs for the relation X -> Y

Parameters
  • X (node) – The observed variable’s name

  • Y (node) – The oberved variable’s name

  • scaling_indicators (dict (optional)) – A dict representing which observed variable to use as scaling indicator for the latent variables. If not provided, automatically finds scaling indicators by randomly selecting one of the measurement variables of each latent variable.

Returns

set

Return type

Set of 2-tuples representing tuple[0] is an IV for X -> Y given tuple[1]

References

1

Van Der Zander, B., Textor, J., & Liskiewicz, M. (2015, June). Efficiently finding conditional instruments for causal inference. In Twenty-Fourth International Joint Conference on Artificial Intelligence.

Examples

>>> from pgmpy.models import SEMGraph
>>> model = SEMGraph(ebunch=[('I', 'X'), ('X', 'Y'), ('W', 'I')],
...                  latents=[],
...                  err_corr=['W', 'Y'])
>>> model.get_ivs('X', 'Y')
[('I', {'W'})]
get_ivs(X, Y, scaling_indicators={})[source]

Returns the Instrumental variables(IVs) for the relation X -> Y

Parameters
  • X (node) – The variable name (observed or latent)

  • Y (node) – The variable name (observed or latent)

  • scaling_indicators (dict (optional)) – A dict representing which observed variable to use as scaling indicator for the latent variables. If not given the method automatically selects one of the measurement variables at random as the scaling indicator.

Returns

set – The set of Instrumental Variables for X -> Y.

Return type

{str}

Examples

>>> from pgmpy.models import SEMGraph
>>> model = SEMGraph(ebunch=[('I', 'X'), ('X', 'Y')],
...                  latents=[],
...                  err_corr=['X', 'Y'])
>>> model.get_ivs('X', 'Y')
{'I'}
get_scaling_indicators()[source]

Returns a scaling indicator for each of the latent variables in the model. The scaling indicator is chosen randomly among the observed measurement variables of the latent variable.

Examples

>>> from pgmpy.models import SEMGraph
>>> model = SEMGraph(ebunch=[('xi1', 'eta1'), ('xi1', 'x1'), ('xi1', 'x2'),
...                          ('eta1', 'y1'), ('eta1', 'y2')],
...                  latents=['xi1', 'eta1'])
>>> model.get_scaling_indicators()
{'xi1': 'x1', 'eta1': 'y1'}
Returns

dict – scaling indicator.

Return type

Returns a dict with latent variables as the key and their value being the

moralize(graph='full')[source]

TODO: This needs to go to a parent class. Removes all the immoralities in the DirectedGraph and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

Parameters

graph

Examples

to_lisrel()[source]

Converts the model from a graphical representation to an equivalent algebraic representation. This converts the model into a Reticular Action Model (RAM) model representation which is implemented by pgmpy.models.SEMAlg class.

Returns

SEMAlg instance

Return type

Instance of SEMAlg representing the model.

Examples

>>> from pgmpy.models import SEM
>>> sem = SEM.from_graph(ebunch=[('deferenc', 'unionsen'), ('laboract', 'unionsen'),
...                              ('yrsmill', 'unionsen'), ('age', 'deferenc'),
...                              ('age', 'laboract'), ('deferenc', 'laboract')],
...                      latents=[],
...                      err_corr=[('yrsmill', 'age')],
...                      err_var={})
>>> sem.to_lisrel()
# TODO: Complete this.

See also

to_standard_lisrel()

Converts to the standard lisrel format and returns the parameters.

to_standard_lisrel()[source]

Transforms the model to the standard LISREL representation of latent and measurement equations. The standard LISREL representation is given as:

..math::

mathbf{eta} = mathbf{B eta} + mathbf{Gamma xi} + mathbf{zeta} \ mathbf{y} = mathbf{wedge_y eta} + mathbf{epsilon} \ mathbf{x} = mathbf{wedge_x xi} + mathbf{delta} \ mathbf{Theta_e} = COV(mathbf{epsilon}) \ mathbf{Theta_delta} = COV(mathbf{delta}) \ mathbf{Psi} = COV(mathbf{eta}) \ mathbf{Phi} = COV(mathbf{xi}) \

Since the standard LISREL representation has restrictions on the types of model, this method adds extra latent variables with fixed loadings of 1 to make the model consistent with the restrictions.

Returns

  • var_names (dict (keys: eta, xi, y, x)) – Returns the variable names in \mathbf{\eta}, \mathbf{\xi}, \mathbf{y}, \mathbf{x}.

  • params (dict (keys: B, gamma, wedge_y, wedge_x, theta_e, theta_del, phi, psi)) – Returns a boolean matrix for each of the parameters. A 1 in the matrix represents that there is an edge in the model, 0 represents there is no edge.

  • fixed_values (dict (keys: B, gamma, wedge_y, wedge_x, theta_e, theta_del, phi, psi)) – Returns a matrix for each of the parameters. A value in the matrix represents the set value for the parameter in the model else it is 0.

See also

to_lisrel()

Converts the model to pgmpy.models.SEMAlg instance.

Examples

TODO: Finish this.