Models

Bayesian Model

class pgmpy.models.BayesianModel.BayesianModel(ebunch=None)[source]

Base class for bayesian model.

A models stores nodes and edges with conditional probability distribution (cpd) and other attributes.

models hold directed edges. Self loops are not allowed neither multiple (parallel) edges.

Nodes should be strings.

Edges are represented as links between nodes.

data
: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Create an empty bayesian model with no nodes and no edges.

>>> from pgmpy.models import BayesianModel
>>> G = BayesianModel()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
active_trail_nodes(start, observed=None)[source]

Returns all the nodes reachable from start via an active trail.

start: Graph node

observed
: List of nodes (optional)
If given the active trail would be computed assuming these nodes to be observed.
>>> from pgmpy.models import BayesianModel
>>> student = BayesianModel()
>>> student.add_nodes_from(['diff', 'intel', 'grades'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades')])
>>> student.active_trail_nodes('diff')
{'diff', 'grade'}
>>> student.active_trail_nodes('diff', observed='grades')
{'diff', 'intel'}

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

add_cpds(*cpds)[source]

Add CPD (Conditional Probability Distribution) to the Bayesian Model.

cpds
: list, set, tuple (array-like)
List of CPDs which will be associated with the model
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete.CPD import TabularCPD
>>> student = BayesianModel([('diff', 'grades'), ('intel', 'grades')])
>>> grades_cpd = TabularCPD('grades', 3, [[0.1,0.1,0.1,0.1,0.1,0.1],
...                                       [0.1,0.1,0.1,0.1,0.1,0.1],
...                                       [0.8,0.8,0.8,0.8,0.8,0.8]],
...                         evidence=['diff', 'intel'], evidence_card=[2, 3])
>>> student.add_cpds(grades_cpd)
diff: easy hard
intel: dumb avg smart dumb avg smart
gradeA 0.1 0.1 0.1 0.1 0.1 0.1
gradeB 0.1 0.1 0.1 0.1 0.1 0.1
gradeC 0.8 0.8 0.8 0.8 0.8 0.8
add_edge(u, v, **kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

u,v
: nodes
Nodes can be any hashable python object.
>>> from pgmpy.models import BayesianModel/home/abinash/software_packages/numpy-1.7.1
>>> G = BayesianModel()
>>> G.add_nodes_from(['grade', 'intel'])
>>> G.add_edge('grade', 'intel')
check_model()[source]

Check the model for various errors. This method checks for the following errors.

  • Checks if the sum of the probabilities for each state is equal to 1 (tol=0.01).
  • Checks if the CPDs associated with nodes are consistent with their parents.
check: boolean
True if all the checks are passed
fit(data, estimator_type=None, state_names=[], complete_samples_only=True, **kwargs)[source]

Estimates the CPD for each variable based on a given data set.

data: pandas DataFrame object
DataFrame object with column names identical to the variable names of the network. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)
estimator: Estimator class

One of: - MaximumLikelihoodEstimator (default) - BayesianEstimator: In this case, pass ‘prior_type’ and either ‘pseudo_counts’

or ‘equivalent_sample_size’ as additional keyword arguments. See BayesianEstimator.get_parameters() for usage.
state_names: dict (optional)
A dict indicating, for each variable, the discrete set of states that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.
complete_samples_only: bool (default True)
Specifies how to deal with missing data, if present. If set to True all rows that contain np.Nan somewhere are ignored. If False then, for each variable, every row where neither the variable nor its parents are np.NaN is used.
>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = BayesianModel([('A', 'C'), ('B', 'C')])
>>> model.fit(data)
>>> model.get_cpds()
[<TabularCPD representing P(A:2) at 0x7fb98a7d50f0>,
<TabularCPD representing P(B:2) at 0x7fb98a7d5588>,
<TabularCPD representing P(C:2 | A:2, B:2) at 0x7fb98a7b1f98>]
get_cardinality(node)[source]

Returns the cardinality of the node. Throws an error if the CPD for the queried node hasn’t been added to the network.

node: Any hashable python object.

int: The cardinality of the node.

get_cpds(node=None)[source]

Returns the cpd of the node. If node is not specified returns all the CPDs that have been added till now to the graph

node: any hashable python object (optional)
The node whose CPD we want. If node not specified returns all the CPDs added to the model.

A list of TabularCPDs.

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7],
...                               [0.9, 0.1, 0.8, 0.3]],
...                  ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd)
>>> student.get_cpds()
get_immoralities()[source]

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

set: A set of all the immoralities in the model

>>> from pgmpy.models import BayesianModel
>>> student = BayesianModel()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
...                         ('intel', 'SAT'), ('grade', 'letter')])
>>> student.get_immoralities()
{('diff','intel')}
get_independencies(latex=False)[source]

Computes independencies in the Bayesian Network, by checking d-seperation.

latex: boolean
If latex=True then latex string of the independence assertion would be created.
>>> from pgmpy.models import BayesianModel
>>> chain = BayesianModel([('X', 'Y'), ('Y', 'Z')])
>>> chain.get_independencies()
(X _|_ Z | Y)
(Z _|_ X | Y)
is_active_trail(start, end, observed=None)[source]

Returns True if there is any active trail between start and end node

start : Graph Node

end : Graph Node

observed
: List of nodes (optional)
If given the active trail would be computed assuming these nodes to be observed.
additional_observed
: List of nodes (optional)
If given the active trail would be computed assuming these nodes to be observed along with the nodes marked as observed in the model.
>>> from pgmpy.models import BayesianModel
>>> student = BayesianModel()
>>> student.add_nodes_from(['diff', 'intel', 'grades', 'letter', 'sat'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades'), ('grades', 'letter'),
...                         ('intel', 'sat')])
>>> student.is_active_trail('diff', 'intel')
False
>>> student.is_active_trail('grades', 'sat')
True
is_iequivalent(model)[source]

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Note: For same skeleton different names of nodes can work but for immoralities names of nodes must be same

model : A Bayesian model object, for which you want to check I-equivalence

boolean : True if both are I-equivalent, False otherwise

>>> from pgmpy.models import BayesianModel
>>> G = BayesianModel()
>>> G.add_edges_from([('V', 'W'), ('W', 'X'),
...                   ('X', 'Y'), ('Z', 'Y')])
>>> G1 = BayesianModel()
>>> G1.add_edges_from([('W', 'V'), ('X', 'W'),
...                    ('X', 'Y'), ('Z', 'Y')])
>>> G.is_iequivalent(G1)
True
is_imap(JPD)[source]

Checks whether the bayesian model is Imap of given JointProbabilityDistribution

JPD
: An instance of JointProbabilityDistribution Class, for which you want to
check the Imap
boolean
: True if bayesian model is Imap for given Joint Probability Distribution
False otherwise
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.factors.discrete import JointProbabilityDistribution
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]])
>>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]])
>>> grade_cpd = TabularCPD('grade', 3,
...                        [[0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.8,0.8,0.8,0.8,0.8,0.8]],
...                        evidence=['diff', 'intel'],
...                        evidence_card=[2, 3])
>>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd)
>>> val = [0.01, 0.01, 0.08, 0.006, 0.006, 0.048, 0.004, 0.004, 0.032,
           0.04, 0.04, 0.32, 0.024, 0.024, 0.192, 0.016, 0.016, 0.128]
>>> JPD = JointProbabilityDistribution(['diff', 'intel', 'grade'], [2, 3, 3], val)
>>> G.is_imap(JPD)
True
local_independencies(variables)[source]

Returns a independencies object containing the local independencies of each of the variables.

variables: str or array like
variables whose local independencies are to be found.
>>> from pgmpy.models import BayesianModel
>>> student = BayesianModel()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
>>>                         ('grade', 'letter'), ('intel', 'SAT')])
>>> ind = student.local_independencies('grade')
>>> ind.event1
{'grade'}
>>> ind.event2
{'SAT'}
>>> ind.event3
{'diff', 'intel'}
predict(data)[source]

Predicts states of all the missing variables.

data
: pandas DataFrame object
A DataFrame object with column names same as the variables in the model.
>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> train_data = values[:800]
>>> predict_data = values[800:]
>>> model = BayesianModel([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')])
>>> model.fit(values)
>>> predict_data = predict_data.copy()
>>> predict_data.drop('E', axis=1, inplace=True)
>>> y_pred = model.predict(predict_data)
>>> y_pred
array([0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1,
       1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 0, 0, 1, 0])
remove_cpds(*cpds)[source]

Removes the cpds that are provided in the argument.

*cpds: TabularCPD object
A CPD object on any subset of the variables of the model which is to be associated with the model.
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd = TabularCPD('grade', 2, [[0.1, 0.9, 0.2, 0.7],
...                               [0.9, 0.1, 0.8, 0.3]],
...                  ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd)
>>> student.remove_cpds(cpd)
to_junction_tree()[source]

Creates a junction tree (or clique tree) for a given bayesian model.

For converting a Bayesian Model into a Clique tree, first it is converted into a Markov one.

For a given markov model (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of the edge to other.

>>> from pgmpy.models import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade'),
...                    ('intel', 'SAT'), ('grade', 'letter')])
>>> diff_cpd = TabularCPD('diff', 2, [[0.2], [0.8]])
>>> intel_cpd = TabularCPD('intel', 3, [[0.5], [0.3], [0.2]])
>>> grade_cpd = TabularCPD('grade', 3,
...                        [[0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.1,0.1,0.1,0.1,0.1,0.1],
...                         [0.8,0.8,0.8,0.8,0.8,0.8]],
...                        evidence=['diff', 'intel'],
...                        evidence_card=[2, 3])
>>> sat_cpd = TabularCPD('SAT', 2,
...                      [[0.1, 0.2, 0.7],
...                       [0.9, 0.8, 0.3]],
...                      evidence=['intel'], evidence_card=[3])
>>> letter_cpd = TabularCPD('letter', 2,
...                         [[0.1, 0.4, 0.8],
...                          [0.9, 0.6, 0.2]],
...                         evidence=['grade'], evidence_card=[3])
>>> G.add_cpds(diff_cpd, intel_cpd, grade_cpd, sat_cpd, letter_cpd)
>>> jt = G.to_junction_tree()
to_markov_model()[source]

Converts bayesian model to markov model. The markov model created would be the moral graph of the bayesian model.

>>> from pgmpy.models import BayesianModel
>>> G = BayesianModel([('diff', 'grade'), ('intel', 'grade'),
...                    ('intel', 'SAT'), ('grade', 'letter')])
>>> mm = G.to_markov_model()
>>> mm.nodes()
['diff', 'grade', 'intel', 'SAT', 'letter']
>>> mm.edges()
[('diff', 'intel'), ('diff', 'grade'), ('intel', 'grade'),
('intel', 'SAT'), ('grade', 'letter')]

Markov Model

class pgmpy.models.MarkovModel.MarkovModel(ebunch=None)[source]

Base class for markov model.

A MarkovModel stores nodes and edges with potentials

MarkovModel holds undirected edges.

data
: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Create an empty Markov Model with no nodes and no edges.

>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3

add_node(‘node1’) add_nodes_from([‘node1’, ‘node2’, ...]) add_edge(‘node1’, ‘node2’) add_edges_from([(‘node1’, ‘node2’),(‘node3’, ‘node4’)])

add_edge(u, v, **kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

u,v
: nodes
Nodes can be any hashable Python object.
>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()
>>> G.add_nodes_from(['Alice', 'Bob', 'Charles'])
>>> G.add_edge('Alice', 'Bob')
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values

*factor: pgmpy.factors.factors object
A factor object on any subset of the variables of the model which is to be associated with the model.

None

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles'),
...                        ('Charles', 'Debbie'), ('Debbie', 'Alice')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[3, 2],
...                 values=np.random.rand(6))
>>> student.add_factors(factor)
check_model()[source]

Check the model for various errors. This method checks for the following errors -

  • Checks if the cardinalities of all the variables are consistent across all the factors.
  • Factors are defined for all the random variables.
check: boolean
True if all the checks are passed
copy()[source]

Returns a copy of this Markov Model.

MarkovModel: Copy of this Markov model.

>>> from pgmpy.factors.discrete import DiscreteFactor
>>> from pgmpy.models import MarkovModel
>>> G = MarkovModel()
>>> G.add_nodes_from([('a', 'b'), ('b', 'c')])
>>> G.add_edge(('a', 'b'), ('b', 'c'))
>>> G_copy = G.copy()
>>> G_copy.edges()
[(('a', 'b'), ('b', 'c'))]
>>> G_copy.nodes()
[('a', 'b'), ('b', 'c')]
>>> factor = DiscreteFactor([('a', 'b')], cardinality=[3],
...                 values=np.random.rand(3))
>>> G.add_factors(factor)
>>> G.get_factors()
[<DiscreteFactor representing phi(('a', 'b'):3) at 0x...>]
>>> G_copy.get_factors()
[]
get_cardinality(check_cardinality=False)[source]

Returns a dictionary with the given factors as keys and their respective cardinality as values.

check_cardinality: boolean, optional
If, check_cardinality=True it checks if cardinality information for all the variables is availble or not. If not it raises an error.
>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.get_cardinality()
defaultdict(<class 'int'>, {'Bob': 2, 'Alice': 2})
get_factors()[source]

Returns the factors that have been added till now to the graph

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.get_factors()
get_local_independencies(latex=False)[source]

Returns all the local independencies present in the markov model.

Local independencies are the independence assertion in the form of .. math:: {X perp W - {X} - MB(X) | MB(X)} where MB is the markov blanket of all the random variables in X

latex: boolean
If latex=True then latex string of the indepedence assertion would be created
>>> from pgmpy.models import MarkovModel
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> mm.get_local_independecies()
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = MarkovModel()
>>> G.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> G.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                   ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                   ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in G.edges()]
>>> G.add_factors(*phi)
>>> G.get_partition_function()
markov_blanket(node)[source]

Returns a markov blanket for a random variable.

Markov blanket is the neighboring nodes of the given node.

>>> from pgmpy.models import MarkovModel
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> mm.markov_blanket('x1')
remove_factors(*factors)[source]

Removes the given factors from the added factors.

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.remove_factors(factor)
to_bayesian_model()[source]

Creates a Bayesian Model which is a minimum I-Map for this markov model.

The ordering of parents may not remain constant. It would depend on the ordering of variable in the junction tree (which is not constant) all the time.

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in mm.edges()]
>>> mm.add_factors(*phi)
>>> bm = mm.to_bayesian_model()
to_factor_graph()[source]

Converts the markov model into factor graph.

A factor graph contains two types of nodes. One type corresponds to random variables whereas the second type corresponds to factors over these variables. The graph only contains edges between variables and factor nodes. Each factor node is associated with one factor whose scope is the set of variables that are its neighbors.

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = MarkovModel([('Alice', 'Bob'), ('Bob', 'Charles')])
>>> factor1 = DiscreteFactor(['Alice', 'Bob'], [3, 2], np.random.rand(6))
>>> factor2 = DiscreteFactor(['Bob', 'Charles'], [2, 2], np.random.rand(4))
>>> student.add_factors(factor1, factor2)
>>> factor_graph = student.to_factor_graph()
to_junction_tree()[source]

Creates a junction tree (or clique tree) for a given markov model.

For a given markov model (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of the edge to other.

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> mm = MarkovModel()
>>> mm.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> mm.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                    ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                    ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in mm.edges()]
>>> mm.add_factors(*phi)
>>> junction_tree = mm.to_junction_tree()
triangulate(heuristic='H6', order=None, inplace=False)[source]

Triangulate the graph.

If order of deletion is given heuristic algorithm will not be used.

heuristic: H1 | H2 | H3 | H4 | H5 | H6

The heuristic algorithm to use to decide the deletion order of the variables to compute the triangulated graph. Let X be the set of variables and X(i) denotes the i-th variable.

  • S(i) - The size of the clique created by deleting the variable.
  • E(i) - Cardinality of variable X(i).
  • M(i) - Maximum size of cliques given by X(i) and its adjacent nodes.
  • C(i) - Sum of size of cliques given by X(i) and its adjacent nodes.

The heuristic algorithm decide the deletion order if this way:

  • H1 - Delete the variable with minimal S(i).
  • H2 - Delete the variable with minimal S(i)/E(i).
  • H3 - Delete the variable with minimal S(i) - M(i).
  • H4 - Delete the variable with minimal S(i) - C(i).
  • H5 - Delete the variable with minimal S(i)/M(i).
  • H6 - Delete the variable with minimal S(i)/C(i).
order: list, tuple (array-like)
The order of deletion of the variables to compute the triagulated graph. If order is given heuristic algorithm will not be used.
inplace: True | False
if inplace is true then adds the edges to the object from which it is called else returns a new object.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.3607

>>> from pgmpy.models import MarkovModel
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = MarkovModel()
>>> G.add_nodes_from(['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'])
>>> G.add_edges_from([('x1', 'x3'), ('x1', 'x4'), ('x2', 'x4'),
...                   ('x2', 'x5'), ('x3', 'x6'), ('x4', 'x6'),
...                   ('x4', 'x7'), ('x5', 'x7')])
>>> phi = [DiscreteFactor(edge, [2, 2], np.random.rand(4)) for edge in G.edges()]
>>> G.add_factors(*phi)
>>> G_chordal = G.triangulate()

Dynamic Bayesian Network

Factor Graph

class pgmpy.models.FactorGraph.FactorGraph(ebunch=None)[source]

Class for representing factor graph.

DiscreteFactor graph is a bipartite graph representing factorization of a function. They allow efficient computation of marginal distributions through sum-product algorithm.

A factor graph contains two types of nodes. One type corresponds to random variables whereas the second type corresponds to factors over these variables. The graph only contains edges between variables and factor nodes. Each factor node is associated with one factor whose scope is the set of variables that are its neighbors.

data: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list.

Create an empty FactorGraph with no nodes and no edges

>>> from pgmpy.models import FactorGraph
>>> G = FactorGraph()

G can be grown by adding variable nodes as well as factor nodes

Nodes:

Add a node at a time or a list of nodes.

>>> G.add_node('a')
>>> G.add_nodes_from(['a', 'b'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1)
>>> G.add_nodes_from([phi1])

Edges:

G can also be grown by adding edges.

>>> G.add_edge('a', phi1)

or a list of edges

>>> G.add_edges_from([('a', phi1), ('b', phi1)])
add_edge(u, v, **kwargs)[source]

Add an edge between variable_node and factor_node.

u, v: nodes
Nodes can be any hashable Python object.
>>> from pgmpy.models import FactorGraph
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edge('a', phi1)
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values.

*factor: pgmpy.factors.DiscreteFactor object
A factor object on any subset of the variables of the model which is to be associated with the model.
>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
check_model()[source]

Check the model for various errors. This method checks for the following errors. In the same time it also updates the cardinalities of all the random variables.

  • Check whether bipartite property of factor graph is still maintained

or not. * Check whether factors are associated for all the random variables or not. * Check if factors are defined for each factor node or not. * Check if cardinality of random variable remains same across all the factors.

get_cardinality(check_cardinality=False)[source]

Returns a dictionary with the given factors as keys and their respective cardinality as values.

check_cardinality: boolean, optional
If, check_cardinality=True it checks if cardinality information for all the variables is availble or not. If not it raises an error.
>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.add_factors(phi1, phi2)
>>> G.get_cardinality()
    defaultdict(<class 'int'>, {'c': 2, 'b': 2, 'a': 2})
get_factor_nodes()[source]

Returns factors nodes present in the graph.

Before calling this method make sure that all the factors are added properly.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factor_nodes()
[<DiscreteFactor representing phi(b:2, c:2) at 0x4b8c7f0>,
 <DiscreteFactor representing phi(a:2, b:2) at 0x4b8c5b0>]
get_factors(node=None)[source]

Returns the factors that have been added till now to the graph.

If node is not None, it would return the factor corresponding to the given node.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factors()
>>> G.get_factors(node=phi1)
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_factors()
>>> G.get_partition_function()
get_variable_nodes()[source]

Returns variable nodes present in the graph.

Before calling this method make sure that all the factors are added properly.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_factors(phi1, phi2)
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> G.get_variable_nodes()
['a', 'b']
remove_factors(*factors)[source]

Removes the given factors from the added factors.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1)
>>> G.remove_factors(phi1)
to_junction_tree()[source]

Create a junction treeo (or clique tree) for a given factor graph.

For a given factor graph (H) a junction tree (G) is a graph 1. where each node in G corresponds to a maximal clique in H 2. each sepset in G separates the variables strictly on one side of edge to other

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> mm = G.to_markov_model()
to_markov_model()[source]

Converts the factor graph into markov model.

A markov model contains nodes as random variables and edge between two nodes imply interaction between them.

>>> from pgmpy.models import FactorGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = FactorGraph()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> G.add_nodes_from([phi1, phi2])
>>> G.add_edges_from([('a', phi1), ('b', phi1),
...                   ('b', phi2), ('c', phi2)])
>>> mm = G.to_markov_model()

Cluster Graph

class pgmpy.models.ClusterGraph.ClusterGraph(ebunch=None)[source]

Base class for representing Cluster Graph.

Cluster graph is an undirected graph which is associated with a subset of variables. The graph contains undirected edges that connects clusters whose scopes have a non-empty intersection.

Formally, a cluster graph is \mathcal{U} for a set of factors \Phi over \mathcal{X} is an undirected graph, each of whose nodes i is associated with a subset C_i \subseteq X. A cluster graph must be family-preserving - each factor \phi \in \Phi must be associated with a cluster C, denoted \alpha(\phi), such that Scope[\phi] \subseteq C_i. Each edge between a pair of clusters C_i and C_j is associated with a sepset S_{i,j} \subseteq C_i \cap C_j.

data: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list

Create an empty ClusterGraph with no nodes and no edges

>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()

G can be grown by adding clique nodes.

Nodes:

Add a tuple (or list or set) of nodes as single clique node.

>>> G.add_node(('a', 'b', 'c'))
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])

Edges:

G can also be grown by adding edges.

>>> G.add_edge(('a', 'b', 'c'), ('a', 'b'))

or a list of edges

>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_edge(u, v, **kwargs)[source]

Add an edge between two clique nodes.

u, v: nodes
Nodes can be any list or set or tuple of nodes forming a clique.
>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_factors(*factors)[source]

Associate a factor to the graph. See factors class for the order of potential values

*factor: pgmpy.factors.factors object
A factor object on any subset of the variables of the model which is to be associated with the model.

None

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> student.add_node(('Alice', 'Bob'))
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[3, 2],
...                 values=np.random.rand(6))
>>> student.add_factors(factor)
add_node(node, **kwargs)[source]

Add a single node to the cluster graph.

node: node
A node should be a collection of nodes forming a clique. It can be a list, set or tuple of nodes
>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_node(('a', 'b', 'c'))
add_nodes_from(nodes, **kwargs)[source]

Add multiple nodes to the cluster graph.

nodes: iterable container
A container of nodes (list, dict, set, etc.).
>>> from pgmpy.models import ClusterGraph
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])
check_model()[source]

Check the model for various errors. This method checks for the following errors.

  • Checks if factors are defined for all the cliques or not.
  • Check for running intersection property is not done explicitly over

here as it done in the add_edges method. * Check if cardinality of random variable remains same across all the factors.

check: boolean
True if all the checks are passed
copy()[source]

Returns a copy of ClusterGraph.

ClusterGraph: copy of ClusterGraph

>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b'), ('b', 'c')])
>>> G.add_edge(('a', 'b'), ('b', 'c'))
>>> phi1 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi2 = DiscreteFactor(['b', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2)
>>> graph_copy = G.copy()
>>> graph_copy.factors
[<DiscreteFactor representing phi(a:2, b:2) at 0xb71b19cc>, <DiscreteFactor representing phi(b:2, c:2) at 0xb4eaf3ac>]
>>> graph_copy.edges()
[(('a', 'b'), ('b', 'c'))]
>>> graph_copy.nodes()
[('a', 'b'), ('b', 'c')]
get_cardinality(check_cardinality=False)[source]

Returns a dictionary with the given factors as keys and their respective cardinality as values.

check_cardinality: boolean, optional
If, check_cardinality=True it checks if cardinality information for all the variables is availble or not. If not it raises an error.
>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 values=np.random.rand(4))
>>> student.add_node(('Alice', 'Bob'))
>>> student.add_factors(factor)
>>> student.get_cardinality()
defaultdict(<class 'int'>, {'Bob': 2, 'Alice': 2})
get_factors(node=None)[source]

Return the factors that have been added till now to the graph.

If node is not None, it would return the factor corresponding to the given node.

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b', 'c'], [2, 2, 2], np.random.rand(8))
>>> phi2 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi3 = DiscreteFactor(['a', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2, phi3)
>>> G.get_factors()
>>> G.get_factors(node=('a', 'b', 'c'))
get_partition_function()[source]

Returns the partition function for a given undirected graph.

A partition function is defined as

\sum_{X}(\prod_{i=1}^{m} \phi_i)

where m is the number of factors present in the graph and X are all the random variables present.

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> G = ClusterGraph()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b', 'c'], [2, 2, 2], np.random.rand(8))
>>> phi2 = DiscreteFactor(['a', 'b'], [2, 2], np.random.rand(4))
>>> phi3 = DiscreteFactor(['a', 'c'], [2, 2], np.random.rand(4))
>>> G.add_factors(phi1, phi2, phi3)
>>> G.get_partition_function()
remove_factors(*factors)[source]

Removes the given factors from the added factors.

>>> from pgmpy.models import ClusterGraph
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> student = ClusterGraph()
>>> factor = DiscreteFactor(['Alice', 'Bob'], cardinality=[2, 2],
...                 value=np.random.rand(4))
>>> student.add_factors(factor)
>>> student.remove_factors(factor)

Junction Tree

class pgmpy.models.JunctionTree.JunctionTree(ebunch=None)[source]

Class for representing Junction Tree.

Junction tree is undirected graph where each node represents a clique (list, tuple or set of nodes) and edges represent sepset between two cliques. Each sepset in G separates the variables strictly on one side of edge to other.

data: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data is an edge list.

Create an empty JunctionTree with no nodes and no edges

>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()

G can be grown by adding clique nodes.

Nodes:

Add a tuple (or list or set) of nodes as single clique node.

>>> G.add_node(('a', 'b', 'c'))
>>> G.add_nodes_from([('a', 'b'), ('a', 'b', 'c')])

Edges:

G can also be grown by adding edges.

>>> G.add_edge(('a', 'b', 'c'), ('a', 'b'))

or a list of edges

>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
add_edge(u, v, **kwargs)[source]

Add an edge between two clique nodes.

u, v: nodes
Nodes can be any list or set or tuple of nodes forming a clique.
>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()
>>> G.add_nodes_from([('a', 'b', 'c'), ('a', 'b'), ('a', 'c')])
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')),
...                   (('a', 'b', 'c'), ('a', 'c'))])
check_model()[source]

Check the model for various errors. This method checks for the following errors. In the same time also updates the cardinalities of all the random variables.

  • Checks if clique potentials are defined for all the cliques or not.
  • Check for running intersection property is not done explicitly over

here as it done in the add_edges method.

check: boolean
True if all the checks are passed
copy()[source]

Returns a copy of JunctionTree.

JunctionTree : copy of JunctionTree

>>> import numpy as np
>>> from pgmpy.factors.discrete import DiscreteFactor
>>> from pgmpy.models import JunctionTree
>>> G = JunctionTree()
>>> G.add_edges_from([(('a', 'b', 'c'), ('a', 'b')), (('a', 'b', 'c'), ('a', 'c'))])
>>> phi1 = DiscreteFactor(['a', 'b'], [1, 2], np.random.rand(2))
>>> phi2 = DiscreteFactor(['a', 'c'], [1, 2], np.random.rand(2))
>>> G.add_factors(phi1,phi2)
>>> modelCopy = G.copy()
>>> modelCopy.edges()
[(('a', 'b'), ('a', 'b', 'c')), (('a', 'c'), ('a', 'b', 'c'))]
>>> G.factors
[<DiscreteFactor representing phi(a:1, b:2) at 0xb720ee4c>, <DiscreteFactor representing phi(a:1, c:2) at 0xb4e1e06c>]
>>> modelCopy.factors
[<DiscreteFactor representing phi(a:1, b:2) at 0xb4bd11ec>, <DiscreteFactor representing phi(a:1, c:2) at 0xb4bd138c>]

Markov Chain

class pgmpy.models.MarkovChain.MarkovChain(variables=None, card=None, start_state=None)[source]

Class to represent a Markov Chain with multiple kernels for factored state space, along with methods to simulate a run.

set_start_state(state) add_variable(variable, cardinality) add_variables_from(vars_list, cards_list) add_transition_model(variable, transition_dict) sample(start_state, size)

Create an empty Markov Chain: >>> from pgmpy.models import MarkovChain as MC >>> model = MC()

And then add variables to it >>> model.add_variables_from([‘intel’, ‘diff’], [2, 3])

Or directly create a Markov Chain from a list of variables and their cardinalities >>> model = MC([‘intel’, ‘diff’], [2, 3])

Add transition models >>> intel_tm = {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}} >>> model.add_transition_model(‘intel’, intel_tm) >>> diff_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}} >>> model.add_transition_model(‘diff’, diff_tm)

Set a start state >>> from pgmpy.factors.discrete import State >>> model.set_start_state([State(‘intel’, 0), State(‘diff’, 2)])

Sample from it >>> model.sample(size=5)

intel diff

0 0 2 1 1 0 2 0 1 3 1 0 4 0 2

add_transition_model(variable, transition_model)[source]

Adds a transition model for a particular variable.

variable: any hashable python object
must be an existing variable of the model.
transition_model: dict
representing valid transition probabilities defined for every possible state of the variable.
>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variable('grade', 3)
>>> grade_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}}
>>> model.add_transition_model('grade', grade_tm)
add_variable(variable, card=0)[source]

Add a variable to the model.

variable: any hashable python object

card: int
Representing the cardinality of the variable to be added.
>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variable('x', 4)
add_variables_from(variables, cards)[source]

Add several variables to the model at once.

variables: array-like iterable object
List of variables to be added.
cards: array-like iterable object
List of cardinalities of the variables to be added.
>>> from pgmpy.models import MarkovChain as MC
>>> model = MC()
>>> model.add_variables_from(['x', 'y'], [3, 4])
copy()[source]

Returns a copy of Markov Chain Model.

MarkovChain : Copy of MarkovChain.

>>> from pgmpy.models import MarkovChain
>>> from pgmpy.factors.discrete import State
>>> model = MarkovChain()
>>> model.add_variables_from(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {0: 0.3, 1: 0.3, 2: 0.4}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.set_start_state([State('intel', 0), State('diff', 2)])
>>> model_copy = model.copy()
>>> model_copy.transition_models
>>> {'diff': {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6}, 2: {0: 0.7, 1: 0.15, 2: 0.15}},
     'intel': {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}}}
generate_sample(start_state=None, size=1)[source]

Generator version of self.sample

List of State namedtuples, representing the assignment to all variables of the model.

>>> from pgmpy.models.MarkovChain import MarkovChain
>>> from pgmpy.factors.discrete import State
>>> model = MarkovChain()
>>> model.add_variables_from(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {0: 0.3, 1: 0.3, 2: 0.4}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> gen = model.generate_sample([State('intel', 0), State('diff', 0)], 2)
>>> [sample for sample in gen]
[[State(var='intel', state=2), State(var='diff', state=1)],
 [State(var='intel', state=2), State(var='diff', state=0)]]
prob_from_sample(state, sample=None, window_size=None)[source]

Given an instantiation (partial or complete) of the variables of the model, compute the probability of observing it over multiple windows in a given sample.

If ‘sample’ is not passed as an argument, generate the statistic by sampling from the Markov Chain, starting with a random initial state.

>>> from pgmpy.models.MarkovChain import MarkovChain as MC
>>> model = MC(['intel', 'diff'], [3, 2])
>>> intel_tm = {0: {0: 0.2, 1: 0.4, 2:0.4}, 1: {0: 0, 1: 0.5, 2: 0.5}, 2: {2: 1}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.5, 1: 0.5}, 1: {0: 0.25, 1:0.75}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.prob_from_sample([State('diff', 0)])
array([ 0.27,  0.4 ,  0.18,  0.23, ..., 0.29])
random_state()[source]

Generates a random state of the Markov Chain.

List of namedtuples, representing a random assignment to all variables of the model.

>>> from pgmpy.models import MarkovChain as MC
>>> model = MC(['intel', 'diff'], [2, 3])
>>> model.random_state()
[State('diff', 2), State('intel', 1)]
sample(start_state=None, size=1)[source]

Sample from the Markov Chain.

start_state: dict or array-like iterable
Representing the starting states of the variables. If None is passed, a random start_state is chosen.
size: int
Number of samples to be generated.

pandas.DataFrame

>>> from pgmpy.models import MarkovChain as MC
>>> from pgmpy.factors.discrete import State
>>> model = MC(['intel', 'diff'], [2, 3])
>>> model.set_start_state([State('intel', 0), State('diff', 2)])
>>> intel_tm = {0: {0: 0.25, 1: 0.75}, 1: {0: 0.5, 1: 0.5}}
>>> model.add_transition_model('intel', intel_tm)
>>> diff_tm = {0: {0: 0.1, 1: 0.5, 2: 0.4}, 1: {0: 0.2, 1: 0.2, 2: 0.6 }, 2: {0: 0.7, 1: 0.15, 2: 0.15}}
>>> model.add_transition_model('diff', diff_tm)
>>> model.sample(size=5)
   intel  diff
0      0     2
1      1     0
2      0     1
3      1     0
4      0     2
set_start_state(start_state)[source]

Set the start state of the Markov Chain. If the start_state is given as a array-like iterable, its contents are reordered in the internal representation.

start_state: dict or array-like iterable object
Dict (or list) of tuples representing the starting states of the variables.
>>> from pgmpy.models import MarkovChain as MC
>>> from pgmpy.factors.discrete import State
>>> model = MC(['a', 'b'], [2, 2])
>>> model.set_start_state([State('a', 0), State('b', 1)])

NoisyOr Model

class pgmpy.models.NoisyOrModel.NoisyOrModel(variables, cardinality, inhibitor_probability)[source]

Base class for Noisy-Or models.

This is an implementation of generalized Noisy-Or models and is not limited to Boolean variables and also any arbitrary function can be used instead of the boolean OR function.

Reference: http://xenon.stanford.edu/~srinivas/research/6-UAI93-Srinivas-Generalization-of-Noisy-Or.pdf

add_variables(variables, cardinality, inhibitor_probability)[source]

Adds variables to the NoisyOrModel.

variables: list, tuple, dict (array like)
array containing names of the variables that are to be added.
cardinality: list, tuple, dict (array like)
array containing integers representing the cardinality of the variables.
inhibitor_probability: list, tuple, dict (array_like)
array containing the inhibitor probabilities corresponding to each variable.
>>> from pgmpy.models import NoisyOrModel
>>> model = NoisyOrModel(['x1', 'x2', 'x3'], [2, 3, 2], [[0.6, 0.4],
...                                                      [0.2, 0.4, 0.7],
...                                                      [0.1, 0. 4]])
>>> model.add_variables(['x4'], [3], [0.1, 0.4, 0.2])
del_variables(variables)[source]

Deletes variables from the NoisyOrModel.

variables: list, tuple, dict (array like)
list of variables to be deleted.
>>> from pgmpy.models import NoisyOrModel
>>> model = NoisyOrModel(['x1', 'x2', 'x3'], [2, 3, 2], [[0.6, 0.4],
...                                                      [0.2, 0.4, 0.7],
...                                                      [0.1, 0. 4]])
>>> model.del_variables(['x1'])

Naive Bayes

class pgmpy.models.NaiveBayes.NaiveBayes(ebunch=None)[source]

Class to represent Naive Bayes. Subclass of Bayesian Model. Model holds directed edges from one parent node to multiple children nodes only.

data
: input graph
Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object.

Create an empty Naive Bayes Model with no nodes and no edges.

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node('a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(['a', 'b', 'c'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge('a', 'b')

a list of edges,

>>> G.add_edges_from([('a', 'b'), ('a', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
active_trail_nodes(start, observed=None)[source]

Returns all the nodes reachable from start via an active trail.

start: Graph node

observed
: List of nodes (optional)
If given the active trail would be computed assuming these nodes to be observed.
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_nodes_from(['a', 'b', 'c', 'd'])
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> model.active_trail_nodes('a')
{'a', 'b', 'c', 'd'}
>>> model.active_trail_nodes('a', ['b', 'c'])
{'a', 'd'}
>>> model.active_trail_nodes('b', ['a'])
{'b'}
add_edge(u, v, *kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph

u,v
: nodes
Nodes can be any hashable python object.
>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> G.add_edge('a', 'b')
>>> G.add_edge('a', 'c')
>>> G.edges()
[('a', 'c'), ('a', 'b')]
fit(data, parent_node=None, estimator_type=None)[source]

Computes the CPD for each node from a given data in the form of a pandas dataframe. If a variable from the data is not present in the model, it adds that node into the model.

data
: pandas DataFrame object
A DataFrame object with column names same as the variable names of network
parent_node: any hashable python object (optional)
Parent node of the model, if not specified it looks for a previously specified parent node.
estimator: Estimator class
Any pgmpy estimator. If nothing is specified, the default MaximumLikelihoodEstimator would be used.
>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> model.fit(values, 'A')
>>> model.get_cpds()
[<TabularCPD representing P(D:2 | A:2) at 0x4b72870>,
 <TabularCPD representing P(E:2 | A:2) at 0x4bb2150>,
 <TabularCPD representing P(A:2) at 0x4bb23d0>,
 <TabularCPD representing P(B:2 | A:2) at 0x4bb24b0>,
 <TabularCPD representing P(C:2 | A:2) at 0x4bb2750>]
>>> model.edges()
[('A', 'D'), ('A', 'E'), ('A', 'B'), ('A', 'C')]
local_independencies(variables)[source]

Returns a list of independencies objects containing the local independencies of each of the variables. If local independencies does not exist for a variable it gives a None for that variable.

variables: str or array like
variables whose local independencies are to found.
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> ind = model.local_independencies('b')
>>> ind
[(b _|_ d, c | a)]