Base Model Structures

Directed Acyclic Graph (DAG)

class pgmpy.base.DAG(ebunch=None, latents={})[source]

Base class for all Directed Graphical Models.

Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.

Parameters:

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.

Examples

Create an empty DAG with no nodes and no edges

>>> from pgmpy.base import DAG
>>> G = DAG()

G can be grown in several ways:

Nodes:

Add one node at a time:

>>> G.add_node(node='a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(nodes=['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge(u='a', v='b')

a list of edges,

>>> G.add_edges_from(ebunch=[('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
active_trail_nodes(variables, observed=None, include_latents=False)[source]

Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.

Parameters:
  • variables (str or array like) – variables whose active trails are to be found.

  • observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.

  • include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades')])
>>> student.active_trail_nodes('diff')
{'diff': {'diff', 'grades'}}
>>> student.active_trail_nodes(['diff', 'intel'], observed='grades')
{'diff': {'diff', 'intel'}, 'intel': {'diff', 'intel'}}

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

add_edge(u, v, weight=None)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph.

Parameters:
  • u (nodes) – Nodes can be any hashable Python object.

  • v (nodes) – Nodes can be any hashable Python object.

  • weight (int, float (default=None)) – The weight of the edge

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles'])
>>> G.add_edge(u='Alice', v='Bob')
>>> G.nodes()
NodeView(('Alice', 'Bob', 'Charles'))
>>> G.edges()
OutEdgeView([('Alice', 'Bob')])

When the node is not already present in the graph: >>> G.add_edge(u=’Alice’, v=’Ankur’) >>> G.nodes() NodeView((‘Alice’, ‘Ankur’, ‘Bob’, ‘Charles’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Alice’, ‘Ankur’)])

Adding edges with weight: >>> G.add_edge(‘Ankur’, ‘Maria’, weight=0.1) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.1}

add_edges_from(ebunch, weights=None)[source]

Add all the edges in ebunch.

If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.

**The behavior of adding weights is different than networkx.

Parameters:
  • ebunch (container of edges) – Each edge given in the container will be added to the graph. The edges must be given as 2-tuples (u, v).

  • weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the edge at index i.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles'])
>>> G.add_edges_from(ebunch=[('Alice', 'Bob'), ('Bob', 'Charles')])
>>> G.nodes()
NodeView(('Alice', 'Bob', 'Charles'))
>>> G.edges()
OutEdgeView([('Alice', 'Bob'), ('Bob', 'Charles')])

When the node is not already in the model: >>> G.add_edges_from(ebunch=[(‘Alice’, ‘Ankur’)]) >>> G.nodes() NodeView((‘Alice’, ‘Bob’, ‘Charles’, ‘Ankur’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Bob’, ‘Charles’), (‘Alice’, ‘Ankur’)])

Adding edges with weights: >>> G.add_edges_from([(‘Ankur’, ‘Maria’), (‘Maria’, ‘Mason’)], … weights=[0.3, 0.5]) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.3} >>> G.edge[‘Maria’][‘Mason’] {‘weight’: 0.5}

add_node(node, weight=None, latent=False)[source]

Adds a single node to the Graph.

Parameters:
  • node (str, int, or any hashable python object.) – The node to add to the graph.

  • weight (int, float) – The weight of the node.

  • latent (boolean (default: False)) – Specifies whether the variable is latent or not.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_node(node='A')
>>> sorted(G.nodes())
['A']

Adding a node with some weight. >>> G.add_node(node=’B’, weight=0.3)

The weight of these nodes can be accessed as: >>> G.nodes[‘B’] {‘weight’: 0.3} >>> G.nodes[‘A’] {‘weight’: None}

add_nodes_from(nodes, weights=None, latent=False)[source]

Add multiple nodes to the Graph.

**The behviour of adding weights is different than in networkx.

Parameters:
  • nodes (iterable container) – A container of nodes (list, dict, set, or any hashable python object).

  • weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the variable at index i.

  • latent (list, tuple (default=False)) – A container of boolean. The value at index i tells whether the node at index i is latent or not.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['A', 'B', 'C'])
>>> G.nodes()
NodeView(('A', 'B', 'C'))

Adding nodes with weights: >>> G.add_nodes_from(nodes=[‘D’, ‘E’], weights=[0.3, 0.6]) >>> G.nodes[‘D’] {‘weight’: 0.3} >>> G.nodes[‘E’] {‘weight’: 0.6} >>> G.nodes[‘A’] {‘weight’: None}

do(nodes, inplace=False)[source]

Applies the do operator to the graph and returns a new DAG with the transformed graph.

The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.

Parameters:
  • nodes (list, array-like) – The names of the nodes to apply the do-operator for.

  • inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.

Returns:

Modified DAG – A new instance of DAG modified by the do-operator

Return type:

pgmpy.base.DAG

Examples

Initialize a DAG >>> graph = DAG() >>> graph.add_edges_from([(‘X’, ‘A’), … (‘A’, ‘Y’), … (‘A’, ‘B’)]) >>> # Applying the do-operator will return a new DAG with the desired structure. >>> graph_do_A = graph.do(‘A’) >>> # Which we can verify is missing the edges we would expect. >>> graph_do_A.edges OutEdgeView([(‘A’, ‘B’), (‘A’, ‘Y’)])

References

Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.

fit(data, estimator=None, state_names=[], n_jobs=1, **kwargs)[source]

Estimates the CPD for each variable based on a given data set.

Parameters:
  • data (pandas DataFrame object) – DataFrame object with column names identical to the variable names of the network. (If some values in the data are missing the data cells should be set to numpy.nan. Note that pandas converts each column containing numpy.nan`s to dtype `float.)

  • estimator (Estimator class) – One of: - MaximumLikelihoodEstimator (default) - BayesianEstimator: In this case, pass ‘prior_type’ and either ‘pseudo_counts’ or ‘equivalent_sample_size’ as additional keyword arguments. See BayesianEstimator.get_parameters() for usage. - ExpectationMaximization

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

  • n_jobs (int (default: 1)) – Number of threads/processes to use for estimation. Using n_jobs > 1 for small models or datasets might be slower.

Returns:

Fitted Model – Returns a BayesianNetwork object with learned CPDs. The DAG structure is preserved, and parameters (CPDs) are added. This allows the DAG to represent both the structure and the parameters of a Bayesian Network.

Return type:

BayesianNetwork

Examples

>>> import pandas as pd
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.base import DAG
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = DAG([('A', 'C'), ('B', 'C')])
>>> fitted_model = model.fit(data)
>>> fitted_model.get_cpds()
[<TabularCPD representing P(A:2) at 0x17945372c30>,
<TabularCPD representing P(B:2) at 0x17945a19760>,
<TabularCPD representing P(C:2 | A:2, B:2) at 0x17944f42690>]
get_ancestral_graph(nodes)[source]

Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.

Parameters:

node (iterable) – List of nodes whose ancestral graph needs to be computed.

Returns:

Ancestral Graph

Return type:

pgmpy.base.DAG

Examples

>>> from pgmpy.base import DAG
>>> dag = DAG([('A', 'C'), ('B', 'C'), ('D', 'A'), ('D', 'B')])
>>> anc_dag = dag.get_ancestral_graph(nodes=['A', 'B'])
>>> anc_dag.edges()
OutEdgeView([('D', 'A'), ('D', 'B')])
get_children(node)[source]

Returns a list of children of node. Throws an error if the node is not present in the graph.

Parameters:

node (string, int or any hashable python object.) – The node whose children would be returned.

Examples

>>> from pgmpy.base import DAG
>>> g = DAG(ebunch=[('A', 'B'), ('C', 'B'), ('B', 'D'),
                              ('B', 'E'), ('B', 'F'), ('E', 'G')])
>>> g.get_children(node='B')
['D', 'E', 'F']
get_immoralities()[source]

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

Returns:

Immoralities – A set of all the immoralities in the model

Return type:

set

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
...                         ('intel', 'SAT'), ('grade', 'letter')])
>>> student.get_immoralities()
{('diff', 'intel')}
get_independencies(latex=False, include_latents=False)[source]

Computes independencies in the DAG, by checking minimal d-seperation.

Parameters:
  • latex (boolean) – If latex=True then latex string of the independence assertion would be created.

  • include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.

Examples

>>> from pgmpy.base import DAG
>>> chain = DAG([('X', 'Y'), ('Y', 'Z')])
>>> chain.get_independencies()
(X ⟂ Z | Y)
get_leaves()[source]

Returns a list of leaves of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D')])
>>> graph.get_leaves()
['C', 'D']
get_markov_blanket(node)[source]

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns:

Markov Blanket – List of nodes in the markov blanket of node.

Return type:

list

Parameters:

node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

Examples

>>> from pgmpy.base import DAG
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = DAG([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'),
                       ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')])
>>> G.get_markov_blanket('y')
['s', 'w', 'x', 'u', 'z', 'v']
get_parents(node)[source]

Returns a list of parents of node.

Throws an error if the node is not present in the graph.

Parameters:

node (string, int or any hashable python object.) – The node whose parents would be returned.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')])
>>> G.get_parents(node='grade')
['diff', 'intel']
static get_random(n_nodes=5, edge_prob=0.5, node_names=None, latents=False, seed=None)[source]

Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.

Parameters:
  • n_nodes (int) – The number of nodes in the randomly generated DAG.

  • edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.

  • node_names (list (default: None)) – A list of variables names to use in the random graph. If None, the node names are integer values starting from 0.

  • latents (bool (default: False)) – If True, includes latent variables in the generated DAG.

  • seed (int (default: None)) – The seed for the random number generator.

Returns:

Random DAG – The randomly generated DAG.

Return type:

pgmpy.base.DAG

Examples

>>> from pgmpy.base import DAG
>>> random_dag = DAG.get_random(n_nodes=10, edge_prob=0.3)
>>> random_dag.nodes()
NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
>>> random_dag.edges()
OutEdgeView([(0, 6), (1, 6), (1, 7), (7, 9), (2, 5), (2, 7), (2, 8), (5, 9), (3, 7)])
get_roots()[source]

Returns a list of roots of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D'), ('E', 'B')])
>>> graph.get_roots()
['A', 'E']
is_dconnected(start, end, observed=None, include_latents=False)[source]

Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.

Parameters:
  • start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.

  • end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.

  • observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.

  • include_latents (boolean (default: False)) – If true, latent variables are return as part of the active trail.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades', 'letter', 'sat'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades'), ('grades', 'letter'),
...                         ('intel', 'sat')])
>>> student.is_dconnected('diff', 'intel')
False
>>> student.is_dconnected('grades', 'sat')
True
is_iequivalent(model)[source]

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Parameters:

model (A DAG object, for which you want to check I-equivalence)

Returns:

I-equivalence – True if both are I-equivalent, False otherwise

Return type:

boolean

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_edges_from([('V', 'W'), ('W', 'X'),
...                   ('X', 'Y'), ('Z', 'Y')])
>>> G1 = DAG()
>>> G1.add_edges_from([('W', 'V'), ('X', 'W'),
...                    ('X', 'Y'), ('Z', 'Y')])
>>> G.is_iequivalent(G1)
True
local_independencies(variables)[source]

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:

variables (str or array like) – variables whose local independencies are to be found.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
>>>                         ('grade', 'letter'), ('intel', 'SAT')])
>>> ind = student.local_independencies('grade')
>>> ind
(grade ⟂ SAT | diff, intel)
minimal_dseparator(start, end, include_latents=False)[source]

Finds the minimal d-separating set for start and end.

Parameters:
  • start (node) – The first node.

  • end (node) – The second node.

  • include_latents (boolean (default: False)) – If true, latent variables are consider for minimal d-seperator.

Examples

>>> dag = DAG([('A', 'B'), ('B', 'C')])
>>> dag.minimal_dseparator(start='A', end='C')
{'B'}

References

[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.

moralize()[source]

Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')])
>>> moral_graph = G.moralize()
>>> moral_graph.edges()
EdgeView([('intel', 'grade'), ('intel', 'diff'), ('grade', 'diff')])
to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]

Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.

Parameters:
  • node_pos (str or dict (default: circular)) –

    If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,

    spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.

    If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.

    If no argument is provided uses circular layout.

  • latex (boolean) – Whether to use latex for rendering the node names.

  • pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}

  • edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }

  • node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }

Returns:

Daft object – Daft object for plotting the DAG.

Return type:

daft.PGM object

Examples

>>> from pgmpy.base import DAG
>>> dag = DAG([('a', 'b'), ('b', 'c'), ('d', 'c')])
>>> dag.to_daft(node_pos={'a': (0, 0), 'b': (1, 0), 'c': (2, 0), 'd': (1, 1)})
<daft.PGM at 0x7fc756e936d0>
>>> dag.to_daft(node_pos="circular")
<daft.PGM at 0x7f9bb48c5eb0>
>>> dag.to_daft(node_pos="circular", pgm_params={'observed_style': 'inner'})
<daft.PGM at 0x7f9bb48b0bb0>
>>> dag.to_daft(node_pos="circular",
...             edge_params={('a', 'b'): {'label': 2}},
...             node_params={'a': {'shape': 'rectangle'}})
<daft.PGM at 0x7f9bb48b0bb0>
to_graphviz()[source]

Retuns a pygraphviz object for the DAG. pygraphviz is useful for visualizing the network structure.

Examples

>>> from pgmpy.utils import get_example_model
>>> model = get_example_model('alarm')
>>> model.to_graphviz()
<AGraph <Swig Object of type 'Agraph_t *' at 0x7fdea4cde040>>
>>> model.draw('model.png', prog='neato')

Partial Directed Acyclic Graph (PDAG)

class pgmpy.base.PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]

Class for representing PDAGs (also known as CPDAG). PDAGs are the equivalence classes of DAGs and contain both directed and undirected edges.

Note: In this class, undirected edges are represented using two edges in both direction i.e. an undirected edge between X - Y is represented using X -> Y and X <- Y.

copy()[source]

Returns a copy of the object instance.

Returns:

Copy of PDAG – Returns a copy of self.

Return type:

pgmpy.dag.PDAG

to_dag(required_edges=[])[source]

Returns one possible DAG which is represented using the PDAG.

Parameters:

required_edges (list, array-like of 2-tuples) – The list of edges that should be included in the DAG.

Return type:

Returns an instance of DAG.

Examples

to_graphviz()[source]

Retuns a pygraphviz object for the DAG. pygraphviz is useful for visualizing the network structure.

Examples

>>> from pgmpy.utils import get_example_model
>>> model = get_example_model('alarm')
>>> model.to_graphviz()
<AGraph <Swig Object of type 'Agraph_t *' at 0x7fdea4cde040>>