Base Structure Classes

Directed Acyclic Graph (DAG)

class pgmpy.base.DAG.DAG(ebunch=None, latents={})[source]

Base class for all Directed Graphical Models.

Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.

Parameters:

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.

Examples

Create an empty DAG with no nodes and no edges

>>> from pgmpy.base import DAG
>>> G = DAG()

G can be grown in several ways:

Nodes:

Add one node at a time:

>>> G.add_node(node='a')

Add the nodes from any container (a list, set or tuple or the nodes from another graph).

>>> G.add_nodes_from(nodes=['a', 'b'])

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge(u='a', v='b')

a list of edges,

>>> G.add_edges_from(ebunch=[('a', 'b'), ('b', 'c')])

If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Shortcuts:

Many common graph features allow python syntax for speed reporting.

>>> 'a' in G     # check if node in graph
True
>>> len(G)  # number of nodes in graph
3
active_trail_nodes(variables, observed=None, include_latents=False)[source]

Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.

Parameters:
  • variables (str or array like) – variables whose active trails are to be found.

  • observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.

  • include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades')])
>>> student.active_trail_nodes('diff')
{'diff': {'diff', 'grades'}}
>>> student.active_trail_nodes(['diff', 'intel'], observed='grades')
{'diff': {'diff', 'intel'}, 'intel': {'diff', 'intel'}}

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

add_edge(u, v, weight=None)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph.

Parameters:
  • u (nodes) – Nodes can be any hashable Python object.

  • v (nodes) – Nodes can be any hashable Python object.

  • weight (int, float (default=None)) – The weight of the edge

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles'])
>>> G.add_edge(u='Alice', v='Bob')
>>> G.nodes()
NodeView(('Alice', 'Bob', 'Charles'))
>>> G.edges()
OutEdgeView([('Alice', 'Bob')])

When the node is not already present in the graph: >>> G.add_edge(u=’Alice’, v=’Ankur’) >>> G.nodes() NodeView((‘Alice’, ‘Ankur’, ‘Bob’, ‘Charles’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Alice’, ‘Ankur’)])

Adding edges with weight: >>> G.add_edge(‘Ankur’, ‘Maria’, weight=0.1) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.1}

add_edges_from(ebunch, weights=None)[source]

Add all the edges in ebunch.

If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.

**The behavior of adding weights is different than networkx.

Parameters:
  • ebunch (container of edges) – Each edge given in the container will be added to the graph. The edges must be given as 2-tuples (u, v).

  • weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the edge at index i.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles'])
>>> G.add_edges_from(ebunch=[('Alice', 'Bob'), ('Bob', 'Charles')])
>>> G.nodes()
NodeView(('Alice', 'Bob', 'Charles'))
>>> G.edges()
OutEdgeView([('Alice', 'Bob'), ('Bob', 'Charles')])

When the node is not already in the model: >>> G.add_edges_from(ebunch=[(‘Alice’, ‘Ankur’)]) >>> G.nodes() NodeView((‘Alice’, ‘Bob’, ‘Charles’, ‘Ankur’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Bob’, ‘Charles’), (‘Alice’, ‘Ankur’)])

Adding edges with weights: >>> G.add_edges_from([(‘Ankur’, ‘Maria’), (‘Maria’, ‘Mason’)], … weights=[0.3, 0.5]) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.3} >>> G.edge[‘Maria’][‘Mason’] {‘weight’: 0.5}

add_node(node, weight=None, latent=False)[source]

Adds a single node to the Graph.

Parameters:
  • node (str, int, or any hashable python object.) – The node to add to the graph.

  • weight (int, float) – The weight of the node.

  • latent (boolean (default: False)) – Specifies whether the variable is latent or not.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_node(node='A')
>>> sorted(G.nodes())
['A']

Adding a node with some weight. >>> G.add_node(node=’B’, weight=0.3)

The weight of these nodes can be accessed as: >>> G.nodes[‘B’] {‘weight’: 0.3} >>> G.nodes[‘A’] {‘weight’: None}

add_nodes_from(nodes, weights=None, latent=False)[source]

Add multiple nodes to the Graph.

**The behviour of adding weights is different than in networkx.

Parameters:
  • nodes (iterable container) – A container of nodes (list, dict, set, or any hashable python object).

  • weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the variable at index i.

  • latent (list, tuple (default=False)) – A container of boolean. The value at index i tells whether the node at index i is latent or not.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_nodes_from(nodes=['A', 'B', 'C'])
>>> G.nodes()
NodeView(('A', 'B', 'C'))

Adding nodes with weights: >>> G.add_nodes_from(nodes=[‘D’, ‘E’], weights=[0.3, 0.6]) >>> G.nodes[‘D’] {‘weight’: 0.3} >>> G.nodes[‘E’] {‘weight’: 0.6} >>> G.nodes[‘A’] {‘weight’: None}

do(nodes, inplace=False)[source]

Applies the do operator to the graph and returns a new DAG with the transformed graph.

The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.

Parameters:
  • nodes (list, array-like) – The names of the nodes to apply the do-operator for.

  • inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.

Returns:

Modified DAG – A new instance of DAG modified by the do-operator

Return type:

pgmpy.base.DAG

Examples

Initialize a DAG >>> graph = DAG() >>> graph.add_edges_from([(‘X’, ‘A’), … (‘A’, ‘Y’), … (‘A’, ‘B’)]) >>> # Applying the do-operator will return a new DAG with the desired structure. >>> graph_do_A = graph.do(‘A’) >>> # Which we can verify is missing the edges we would expect. >>> graph_do_A.edges OutEdgeView([(‘A’, ‘B’), (‘A’, ‘Y’)])

References

Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.

get_ancestral_graph(nodes)[source]

Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.

Parameters:

node (iterable) – List of nodes whose ancestral graph needs to be computed.

Returns:

Ancestral Graph

Return type:

pgmpy.base.DAG

Examples

>>> from pgmpy.base import DAG
>>> dag = DAG([('A', 'C'), ('B', 'C'), ('D', 'A'), ('D', 'B')])
>>> anc_dag = dag.get_ancestral_graph(nodes=['A', 'B'])
>>> anc_dag.edges()
OutEdgeView([('D', 'A'), ('D', 'B')])
get_children(node)[source]

Returns a list of children of node. Throws an error if the node is not present in the graph.

Parameters:

node (string, int or any hashable python object.) – The node whose children would be returned.

Examples

>>> from pgmpy.base import DAG
>>> g = DAG(ebunch=[('A', 'B'), ('C', 'B'), ('B', 'D'),
                              ('B', 'E'), ('B', 'F'), ('E', 'G')])
>>> g.get_children(node='B')
['D', 'E', 'F']
get_immoralities()[source]

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

Returns:

Immoralities – A set of all the immoralities in the model

Return type:

set

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
...                         ('intel', 'SAT'), ('grade', 'letter')])
>>> student.get_immoralities()
{('diff', 'intel')}
get_independencies(latex=False, include_latents=False)[source]

Computes independencies in the DAG, by checking d-seperation.

Parameters:
  • latex (boolean) – If latex=True then latex string of the independence assertion would be created.

  • include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.

Examples

>>> from pgmpy.base import DAG
>>> chain = DAG([('X', 'Y'), ('Y', 'Z')])
>>> chain.get_independencies()
(X ⟂ Z | Y)
(Z ⟂ X | Y)
get_leaves()[source]

Returns a list of leaves of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D')])
>>> graph.get_leaves()
['C', 'D']
get_markov_blanket(node)[source]

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns:

Markov Blanket – List of nodes in the markov blanket of node.

Return type:

list

Parameters:

node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

Examples

>>> from pgmpy.base import DAG
>>> from pgmpy.factors.discrete import TabularCPD
>>> G = DAG([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'),
                       ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')])
>>> G.get_markov_blanket('y')
['s', 'w', 'x', 'u', 'z', 'v']
get_parents(node)[source]

Returns a list of parents of node.

Throws an error if the node is not present in the graph.

Parameters:

node (string, int or any hashable python object.) – The node whose parents would be returned.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')])
>>> G.get_parents(node='grade')
['diff', 'intel']
static get_random(n_nodes=5, edge_prob=0.5, node_names=None, latents=False)[source]

Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.

Parameters:
  • n_nodes (int) – The number of nodes in the randomly generated DAG.

  • edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.

  • node_names (list (default: None)) – A list of variables names to use in the random graph. If None, the node names are integer values starting from 0.

  • latents (bool (default: False)) – If True, includes latent variables in the generated DAG.

Returns:

Random DAG – The randomly generated DAG.

Return type:

pgmpy.base.DAG

Examples

>>> from pgmpy.base import DAG
>>> random_dag = DAG.get_random(n_nodes=10, edge_prob=0.3)
>>> random_dag.nodes()
NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
>>> random_dag.edges()
OutEdgeView([(0, 6), (1, 6), (1, 7), (7, 9), (2, 5), (2, 7), (2, 8), (5, 9), (3, 7)])
get_roots()[source]

Returns a list of roots of the graph.

Examples

>>> from pgmpy.base import DAG
>>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D'), ('E', 'B')])
>>> graph.get_roots()
['A', 'E']
in_degree_iter(nbunch=None, weight=None)[source]
is_dconnected(start, end, observed=None)[source]

Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.

Parameters:
  • start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.

  • end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.

  • observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_nodes_from(['diff', 'intel', 'grades', 'letter', 'sat'])
>>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades'), ('grades', 'letter'),
...                         ('intel', 'sat')])
>>> student.is_dconnected('diff', 'intel')
False
>>> student.is_dconnected('grades', 'sat')
True
is_iequivalent(model)[source]

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Parameters:

model (A DAG object, for which you want to check I-equivalence)

Returns:

I-equivalence – True if both are I-equivalent, False otherwise

Return type:

boolean

Examples

>>> from pgmpy.base import DAG
>>> G = DAG()
>>> G.add_edges_from([('V', 'W'), ('W', 'X'),
...                   ('X', 'Y'), ('Z', 'Y')])
>>> G1 = DAG()
>>> G1.add_edges_from([('W', 'V'), ('X', 'W'),
...                    ('X', 'Y'), ('Z', 'Y')])
>>> G.is_iequivalent(G1)
True
local_independencies(variables)[source]

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:

variables (str or array like) – variables whose local independencies are to be found.

Examples

>>> from pgmpy.base import DAG
>>> student = DAG()
>>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'),
>>>                         ('grade', 'letter'), ('intel', 'SAT')])
>>> ind = student.local_independencies('grade')
>>> ind
(grade ⟂ SAT | diff, intel)
minimal_dseparator(start, end)[source]

Finds the minimal d-separating set for start and end.

Parameters:
  • start (node) – The first node.

  • end (node) – The second node.

Examples

>>> dag = DAG([('A', 'B'), ('B', 'C')])
>>> dag.minimal_dseparator(start='A', end='C')
{'B'}

References

[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.

moralize()[source]

Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

Examples

>>> from pgmpy.base import DAG
>>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')])
>>> moral_graph = G.moralize()
>>> moral_graph.edges()
EdgeView([('intel', 'grade'), ('intel', 'diff'), ('grade', 'diff')])
out_degree_iter(nbunch=None, weight=None)[source]
to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]

Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.

Parameters:
  • node_pos (str or dict (default: circular)) –

    If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,

    spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.

    If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.

    If no argument is provided uses circular layout.

  • latex (boolean) – Whether to use latex for rendering the node names.

  • pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}

  • edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }

  • node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }

Returns:

Daft object – Daft object for plotting the DAG.

Return type:

daft.PGM object

Examples

>>> from pgmpy.base import DAG
>>> dag = DAG([('a', 'b'), ('b', 'c'), ('d', 'c')])
>>> dag.to_daft(node_pos={'a': (0, 0), 'b': (1, 0), 'c': (2, 0), 'd': (1, 1)})
<daft.PGM at 0x7fc756e936d0>
>>> dag.to_daft(node_pos="circular")
<daft.PGM at 0x7f9bb48c5eb0>
>>> dag.to_daft(node_pos="circular", pgm_params={'observed_style': 'inner'})
<daft.PGM at 0x7f9bb48b0bb0>
>>> dag.to_daft(node_pos="circular",
...             edge_params={('a', 'b'): {'label': 2}},
...             node_params={'a': {'shape': 'rectangle'}})
<daft.PGM at 0x7f9bb48b0bb0>
to_graphviz()[source]

Retuns a pygraphviz object for the DAG. pygraphviz is useful for visualizing the network structure.

Examples

>>> from pgmpy.utils import get_example_model
>>> model = get_example_model('alarm')
>>> model.to_graphviz()
<AGraph <Swig Object of type 'Agraph_t *' at 0x7fdea4cde040>>
>>> model.draw('model.png', prog='neato')
class pgmpy.base.DAG.PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]

Class for representing PDAGs (also known as CPDAG). PDAGs are the equivalence classes of DAGs and contain both directed and undirected edges.

Note: In this class, undirected edges are represented using two edges in both direction i.e. an undirected edge between X - Y is represented using X -> Y and X <- Y.

copy()[source]

Returns a copy of the object instance.

Returns:

Copy of PDAG – Returns a copy of self.

Return type:

pgmpy.dag.PDAG

to_dag(required_edges=[])[source]

Returns one possible DAG which is represented using the PDAG.

Parameters:

required_edges (list, array-like of 2-tuples) – The list of edges that should be included in the DAG.

Return type:

Returns an instance of DAG.

Examples

to_graphviz()[source]

Retuns a pygraphviz object for the DAG. pygraphviz is useful for visualizing the network structure.

Examples

>>> from pgmpy.utils import get_example_model
>>> model = get_example_model('alarm')
>>> model.to_graphviz()
<AGraph <Swig Object of type 'Agraph_t *' at 0x7fdea4cde040>>

Partially Directed Acyclic Graph (PDAG or CPDAG)

Class for representing PDAGs (also known as CPDAG). PDAGs are the equivalence classes of DAGs and contain both directed and undirected edges.

Note: In this class, undirected edges are represented using two edges in both direction i.e. an undirected edge between X - Y is represented using X -> Y and X <- Y.