Directed Acyclic Graph (DAG)¶
- class pgmpy.base.DAG(ebunch=None, latents={})[source]¶
Base class for all Directed Graphical Models.
Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.
- Parameters:
data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.
Examples
Create an empty DAG with no nodes and no edges
>>> from pgmpy.base import DAG >>> G = DAG()
G can be grown in several ways:
Nodes:
Add one node at a time:
>>> G.add_node(node='a')
Add the nodes from any container (a list, set or tuple or the nodes from another graph).
>>> G.add_nodes_from(nodes=['a', 'b'])
Edges:
G can also be grown by adding edges.
Add one edge,
>>> G.add_edge(u='a', v='b')
a list of edges,
>>> G.add_edges_from(ebunch=[('a', 'b'), ('b', 'c')])
If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.
Shortcuts:
Many common graph features allow python syntax for speed reporting.
>>> 'a' in G # check if node in graph True >>> len(G) # number of nodes in graph 3
- active_trail_nodes(variables, observed=None, include_latents=False)[source]¶
Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.
- Parameters:
variables (str or array like) – variables whose active trails are to be found.
observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_nodes_from(['diff', 'intel', 'grades']) >>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades')]) >>> student.active_trail_nodes('diff') {'diff': {'diff', 'grades'}} >>> student.active_trail_nodes(['diff', 'intel'], observed='grades') {'diff': {'diff', 'intel'}, 'intel': {'diff', 'intel'}}
References
Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1
- add_edge(u, v, weight=None)[source]¶
Add an edge between u and v.
The nodes u and v will be automatically added if they are not already in the graph.
- Parameters:
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles']) >>> G.add_edge(u='Alice', v='Bob') >>> G.nodes() NodeView(('Alice', 'Bob', 'Charles')) >>> G.edges() OutEdgeView([('Alice', 'Bob')])
When the node is not already present in the graph: >>> G.add_edge(u=’Alice’, v=’Ankur’) >>> G.nodes() NodeView((‘Alice’, ‘Ankur’, ‘Bob’, ‘Charles’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Alice’, ‘Ankur’)])
Adding edges with weight: >>> G.add_edge(‘Ankur’, ‘Maria’, weight=0.1) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.1}
- add_edges_from(ebunch, weights=None)[source]¶
Add all the edges in ebunch.
If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.
**The behavior of adding weights is different than networkx.
- Parameters:
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_nodes_from(nodes=['Alice', 'Bob', 'Charles']) >>> G.add_edges_from(ebunch=[('Alice', 'Bob'), ('Bob', 'Charles')]) >>> G.nodes() NodeView(('Alice', 'Bob', 'Charles')) >>> G.edges() OutEdgeView([('Alice', 'Bob'), ('Bob', 'Charles')])
When the node is not already in the model: >>> G.add_edges_from(ebunch=[(‘Alice’, ‘Ankur’)]) >>> G.nodes() NodeView((‘Alice’, ‘Bob’, ‘Charles’, ‘Ankur’)) >>> G.edges() OutEdgeView([(‘Alice’, ‘Bob’), (‘Bob’, ‘Charles’), (‘Alice’, ‘Ankur’)])
Adding edges with weights: >>> G.add_edges_from([(‘Ankur’, ‘Maria’), (‘Maria’, ‘Mason’)], … weights=[0.3, 0.5]) >>> G.edge[‘Ankur’][‘Maria’] {‘weight’: 0.3} >>> G.edge[‘Maria’][‘Mason’] {‘weight’: 0.5}
- add_node(node, weight=None, latent=False)[source]¶
Adds a single node to the Graph.
- Parameters:
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_node(node='A') >>> sorted(G.nodes()) ['A']
Adding a node with some weight. >>> G.add_node(node=’B’, weight=0.3)
The weight of these nodes can be accessed as: >>> G.nodes[‘B’] {‘weight’: 0.3} >>> G.nodes[‘A’] {‘weight’: None}
- add_nodes_from(nodes, weights=None, latent=False)[source]¶
Add multiple nodes to the Graph.
**The behviour of adding weights is different than in networkx.
- Parameters:
nodes (iterable container) – A container of nodes (list, dict, set, or any hashable python object).
weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the variable at index i.
latent (list, tuple (default=False)) – A container of boolean. The value at index i tells whether the node at index i is latent or not.
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_nodes_from(nodes=['A', 'B', 'C']) >>> G.nodes() NodeView(('A', 'B', 'C'))
Adding nodes with weights: >>> G.add_nodes_from(nodes=[‘D’, ‘E’], weights=[0.3, 0.6]) >>> G.nodes[‘D’] {‘weight’: 0.3} >>> G.nodes[‘E’] {‘weight’: 0.6} >>> G.nodes[‘A’] {‘weight’: None}
- do(nodes, inplace=False)[source]¶
Applies the do operator to the graph and returns a new DAG with the transformed graph.
The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.
- Parameters:
nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.
- Returns:
Modified DAG – A new instance of DAG modified by the do-operator
- Return type:
Examples
Initialize a DAG >>> graph = DAG() >>> graph.add_edges_from([(‘X’, ‘A’), … (‘A’, ‘Y’), … (‘A’, ‘B’)]) >>> # Applying the do-operator will return a new DAG with the desired structure. >>> graph_do_A = graph.do(‘A’) >>> # Which we can verify is missing the edges we would expect. >>> graph_do_A.edges OutEdgeView([(‘A’, ‘B’), (‘A’, ‘Y’)])
References
Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.
- get_ancestral_graph(nodes)[source]¶
Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.
- Parameters:
node (iterable) – List of nodes whose ancestral graph needs to be computed.
- Returns:
Ancestral Graph
- Return type:
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([('A', 'C'), ('B', 'C'), ('D', 'A'), ('D', 'B')]) >>> anc_dag = dag.get_ancestral_graph(nodes=['A', 'B']) >>> anc_dag.edges() OutEdgeView([('D', 'A'), ('D', 'B')])
- get_children(node)[source]¶
Returns a list of children of node. Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose children would be returned.
Examples
>>> from pgmpy.base import DAG >>> g = DAG(ebunch=[('A', 'B'), ('C', 'B'), ('B', 'D'), ('B', 'E'), ('B', 'F'), ('E', 'G')]) >>> g.get_children(node='B') ['D', 'E', 'F']
- get_immoralities()[source]¶
Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .
- Returns:
Immoralities – A set of all the immoralities in the model
- Return type:
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'), ... ('intel', 'SAT'), ('grade', 'letter')]) >>> student.get_immoralities() {('diff', 'intel')}
- get_independencies(latex=False, include_latents=False)[source]¶
Computes independencies in the DAG, by checking d-seperation.
- Parameters:
latex (boolean) – If latex=True then latex string of the independence assertion would be created.
include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.
Examples
>>> from pgmpy.base import DAG >>> chain = DAG([('X', 'Y'), ('Y', 'Z')]) >>> chain.get_independencies() (X ⟂ Z | Y) (Z ⟂ X | Y)
- get_leaves()[source]¶
Returns a list of leaves of the graph.
Examples
>>> from pgmpy.base import DAG >>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D')]) >>> graph.get_leaves() ['C', 'D']
- get_markov_blanket(node)[source]¶
Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.
- Returns:
Markov Blanket – List of nodes in the markov blanket of node.
- Return type:
- Parameters:
node (string, int or any hashable python object.) – The node whose markov blanket would be returned.
Examples
>>> from pgmpy.base import DAG >>> from pgmpy.factors.discrete import TabularCPD >>> G = DAG([('x', 'y'), ('z', 'y'), ('y', 'w'), ('y', 'v'), ('u', 'w'), ('s', 'v'), ('w', 't'), ('w', 'm'), ('v', 'n'), ('v', 'q')]) >>> G.get_markov_blanket('y') ['s', 'w', 'x', 'u', 'z', 'v']
- get_parents(node)[source]¶
Returns a list of parents of node.
Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose parents would be returned.
Examples
>>> from pgmpy.base import DAG >>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')]) >>> G.get_parents(node='grade') ['diff', 'intel']
- static get_random(n_nodes=5, edge_prob=0.5, latents=False)[source]¶
Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.
- Parameters:
- Returns:
Random DAG – The randomly generated DAG.
- Return type:
Examples
>>> from pgmpy.base import DAG >>> random_dag = DAG.get_random(n_nodes=10, edge_prob=0.3) >>> random_dag.nodes() NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9)) >>> random_dag.edges() OutEdgeView([(0, 6), (1, 6), (1, 7), (7, 9), (2, 5), (2, 7), (2, 8), (5, 9), (3, 7)])
- get_roots()[source]¶
Returns a list of roots of the graph.
Examples
>>> from pgmpy.base import DAG >>> graph = DAG([('A', 'B'), ('B', 'C'), ('B', 'D'), ('E', 'B')]) >>> graph.get_roots() ['A', 'E']
- is_dconnected(start, end, observed=None)[source]¶
Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.
- Parameters:
start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_nodes_from(['diff', 'intel', 'grades', 'letter', 'sat']) >>> student.add_edges_from([('diff', 'grades'), ('intel', 'grades'), ('grades', 'letter'), ... ('intel', 'sat')]) >>> student.is_dconnected('diff', 'intel') False >>> student.is_dconnected('grades', 'sat') True
- is_iequivalent(model)[source]¶
Checks whether the given model is I-equivalent
Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.
- Parameters:
model (A DAG object, for which you want to check I-equivalence) –
- Returns:
I-equivalence – True if both are I-equivalent, False otherwise
- Return type:
boolean
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_edges_from([('V', 'W'), ('W', 'X'), ... ('X', 'Y'), ('Z', 'Y')]) >>> G1 = DAG() >>> G1.add_edges_from([('W', 'V'), ('X', 'W'), ... ('X', 'Y'), ('Z', 'Y')]) >>> G.is_iequivalent(G1) True
- local_independencies(variables)[source]¶
Returns an instance of Independencies containing the local independencies of each of the variables.
- Parameters:
variables (str or array like) – variables whose local independencies are to be found.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_edges_from([('diff', 'grade'), ('intel', 'grade'), >>> ('grade', 'letter'), ('intel', 'SAT')]) >>> ind = student.local_independencies('grade') >>> ind (grade ⟂ SAT | diff, intel)
- minimal_dseparator(start, end)[source]¶
Finds the minimal d-separating set for start and end.
- Parameters:
start (node) – The first node.
end (node) – The second node.
Examples
>>> dag = DAG([('A', 'B'), ('B', 'C')]) >>> dag.minimal_dseparator(start='A', end='C') {'B'}
References
[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.
- moralize()[source]¶
Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).
A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.
Examples
>>> from pgmpy.base import DAG >>> G = DAG(ebunch=[('diff', 'grade'), ('intel', 'grade')]) >>> moral_graph = G.moralize() >>> moral_graph.edges() EdgeView([('intel', 'grade'), ('intel', 'diff'), ('grade', 'diff')])
- to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]¶
Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.
- Parameters:
node_pos (str or dict (default: circular)) –
- If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,
spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.
If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.
If no argument is provided uses circular layout.
latex (boolean) – Whether to use latex for rendering the node names.
pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}
edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }
node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }
- Returns:
Daft object – Daft object for plotting the DAG.
- Return type:
daft.PGM object
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([('a', 'b'), ('b', 'c'), ('d', 'c')]) >>> dag.to_daft(node_pos={'a': (0, 0), 'b': (1, 0), 'c': (2, 0), 'd': (1, 1)}) <daft.PGM at 0x7fc756e936d0> >>> dag.to_daft(node_pos="circular") <daft.PGM at 0x7f9bb48c5eb0> >>> dag.to_daft(node_pos="circular", pgm_params={'observed_style': 'inner'}) <daft.PGM at 0x7f9bb48b0bb0> >>> dag.to_daft(node_pos="circular", ... edge_params={('a', 'b'): {'label': 2}}, ... node_params={'a': {'shape': 'rectangle'}}) <daft.PGM at 0x7f9bb48b0bb0>
Partial Directed Acyclic Graph (PDAG)¶
- class pgmpy.base.PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]¶
Class for representing PDAGs (also known as CPDAG). PDAGs are the equivance classes of DAGs and contain both directed and undirected edges.
Note: In this class, undirected edges are represented using two edges in both direction i.e. an undirected edge between X - Y is represented using X -> Y and X <- Y.