Directed Acyclic Graph (DAG)¶
- class pgmpy.base.DAG(ebunch: Iterable[tuple[Hashable, Hashable]] | None = None, latents: set[Hashable] = {}, exposures: set[Hashable] = {}, outcomes: set[Hashable] = {}, roles=None)[source]¶
Directed Graphical Model, graph with vertex roles.
Each node in the graph can represent either a random variable,
Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.Abstract roles can be assigned to nodes in the graph, such as exposure, outcome, adjustment set, etc. These roles are used, or created, by algorithms that use the graph, such as causal inference, causal discovery, causal prediction.
- Parameters:
ebunch (input graph, optional) – Data to initialize graph. If None (default) an empty graph is created. The data can be any format that is supported by the to_networkx_graph() function, currently including edge list, dict of dicts, dict of lists, NetworkX graph, 2D NumPy array, SciPy sparse matrix, or PyGraphviz graph.
latents (set of nodes, default=set()) – A set of latent variables in the graph. These are not observed variables but are used to represent unobserved confounding or other latent structures.
exposures (set, default=set()) – Set of exposure variables in the graph. These are the variables that represent the treatment or intervention being studied in a causal analysis. Default is an empty set.
outcomes (set, default=set()) – Set of outcome variables in the graph. These are the variables that represent the response or dependent variables being studied in a causal analysis. Default is an empty set.
roles (dict, optional (default: None)) – A dictionary mapping roles to node names. The keys are roles, and the values are role names (strings or iterables of str). If provided, this will automatically assign roles to the nodes in the graph. Passing a key-value pair via
rolesis equivalent to callingwith_role(role, variables)for each key-value pair in the dictionary.
Examples
Create an empty DAG with no nodes and no edges
>>> from pgmpy.base import DAG >>> G = DAG()
Edges and vertices can be passed to the constructor as an edge list.
>>> G = DAG(ebunch=[("a", "b"), ("b", "c")])
G can be also grown incrementally, in several ways:
Nodes:
Add one node at a time:
>>> G.add_node(node="a")
Add the nodes from any container (a list, set or tuple or the nodes from another graph).
>>> G.add_nodes_from(nodes=["a", "b"])
Edges:
G can also be grown by adding edges.
Add one edge,
>>> G.add_edge(u="a", v="b")
a list of edges,
>>> G.add_edges_from(ebunch=[("a", "b"), ("b", "c")])
If some edges connect nodes not yet in the model, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.
Shortcuts:
Many common graph features allow python syntax for speed reporting.
>>> "a" in G # check if node in graph True >>> len(G) # number of nodes in graph 3
Roles can be assigned to nodes in the graph at construction or using methods.
At construction:
>>> G = DAG( ... ebunch=[("U", "X"), ("X", "M"), ("M", "Y"), ("U", "Y")], ... roles={"exposure": "X", "outcome": "Y"}, ... )
Roles can also be assigned after creation using the
with_rolemethod.>>> G = G.with_role("adjustment", {"U", "M"})
Vertices of a specific role can be retrieved using the
get_rolemethod.>>> G.get_role("exposure") ['X'] >>> G.get_role("adjustment") ['U', 'M']
- Latents:
Latent variables can be managed using the latents parameter at initialization or by assigning the “latents” role to nodes. The latents parameter is a convenient shortcut for roles={‘latents’: …}.
Create a graph with initial latent variables ‘U’ and ‘V’, and exposure ‘X’:
>>> from pgmpy.base import DAG >>> G = DAG( ... ebunch=[("U", "X"), ("X", "M"), ("M", "Y"), ("U", "Y"), ("V", "M")], ... latents={"U", "V"}, ... exposures={"X"}, ... ) >>> sorted(G.latents) ['U', 'V'] >>> G.exposures {'X'}
Add a new latent variable ‘Z’ using the role system:
>>> G.add_node("Z") >>> G.with_role(role="latents", variables="Z", inplace=True) >>> sorted(G.latents) ['U', 'V', 'Z']
You can also check for latents using the get_role method:
>>> sorted(G.get_role(role="latents")) ['U', 'V', 'Z']
Remove a latent variable from the role:
>>> G.without_role(role="latents", variables="V", inplace=True) >>> sorted(G.latents) ['U', 'Z']
- active_trail_nodes(variables: list[Hashable] | Hashable, observed: Hashable | list[Hashable] | tuple[Hashable, Hashable] | None = None, include_latents=False) dict[Hashable, set[Hashable]][source]¶
Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.
- Parameters:
variables (str or array like) – variables whose active trails are to be found.
observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_nodes_from(["diff", "intel", "grades"]) >>> student.add_edges_from([("diff", "grades"), ("intel", "grades")]) >>> student.active_trail_nodes("diff") {'diff': {'diff', 'grades'}} >>> student.active_trail_nodes(["diff", "intel"], observed="grades") {'diff': {'diff', 'intel'}, 'intel': {'diff', 'intel'}}
References
Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1
- add_edge(u: Hashable, v: Hashable, weight: int | float | None = None)[source]¶
Add an edge between u and v.
The nodes u and v will be automatically added if they are not already in the graph.
- Parameters:
u (nodes) – Nodes can be any hashable Python object.
v (nodes) – Nodes can be any hashable Python object.
weight (int, float (default=None)) – The weight of the edge
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_nodes_from(nodes=["Alice", "Bob", "Charles"]) >>> G.add_edge(u="Alice", v="Bob") >>> G.nodes() NodeView(('Alice', 'Bob', 'Charles')) >>> G.edges() OutEdgeView([('Alice', 'Bob')])
When the node is not already present in the graph:
>>> G.add_edge(u="Alice", v="Ankur") >>> G.nodes() NodeView(('Alice', 'Ankur', 'Bob', 'Charles')) >>> G.edges() OutEdgeView([('Alice', 'Bob'), ('Alice', 'Ankur')])
Adding edges with weight:
>>> G.add_edge("Ankur", "Maria", weight=0.1) >>> G.edge["Ankur"]["Maria"] {'weight': 0.1}
- add_edges_from(ebunch: Iterable[tuple[Hashable, Hashable]], weights: list[float] | tuple[float] | None = None)[source]¶
Add all the edges in ebunch.
If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.
**The behavior of adding weights is different than networkx.
- Parameters:
ebunch (container of edges) – Each edge given in the container will be added to the graph. The edges must be given as 2-tuples (u, v).
weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the edge at index i.
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_nodes_from(nodes=["Alice", "Bob", "Charles"]) >>> G.add_edges_from(ebunch=[("Alice", "Bob"), ("Bob", "Charles")]) >>> G.nodes() NodeView(('Alice', 'Bob', 'Charles')) >>> G.edges() OutEdgeView([('Alice', 'Bob'), ('Bob', 'Charles')])
When the node is not already in the model:
>>> G.add_edges_from(ebunch=[("Alice", "Ankur")]) >>> G.nodes() NodeView(('Alice', 'Bob', 'Charles', 'Ankur')) >>> G.edges() OutEdgeView([('Alice', 'Bob'), ('Bob', 'Charles'), ('Alice', 'Ankur')])
Adding edges with weights:
>>> G.add_edges_from( ... [("Ankur", "Maria"), ("Maria", "Mason")], weights=[0.3, 0.5] ... ) >>> G.edge["Ankur"]["Maria"] {'weight': 0.3} >>> G.edge["Maria"]["Mason"] {'weight': 0.5}
or
>>> G.add_edges_from([("Ankur", "Maria", 0.3), ("Maria", "Mason", 0.5)])
- do(nodes: Hashable | Iterable[Hashable] | tuple[Hashable, Hashable], inplace=False)[source]¶
Applies the do operator to the graph and returns a new DAG with the transformed graph.
The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.
- Parameters:
nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.
- Returns:
Modified DAG – A new instance of DAG modified by the do-operator
- Return type:
Examples
Initialize a DAG
>>> graph = DAG() >>> graph.add_edges_from([("X", "A"), ("A", "Y"), ("A", "B")]) >>> # Applying the do-operator will return a new DAG with the desired structure. >>> graph_do_A = graph.do("A") >>> # Which we can verify is missing the edges we would expect. >>> graph_do_A.edges OutEdgeView([('A', 'B'), ('A', 'Y')])
References
Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.
- edge_strength(data, edges=None)[source]¶
Computes the strength of each edge in edges. The strength is bounded between 0 and 1, with 1 signifying strong effect.
The edge strength is defined as the effect size measure of a Conditional Independence test using the parents as the conditional set. The strength quantifies the effect of edge[0] on edge[1] after controlling for any other influence paths. We use a residualization-based CI test[1] to compute the strengths.
Interpretation: - The strength is the Pillai’s Trace effect size of partial correlation. - Measures the strength of linear relationship between the residuals. - Works for any mixture of categorical and continuous variables. - The value is bounded between 0 and 1: - Strength close to 1 → strong dependence. - Strength close to 0 → conditional independence.
- Parameters:
data (pandas.DataFrame) – Dataset to compute edge strengths on.
edges (tuple, list, or None (default: None)) –
None: Compute for all DAG edges.
Tuple (X, Y): Compute for edge X → Y.
List of tuples: Compute for selected edges.
- Returns:
Dictionary mapping edges to their strength values.
- Return type:
dict
Examples
>>> from pgmpy.models import LinearGaussianBayesianNetwork as LGBN >>> # Create a linear Gaussian Bayesian network >>> linear_model = LGBN([("X", "Y"), ("Z", "Y")]) >>> # Create CPDs with specific beta values >>> x_cpd = LinearGaussianCPD(variable="X", beta=[0], std=1) >>> y_cpd = LinearGaussianCPD( ... variable="Y", beta=[0, 0.4, 0.6], std=1, evidence=["X", "Z"] ... ) >>> z_cpd = LinearGaussianCPD(variable="Z", beta=[0], std=1) >>> # Add CPDs to the model >>> linear_model.add_cpds(x_cpd, y_cpd, z_cpd) >>> # Simulate data from the model >>> data = linear_model.simulate(n_samples=int(1e4)) >>> # Create DAG and compute edge strengths >>> dag = DAG([("X", "Y"), ("Z", "Y")]) >>> strengths = dag.edge_strength(data) {('X', 'Y'): np.float64(0.14587166611282304), ('Z', 'Y'): np.float64(0.25683780900125613)}
References
[1] Ankan, Ankur, and Johannes Textor. “A simple unified approach to testing high-dimensional conditional independences for categorical and ordinal data.” Proceedings of the AAAI Conference on Artificial Intelligence.
- classmethod from_dagitty(string=None, filename=None) DAG[source]¶
Initializes a DAG instance using DAGitty syntax.
Creates a DAG from the dagitty string. If parameter beta is specified in the DAGitty string, the method returns a LinearGaussianBayesianNetwork instead of a plain DAG.
- Parameters:
string (str (default: None)) – A DAGitty style multiline set of regression equation representing the model. Refer https://www.dagitty.net/manual-3.x.pdf#page=3.58 and https://github.com/jtextor/dagitty/blob/7a657776dc8f5e5ba4e323edb028e2c2aaf29327/gui/js/dagitty.js#L3417
filename (str (default: None)) – The filename of the file containing the model in DAGitty syntax.
Examples
>>> from pgmpy.base import DAG >>> dag = DAG.from_dagitty( ... "dag{'carry matches' [latent] cancer [outcome] smoking -> 'carry matches' [beta=0.2] " ... "smoking -> cancer [beta=0.5] 'carry matches' -> cancer }" ... )
Creating a Linear Gaussian Bayesian network from dagitty:
>>> from pgmpy.base import DAG >>> from pgmpy.models import LinearGaussianBayesianNetwork as LGBN
# Specifying beta creates a LinearGaussianBayesianNetwork instance >>> dag = DAG.from_dagitty(“dag{X -> Y [beta=0.3] Y -> Z [beta=0.1]}”) >>> data = dag.simulate(n_samples=int(1e4))
>>> from pgmpy.base import DAG >>> from pgmpy.models import LinearGaussianBayesianNetwork as LGBN
- classmethod from_lavaan(string: str | None = None, filename: str | PathLike | None = None) DAG[source]¶
Initializes a DAG instance using lavaan syntax.
- Parameters:
string (str (default: None)) – A lavaan style multiline set of regression equation representing the model. Refer http://lavaan.ugent.be/tutorial/syntax1.html for details.
filename (str (default: None)) – The filename of the file containing the model in lavaan syntax.
Examples
- get_ancestors(nodes: str | tuple[Hashable, Hashable] | Iterable[Hashable]) set[Hashable][source]¶
Returns a dictionary of all ancestors of all the observed nodes including the node itself.
- Parameters:
nodes (string, list-type) – name of all the observed nodes
Examples
>>> from pgmpy.base import DAG >>> model = DAG([("D", "G"), ("I", "G"), ("G", "L"), ("I", "L")]) >>> model.get_ancestors("G") {'D', 'G', 'I'} >>> model.get_ancestors(["G", "I"]) {'D', 'G', 'I'}
- get_ancestral_graph(nodes: Iterable[Hashable])[source]¶
Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of at least one of the variables in node.
- Parameters:
node (iterable) – List of nodes whose ancestral graph needs to be computed.
- Returns:
Ancestral Graph
- Return type:
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([("A", "C"), ("B", "C"), ("D", "A"), ("D", "B")]) >>> anc_dag = dag.get_ancestral_graph(nodes=["A", "B"]) >>> anc_dag.edges() OutEdgeView([('D', 'A'), ('D', 'B')])
- get_children(node: Hashable)[source]¶
Returns a list of children of node. Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose children would be returned.
Examples
>>> from pgmpy.base import DAG >>> g = DAG( ... ebunch=[ ... ("A", "B"), ... ("C", "B"), ... ("B", "D"), ... ("B", "E"), ... ("B", "F"), ... ("E", "G"), ... ] ... ) >>> g.get_children(node="B") ['D', 'E', 'F']
- get_immoralities() dict[Hashable, list[tuple[Hashable, Hashable]]][source]¶
Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .
- Returns:
Immoralities – A set of all the immoralities in the model
- Return type:
set
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_edges_from( ... [ ... ("diff", "grade"), ... ("intel", "grade"), ... ("intel", "SAT"), ... ("grade", "letter"), ... ] ... ) >>> student.get_immoralities() {('diff', 'intel')}
- get_independencies(latex=False, include_latents=False) Independencies | list[str][source]¶
Computes independencies in the DAG, by checking minimal d-seperation.
- Parameters:
latex (boolean) – If latex=True then latex string of the independence assertion would be created.
include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.
Examples
>>> from pgmpy.base import DAG >>> chain = DAG([("X", "Y"), ("Y", "Z")]) >>> chain.get_independencies() (X ⟂ Z | Y)
- get_leaves()[source]¶
Returns a list of leaves of the graph.
Examples
>>> from pgmpy.base import DAG >>> graph = DAG([("A", "B"), ("B", "C"), ("B", "D")]) >>> graph.get_leaves() ['C', 'D']
- get_markov_blanket(node: Hashable) list[Hashable][source]¶
Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.
- Returns:
Markov Blanket – List of nodes in the markov blanket of node.
- Return type:
list
- Parameters:
node (string, int or any hashable python object.) – The node whose markov blanket would be returned.
Examples
>>> from pgmpy.base import DAG >>> from pgmpy.factors.discrete import TabularCPD >>> G = DAG( ... [ ... ("x", "y"), ... ("z", "y"), ... ("y", "w"), ... ("y", "v"), ... ("u", "w"), ... ("s", "v"), ... ("w", "t"), ... ("w", "m"), ... ("v", "n"), ... ("v", "q"), ... ] ... ) >>> G.get_markov_blanket("y") ['s', 'w', 'x', 'u', 'z', 'v']
- get_parents(node: Hashable)[source]¶
Returns a list of parents of node.
Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose parents would be returned.
Examples
>>> from pgmpy.base import DAG >>> G = DAG(ebunch=[("diff", "grade"), ("intel", "grade")]) >>> G.get_parents(node="grade") ['diff', 'intel']
- static get_random(n_nodes=5, edge_prob=0.5, node_names: list[Hashable] | None = None, latents=False, seed: int | None = None) DAG[source]¶
Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.
- Parameters:
n_nodes (int) – The number of nodes in the randomly generated DAG.
edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.
node_names (list (default: None)) – A list of variables names to use in the random graph. If None, the node names are integer values starting from 0.
latents (bool (default: False)) – If True, includes latent variables in the generated DAG.
seed (int (default: None)) – The seed for the random number generator.
- Returns:
Random DAG – The randomly generated DAG.
- Return type:
Examples
>>> from pgmpy.base import DAG >>> random_dag = DAG.get_random(n_nodes=10, edge_prob=0.3) >>> random_dag.nodes() NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9)) >>> random_dag.edges() OutEdgeView([(0, 6), (1, 6), (1, 7), (7, 9), (2, 5), (2, 7), (2, 8), (5, 9), (3, 7)])
- get_roots()[source]¶
Returns a list of roots of the graph.
Examples
>>> from pgmpy.base import DAG >>> graph = DAG([("A", "B"), ("B", "C"), ("B", "D"), ("E", "B")]) >>> graph.get_roots() ['A', 'E']
- is_dconnected(start: Hashable, end: Hashable, observed: Sequence[Hashable] | None = None, include_latents=False)[source]¶
Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.
- Parameters:
start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – If true, latent variables are return as part of the active trail.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_nodes_from(["diff", "intel", "grades", "letter", "sat"]) >>> student.add_edges_from( ... [ ... ("diff", "grades"), ... ("intel", "grades"), ... ("grades", "letter"), ... ("intel", "sat"), ... ] ... ) >>> student.is_dconnected("diff", "intel") False >>> student.is_dconnected("grades", "sat") True
- is_iequivalent(model: DAG)[source]¶
Checks whether the given model is I-equivalent
Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.
- Parameters:
model (A DAG object, for which you want to check I-equivalence)
- Returns:
I-equivalence – True if both are I-equivalent, False otherwise
- Return type:
boolean
Examples
>>> from pgmpy.base import DAG >>> G = DAG() >>> G.add_edges_from([("V", "W"), ("W", "X"), ("X", "Y"), ("Z", "Y")]) >>> G1 = DAG() >>> G1.add_edges_from([("W", "V"), ("X", "W"), ("X", "Y"), ("Z", "Y")]) >>> G.is_iequivalent(G1) True
- local_independencies(variables: list[Hashable] | tuple[Hashable, ...] | str)[source]¶
Returns an instance of Independencies containing the local independencies of each of the variables.
- Parameters:
variables (str or array like) – variables whose local independencies are to be found.
Examples
>>> from pgmpy.base import DAG >>> student = DAG() >>> student.add_edges_from( ... [ ... ("diff", "grade"), ... ("intel", "grade"), ... ("grade", "letter"), ... ("intel", "SAT"), ... ] ... ) >>> ind = student.local_independencies("grade") >>> ind (grade ⟂ SAT | diff, intel)
- minimal_dseparator(start: Hashable, end: Hashable, include_latents=False) set[Hashable][source]¶
Finds the minimal d-separating set for start and end.
- Parameters:
start (node) – The first node.
end (node) – The second node.
include_latents (boolean (default: False)) – If true, latent variables are consider for minimal d-seperator.
Examples
>>> dag = DAG([("A", "B"), ("B", "C")]) >>> dag.minimal_dseparator(start="A", end="C") {'B'}
References
- [1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and
- Judea Pearl. Finding minimal d-separators. Computer Science Department,
University of California, 1998.
- moralize()[source]¶
Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).
A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.
Examples
>>> from pgmpy.base import DAG >>> G = DAG(ebunch=[("diff", "grade"), ("intel", "grade")]) >>> moral_graph = G.moralize() >>> moral_graph.edges() EdgeView([('intel', 'grade'), ('intel', 'diff'), ('grade', 'diff')])
- to_daft(node_pos: str | dict[Hashable, tuple[int, int]] = 'circular', latex=True, pgm_params={}, edge_params={}, node_params={}, plot_edge_strength=False)[source]¶
Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.
- Parameters:
node_pos (str or dict (default: circular)) –
- If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,
- spectral, spiral. Please refer:
- https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout
for details on these layouts.
If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.
If no argument is provided uses circular layout.
latex (boolean) – Whether to use latex for rendering the node names.
pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}
edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }
node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }
plot_edge_strength (bool (default: False)) – If True, displays edge strength values as labels on edges. Requires edge strengths to be computed first using the edge_strength() method.
- Returns:
Daft object – Daft object for plotting the DAG.
- Return type:
daft.PGM object
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([("a", "b"), ("b", "c"), ("d", "c")]) >>> dag.to_daft(node_pos={"a": (0, 0), "b": (1, 0), "c": (2, 0), "d": (1, 1)}) <daft.PGM at 0x7fc756e936d0> >>> dag.to_daft(node_pos="circular") <daft.PGM at 0x7f9bb48c5eb0> >>> dag.to_daft(node_pos="circular", pgm_params={"observed_style": "inner"}) <daft.PGM at 0x7f9bb48b0bb0> >>> dag.to_daft( ... node_pos="circular", ... edge_params={("a", "b"): {"label": 2}}, ... node_params={"a": {"shape": "rectangle"}}, ... ) <daft.PGM at 0x7f9bb48b0bb0>
- to_dagitty() str[source]¶
Convert the DAG to dagitty syntax representation.
The dagitty syntax represents directed acyclic graphs using the dag { statements } format with -> for directed edges. Isolated nodes (nodes with no edges) are included as standalone nodes.
- Returns:
String representation of the DAG in dagitty syntax format.
- Return type:
str
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([("X", "Y"), ("Z", "Y")]) >>> print(dag.to_dagitty()) dag { X -> Y Z -> Y }
>>> dag2 = DAG([("A", "B"), ("B", "C")]) >>> print(dag2.to_dagitty()) dag { A -> B B -> C }
>>> # DAG with isolated node >>> dag3 = DAG() >>> dag3.add_nodes_from(["A", "B"]) >>> dag3.add_edge("A", "B") >>> dag3.add_node("C") # Isolated node >>> print(dag3.to_dagitty()) dag { A -> B C }
Notes
Node names are converted to string representations using str().
If node names contain spaces or special characters, they will be used as-is.
Users should ensure node names are valid in R/dagitty context if needed.
References
dagitty syntax: https://cran.r-project.org/web/packages/dagitty/dagitty.pdf
- to_graphviz(plot_edge_strength=False)[source]¶
Retuns a pygraphviz object for the DAG. pygraphviz is useful for visualizing the network structure.
- Parameters:
plot_edge_strength (bool (default: False)) – If True, displays edge strength values as labels on edges. Requires edge strengths to be computed first using the edge_strength() method.
- Returns:
AGraph object – pygraphviz object for plotting the DAG.
- Return type:
pygraphviz.AGraph
Examples
>>> from pgmpy.utils import get_example_model >>> model = get_example_model("alarm") >>> model.to_graphviz() <AGraph <Swig Object of type 'Agraph_t *' at 0x7fdea4cde040>> >>> model.draw("model.png", prog="neato")
- to_lavaan() str[source]¶
Convert the DAG to lavaan syntax representation.
The lavaan syntax represents structural equations where each line shows a dependent variable regressed on its parents using the ~ operator. Isolated nodes (nodes with no parents) are not included in the output.
- Returns:
String representation of the DAG in lavaan syntax format. Each line represents a regression equation where the dependent variable is regressed on its parents.
- Return type:
str
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([("X", "Y"), ("Z", "Y")]) >>> print(dag.to_lavaan()) Y ~ X + Z
>>> dag2 = DAG([("A", "B"), ("B", "C")]) >>> print(dag2.to_lavaan()) B ~ A C ~ B
>>> # Empty DAG returns empty string >>> empty_dag = DAG() >>> print(empty_dag.to_lavaan()) ""
Notes
Node names are converted to string representations using str().
If node names contain spaces or special characters, they will be used as-is.
Users should ensure node names are valid in R/lavaan context if needed.
References
lavaan syntax: http://lavaan.ugent.be/tutorial/syntax1.html
- to_pdag()[source]¶
Returns the CPDAG (Completed Partial DAG) of the DAG representing the equivalence class that the given DAG belongs to.
- Returns:
CPDAG – An instance of pgmpy.base.PDAG representing the CPDAG of the given DAG.
- Return type:
Examples
>>> from pgmpy.base import DAG >>> dag = DAG([("A", "B"), ("B", "C"), ("C", "D")]) >>> pdag = dag.to_pdag() >>> pdag.directed_edges {('A', 'B'), ('B', 'C'), ('C', 'D')}
References
- [1] Chickering, David Maxwell. “Learning equivalence classes of Bayesian-network structures.”
Journal of machine learning research 2.Feb (2002): 445-498. Figure 4 and 5.
- validate(data, metrics: tuple[str | Callable] | None = None, significance_level=0.05, **kwargs)[source]¶
Returns a table of the compiled results of the tests run on the DAG using the data provided. The tests are available in pgmpy.metrics.metrics. This method includes support for:
Correlation Score
Log Likelihood Score
AIC Score
BIC Score
Fisher-C p-value
RMSEA based on the Fisher-C statistic
Measure of failing vs. total CIs based on DAG and data fit
- Parameters:
data (pandas.Dataframe) – Dataset to be used to run the scoring methods/tests
metrics (tuple (Callable or strings)) –
A list of the metrics that are to be run on the model and data. A comma separated set of either functions defined in pgmpy.metrics.metrics or strings referencing those metrics can be passed.
Following are the supported strings and respective function that can be passed as elements of the tuple:
”correlation” : correlation_score,
”log-likelihood” : log_likelihood_score,
”aic” : structure_score,
”bic” : structure_score,
”implied-cis” : implied_cis,
”fisher-c” : fisher_c
For instance (“correlation”, log_likelihood_score) is a tuple that can be passed in metrics. This is an example of (string, Callable) type and so on.
If no value is passed, all available metrics in pgmpy.metrics.metrics will be run.
significance_level (float (default: 0.05)) – A hyperparameter to conditional independence test based metrics. A p-value greater than significance_level indicates that the conditional independence holds.
**kwargs – Any additional hyperparameter that needs to be passed to the metrics. Please refer to the documentation of pgmpy.metrics.metrics for details on which arguments are supported.
- Returns:
results – A dataframe containing a summary of the tests run on the model using the data provided.
- Return type:
pandas.Dataframe
Examples
>>> from pgmpy.base import DAG >>> from pgmpy.utils import get_example_model >>> from pgmpy.metrics import fisher_c
>>> # Simulate data from the cancer model to test against. >>> cancer = get_example_model("cancer") >>> df_cancer = cancer.simulate(n_samples=1000)
>>> # Create a new DAG object, and run all the tests >>> cancer_dag = DAG(cancer.edges()) >>> cancer_dag.validate(df_cancer) RESULT Correlation 0.25 Log-likelihood -2078.649707 AIC -2085.926617 BIC -2110.465393 Failing CIs / Total CIs 0 / 6 Fisher-C p-value 0.846715
>>> # Run selected tests >>> dag.validate(df_cancer, metrics=("correlation", fisher_c)) RESULT Correlation 0.25 Fisher-C p-value 0.846715