SEMGraph#

class pgmpy.models.SEMGraph(ebunch=[], latents=[], err_corr=[], err_var={})[source]#

Bases: object

Base class for graphical representation of Structural Equation Models(SEMs).

All variables are by default assumed to have an associated error latent variable, therefore doesn’t need to be specified.

Parameters:

ebunch: list/array-like

List of edges in form of tuples. Each tuple can be of two possible shape:

(u, v): This would add an edge from u to v without setting any parameter
for the edge.
(u, v, parameter): This would add an edge from u to v and set the edge’s
parameter to parameter.

latents: list/array-like

List of nodes which are latent. All other variables are considered observed.

err_corr: list/array-like

List of tuples representing edges between error terms. It can be of the following forms:

(u, v): Add correlation between error terms of u and v. Doesn’t set any variance or
covariance values.
(u, v, covar): Adds correlation between the error terms of u and v and sets the
parameter to covar.

err_var: dict (variable: variance)

Sets variance for the error terms in the model.

Attributes:

latents: list: List of all the latent variables in the model except the error terms.
observed: list: List of all the observed variables in the model.
graph: nx.DirectedGraph: The graphical structure of the latent and observed variables except the error terms. The parameters are stored in the weight attribute of each edge.
err_graph: nx.Graph: An undirected graph representing the relations between the error terms of the model. The node of the graph has the same name as the variable but represents the error terms. The variance is stored in the weight attribute of the node and the covariance are stored in the weight attribute of the edge.
full_graph_struct: nx.DiGraph: Represents the full graph structure. The names of error terms start with . and new nodes are added for each correlation which starts with ...

References

[1] McDonald, A, J., & Clelland, D. A. (1984). Textile Workers and Union Sentiment.: Social Forces, 63(2), 502–521
[2] https://en.wikipedia.org/wiki/Structural_equation_modeling#/: media/File:Example_Structural_equation_model.svg

Examples

Defining a model (Union sentiment model[1]) without setting any paramaters:

>>> from pgmpy.models import SEMGraph
>>> sem = SEMGraph(
...     ebunch=[
...         ("deferenc", "unionsen"),
...         ("laboract", "unionsen"),
...         ("yrsmill", "unionsen"),
...         ("age", "deferenc"),
...         ("age", "laboract"),
...         ("deferenc", "laboract"),
...     ],
...     latents=[],
...     err_corr=[("yrsmill", "age")],
...     err_var={},
... )

Defining a model (Education [2]) with all the parameters set. For not setting any parameter np.nan can be explicitly passed.

>>> sem_edu = SEMGraph(
...     ebunch=[
...         ("intelligence", "academic", 0.8),
...         ("intelligence", "scale_1", 0.7),
...         ("intelligence", "scale_2", 0.64),
...         ("intelligence", "scale_3", 0.73),
...         ("intelligence", "scale_4", 0.82),
...         ("academic", "SAT_score", 0.98),
...         ("academic", "High_school_gpa", 0.75),
...         ("academic", "ACT_score", 0.87),
...     ],
...     latents=["intelligence", "academic"],
...     err_corr=[],
...     err_var={"intelligence": 1},
... )

active_trail_nodes(variables, observed=[], avoid_nodes=[], struct='full')[source]#

Finds all the observed variables which are d-connected to variables in the graph_struct when observed variables are observed.

Parameters:

variables: str or array like: Observed variables whose d-connected variables are to be found.
observedlist/array-like: If given the active trails would be computed assuming these nodes to be observed.
avoid_nodes: list/array-like: If specificed, the algorithm doesn’t account for paths that have influence flowing through the avoid node.
struct: str or nx.DiGraph instance: If “full”, considers correlation between error terms for computing d-connection. If “non_error”, doesn’t condised error correlations for computing d-connection. If instance of nx.DiGraph, finds d-connected variables on the given graph.

Returns:

dict: {str: list}: Returns a dict with variables as the key and a list of d-connected variables as the value.

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

Examples

>>> from pgmpy.models import SEM
>>> model = SEMGraph(
...     ebunch=[
...         ("yrsmill", "unionsen"),
...         ("age", "laboract"),
...         ("age", "deferenc"),
...         ("deferenc", "laboract"),
...         ("deferenc", "unionsen"),
...         ("laboract", "unionsen"),
...     ],
...     latents=[],
...     err_corr=[("yrsmill", "age")],
... )
>>> model.active_trail_nodes("age")

get_scaling_indicators()[source]#

Returns a scaling indicator for each of the latent variables in the model. The scaling indicator is chosen randomly among the observed measurement variables of the latent variable.

Returns:

dict: Returns a dict with latent variables as the key and their value being the: scaling indicator.

Examples

>>> from pgmpy.models import SEMGraph
>>> model = SEMGraph(
...     ebunch=[
...         ("xi1", "eta1"),
...         ("xi1", "x1"),
...         ("xi1", "x2"),
...         ("eta1", "y1"),
...         ("eta1", "y2"),
...     ],
...     latents=["xi1", "eta1"],
... )
>>> model.get_scaling_indicators()
{'xi1': 'x1', 'eta1': 'y1'}

moralize(graph='full')[source]#

TODO: This needs to go to a parent class. Removes all the immoralities in the DirectedGraph and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

Parameters:

graph:

to_lisrel()[source]#

Converts the model from a graphical representation to an equivalent algebraic representation. This converts the model into a Reticular Action Model (RAM) model representation which is implemented by pgmpy.models.SEMAlg class.

Returns:

SEMAlg instance: Instance of SEMAlg representing the model.

See also

to_standard_lisrel: Converts to the standard lisrel format and returns the parameters.

Examples

>>> from pgmpy.models import SEM
>>> sem = SEM.from_graph(
...     ebunch=[
...         ("deferenc", "unionsen"),
...         ("laboract", "unionsen"),
...         ("yrsmill", "unionsen"),
...         ("age", "deferenc"),
...         ("age", "laboract"),
...         ("deferenc", "laboract"),
...     ],
...     latents=[],
...     err_corr=[("yrsmill", "age")],
...     err_var={},
... )
>>> sem.to_lisrel()
# TODO: Complete this.

to_standard_lisrel()[source]#

Transforms the model to the standard LISREL representation of latent and measurement equations. The standard LISREL representation is given as:

..math::: mathbf{eta} = mathbf{B eta} + mathbf{Gamma xi} + mathbf{zeta} \ mathbf{y} = mathbf{wedge_y eta} + mathbf{epsilon} \ mathbf{x} = mathbf{wedge_x xi} + mathbf{delta} \ mathbf{Theta_e} = COV(mathbf{epsilon}) \ mathbf{Theta_delta} = COV(mathbf{delta}) \ mathbf{Psi} = COV(mathbf{eta}) \ mathbf{Phi} = COV(mathbf{xi}) \

Since the standard LISREL representation has restrictions on the types of model, this method adds extra latent variables with fixed loadings of 1 to make the model consistent with the restrictions.

Returns:

var_names: dict (keys: eta, xi, y, x): Returns the variable names in \(\mathbf{\eta}\), \(\mathbf{\xi}\), \(\mathbf{y}\), \(\mathbf{x}\).
params: dict (keys: B, gamma, wedge_y, wedge_x, theta_e, theta_del, phi, psi): Returns a boolean matrix for each of the parameters. A 1 in the matrix represents that there is an edge in the model, 0 represents there is no edge.
fixed_values: dict (keys: B, gamma, wedge_y, wedge_x, theta_e, theta_del, phi, psi): Returns a matrix for each of the parameters. A value in the matrix represents the set value for the parameter in the model else it is 0.