CorrelationScore#

class pgmpy.metrics.CorrelationScore(ci_test=None, score=<function f1_score>, significance_level=0.05, return_summary=False)[source]#

Bases: _BaseUnsupervisedMetric

Score to compute how well the model structure represents the correlations in the data. The model doesn’t need to be parameterized for this score.

A Bayesian Network or DAG has d-connection property which can be used to determine which variables are correlated according to the model. This function uses this d-connection/d-separation property to compare the model with variable correlations in a given dataset. For every pair of variables in the dataset, a correlation test (specified by test argument) is done. We say that any two variables are correlated if the test’s p-value < significance_level. The same pair of variables are then tested whether they are d-connected in the network structure or not. Finally, a metric specified by score is computed by using the correlation test as the true value and d-connections as predicted values.

Absence of correlation/d-separation is considered as the positive class for computing the metrics.

Parameters:

ci_test: str or function: The statistical tests to use for determining whether the variables in data are correlated or not. For discrete variables, the options are: 1) chi_square 2) g_sq 3) log_likelihood 4) freeman_tuckey 5) modified_log_likelihood 6) neyman 7) cressie_read. For continuous variables only one test is available: 1) pearsonr. A function with the signature fun(X, Y, Z, data) can also be passed which returns True for uncorrelated and False otherwise.
significance_level: float: A value between 0 and 1. If p_value < significance_level, the variables are considered uncorrelated.
score: fun (default: f1-score): Any classification scoring metric from scikit-learn. https://scikit-learn.org/stable/modules/classes.html#classification-metrics
return_summary: boolean (default: False): If True, returns a dataframe with details for each of the conditions checked.

Returns:

The specified metric: float: The metric specified by the score argument. By defaults returns the f1-score.

Examples

>>> from pgmpy.example_models import load_model
>>> from pgmpy.metrics import CorrelationScore
>>> alarm = load_model("bnlearn/alarm")
>>> data = alarm.simulate(int(1e4))
>>> scorer = CorrelationScore(
...     ci_test="chi_square", significance_level=0.05, return_summary=False
... )
>>> scorer(X=data, causal_graph=alarm)
0.911957950065703

>>> scorer = CorrelationScore(
...     ci_test="chi_square", significance_level=0.05, return_summary=True
... )
>>> scorer(X=data, causal_graph=alarm).head()
    var1            var2  stat_test  d_connected
0   HISTORY          CVP      False        False
1   HISTORY         PCWP      False        False
2   HISTORY  HYPOVOLEMIA       True         True
3   HISTORY   LVEDVOLUME      False        False
4   HISTORY    LVFAILURE      False        False