Maximum Likelihood Estimator

class pgmpy.estimators.MLE.MaximumLikelihoodEstimator(model, data, **kwargs)[source]

Class used to compute parameters for a model using Maximum Likelihood Estimation.

Parameters:
  • model (A pgmpy.models.BayesianNetwork or pgmpy.models.JunctionTree instance) –

  • data (pandas DataFrame object) – DataFrame object with column names identical to the variable names of the network. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> data = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D'), ('B', 'E')])
>>> estimator = MaximumLikelihoodEstimator(model, data)
estimate_cpd(node, weighted=False)[source]

Method to estimate the CPD for a given variable.

Parameters:
  • node (int, string (any hashable python object)) – The name of the variable for which the CPD is to be estimated.

  • weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.

Returns:

Estimated CPD – Estimated CPD for node.

Return type:

pgmpy.factors.discrete.TabularCPD

Examples

>>> import pandas as pd
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = BayesianNetwork([('A', 'C'), ('B', 'C')])
>>> cpd_A = MaximumLikelihoodEstimator(model, data).estimate_cpd('A')
>>> print(cpd_A)
╒══════╤══════════╕
│ A(0) │ 0.666667 │
├──────┼──────────┤
│ A(1) │ 0.333333 │
╘══════╧══════════╛
>>> cpd_C = MaximumLikelihoodEstimator(model, data).estimate_cpd('C')
>>> print(cpd_C)
╒══════╤══════╤══════╤══════╤══════╕
│ A    │ A(0) │ A(0) │ A(1) │ A(1) │
├──────┼──────┼──────┼──────┼──────┤
│ B    │ B(0) │ B(1) │ B(0) │ B(1) │
├──────┼──────┼──────┼──────┼──────┤
│ C(0) │ 0.0  │ 0.0  │ 1.0  │ 0.5  │
├──────┼──────┼──────┼──────┼──────┤
│ C(1) │ 1.0  │ 1.0  │ 0.0  │ 0.5  │
╘══════╧══════╧══════╧══════╧══════╛
estimate_potentials()[source]

Implements Iterative Proportional Fitting to estimate potentials specifically for a Decomposable Undirected Graphical Model. Decomposability is enforced by using a Junction Tree.

Returns:

Estimated potentials – Estimated potentials for the entire graphical model.

Return type:

pgmpy.factors.FactorDict

References

[1] Kevin P. Murphy, ML Machine Learning - A Probabilistic Perspective

Algorithm 19.2 Iterative Proportional Fitting algorithm for tabular MRFs & Section 19.5.7.4 IPF for decomposable graphical models.

[2] Eric P. Xing, Meng Song, Li Zhou, Probabilistic Graphical Models 10-708, Spring 2014.

https://www.cs.cmu.edu/~epxing/Class/10708-14/scribe_notes/scribe_note_lecture8.pdf.

Examples

>>> import pandas as pd
>>> from pgmpy.models import JunctionTree
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = JunctionTree()
>>> model.add_edges_from([(("A", "C"), ("B", "C"))])
>>> potentials = MaximumLikelihoodEstimator(model, data).estimate_potentials()
>>> print(potentials[("A", "C")])
+------+------+------------+
| A    | C    |   phi(A,C) |
+======+======+============+
| A(0) | C(0) |     0.0000 |
+------+------+------------+
| A(0) | C(1) |     0.6667 |
+------+------+------------+
| A(1) | C(0) |     0.3333 |
+------+------+------------+
| A(1) | C(1) |     0.0000 |
+------+------+------------+
>>> print(potentials[("B", "C")])
+------+------+------------+
| B    | C    |   phi(B,C) |
+======+======+============+
| B(0) | C(0) |     1.0000 |
+------+------+------------+
| B(0) | C(1) |     0.5000 |
+------+------+------------+
| B(1) | C(0) |     0.0000 |
+------+------+------------+
| B(1) | C(1) |     0.5000 |
+------+------+------------+
get_parameters(n_jobs=1, weighted=False)[source]

Method to estimate the model parameters using Maximum Likelihood Estimation.

Parameters:
  • n_jobs (int (default: 1)) – Number of jobs to run in parallel. Default: 1 uses all the processors. Using n_jobs > 1 for small models might be slower.

  • weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.

Returns:

Estimated parameters – List of pgmpy.factors.discrete.TabularCPDs, one for each variable of the model Or a FactorDict representing potential values of a Junction Tree

Return type:

list or pgmpy.factors.FactorDict

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.estimators import MaximumLikelihoodEstimator
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)),
...                       columns=['A', 'B', 'C', 'D'])
>>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D')])
>>> estimator = MaximumLikelihoodEstimator(model, values)
>>> estimator.get_parameters()
[<TabularCPD representing P(C:2) at 0x7f7b534251d0>,
<TabularCPD representing P(B:2 | C:2, A:2) at 0x7f7b4dfd4da0>,
<TabularCPD representing P(A:2) at 0x7f7b4dfd4fd0>,
<TabularCPD representing P(D:2 | C:2) at 0x7f7b4df822b0>]