Expectation Maximization (EM)

class pgmpy.estimators.ExpectationMaximization(model, data, **kwargs)[source]
get_parameters(latent_card=None, max_iter=100, atol=1e-08, n_jobs=- 1, seed=None, show_progress=True)[source]

Method to estimate all model parameters (CPDs) using Expecation Maximization.

  • latent_card (dict (default: None)) – A dictionary of the form {latent_var: cardinality} specifying the cardinality (number of states) of each latent variable. If None, assumes 2 states for each latent variable.

  • max_iter (int (default: 100)) – The maximum number of iterations the algorithm is allowed to run for. If max_iter is reached, return the last value of parameters.

  • atol (int (default: 1e-08)) – The absolute accepted tolerance for checking convergence. If the parameters change is less than atol in an iteration, the algorithm will exit.

  • n_jobs (int (default: -1)) – Number of jobs to run in parallel. Default: -1 uses all the processors.

  • seed (int) – The random seed to use for generating the intial values.

  • show_progress (boolean (default: True)) – Whether to show a progress bar for iterations.


Estimated paramters (CPDs) – A list of estimated CPDs for the model.

Return type:



>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.estimators import ExpectationMaximization as EM
>>> data = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 3)),
...                       columns=['A', 'C', 'D'])
>>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D')], latents={'B'})
>>> estimator = EM(model, data)
>>> estimator.get_parameters(latent_card={'B': 3})
[<TabularCPD representing P(C:2) at 0x7f7b534251d0>,
<TabularCPD representing P(B:3 | C:2, A:2) at 0x7f7b4dfd4da0>,
<TabularCPD representing P(A:2) at 0x7f7b4dfd4fd0>,
<TabularCPD representing P(D:2 | C:2) at 0x7f7b4df822b0>]