Bayesian Model Sampling

class pgmpy.sampling.Sampling.BayesianModelSampling(model)[source]

Class for sampling methods specific to Bayesian Models

Parameters:

model (instance of BayesianNetwork) – model on which inference queries will be computed

forward_sample(size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None, n_jobs=-1)[source]

Generates sample(s) from joint distribution of the Bayesian Network.

Parameters:
  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.

  • n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.

Returns:

sampled – The generated samples

Return type:

pandas.DataFrame

Examples

>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...                0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...                ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> inference.forward_sample(size=2)
rec.array([(0, 0, 1), (1, 0, 2)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8')])
likelihood_weighted_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, n_jobs=-1)[source]

Generates weighted sample(s) from joint distribution of the Bayesian Network, that comply with the given evidence. ‘Probabilistic Graphical Model Principles and Techniques’, Koller and Friedman, Algorithm 12.2 pp 493.

Parameters:
  • evidence (list of pgmpy.factor.State namedtuples) – None if no evidence

  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.

Returns:

sampled – The generated samples with corresponding weights

Return type:

A pandas.DataFrame

Examples

>>> from pgmpy.factors.discrete import State
>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...         0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...         ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State('diff', 0)]
>>> inference.likelihood_weighted_sample(evidence=evidence, size=2, return_type='recarray')
rec.array([(0, 0, 1, 0.6), (0, 0, 2, 0.6)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8'), ('_weight', '<f8')])
rejection_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None)[source]

Generates sample(s) from joint distribution of the Bayesian Network, given the evidence.

Parameters:
  • evidence (list of pgmpy.factor.State namedtuples) – None if no evidence

  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.

Returns:

sampled – The generated samples

Return type:

pandas.DataFrame

Examples

>>> from pgmpy.models import BayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.factors.discrete import State
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...                0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...                ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State(var='diff', state=0)]
>>> inference.rejection_sample(evidence=evidence, size=2, return_type='dataframe')
        intel       diff       grade
0         0          0          1
1         0          0          1