Bayesian Model Sampling

class pgmpy.sampling.Sampling.BayesianModelSampling(model)[source]

Class for sampling methods specific to Bayesian Models

Parameters:

model (instance of DiscreteBayesianNetwork) – model on which inference queries will be computed

forward_sample(size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None, n_jobs=-1)[source]

Generates sample(s) from joint distribution of the Bayesian Network.

Parameters:
  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.

  • n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.

Returns:

sampled – The generated samples

Return type:

pandas.DataFrame

Examples

>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")])
>>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD(
...     "grade",
...     3,
...     [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...     ["intel", "diff"],
...     [2, 2],
... )
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> inference.forward_sample(size=2)
rec.array([(0, 0, 1), (1, 0, 2)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8')])
likelihood_weighted_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, n_jobs=-1)[source]

Generates weighted sample(s) from joint distribution of the Bayesian Network, that comply with the given evidence. ‘Probabilistic Graphical Model Principles and Techniques’, Koller and Friedman, Algorithm 12.2 pp 493.

Parameters:
  • evidence (list of pgmpy.factor.State namedtuples) – None if no evidence

  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.

Returns:

sampled – The generated samples with corresponding weights

Return type:

A pandas.DataFrame

Examples

>>> from pgmpy.factors.discrete import State
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")])
>>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD(
...     "grade",
...     3,
...     [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...     ["intel", "diff"],
...     [2, 2],
... )
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State("diff", 0)]
>>> inference.likelihood_weighted_sample(
...     evidence=evidence, size=2, return_type="recarray"
... )
rec.array([(0, 0, 1, 0.6), (0, 0, 2, 0.6)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8'), ('_weight', '<f8')])
rejection_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None)[source]

Generates sample(s) from joint distribution of the Bayesian Network, given the evidence.

Parameters:
  • evidence (list of pgmpy.factor.State namedtuples) – None if no evidence

  • size (int) – size of sample to be generated

  • include_latents (boolean) – Whether to include the latent variable values in the generated samples.

  • seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.

  • show_progress (boolean) – Whether to show a progress bar of samples getting generated.

  • partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.

Returns:

sampled – The generated samples

Return type:

pandas.DataFrame

Examples

>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.factors.discrete import State
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")])
>>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD(
...     "grade",
...     3,
...     [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...     ["intel", "diff"],
...     [2, 2],
... )
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State(var="diff", state=0)]
>>> inference.rejection_sample(
...     evidence=evidence, size=2, return_type="dataframe"
... )
        intel       diff       grade
0         0          0          1
1         0          0          1