Bayesian Model Sampling¶
- class pgmpy.sampling.Sampling.BayesianModelSampling(model)[source]¶
Class for sampling methods specific to Bayesian Models
- Parameters:
model (instance of DiscreteBayesianNetwork) – model on which inference queries will be computed
- forward_sample(size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None, n_jobs=-1)[source]¶
Generates sample(s) from joint distribution of the Bayesian Network.
- Parameters:
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.
n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.
- Returns:
sampled – The generated samples
- Return type:
pandas.DataFrame
Examples
>>> from pgmpy.models import DiscreteBayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.sampling import BayesianModelSampling >>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")]) >>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD( ... "grade", ... 3, ... [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ["intel", "diff"], ... [2, 2], ... ) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> inference.forward_sample(size=2) rec.array([(0, 0, 1), (1, 0, 2)], dtype= [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8')])
- likelihood_weighted_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, n_jobs=-1)[source]¶
Generates weighted sample(s) from joint distribution of the Bayesian Network, that comply with the given evidence. ‘Probabilistic Graphical Model Principles and Techniques’, Koller and Friedman, Algorithm 12.2 pp 493.
- Parameters:
evidence (list of pgmpy.factor.State namedtuples) – None if no evidence
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.
- Returns:
sampled – The generated samples with corresponding weights
- Return type:
A pandas.DataFrame
Examples
>>> from pgmpy.factors.discrete import State >>> from pgmpy.models import DiscreteBayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.sampling import BayesianModelSampling >>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")]) >>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD( ... "grade", ... 3, ... [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ["intel", "diff"], ... [2, 2], ... ) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> evidence = [State("diff", 0)] >>> inference.likelihood_weighted_sample( ... evidence=evidence, size=2, return_type="recarray" ... ) rec.array([(0, 0, 1, 0.6), (0, 0, 2, 0.6)], dtype= [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8'), ('_weight', '<f8')])
- rejection_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None)[source]¶
Generates sample(s) from joint distribution of the Bayesian Network, given the evidence.
- Parameters:
evidence (list of pgmpy.factor.State namedtuples) – None if no evidence
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.
- Returns:
sampled – The generated samples
- Return type:
pandas.DataFrame
Examples
>>> from pgmpy.models import DiscreteBayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.factors.discrete import State >>> from pgmpy.sampling import BayesianModelSampling >>> student = DiscreteBayesianNetwork([("diff", "grade"), ("intel", "grade")]) >>> cpd_d = TabularCPD("diff", 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD("intel", 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD( ... "grade", ... 3, ... [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ["intel", "diff"], ... [2, 2], ... ) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> evidence = [State(var="diff", state=0)] >>> inference.rejection_sample( ... evidence=evidence, size=2, return_type="dataframe" ... ) intel diff grade 0 0 0 1 1 0 0 1