Bayesian Model Sampling¶
- class pgmpy.sampling.Sampling.BayesianModelSampling(model)[source]¶
Class for sampling methods specific to Bayesian Models
- Parameters:
model (instance of BayesianNetwork) – model on which inference queries will be computed
- forward_sample(size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None, n_jobs=-1)[source]¶
Generates sample(s) from joint distribution of the Bayesian Network.
- Parameters:
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.
n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.
- Returns:
sampled – The generated samples
- Return type:
pandas.DataFrame
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.sampling import BayesianModelSampling >>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, ... 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ['intel', 'diff'], [2, 2]) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> inference.forward_sample(size=2) rec.array([(0, 0, 1), (1, 0, 2)], dtype= [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8')])
- likelihood_weighted_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, n_jobs=-1)[source]¶
Generates weighted sample(s) from joint distribution of the Bayesian Network, that comply with the given evidence. ‘Probabilistic Graphical Model Principles and Techniques’, Koller and Friedman, Algorithm 12.2 pp 493.
- Parameters:
evidence (list of pgmpy.factor.State namedtuples) – None if no evidence
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
n_jobs (int (default: -1)) – The number of CPU cores to use. Default uses all cores.
- Returns:
sampled – The generated samples with corresponding weights
- Return type:
A pandas.DataFrame
Examples
>>> from pgmpy.factors.discrete import State >>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.sampling import BayesianModelSampling >>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, ... 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ['intel', 'diff'], [2, 2]) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> evidence = [State('diff', 0)] >>> inference.likelihood_weighted_sample(evidence=evidence, size=2, return_type='recarray') rec.array([(0, 0, 1, 0.6), (0, 0, 2, 0.6)], dtype= [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8'), ('_weight', '<f8')])
- rejection_sample(evidence=[], size=1, include_latents=False, seed=None, show_progress=True, partial_samples=None)[source]¶
Generates sample(s) from joint distribution of the Bayesian Network, given the evidence.
- Parameters:
evidence (list of pgmpy.factor.State namedtuples) – None if no evidence
size (int) – size of sample to be generated
include_latents (boolean) – Whether to include the latent variable values in the generated samples.
seed (int (default: None)) – If a value is provided, sets the seed for numpy.random.
show_progress (boolean) – Whether to show a progress bar of samples getting generated.
partial_samples (pandas.DataFrame) – A pandas dataframe specifying samples on some of the variables in the model. If specified, the sampling procedure uses these sample values, instead of generating them.
- Returns:
sampled – The generated samples
- Return type:
pandas.DataFrame
Examples
>>> from pgmpy.models import BayesianNetwork >>> from pgmpy.factors.discrete import TabularCPD >>> from pgmpy.factors.discrete import State >>> from pgmpy.sampling import BayesianModelSampling >>> student = BayesianNetwork([('diff', 'grade'), ('intel', 'grade')]) >>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]]) >>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]]) >>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, ... 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ... ['intel', 'diff'], [2, 2]) >>> student.add_cpds(cpd_d, cpd_i, cpd_g) >>> inference = BayesianModelSampling(student) >>> evidence = [State(var='diff', state=0)] >>> inference.rejection_sample(evidence=evidence, size=2, return_type='dataframe') intel diff grade 0 0 0 1 1 0 0 1