Sampling Methods

Gibbs Sampler

class pgmpy.sampling.Sampling.GibbsSampling(model=None)[source]

Class for performing Gibbs sampling.

generate_sample(start_state=None, size=1)[source]

Generator version of self.sample

sample(start_state=None, size=1, return_type='dataframe')[source]

Sample from the Markov Chain.

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

the generated samples

Bayesian Model Samplers

class pgmpy.sampling.Sampling.BayesianModelSampling(model)[source]

Class for sampling methods specific to Bayesian Models

Parameters:

model: instance of BayesianModel

model on which inference queries will be computed

forward_sample(size=1, return_type='dataframe')[source]

Generates sample(s) from joint distribution of the bayesian network.

Parameters:

size: int

size of sample to be generated

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

the generated samples

Examples

>>> from pgmpy.models.BayesianModel import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...                0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...                ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> inference.forward_sample(size=2, return_type='recarray')
rec.array([(0, 0, 1), (1, 0, 2)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8')])
likelihood_weighted_sample(evidence=None, size=1, return_type='dataframe')[source]

Generates weighted sample(s) from joint distribution of the bayesian network, that comply with the given evidence. ‘Probabilistic Graphical Model Principles and Techniques’, Koller and Friedman, Algorithm 12.2 pp 493.

Parameters:

evidence: list of `pgmpy.factor.State` namedtuples

None if no evidence

size: int

size of sample to be generated

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

the generated samples with corresponding weights

Examples

>>> from pgmpy.factors.discrete import State
>>> from pgmpy.models.BayesianModel import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...         0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...         ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State('diff', 0)]
>>> inference.likelihood_weighted_sample(evidence=evidence, size=2, return_type='recarray')
rec.array([(0, 0, 1, 0.6), (0, 0, 2, 0.6)], dtype=
          [('diff', '<i8'), ('intel', '<i8'), ('grade', '<i8'), ('_weight', '<f8')])
rejection_sample(evidence=None, size=1, return_type='dataframe')[source]

Generates sample(s) from joint distribution of the bayesian network, given the evidence.

Parameters:

evidence: list of `pgmpy.factor.State` namedtuples

None if no evidence

size: int

size of sample to be generated

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

the generated samples

Examples

>>> from pgmpy.models.BayesianModel import BayesianModel
>>> from pgmpy.factors.discrete import TabularCPD
>>> from pgmpy.factors.discrete import State
>>> from pgmpy.sampling import BayesianModelSampling
>>> student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
>>> cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
>>> cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
>>> cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25,
...                0.08, 0.3], [0.3, 0.7, 0.02, 0.2]],
...                ['intel', 'diff'], [2, 2])
>>> student.add_cpds(cpd_d, cpd_i, cpd_g)
>>> inference = BayesianModelSampling(student)
>>> evidence = [State(var='diff', state=0)]
>>> inference.rejection_sample(evidence=evidence, size=2, return_type='dataframe')
        intel       diff       grade
0         0          0          1
1         0          0          1

Hamiltonian Monte Carlo

A collection of methods for sampling from continuous models in pgmpy

class pgmpy.sampling.HMC.HamiltonianMC(model, grad_log_pdf, simulate_dynamics=<class 'pgmpy.sampling.base.LeapFrog'>)[source]

Class for performing sampling using simple Hamiltonian Monte Carlo

References

R.Neal. Handbook of Markov Chain Monte Carlo, chapter 5: MCMC Using Hamiltonian Dynamics. CRC Press, 2011.

generate_sample(initial_pos, num_samples, trajectory_length, stepsize=None)[source]

Method returns a generator type object whose each iteration yields a sample using Hamiltonian Monte Carlo

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_samples: int

Number of samples to be generated

trajectory_length: int or float

Target trajectory length, stepsize * number of steps(L), where L is the number of steps taken per HMC iteration, and stepsize is step size for splitting time method.

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

Returns:

genrator: yielding a 1d numpy.array type object for a sample

Examples

>>> from pgmpy.sampling import HamiltonianMC as HMC, GradLogPDFGaussian as GLPG
>>> from pgmpy.factors import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([4, -1])
>>> covariance = np.array([[3, 0.4], [0.4, 3]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = HMC(model=model, grad_log_pdf=GLPG)
>>> gen_samples = sampler.generate_sample(np.array([-1, 1]), num_samples = 10000,
...                                       trajectory_length=2, stepsize=0.25)
>>> samples_array = np.array([sample for sample in gen_samples])
>>> samples_array
array([[ 0.1467264 ,  0.27143857],
       [ 4.0371448 ,  0.15871274],
       [ 3.24656208, -1.03742621],
       ...,
       [ 6.45975905,  1.97941306],
       [ 4.89007171,  0.15413156],
       [ 5.9528083 ,  1.92983158]])
>>> np.cov(samples_array.T)
array([[ 2.95692642,  0.4379419 ],
       [ 0.4379419 ,  3.00939434]])
>>> sampler.acceptance_rate
0.9969
sample(initial_pos, num_samples, trajectory_length, stepsize=None, return_type='dataframe')[source]

Method to return samples using Hamiltonian Monte Carlo

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_samples: int

Number of samples to be generated

trajectory_length: int or float

Target trajectory length, stepsize * number of steps(L), where L is the number of steps taken per HMC iteration, and stepsize is step size for splitting time method.

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

Examples

>>> from pgmpy.sampling import HamiltonianMC as HMC, GradLogPDFGaussian, ModifiedEuler
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([1, -1])
>>> covariance = np.array([[1, 0.2], [0.2, 1]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = HMC(model=model, grad_log_pdf=GradLogPDFGaussian, simulate_dynamics=ModifiedEuler)
>>> samples = sampler.sample(np.array([1, 1]), num_samples = 5,
...                          trajectory_length=6, stepsize=0.25, return_type='dataframe')
>>> samples
               x              y
0   1.000000e+00   1.000000e+00
1   1.592133e+00   1.152911e+00
2   1.608700e+00   1.315349e+00
3   1.608700e+00   1.315349e+00
4   6.843856e-01   6.237043e-01
>>> mean = np.array([4, 1, -1])
>>> covariance = np.array([[1, 0.7 , 0.8], [0.7, 1, 0.2], [0.8, 0.2, 1]])
>>> model = JGD(['x', 'y', 'z'], mean, covariance)
>>> sampler = HMC(model=model, grad_log_pdf=GLPG)
>>> samples = sampler.sample(np.array([1, 1]), num_samples = 10000,
...                          trajectory_length=6, stepsize=0.25, return_type='dataframe')
>>> np.cov(samples.values.T)
array([[ 1.00795398,  0.71384233,  0.79802097],
       [ 0.71384233,  1.00633524,  0.21313767],
       [ 0.79802097,  0.21313767,  0.98519017]])
class pgmpy.sampling.HMC.HamiltonianMCDA(model, grad_log_pdf=None, simulate_dynamics=<class 'pgmpy.sampling.base.LeapFrog'>, delta=0.65)[source]

Class for performing sampling in Continuous model using Hamiltonian Monte Carlo with dual averaging for adaptaion of parameter stepsize.

References

Matthew D. Hoffman, Andrew Gelman, The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15 (2014) 1351-1381 Algorithm 5 : Hamiltonian Monte Carlo with dual averaging

generate_sample(initial_pos, num_adapt, num_samples, trajectory_length, stepsize=None)[source]

Method returns a generator type object whose each iteration yields a sample using Hamiltonian Monte Carlo

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_adapt: int

The number of interations to run the adaptation of stepsize

num_samples: int

Number of samples to be generated

trajectory_length: int or float

Target trajectory length, stepsize * number of steps(L), where L is the number of steps taken to propose new values of position and momentum per HMC iteration and stepsize is step size.

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

Returns:

genrator: yielding a numpy.array type object for a sample

Examples

>>> from pgmpy.sampling import HamiltonianMCDA as HMCda, GradLogPDFGaussian as GLPG, LeapFrog
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([1, 1])
>>> covariance = np.array([[1, 0.7], [0.7, 3]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = HMCda(model=model, grad_log_pdf=GLPG, simulate_dynamics=LeapFrog)
>>> gen_samples = sampler.generate_sample(np.array([1, 1]), num_adapt=10000,
...                                       num_samples = 10000, trajectory_length=2, stepsize=None)
>>> samples_array = np.array([sample for sample in gen_samples])
>>> np.cov(samples_array.T)
array([[ 0.98432155,  0.69517394],
       [ 0.69517394,  2.95449533]])
sample(initial_pos, num_adapt, num_samples, trajectory_length, stepsize=None, return_type='dataframe')[source]

Method to return samples using Hamiltonian Monte Carlo

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_adapt: int

The number of interations to run the adaptation of stepsize

num_samples: int

Number of samples to be generated

trajectory_length: int or float

Target trajectory length, stepsize * number of steps(L), where L is the number of steps taken per HMC iteration, and stepsize is step size for splitting time method.

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

Examples

>>> from pgmpy.sampling import HamiltonianMCDA as HMCda, GradLogPDFGaussian as GLPG, LeapFrog
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([1, 1])
>>> covariance = np.array([[1, 0.7], [0.7, 3]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = HMCda(model=model, grad_log_pdf=GLPG, simulate_dynamics=LeapFrog)
>>> samples = sampler.sample(np.array([1, 1]), num_adapt=10000, num_samples = 10000,
...                          trajectory_length=2, stepsize=None, return_type='recarray')
>>> samples_array = np.array([samples[var_name] for var_name in model.variables])
>>> np.cov(samples_array)
array([[ 0.98432155,  0.66517394],
       [ 0.66517394,  2.95449533]])

No U-Turn Sampler

class pgmpy.sampling.NUTS.NoUTurnSampler(model, grad_log_pdf, simulate_dynamics=<class 'pgmpy.sampling.base.LeapFrog'>)[source]

Class for performing sampling in Continuous model using No U Turn Sampler (a variant of Hamiltonian Monte Carlo)

References

Matthew D. Hoffman, Andrew Gelman, The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15 (2014) 1351-1381 Algorithm 3 : Efficient No-U-Turn Sampler

generate_sample(initial_pos, num_samples, stepsize=None)[source]

Returns a generator type object whose each iteration yields a sample

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_samples: int

Number of samples to be generated

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

Returns:

generator: yielding a numpy.array type object for a sample

Examples

>>> from pgmpy.sampling import NoUTurnSampler as NUTS, GradLogPDFGaussian
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([11, -6])
>>> covariance = np.array([[0.7, 0.2], [0.2, 14]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = NUTS(model=model, grad_log_pdf=GradLogPDFGaussian)
>>> samples = sampler.generate_sample(initial_pos=np.array([1, 1]), num_samples=10, stepsize=0.4)
>>> samples = np.array([sample for sample in samples])
>>> samples
array([[ 10.26357538,   0.10062725],
       [ 12.70600336,   0.63392499],
       [ 10.95523217,  -0.62079273],
       [ 10.66263031,  -4.08135962],
       [ 10.59255762,  -8.48085076],
       [  9.99860242,  -9.47096032],
       [ 10.5733564 ,  -9.83504745],
       [ 11.51302059,  -9.49919523],
       [ 11.31892143,  -8.5873259 ],
       [ 11.29008667,  -0.43809674]])
sample(initial_pos, num_samples, stepsize=None, return_type='dataframe')[source]

Method to return samples using No U Turn Sampler

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_samples: int

Number of samples to be generated

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

Examples

>>> from pgmpy.sampling import NoUTurnSampler as NUTS, GradLogPDFGaussian, LeapFrog
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([0, 0, 0])
>>> covariance = np.array([[6, 0.7, 0.2], [0.7, 3, 0.9], [0.2, 0.9, 1]])
>>> model = JGD(['x', 'y', 'z'], mean, covariance)
>>> sampler = NUTS(model=model, grad_log_pdf=GradLogPDFGaussian, simulate_dynamics=LeapFrog)
>>> samples = sampler.sample(initial_pos=np.array([1, 1, 1]), num_samples=10,
...                          stepsize=0.4, return_type='dataframe')
>>> samples
          x         y         z
0  1.000000  1.000000  1.000000
1  1.760756  0.271543 -0.613309
2  1.883387  0.990745 -0.611720
3  0.980812  0.340336 -0.916283
4  0.781338  0.647220 -0.948640
5  0.040308 -1.391406  0.412201
6  1.179549 -1.450552  1.105216
7  1.100320 -1.313926  1.207815
8  1.484520 -1.349247  0.768599
9  0.934942 -1.894589  0.471772
class pgmpy.sampling.NUTS.NoUTurnSamplerDA(model, grad_log_pdf, simulate_dynamics=<class 'pgmpy.sampling.base.LeapFrog'>, delta=0.65)[source]

Class for performing sampling in Continuous model using No U Turn sampler with dual averaging for adaptation of parameter stepsize.

References

Matthew D. Hoffman, Andrew Gelman, The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15 (2014) 1351-1381 Algorithm 6 : No-U-Turn Sampler with Dual Averaging

generate_sample(initial_pos, num_adapt, num_samples, stepsize=None)[source]

Returns a generator type object whose each iteration yields a sample

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_adapt: int

The number of interations to run the adaptation of stepsize

num_samples: int

Number of samples to be generated

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

Returns:

genrator: yielding a numpy.array type object for a sample

Examples

>>> from pgmpy.sampling import NoUTurnSamplerDA as NUTSda, GradLogPDFGaussian
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([1, -100])
>>> covariance = np.array([[-12, 45], [45, -10]])
>>> model = JGD(['a', 'b'], mean, covariance)
>>> sampler = NUTSda(model=model, grad_log_pdf=GradLogPDFGaussian, simulate_dynamics=LeapFrog)
>>> samples = sampler.generate_sample(initial_pos=np.array([12, -4]), num_adapt=10,
...                                   num_samples=10, stepsize=0.1)
>>> samples
<generator object NoUTurnSamplerDA.generate_sample at 0x7f4fed46a4c0>
>>> samples_array = np.array([sample for sample in samples])
>>> samples_array
array([[ 11.89963386,  -4.06572636],
       [ 10.3453755 ,  -7.5700289 ],
       [-26.56899659, -15.3920684 ],
       [-29.97143077, -12.0801625 ],
       [-29.97143077, -12.0801625 ],
       [-33.07960829,  -8.90440347],
       [-55.28263496, -17.31718524],
       [-55.28263496, -17.31718524],
       [-56.63440044, -16.03309364],
       [-63.880094  , -19.19981944]])
sample(initial_pos, num_adapt, num_samples, stepsize=None, return_type='dataframe')[source]

Returns samples using No U Turn Sampler with dual averaging

Parameters:

initial_pos: A 1d array like object

Vector representing values of parameter position, the starting state in markov chain.

num_adapt: int

The number of interations to run the adaptation of stepsize

num_samples: int

Number of samples to be generated

stepsize: float , defaults to None

The stepsize for proposing new values of position and momentum in simulate_dynamics If None, then will be choosen suitably

return_type: string (dataframe | recarray)

Return type for samples, either of ‘dataframe’ or ‘recarray’. Defaults to ‘dataframe’

Returns:

sampled: A pandas.DataFrame or a numpy.recarray object depending upon return_type argument

Examples

>>> from pgmpy.sampling import NoUTurnSamplerDA as NUTSda, GradLogPDFGaussian, LeapFrog
>>> from pgmpy.factors.continuous import GaussianDistribution as JGD
>>> import numpy as np
>>> mean = np.array([10, -13])
>>> covariance = np.array([[16, -3], [-3, 13]])
>>> model = JGD(['x', 'y'], mean, covariance)
>>> sampler = NUTSda(model=model, grad_log_pdf=GradLogPDFGaussian, simulate_dynamics=LeapFrog)
>>> samples = sampler.sample(initial_pos=np.array([12, -4]), num_adapt=10, num_samples=10,
...                          stepsize=0.1, return_type='dataframe')
>>> samples
           x          y
0  12.000000  -4.000000
1  11.864821  -3.696109
2  10.546986  -4.892169
3   8.526596 -21.555793
4   8.526596 -21.555793
5  11.343194  -6.353789
6  -1.583269 -12.802931
7  12.411957 -11.704859
8  13.253336 -20.169492
9  11.295901  -7.665058