BaseEstimator#

class pgmpy.estimators.BaseEstimator(data=None, state_names=None)[source]#

Bases: object

Base class for estimators in pgmpy; ParameterEstimator, StructureEstimator and StructureScore derive from this class.

Parameters:
data: pandas DataFrame object

object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.nan. Note that pandas converts each column containing numpy.nan`s to dtype `float.)

state_names: dict (optional)

A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

state_counts(variable, parents=[], weighted=False, reindex=True)[source]#

Return counts how often each state of ‘variable’ occurred in the data. If a list of parents is provided, counting is done conditionally for each state configuration of the parents.

Parameters:
variable: string

Name of the variable for which the state count is to be done.

parents: list

Optional list of variable parents, if conditional counting is desired. Order of parents in list is reflected in the returned DataFrame

weighted: bool

If True, data must have a _weight column specifying the weight of the datapoint (row). If False, each datapoint has a weight of 1.

reindex: bool

If True, returns a data frame with all possible parents state combinations as the columns. If False, drops the state combinations which are not present in the data.

Returns:
state_counts: pandas.DataFrame

Table with state counts for ‘variable’

Examples

>>> import pandas as pd
>>> from pgmpy.estimators import BaseEstimator
>>> data = pd.DataFrame(
...     data={
...         "A": ["a1", "a1", "a2"],
...         "B": ["b1", "b2", "b1"],
...         "C": ["c1", "c1", "c2"],
...     }
... )
>>> estimator = BaseEstimator(data)
>>> estimator.state_counts(variable="A").values
array([[2],
       [1]])
>>> estimator.state_counts(variable="C", parents=["A", "B"]).values
array([[1., 1., 0., 0.],
       [0., 0., 1., 0.]])