TabularCPD#
- class pgmpy.factors.discrete.TabularCPD(variable: Hashable, variable_card: int, values: list | ArrayLike, evidence: list | tuple | None = None, evidence_card: list | tuple | None = None, state_names={})[source]#
Bases:
DiscreteFactorDefines the conditional probability distribution table (CPD table)
- Parameters:
- variable: int, string (any hashable python object)
The variable whose CPD is defined.
- variable_card: integer
Cardinality/no. of states of variable
- values: 2D array, 2D list or 2D tuple
Values for the CPD table. Please refer the example for the exact format needed.
- evidence: array-like
List of variables in evidences(if any) w.r.t. which CPD is defined.
- evidence_card: array-like
cardinality/no. of states of variables in `evidence`(if any)
- state_names: dict (default: dict())
A dictionary of the form {variable: list of states} specifying the names of possible states for each variable (variable + evidence) in the TabularCPD. The order in which the states are specified should match the order in the values array. If state_names is not specified, auto-assigns state names starting from 0.
Examples
For a distribution of P(grade|diff, intel)
diff
easy
hard
intel
low
medium
high
low
medium
high
gradeA
0.1
0.1
0.1
0.1
0.1
0.1
gradeB
0.1
0.1
0.1
0.1
0.1
0.1
gradeC
0.8
0.8
0.8
0.8
0.8
0.8
the values array should be [[0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1], [0.8,0.8,0.8,0.8,0.8,0.8]]
>>> cpd = TabularCPD( ... variable="grade", ... variable_card=3, ... values=[ ... [0.1, 0.1, 0.1, 0.1, 0.1, 0.1], ... [0.1, 0.1, 0.1, 0.1, 0.1, 0.1], ... [0.8, 0.8, 0.8, 0.8, 0.8, 0.8], ... ], ... evidence=["diff", "intel"], ... evidence_card=[2, 3], ... state_names={ ... "diff": ["easy", "hard"], ... "intel": ["low", "mid", "high"], ... "grade": ["A", "B", "C"], ... }, ... ) >>> print(cpd) +----------+------------+-----+------------+-------------+ | diff | diff(easy) | ... | diff(hard) | diff(hard) | +----------+------------+-----+------------+-------------+ | intel | intel(low) | ... | intel(mid) | intel(high) | +----------+------------+-----+------------+-------------+ | grade(A) | 0.1 | ... | 0.1 | 0.1 | +----------+------------+-----+------------+-------------+ | grade(B) | 0.1 | ... | 0.1 | 0.1 | +----------+------------+-----+------------+-------------+ | grade(C) | 0.8 | ... | 0.8 | 0.8 | +----------+------------+-----+------------+-------------+ >>> cpd.values array([[[0.1, 0.1, 0.1], [0.1, 0.1, 0.1]], [[0.1, 0.1, 0.1], [0.1, 0.1, 0.1]], [[0.8, 0.8, 0.8], [0.8, 0.8, 0.8]]]) >>> cpd.variables ['grade', 'diff', 'intel'] >>> cpd.cardinality array([3, 2, 3]) >>> cpd.variable 'grade' >>> cpd.variable_card 3
- copy()[source]#
Returns a copy of the TabularCPD object.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd = TabularCPD( ... variable="grade", ... variable_card=2, ... values=[[0.7, 0.6, 0.6, 0.2], [0.3, 0.4, 0.4, 0.8]], ... evidence=["intel", "diff"], ... evidence_card=[2, 2], ... ) >>> copy = cpd.copy() >>> copy.variable 'grade' >>> copy.variable_card 2 >>> copy.values array([[[0.7, 0.6], [0.6, 0.2]], [[0.3, 0.4], [0.4, 0.8]]])
- static get_random(variable, evidence=None, cardinality=None, state_names={}, seed=None)[source]#
Generates a TabularCPD instance with random values on variable with parents/evidence evidence with cardinality/number of states as given in cardinality.
- Parameters:
- variable: str, int or any hashable python object.
The variable on which to define the TabularCPD.
- evidence: list, array-like
A list of variable names which are the parents/evidence of variable.
- cardinality: dict (default: None)
A dict of the form {var_name: card} specifying the number of states/ cardinality of each of the variables. If None, assigns each variable 2 states.
- state_names: dict (default: {})
A dict of the form {var_name: list of states} to specify the state names for the variables in the CPD. If state_names=None, integral state names starting from 0 is assigned.
- Returns:
- Random CPD: pgmpy.factors.discrete.TabularCPD
A TabularCPD object on variable with evidence as evidence with random values.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> TabularCPD.get_random( ... variable="A", evidence=["B", "C"], cardinality={"A": 3, "B": 2, "C": 4} ... ) <TabularCPD representing P(A:3 | ...) at 0x...> >>> TabularCPD.get_random( ... variable="A", ... evidence=["B", "C"], ... cardinality={"A": 2, "B": 2, "C": 2}, ... state_names={"A": ["a1", "a2"], "B": ["b1", "b2"], "C": ["c1", "c2"]}, ... ) <TabularCPD representing P(A:2 | B:2, C:2) at 0x...>
- static get_uniform(variable, evidence=None, cardinality=None, state_names={}, seed=None)[source]#
Generates a TabularCPD instance with uniform values (i.e., all probabilities are 0.5) on variable with parents/evidence evidence with cardinality/number of states as given in cardinality.
- Parameters:
- variable: str, int or any hashable python object.
The variable on which to define the TabularCPD.
- evidence: list, array-like
A list of variable names which are the parents/evidence of variable.
- cardinality: dict (default: None)
A dict of the form {var_name: card} specifying the number of states/ cardinality of each of the variables. If None, assigns each variable 2 states.
- state_names: dict (default: {})
A dict of the form {var_name: list of states} to specify the state names for the variables in the CPD. If state_names=None, integral state names starting from 0 is assigned.
- Returns:
- Uniform CPD: pgmpy.factors.discrete.TabularCPD
A TabularCPD object on variable with evidence as evidence with all probabilities set to 0.5.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> TabularCPD.get_uniform( ... variable="A", evidence=["B", "C"], cardinality={"A": 3, "B": 2, "C": 4} ... ) <TabularCPD representing P(A:3 | ...) at 0x...> >>> TabularCPD.get_uniform( ... variable="A", ... evidence=["B", "C"], ... cardinality={"A": 2, "B": 2, "C": 2}, ... state_names={"A": ["a1", "a2"], "B": ["b1", "b2"], "C": ["c1", "c2"]}, ... ) <TabularCPD representing P(A:2 | B:2, C:2) at 0x...>
- get_values()[source]#
Returns the values of the CPD as a 2-D array. The order of the parents is the same as provided in evidence.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd = TabularCPD( ... variable="grade", ... variable_card=3, ... values=[[0.1, 0.1], [0.1, 0.1], [0.8, 0.8]], ... evidence=["evi1"], ... evidence_card=[2], ... ) >>> cpd.get_values() array([[0.1, 0.1], [0.1, 0.1], [0.8, 0.8]])
- marginalize(variables, inplace=True)[source]#
Modifies the CPD table with marginalized values. Marginalization refers to summing out variables, hence that variable would no longer appear in the CPD.
- Parameters:
- variables: list, array-like
list of variable to be marginalized
- inplace: boolean
If inplace=True it will modify the CPD itself, else would return a new CPD
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd_table = TabularCPD( ... variable="grade", ... variable_card=2, ... values=[[0.7, 0.6, 0.6, 0.2], [0.3, 0.4, 0.4, 0.8]], ... evidence=["intel", "diff"], ... evidence_card=[2, 2], ... ) >>> cpd_table.marginalize(variables=["diff"]) >>> cpd_table.get_values() array([[0.65, 0.4 ], [0.35, 0.6 ]])
- normalize(inplace=True)[source]#
Normalizes the cpd table. The method modifies each column of values such that it sums to 1 without changing the proportion between states.
- Parameters:
- inplace: boolean
If inplace=True it will modify the CPD itself, else would return a new CPD
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd_table = TabularCPD( ... variable="grade", ... variable_card=2, ... values=[[0.7, 0.2, 0.6, 0.2], [0.4, 0.4, 0.4, 0.8]], ... evidence=["intel", "diff"], ... evidence_card=[2, 2], ... ) >>> cpd_table.normalize() >>> cpd_table.get_values() array([[0.63636364, 0.33333333, 0.6 , 0.2 ], [0.36363636, 0.66666667, 0.4 , 0.8 ]])
- reduce(values, inplace=True, show_warnings=True)[source]#
Reduces the cpd table to the context of given variable values. Reduce fixes the state of given variable to specified value. The reduced variables will no longer appear in the CPD.
- Parameters:
- values: list, array-like
A list of tuples of the form (variable_name, variable_state).
- inplace: boolean
If inplace=True it will modify the factor itself, else would return a new factor.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd_table = TabularCPD( ... variable="grade", ... variable_card=2, ... values=[[0.7, 0.6, 0.6, 0.2], [0.3, 0.4, 0.4, 0.8]], ... evidence=["intel", "diff"], ... evidence_card=[2, 2], ... ) >>> cpd_table.reduce(values=[("diff", 0)]) >>> cpd_table.get_values() array([[0.7, 0.6], [0.3, 0.4]])
- reorder_parents(new_order: list, inplace: bool = True)[source]#
Returns a new cpd table according to provided parent/evidence order.
- Parameters:
- new_order: list
list of new ordering of variables
- inplace: boolean
If inplace == True it will modify the CPD itself otherwise new value will be returned without affecting old values
Examples
Consider a CPD P(grade| diff, intel)
>>> cpd = TabularCPD( ... variable="grade", ... variable_card=3, ... values=[ ... [0.1, 0.1, 0.0, 0.4, 0.2, 0.1], ... [0.3, 0.2, 0.1, 0.4, 0.3, 0.2], ... [0.6, 0.7, 0.9, 0.2, 0.5, 0.7], ... ], ... evidence=["diff", "intel"], ... evidence_card=[2, 3], ... ) >>> print(cpd) +----------+----------+----------+----------+----------+----------+----------+ | diff | diff(0) | diff(0) | diff(0) | diff(1) | diff(1) | diff(1) | +----------+----------+----------+----------+----------+----------+----------+ | intel | intel(0) | intel(1) | intel(2) | intel(0) | intel(1) | intel(2) | +----------+----------+----------+----------+----------+----------+----------+ | grade(0) | 0.1 | 0.1 | 0.0 | 0.4 | 0.2 | 0.1 | +----------+----------+----------+----------+----------+----------+----------+ | grade(1) | 0.3 | 0.2 | 0.1 | 0.4 | 0.3 | 0.2 | +----------+----------+----------+----------+----------+----------+----------+ | grade(2) | 0.6 | 0.7 | 0.9 | 0.2 | 0.5 | 0.7 | +----------+----------+----------+----------+----------+----------+----------+ >>> cpd.values array([[[0.1, 0.1, 0. ], [0.4, 0.2, 0.1]], [[0.3, 0.2, 0.1], [0.4, 0.3, 0.2]], [[0.6, 0.7, 0.9], [0.2, 0.5, 0.7]]]) >>> cpd.variables ['grade', 'diff', 'intel'] >>> cpd.cardinality array([3, 2, 3]) >>> cpd.variable 'grade' >>> cpd.variable_card 3 >>> cpd.reorder_parents(new_order=["intel", "diff"]) array([[0.1, 0.4, 0.1, 0.2, 0. , 0.1], [0.3, 0.4, 0.2, 0.3, 0.1, 0.2], [0.6, 0.2, 0.7, 0.5, 0.9, 0.7]]) >>> print(cpd) +----------+----------+----------+----------+----------+----------+----------+ | intel | intel(0) | intel(0) | intel(1) | intel(1) | intel(2) | intel(2) | +----------+----------+----------+----------+----------+----------+----------+ | diff | diff(0) | diff(1) | diff(0) | diff(1) | diff(0) | diff(1) | +----------+----------+----------+----------+----------+----------+----------+ | grade(0) | 0.1 | 0.4 | 0.1 | 0.2 | 0.0 | 0.1 | +----------+----------+----------+----------+----------+----------+----------+ | grade(1) | 0.3 | 0.4 | 0.2 | 0.3 | 0.1 | 0.2 | +----------+----------+----------+----------+----------+----------+----------+ | grade(2) | 0.6 | 0.2 | 0.7 | 0.5 | 0.9 | 0.7 | +----------+----------+----------+----------+----------+----------+----------+ >>> cpd.values array([[[0.1, 0.4], [0.1, 0.2], [0. , 0.1]], [[0.3, 0.4], [0.2, 0.3], [0.1, 0.2]], [[0.6, 0.2], [0.7, 0.5], [0.9, 0.7]]]) >>> cpd.variables ['grade', 'intel', 'diff'] >>> cpd.cardinality array([3, 3, 2]) >>> cpd.variable 'grade' >>> cpd.variable_card 3
- to_csv(filename: str | PathLike)[source]#
Exports the CPD to a CSV file.
Examples
>>> from pgmpy.example_models import load_model >>> model = load_model("bnlearn/alarm") >>> cpd = model.get_cpds(node="SAO2") >>> cpd.to_csv(filename="sao2.csv")
- to_dataframe()[source]#
Exports the CPD as a pandas dataframe.
Examples
>>> from pgmpy.example_models import load_model >>> model = load_model("bnlearn/insurance") >>> cpd = model.get_cpds(node="ThisCarCost") >>> df = cpd.to_dataframe() >>> df.query( ... "CarValue=='FiftyThou' and Theft == 'True'" ... ) ThisCarCost HundredThou Million TenThou Thousand ThisCarDam CarValue Theft Mild FiftyThou True 0.950000 0.0 0.020000 0.030000 Moderate FiftyThou True 0.998000 0.0 0.001000 0.001000 None FiftyThou True 0.950000 0.0 0.010000 0.040000 Severe FiftyThou True 0.999998 0.0 0.000001 0.000001 >>> # Probability sums up to zero, for every combination of evidence variables >>> df.sum(axis=1) ThisCarDam CarValue Theft Mild FiftyThou False 1.0 True 1.0 FiveThou False 1.0 True 1.0 Million False 1.0 True 1.0 TenThou False 1.0 True 1.0 TwentyThou False 1.0 True 1.0 Moderate FiftyThou False 1.0 True 1.0 FiveThou False 1.0 True 1.0 Million False 1.0 True 1.0 TenThou False 1.0 True 1.0 TwentyThou False 1.0 True 1.0 None FiftyThou False 1.0 True 1.0 FiveThou False 1.0 True 1.0 Million False 1.0 True 1.0 TenThou False 1.0 True 1.0 TwentyThou False 1.0 True 1.0 Severe FiftyThou False 1.0 True 1.0 FiveThou False 1.0 True 1.0 Million False 1.0 True 1.0 TenThou False 1.0 True 1.0 TwentyThou False 1.0 True 1.0 dtype: float64
- to_factor()[source]#
Returns an equivalent factor with the same variables, cardinality, values as that of the CPD. Since factor doesn’t distinguish between conditional and non-conditional distributions, evidence information will be lost.
Examples
>>> from pgmpy.factors.discrete import TabularCPD >>> cpd = TabularCPD( ... variable="grade", ... variable_card=3, ... values=[[0.1, 0.1], [0.1, 0.1], [0.8, 0.8]], ... evidence=["evi1"], ... evidence_card=[2], ... ) >>> factor = cpd.to_factor() >>> factor <DiscreteFactor representing phi(grade:3, evi1:2) at 0x...>