NaiveBayes#

class pgmpy.models.NaiveBayes(*args, backend=None, **kwargs)[source]#

Bases: DiscreteBayesianNetwork

Class to represent Naive Bayes. Naive Bayes is a special case of Bayesian Model where the only edges in the model are from the feature variables to the dependent variable.

active_trail_nodes(start, observed=None)[source]#

Returns all the nodes reachable from start via an active trail.

Parameters:
start: Graph node
observedList of nodes (optional)

If given the active trail would be computed assuming these nodes to be observed.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_nodes_from(["a", "b", "c", "d"])
>>> model.add_edges_from([("a", "b"), ("a", "c"), ("a", "d")])
>>> sorted(model.active_trail_nodes("a"))
['a', 'b', 'c', 'd']
>>> sorted(model.active_trail_nodes("a", ["b", "c"]))
['a', 'd']
>>> model.active_trail_nodes("b", ["a"])
{'b'}
add_edge(u, v, *kwargs)[source]#

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph. u will be the dependent variable (i.e. variable to be predicted) and v will be one of the features (i.e. predictors) in the model.

Parameters:
u, vnodes

Nodes can be any hashable python object.

Returns:
None

Examples

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(["a", "b", "c"])
>>> G.add_edge("a", "b")
>>> G.add_edge("a", "c")
>>> G.edges()
OutEdgeView([('a', 'b'), ('a', 'c')])
add_edges_from(ebunch)[source]#

Adds edges to the model.

Each tuple of the form (u, v) in ebunch adds a new edge in the model. Since there can only be one dependent variable in a Naive Bayes model, u should be the same for each tuple in ebunch.

Parameters:
ebunch: list (array-like)

A list of tuples of the form (u, v) representing an edge from u to v.

Returns:
None

Examples

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(["a", "b", "c"])
>>> G.add_edges_from([("a", "b"), ("a", "c")])
>>> G.edges()
OutEdgeView([('a', 'b'), ('a', 'c')])
fit(data, parent_node=None, estimator=None)[source]#

Computes the CPD for each node from a given data in the form of a pandas dataframe. If a variable from the data is not present in the model, it adds that node into the model.

Parameters:
datapandas DataFrame object

A DataFrame object with column names same as the variable names of network

parent_node: any hashable python object (optional)

Parent node of the model, if not specified it looks for a previously specified parent node.

estimator: Estimator class

Any pgmpy estimator. If nothing is specified, the default MaximumLikelihoodEstimator would be used.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> values = pd.DataFrame(
...     np.random.randint(low=0, high=2, size=(1000, 5)),
...     columns=["A", "B", "C", "D", "E"],
... )
>>> model.fit(values, "A")
>>> model.get_cpds()
[<TabularCPD representing P(A:2) at 0x...>,
 <TabularCPD representing P(B:2 | A:2) at 0x...>,
 <TabularCPD representing P(C:2 | A:2) at 0x...>,
 <TabularCPD representing P(D:2 | A:2) at 0x...>,
 <TabularCPD representing P(E:2 | A:2) at 0x...>]
>>> sorted(model.edges())
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('A', 'E')]
local_independencies(variables)[source]#

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:
variables: str or array like

variables whose local independencies are to found.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_edges_from([("a", "b"), ("a", "c"), ("a", "d")])
>>> ind = model.local_independencies("b")
>>> assertion = ind.get_assertions()[0]
>>> sorted(assertion.event1), sorted(assertion.event2), sorted(assertion.event3)
(['b'], ['c', 'd'], ['a'])