Naive Bayes

class pgmpy.models.NaiveBayes.NaiveBayes(feature_vars=None, dependent_var=None)[source]

Class to represent Naive Bayes. Naive Bayes is a special case of Bayesian Model where the only edges in the model are from the feature variables to the dependent variable.

active_trail_nodes(start, observed=None)[source]

Returns all the nodes reachable from start via an active trail.

Parameters:
  • start (Graph node)

  • observed (List of nodes (optional)) – If given the active trail would be computed assuming these nodes to be observed.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_nodes_from(['a', 'b', 'c', 'd'])
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> model.active_trail_nodes('a')
{'a', 'd', 'c', 'b'}
>>> model.active_trail_nodes('a', ['b', 'c'])
{'a', 'd'}
>>> model.active_trail_nodes('b', ['a'])
{'b'}
add_edge(u, v, *kwargs)[source]

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph. u will be the dependent variable (i.e. variable to be predicted) and v will be one of the features (i.e. predictors) in the model.

Parameters:
  • u (nodes) – Nodes can be any hashable python object.

  • v (nodes) – Nodes can be any hashable python object.

Return type:

None

Examples

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> G.add_edge('a', 'b')
>>> G.add_edge('a', 'c')
>>> G.edges()
OutEdgeView([('a', 'b'), ('a', 'c')])
add_edges_from(ebunch)[source]

Adds edges to the model.

Each tuple of the form (u, v) in ebunch adds a new edge in the model. Since there can only be one dependent variable in a Naive Bayes model, u should be the same for each tuple in ebunch.

Parameters:

ebunch (list (array-like)) – A list of tuples of the form (u, v) representing an edge from u to v.

Return type:

None

Examples

>>> from pgmpy.models import NaiveBayes
>>> G = NaiveBayes()
>>> G.add_nodes_from(['a', 'b', 'c'])
>>> G.add_edges_from([('a', 'b'), ('a', 'c')])
>>> G.edges()
OutEdgeView([('a', 'b'), ('a', 'c')])
fit(data, parent_node=None, estimator=None)[source]

Computes the CPD for each node from a given data in the form of a pandas dataframe. If a variable from the data is not present in the model, it adds that node into the model.

Parameters:
  • data (pandas DataFrame object) – A DataFrame object with column names same as the variable names of network

  • parent_node (any hashable python object (optional)) – Parent node of the model, if not specified it looks for a previously specified parent node.

  • estimator (Estimator class) – Any pgmpy estimator. If nothing is specified, the default MaximumLikelihoodEstimator would be used.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)),
...                       columns=['A', 'B', 'C', 'D', 'E'])
>>> model.fit(values, 'A')
>>> model.get_cpds()
[<TabularCPD representing P(D:2 | A:2) at 0x4b72870>,
 <TabularCPD representing P(E:2 | A:2) at 0x4bb2150>,
 <TabularCPD representing P(A:2) at 0x4bb23d0>,
 <TabularCPD representing P(B:2 | A:2) at 0x4bb24b0>,
 <TabularCPD representing P(C:2 | A:2) at 0x4bb2750>]
>>> model.edges()
[('A', 'D'), ('A', 'E'), ('A', 'B'), ('A', 'C')]
local_independencies(variables)[source]

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:

variables (str or array like) – variables whose local independencies are to found.

Examples

>>> from pgmpy.models import NaiveBayes
>>> model = NaiveBayes()
>>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')])
>>> ind = model.local_independencies('b')
>>> ind
(b ⟂ d, c | a)