Naive Bayes¶
- class pgmpy.models.NaiveBayes.NaiveBayes(feature_vars=None, dependent_var=None)[source]¶
Class to represent Naive Bayes. Naive Bayes is a special case of Bayesian Model where the only edges in the model are from the feature variables to the dependent variable.
- active_trail_nodes(start, observed=None)[source]¶
Returns all the nodes reachable from start via an active trail.
- Parameters:
start (Graph node)
observed (List of nodes (optional)) – If given the active trail would be computed assuming these nodes to be observed.
Examples
>>> from pgmpy.models import NaiveBayes >>> model = NaiveBayes() >>> model.add_nodes_from(['a', 'b', 'c', 'd']) >>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')]) >>> model.active_trail_nodes('a') {'a', 'd', 'c', 'b'} >>> model.active_trail_nodes('a', ['b', 'c']) {'a', 'd'} >>> model.active_trail_nodes('b', ['a']) {'b'}
- add_edge(u, v, *kwargs)[source]¶
Add an edge between u and v.
The nodes u and v will be automatically added if they are not already in the graph. u will be the dependent variable (i.e. variable to be predicted) and v will be one of the features (i.e. predictors) in the model.
- Parameters:
u (nodes) – Nodes can be any hashable python object.
v (nodes) – Nodes can be any hashable python object.
- Return type:
None
Examples
>>> from pgmpy.models import NaiveBayes >>> G = NaiveBayes() >>> G.add_nodes_from(['a', 'b', 'c']) >>> G.add_edge('a', 'b') >>> G.add_edge('a', 'c') >>> G.edges() OutEdgeView([('a', 'b'), ('a', 'c')])
- add_edges_from(ebunch)[source]¶
Adds edges to the model.
Each tuple of the form (u, v) in ebunch adds a new edge in the model. Since there can only be one dependent variable in a Naive Bayes model, u should be the same for each tuple in ebunch.
- Parameters:
ebunch (list (array-like)) – A list of tuples of the form (u, v) representing an edge from u to v.
- Return type:
None
Examples
>>> from pgmpy.models import NaiveBayes >>> G = NaiveBayes() >>> G.add_nodes_from(['a', 'b', 'c']) >>> G.add_edges_from([('a', 'b'), ('a', 'c')]) >>> G.edges() OutEdgeView([('a', 'b'), ('a', 'c')])
- fit(data, parent_node=None, estimator=None)[source]¶
Computes the CPD for each node from a given data in the form of a pandas dataframe. If a variable from the data is not present in the model, it adds that node into the model.
- Parameters:
data (pandas DataFrame object) – A DataFrame object with column names same as the variable names of network
parent_node (any hashable python object (optional)) – Parent node of the model, if not specified it looks for a previously specified parent node.
estimator (Estimator class) – Any pgmpy estimator. If nothing is specified, the default
MaximumLikelihoodEstimator
would be used.
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import NaiveBayes >>> model = NaiveBayes() >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 5)), ... columns=['A', 'B', 'C', 'D', 'E']) >>> model.fit(values, 'A') >>> model.get_cpds() [<TabularCPD representing P(D:2 | A:2) at 0x4b72870>, <TabularCPD representing P(E:2 | A:2) at 0x4bb2150>, <TabularCPD representing P(A:2) at 0x4bb23d0>, <TabularCPD representing P(B:2 | A:2) at 0x4bb24b0>, <TabularCPD representing P(C:2 | A:2) at 0x4bb2750>] >>> model.edges() [('A', 'D'), ('A', 'E'), ('A', 'B'), ('A', 'C')]
- local_independencies(variables)[source]¶
Returns an instance of Independencies containing the local independencies of each of the variables.
- Parameters:
variables (str or array like) – variables whose local independencies are to found.
Examples
>>> from pgmpy.models import NaiveBayes >>> model = NaiveBayes() >>> model.add_edges_from([('a', 'b'), ('a', 'c'), ('a', 'd')]) >>> ind = model.local_independencies('b') >>> ind (b ⟂ d, c | a)