Gaussian Naive Bayes

Gaussian Naive Bayes classifies data according to how well it aligns with the Gaussian distributions of several different classes.

The probability that some feature \(x_{i}\) in the feature vector \(i\) belongs to class \(c\), \(p(x_{i}|c)\), is given by

\[p(x_{i}|c)=\frac{1}{\sqrt{2\pi\sigma_{x,c}^{2}}}\exp \left(-\frac{(x_{i}-\mu_{x,c})^{2}}{2\sigma_{x,c}^{2}} \right)\]

For each vector, the Gaussian Naive Bayes classifier chooses the class \(c\) which the vector most likely belongs to, given by

\[\argmax_c p(c)\prod_{i}p(x_{i}|c)\]

See Chapter 10 in [Bar12] for a detailed introduction.

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = MulticlassLabels(f_labels_train)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = MulticlassLabels(f_labels_train);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::MulticlassLabels.new f_labels_train
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- MulticlassLabels(f_labels_train)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.MulticlassLabels(f_labels_train)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CMulticlassLabels>(f_labels_train);

We create an instance of the CGaussianNaiveBayes classifier, passing it training data and the label.

gnb = GaussianNaiveBayes(features_train, labels_train)
gnb = GaussianNaiveBayes(features_train, labels_train);
GaussianNaiveBayes gnb = new GaussianNaiveBayes(features_train, labels_train);
gnb = Modshogun::GaussianNaiveBayes.new features_train, labels_train
gnb <- GaussianNaiveBayes(features_train, labels_train)
gnb = modshogun.GaussianNaiveBayes(features_train, labels_train)
GaussianNaiveBayes gnb = new GaussianNaiveBayes(features_train, labels_train);
auto gnb = some<CGaussianNaiveBayes>(features_train, labels_train);

Then we run the train Gaussian Naive Bayes algorithm and apply it to the test data, which here gives CMulticlassLabels

gnb.train()
labels_predict = gnb.apply_multiclass(features_test)
gnb.train();
labels_predict = gnb.apply_multiclass(features_test);
gnb.train();
MulticlassLabels labels_predict = gnb.apply_multiclass(features_test);
gnb.train 
labels_predict = gnb.apply_multiclass features_test
gnb$train()
labels_predict <- gnb$apply_multiclass(features_test)
gnb:train()
labels_predict = gnb:apply_multiclass(features_test)
gnb.train();
MulticlassLabels labels_predict = gnb.apply_multiclass(features_test);
gnb->train();
auto labels_predict = gnb->apply_multiclass(features_test);