K Nearest neighbours

KNN classifies data according to the majority of labels in the nearest neighbourhood, according to some underlying distance function \(d(x,x')\).

For \(k=1\), the label for a test point \(x^*\) is predicted to be the same as for its closest training point \(x_{k}\), i.e. \(y_{k}\), where

\[k=\argmin_j d(x^*, x_j).\]

See Chapter 14 in [Bar12] for a detailed introduction.

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = MulticlassLabels(f_labels_train)
labels_test = MulticlassLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = MulticlassLabels(f_labels_train);
labels_test = MulticlassLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::MulticlassLabels.new f_labels_train
labels_test = Modshogun::MulticlassLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- MulticlassLabels(f_labels_train)
labels_test <- MulticlassLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.MulticlassLabels(f_labels_train)
labels_test = modshogun.MulticlassLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CMulticlassLabels>(f_labels_train);
auto labels_test = some<CMulticlassLabels>(f_labels_test);

In order to run CKNN, we need to choose a distance, for example CEuclideanDistance, or other sub-classes of CDistance. The distance is initialized with the data we want to classify.

distance = EuclideanDistance(features_train, features_train)
distance = EuclideanDistance(features_train, features_train);
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
distance = Modshogun::EuclideanDistance.new features_train, features_train
distance <- EuclideanDistance(features_train, features_train)
distance = modshogun.EuclideanDistance(features_train, features_train)
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
auto distance = some<CEuclideanDistance>(features_train, features_train);

Once we have chosen a distance, we create an instance of the CKNN classifier, passing it \(k\).

k = 3
knn = KNN(k, distance, labels_train)
k = 3;
knn = KNN(k, distance, labels_train);
int k = 3;
KNN knn = new KNN(k, distance, labels_train);
k = 3
knn = Modshogun::KNN.new k, distance, labels_train
k <- 3
knn <- KNN(k, distance, labels_train)
k = 3
knn = modshogun.KNN(k, distance, labels_train)
int k = 3;
KNN knn = new KNN(k, distance, labels_train);
auto k = 3;
auto knn = some<CKNN>(k, distance, labels_train);

Then we run the train KNN algorithm and apply it to test data, which here gives CMulticlassLabels.

knn.train()
labels_predict = knn.apply_multiclass(features_test)
knn.train();
labels_predict = knn.apply_multiclass(features_test);
knn.train();
MulticlassLabels labels_predict = knn.apply_multiclass(features_test);
knn.train 
labels_predict = knn.apply_multiclass features_test
knn$train()
labels_predict <- knn$apply_multiclass(features_test)
knn:train()
labels_predict = knn:apply_multiclass(features_test)
knn.train();
MulticlassLabels labels_predict = knn.apply_multiclass(features_test);
knn->train();
auto labels_predict = knn->apply_multiclass(features_test);

We can evaluate test performance via e.g. CMulticlassAccuracy.

eval = MulticlassAccuracy()
accuracy = eval.evaluate(labels_predict, labels_test)
eval = MulticlassAccuracy();
accuracy = eval.evaluate(labels_predict, labels_test);
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
eval = Modshogun::MulticlassAccuracy.new 
accuracy = eval.evaluate labels_predict, labels_test
eval <- MulticlassAccuracy()
accuracy <- eval$evaluate(labels_predict, labels_test)
eval = modshogun.MulticlassAccuracy()
accuracy = eval:evaluate(labels_predict, labels_test)
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
auto eval = some<CMulticlassAccuracy>();
auto accuracy = eval->evaluate(labels_predict, labels_test);

References

Wikipedia: K-nearest_neighbors_algorithm

[Bar12]David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2012.