Nyström Kernel Ridge Regression

The Nyström method is a technique for reducing the computational load of kernel methods by replacing the kernel matrix with a low rank approximation.

The approximation is achieved by projecting the data matrix on a subset of data points, resulting in a linear system that is cheaper to solve. When applied to ridge regression, the following approximate system is solved

\[{\bf \alpha} = (\tau K_{m,m} + K_{m,n}K_{n,m})^+K_{m,n} {\bf y}\]

where \(K_{n,m}\) is a submatrix of the kernel matrix \(K\) containing all \(n\) rows and the \(m\) columns corresponding to the \(m\) chosen training examples, \(K_{m,n}\) is its transpose and \(K_{m,m}\) is the submatrix with the m rows and columns corresponding to the training examples chosen. \(+\) indicates the Moore-Penrose pseudoinverse. The computational complexity is \(O(m^3 + m^2n)\) versus \(O(n^3)\) for ordinary kernel ridge regression. The memory requirements are \(O(m^2 + nm)\) versus \(O(n^2)\).

Several ways to subsample columns of the data matrix have been proposed. The default method is to subsample columns uniformly.

The Nyström method has been shown empirically to give good results while substantially increasing performance.

See [WS01] or [RCR15] for more details.

Example

The API is very similar to CKernelRidgeRegression. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CRegressionLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = RegressionLabels(f_labels_train)
labels_test = RegressionLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = RegressionLabels(f_labels_train);
labels_test = RegressionLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
RegressionLabels labels_train = new RegressionLabels(f_labels_train);
RegressionLabels labels_test = new RegressionLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::RegressionLabels.new f_labels_train
labels_test = Modshogun::RegressionLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- RegressionLabels(f_labels_train)
labels_test <- RegressionLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.RegressionLabels(f_labels_train)
labels_test = modshogun.RegressionLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
RegressionLabels labels_train = new RegressionLabels(f_labels_train);
RegressionLabels labels_test = new RegressionLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CRegressionLabels>(f_labels_train);
auto labels_test = some<CRegressionLabels>(f_labels_test);

Choose an appropriate CKernel and instantiate it. Here we use a CGaussianKernel.

width = 1.0
kernel = GaussianKernel(features_train, features_train, width)
width = 1.0;
kernel = GaussianKernel(features_train, features_train, width);
double width = 1.0;
GaussianKernel kernel = new GaussianKernel(features_train, features_train, width);
width = 1.0
kernel = Modshogun::GaussianKernel.new features_train, features_train, width
width <- 1.0
kernel <- GaussianKernel(features_train, features_train, width)
width = 1.0
kernel = modshogun.GaussianKernel(features_train, features_train, width)
double width = 1.0;
GaussianKernel kernel = new GaussianKernel(features_train, features_train, width);
auto width = 1.0;
auto kernel = some<CGaussianKernel>(features_train, features_train, width);

We create an instance of CKRRNystrom by passing it the ridge penalty \(\tau\), the number of data columns to subsample \(m\), the kernel and the labels.

tau = 0.001
m = 100
nystrom = KRRNystrom(tau, m, kernel, labels_train)
tau = 0.001;
m = 100;
nystrom = KRRNystrom(tau, m, kernel, labels_train);
double tau = 0.001;
int m = 100;
KRRNystrom nystrom = new KRRNystrom(tau, m, kernel, labels_train);
tau = 0.001
m = 100
nystrom = Modshogun::KRRNystrom.new tau, m, kernel, labels_train
tau <- 0.001
m <- 100
nystrom <- KRRNystrom(tau, m, kernel, labels_train)
tau = 0.001
m = 100
nystrom = modshogun.KRRNystrom(tau, m, kernel, labels_train)
double tau = 0.001;
int m = 100;
KRRNystrom nystrom = new KRRNystrom(tau, m, kernel, labels_train);
auto tau = 0.001;
auto m = 100;
auto nystrom = some<CKRRNystrom>(tau, m, kernel, labels_train);

Then we train the regression model and apply it to test data to get the predicted CRegressionLabels.

nystrom.train()
labels_predict = nystrom.apply_regression(features_test)
nystrom.train();
labels_predict = nystrom.apply_regression(features_test);
nystrom.train();
RegressionLabels labels_predict = nystrom.apply_regression(features_test);
nystrom.train 
labels_predict = nystrom.apply_regression features_test
nystrom$train()
labels_predict <- nystrom$apply_regression(features_test)
nystrom:train()
labels_predict = nystrom:apply_regression(features_test)
nystrom.train();
RegressionLabels labels_predict = nystrom.apply_regression(features_test);
nystrom->train();
auto labels_predict = nystrom->apply_regression(features_test);

After training, we can extract \(\alpha\).

alpha = nystrom.get_alphas()
alpha = nystrom.get_alphas();
DoubleMatrix alpha = nystrom.get_alphas();
alpha = nystrom.get_alphas 
alpha <- nystrom$get_alphas()
alpha = nystrom:get_alphas()
double[] alpha = nystrom.get_alphas();
auto alpha = nystrom->get_alphas();

Finally, we can evaluate the CMeanSquaredError.

eval = MeanSquaredError()
mse = eval.evaluate(labels_predict, labels_test)
eval = MeanSquaredError();
mse = eval.evaluate(labels_predict, labels_test);
MeanSquaredError eval = new MeanSquaredError();
double mse = eval.evaluate(labels_predict, labels_test);
eval = Modshogun::MeanSquaredError.new 
mse = eval.evaluate labels_predict, labels_test
eval <- MeanSquaredError()
mse <- eval$evaluate(labels_predict, labels_test)
eval = modshogun.MeanSquaredError()
mse = eval:evaluate(labels_predict, labels_test)
MeanSquaredError eval = new MeanSquaredError();
double mse = eval.evaluate(labels_predict, labels_test);
auto eval = some<CMeanSquaredError>();
auto mse = eval->evaluate(labels_predict, labels_test);

References

[RCR15]A. Rudi, R. Camoriano, and L. Rosasco. Less is more: nyström computational regularization. In Advances in Neural Information Processing Systems, 1657–1665. 2015.
[WS01]C. K. I. Williams and M. W. Seeger. Using the nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, 682–688. 2001.