SHOGUN  5.0.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages

Shogun's ML functionality is currently split into feature representations, feature preprocessors, kernels, kernel normalizers, distances, classifier, clustering algorithms, distributions, performance evaluation measures, regression methods, structured output learners. The following gives a brief overview over all the ML related Algorithms/Classes/Methods implemented within shogun.

Feature Representations

Shogun supports a wide range of feature representations. Among them are the so called simple features (cf., CSimpleFeatures) that are standard 2-d Matrices, strings (cf., CStringFeatures) that however in contrast to other meanings of string are just a list of vectors of arbitrary length and sparse features (cf., CSparseFeatures) to efficiently represent sparse matrices.

Each of these feature objects

supports any of the standard types from bool to floats:

Supported Types

Many other feature types available. Some of them are based on the three basic feature types above, like CTOPFeatures (TOP Kernel features from CHMM), CFKFeatures (Fisher Kernel features from CHMM) and CRealFileFeatures (vectors fetched from a binary file). It should be noted that all feature objects are derived from CFeatures More complex

In addition, labels are represented in CLabels and the alphabet of a string in CAlphabet.


The aforementioned features can be on-the-fly preprocessed to e.g. subtract the mean or normalize vectors to norm 1 etc. The following pre-processors are implemented


A multitude of Classifiers are implemented in shogun. Among them are several standard 2-class classifiers, 1-class classifiers and multi-class classifiers. Several of them are linear classifiers and SVMs. Among the fastest linear SVM-classifiers are CSGD, CSVMOcas and CLibLinear (capable of dealing with millions of examples and features).

Linear Classifiers

Support Vector Machines

Vowpal Wabbit

Distance Machines


Vector Regression



Classical Distributions


Multiple Kernel Learning


Kernel Normalizers

Since several of the kernels pose numerical challenges to SVM optimizers, kernels can be ``normalized'' for example to have ones on the diagonal.


Distance Measures to measure the distance between objects. They can be used in CDistanceMachine's like CKNN. The following distances are implemented:


Performance Measures

Performance measures assess the quality of a prediction and are implemented in CPerformanceMeasures. They following measures are implemented:

SHOGUN Machine Learning Toolbox - Documentation