SHOGUN provides "static" interfaces to Matlab(tm), R, Python, Octave and provides a command line stand-a-lone executable. The idea behind the static interfaces is to provide a simple environment just enough to do simple experiments. For example, it will allow you to train and evaluate a classifier but not go beyond that. In case you are looking for basically unlimited extensibility (multiple methods like classifiers potentially sharing data and interacting) you might want to look at the ModularInterfaces" instead.
In this tutorial we demonstrate how to use shogun to create a simple gaussian kernel based support vector machine classifier but first things first. Lets start up R, python, octave or matlab load the shogun environment.
To start SHOGUN in python, start python and type
from sg import sg
For R issue (from within R)
For octave and matlab just make sure sg is in the path (use addpath). For the cmdline interface just start the shogun executable
Now in all languages
in the cmdline interface will show the help screen. If not consult Installation on how to install shogun.
The rest of this tutorial assumes that the cmdline shogun executable is used (but hints on how things work using other interfaces). The basic syntax is
<command> <option1> <option2> ...
here options are separated by spaces. For example
set_kernel GAUSSIAN REAL 10 1.2
will create a gaussian kernel that operates on real-valued features uses a kernel cache of size 10 MB and kernel width of 1.2. In analogy the other the cmdline for the other interfaces (python,r,...) would look like
sg('set_kernel', 'GAUSSIAN', 'REAL', 10, 1.2)
Note that there is little difference to the other interfaces, basically only strings are marked as such and arguments comma separated.
We now use two random gaussians as inputs as train data:
set_features TRAIN ../data/fm_train_real.dat
(For other interfaces sth. like
sg('set_features', 'TRAIN', [ randn(2, 100)-1, randn(2,100)+1 ])
For training a supervised method like an SVM we need a labeling of the training data, which we set via
set_labels TRAIN ../data/label_train_twoclass.dat
(For other interfaces, e.g. matlab/octave sth. like
sg('set_labels', 'TRAIN', sign(randn(1, 100)))
Now we create an SVM and set the SVM-C parameter to some hopefully sane value (which in real applications needs tuning; like the kernel parameters (here kernel width)).
new_classifier LIBSVM c 1
We then train our SVM:
We can now apply our classifier to unseen test data by loading some test data and classifying the examples:
set_features TEST ../data/fm_test_real.dat out.txt = classify
In case we want to save the classifiers, such that we don't have to perform potentially time consuming training again we can save and load like this
save_classifier libsvm.model load_classifier libsvm.model LIBSVM
Other interfaces (python,r...) could use the load/save functions but typically one manually obtains and restores the model parameters, like
[b,alphas]=sg('get_classifier') sg('set_classifier', b, alphas)
in this case.
Finally, the complete example looks like this
and can be found together with many other examples in examples/cmdline/classifier_libsvm.sg .
For users of other interfaces we show the translated example for completeness: