SHOGUN  v3.0.0
Static Interfaces

As mentioned before SHOGUN interfaces to several programming languages and toolkits such as Matlab(tm), R, Python, Octave. The following sections shall give you an overview over the static interface commands of SHOGUN. For the static interfaces we tried to preserve the syntax of the commands in a consistent manner through all the different languages. However as in some cases this was not possible and we document the subtle differences of syntax and semantic in the respective toolkit. Instead of reading through all this, we suggest to have a look at the large number of examples available in the examples / interface directory. For example examples/R or examples/python etc.

Overview of Static Interfaces & Testing the Installation

Interface Commands

Command Reference

# Overview of Static Interfaces & Testing the Installation

## Static Matlab and Octave Interface

Since octave is nowadays up to par with matlab a single documentation for both interfaces is sufficient and will be based on octave (matlab can be used synonymously).

To start SHOGUN in octave, start octave and check if it is correctly installed by by typing ( let ">" be the octave prompt )

  sg('help')


inside of octave. This should show you some help text.

## Static Python Interface

To start SHOGUN in python, start python and check if it is correctly installed by by typing ( let ">" be the python prompt )

  from sg import sg
sg('help')


inside of python. This should show you some help text.

## Static R Interface

To fire up SHOGUN in R make sure that you have SHOGUN correctly installed in R. You can check this by typing ( let ">" be the R prompt ):

  > library()


inside of R, this command should list all R packages that have been installed on your system. You should have an entry like:

  sg                     The SHOGUN Machine Learning Toolbox


After you made sure that SHOGUN is installed correctly you can start it via:

  > library(sg)


you will see some informations of the SHOGUN core (compile options etc). After this command R and SHOGUN are ready to receive your commands.

In general all commands in SHOGUN are issued using the function sg(...). To invoke the SHOGUN command help one types:

  > sg('help')


and then a help text appears giving a short description of all commands.

# Static Interface Commands

## Features

These functions transfer data from the interface to shogun and back. Suppose you have a matlab matrix or R matrix "features" which contains your training data and you want to register this data, you simply type:

Transfer the features to shogun

• set_features
sg('set_features', 'TRAIN|TEST', features[, DNABINFILE|<ALPHABET>])
sg('add_features', 'TRAIN|TEST', features[, DNABINFILE|<ALPHABET>])

Features can be char/byte/word/int/real valued matrices, real values sparse matrices, or strings (lists or cell arrays of strings). When dealing with strings an alphabet name has to be specified (DNA, RAW, ...). Use 'TRAIN' to tell SHOGUN that this is the data you want to train your classifier and TEST for the test data.

In contrast to set_features, add_features will create a combined feature object and append the features to it. This is useful when dealing with a set of different features (real valued and strings) and multiple kernels.

In case a single string was set using set_features, it can be "multiplexed" by sliding a window over it using

• from_position_list
sg('from_position_list', 'TRAIN|TEST', winsize, shift[, skip])
or
• obtain_from_sliding_window
sg('obtain_from_sliding_window', winsize, skip)

Deletes the features which we assigned before in the actual SHOGUN session.

• clean_features
sg('clean_features')

Obtain the Features from shogun

• get_features
[features]=sg('get_features', 'TRAIN|TEST')

One proceeds similar when assigning labels to the training data and obtaining labels from shogun: The commands

• set_labels
sg('set_labels', 'TRAIN', trainlab)
• get_labels
[labels]=sg('get_labels', 'TRAIN|TEST')

tell SHOGUN that the labels of the assigned training data reside in trainlab, respectively return the current labels (note that currently all data is copied into SHOGUN, so modifications to trainlab are local within the interface).

## Kernel & Distances

Kernel and DistanceMatrix specific commands, used to create, obtain and setting the kernel matrix.

Creating a kernel in shogun

• set_kernel
sg('set_kernel', 'KERNELNAME', 'FEATURETYPE', CACHESIZE, PARAMETERS)
sg('add_kernel', WEIGHT, 'KERNELNAME', 'FEATURETYPE', CACHESIZE, PARAMETERS)

Here KERNELNAME is the name of the kernel one wishes to use, FEATURETYPE the type of features (e.g. REAL for standard realvalued feature vectors), CACHESIZE the size of the kernel cache in megabytes and PARAMETERS kernel specific additional parameters.

### Supported Kernels

The following kernels are implemented in SHOGUN:

• AUC
• Chi2
• Spectrum
• Const Kernel
• User defined CustomKernel
• Diagonal Kernel
• Kernel from Distance
• Fixed Degree StringKernel
• Gaussian $$k(x,x')=e^{-\frac{||x-x'||^2}{\sigma}}$$

To work with a gaussian kernel on real values one issues:

sg('set_kernel', 'GAUSSIAN', 'TYPE', CACHESIZE, SIGMA)

For example:

sg('set_kernel', 'GAUSSIAN', 'REAL', 40, 1)

creates a gaussian kernel on real values with a cache size of 40MB and a sigma value of one. Available types for the gaussian kernel: REAL, SPARSEREAL.

• Gaussian Shift Kernel
• Histogram Kernel
• Linear $$k(x,x')=x\cdot x'$$

A linear kernel is created via:

sg('set_kernel', 'LINEAR', 'TYPE', CACHESIZE)

For example:

sg('add_kernel', 1.0, 'LINEAR', 'REAL', 50')

creates a linear kernel of cache size 50 for real datavalues, with weight 1.0.

Available types for the linear kernel: BYTE, WORD CHAR, REAL, SPARSEREAL.

• Local Alignment StringKernel
• Locality Improved StringKernel
• Polynomial Kernel $$k(x,x')=(x\cdot x')^d$$

A polynomial kernel is created via:

sg('set_kernel', 'POLY', 'TYPE', CACHESIZE, DEGREE, INHOMOGENE, NORMALIZE)

For example:

sg('add_kernel', 0.1, 'POLY', 'REAL', 50, 3, 0)

adds a polynomial kernel. Available types for the polynomial kernel: REAL, CHAR, SPARSEREAL.

• Salzberg Kernel
• Sigmoid Kernel To work with a sigmoid kernel on real values one issues:
sg('set_kernel', 'SIGMOID', 'TYPE', CACHESIZE, GAMMA, COEFF)

For example:

sg('set_kernel', 'SIGMOID', 'REAL', 40, 0.1, 0.1)

creates a sigmoid kernel on real values with a cache size of 40MB, a gamma value of 0.1 and a coefficient of 0.1. Available types for the gaussian kernel: REAL.

• Weighted Spectrum Kernel
• Weighted Degree Kernels
• Match Kernel
• Custom Kernel

Assign a user defined custom kernel, fo which only the upper triangle may be given (DIAG) or the FULL matrix (FULL), or the full matrix which is then internally stored as a upper triangle (FULL2DIAG).

• set_custom_kernel
sg('set_custom_kernel', kernelmatrix, 'DIAG|FULL|FULL2DIAG')

The purpose of the get_kernel_matrix and get_distance_matrix commands is to return a kernel or distance matrix representing the kernel/distance matrix for the actual problem.

• get_distance_matrix
[D]=sg('get_distance_matrix', 'TRAIN|TEST')
• get_kernel_matrix
[K]=sg('get_kernel_matrix', 'TRAIN|TEST')

km refers to a matrix object.

## SVM

• new_classifier Creates a new classifier (e.g. SVM instance).
• train_classifier Starts the training of the SVM on the assigned features and kernels.

The get_svm command returns some properties of an SVM such as the Langrange multipliers alpha, the bias b and the index of the support vectors SV (zero based).

• get_classifier
[bias, alphas]=sg('get_svm')
• set_classifier
sg('set_classifier', bias, alphas)

This commands returns a list of arguments. set_classifier may be later on used (after creating an SVM classifier) to set alphas and bias again.

The result of the classification of the test sample is obtained via:

• classify
[result]=sg('classify')
• classify_example
[result]=sg('classify_example', feature_vector_index)
where result is a vector containing the classification result for each datapoint and classify_example only obtains the output for a single example (index is zero based like in python. note that octave, matlab, R are 1 based).

## HMM

• get_hmm
• set_hmm
• hmm_classify
• hmm_classify_example
• hmm_likelihood
• get_viterbi_path

## POIM

• compute_poim_wd
• get_SPEC_consensus
• get_SPEC_scoring
• get_WD_consensus
• get_WD_scoring

## Utility

Miscellaneous functions.

Returns the svn version number

• help
sg('get_version')

Gives you a help text.

• help
sg('help')
• help
sg('help', 'CMD')

Sets a debugging log level - useful to trace errors.

• loglevel
sg('loglevel', 'LEVEL')
LEVEL can be one of DEBUG, WARN, ERROR
• ALL: very verbose logging output (useful only for hunting memory leaks)
• DEBUG: verbose logging output (useful for debugging).
• WARN: less logging output (useful for error search).
• ERROR: only logging output on critical errors.

For example

  > sg('loglevel', 'ALL')


gives you a list of instructions.

Let's get started, equipped with the above information on the basic SHOGUN commands you are now able to create your own SHOGUN applications.

# Example

Let us discuss an example:

• sg('set_features', 'TRAIN', traindat)
registers the training sample which reside in traindat.
• sg('set_labels', 'TRAIN', trainlab)
registers the training labels.
• sg('set_kernel', 'GAUSSIAN', 'REAL', 100, 1.0)
creates a new gaussian kernel for reals with cache size 100Mb and width = 1.
• sg('new_classifier', 'SVMLIGHT')
creates a new SVM object inside the SHOGUN core.
• sg('c', 20.0)
sets the C value of the new SVM to 20.0.
• sg('train_classifier')
attaches the data to the kernel and does some initialization then starts the training on the sample.
• sg('set_features', 'TEST', testdat)
registers the test sample
• out=sg('classify')
attaches the data to the kernel and classifies. Then gives you the classification result as a vector.

# Function Reference

SHOGUN Machine Learning Toolbox - Documentation