Shogun - A Large Scale Machine Learning Toolbox
This is the official homepage of the SHOGUN machine learning toolbox.
|
The machine learning toolbox's focus is on large scale kernel methods and
especially on Support Vector Machines (SVM) . It provides a generic SVM
object interfacing to several different SVM implementations, among them the
state of the art OCAS , Liblinear ,
LibSVM , SVMLight,
SVMLin and GPDT . Each of the SVMs can be
combined with a variety of kernels. The toolbox not only provides efficient
implementations of the most common kernels, like the Linear, Polynomial,
Gaussian and Sigmoid Kernel but also comes with a number of recent string
kernels as e.g. the Locality Improved , Fischer , TOP , Spectrum ,
Weighted Degree Kernel (with shifts) . For the latter the efficient
LINADD optimizations are implemented. For linear SVMs the COFFIN framework allows for on-demand computing feature spaces on-the-fly,
even allowing to mix sparse, dense and other data types.
Furthermore, SHOGUN offers the freedom of
working with custom pre-computed kernels. One of its key features is the
combined kernel which can be constructed by a weighted linear combination
of a number of sub-kernels, each of which not necessarily working on the same
domain. An optimal sub-kernel weighting can be learned using
Multiple Kernel Learning .
Currently SVM one-class, 2-class and multiclass classification and regression problems can be dealt
with. However SHOGUN also implements a number of linear methods like Linear
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
Perceptrons and features algorithms to train hidden markov models.
The input feature-objects can be dense, sparse or strings and
of type int/short/double/char and can be converted into different feature types.
Chains of preprocessors (e.g. substracting the mean) can be attached to
each feature object allowing for on-the-fly pre-processing.
SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python and is proudly released as Machine Learning Open Source Software.
|
We have successfully used this toolbox to tackle the following sequence
analysis problems: Protein Super Family classification,
Splice Site Prediction , Interpreting the SVM Classifier ,
Splice Form Prediction , Alternative Splicing and Promotor
Prediction . Some of them come with no less than 10
million training examples, others with 7 billion test examples.
| Except for SVMLight
which is (C) Torsten Joachims and follows a different licensing scheme
(cf. LICENSE.SVMLight in the tar achive) SHOGUN is licensed under the
GPL version 3 or any later version (cf. LICENSE). |
 |
|
If you use SHOGUN in your research you are kindly asked to cite the following paper:
Soeren Sonnenburg, Gunnar Raetsch, Sebastian Henschel, Christian Widmer, Jonas Behr, Alexander Zien,
Fabio de Bona, Alexander Binder, Christian Gehl, and Vojtech Franc.
The SHOGUN Machine Learning Toolbox. Journal of Machine Learning Research, 11:1799-1802, June 2010.
|
SHOGUN Version 0.9.3 (libshogun 8.0, libshogunui 5.0) (updated 31.05.2010)
Older Versions
|
This release contains several enhancements, cleanups and bugfixes:
Features:
- Experimental lp-norm MCMKL
- New Kernels: SpectrumRBFKernelRBF, SpectrumMismatchRBFKernel, WeightedDegreeRBFKernel
- WDK kernel supports amino acids
- String Features now support append operations (and creation of
- python-dbg support
- Allow floats as input for custom kernel (and matrices > 4GB in size)
Bugfixes:
- Static linking fix.
- Fix sparse linear kernel's add_to_normal
Cleanup and API Changes:
- Remove init() function in Performance Measures
- Adjust .so suffix for python and use python distutils to figure out install paths
|
|
We use Doxygen for both user and developer documentation which may be read online here.
More than 600 documented examples for the interfaces python_modular, octave_modular, r_modular, static python, static matlab and octave, static r, static command line and C++ libshogun developer interface
can be found in the
online documentation.
In addition, examples are shipped in the examples/(un)documented/[interface]
directory in the source code (where interface is one of r, octave, matlab,
python, python_modular, r_modular, octave_modular, cmdline, libshogun).
|
|
English
|
|
Chinese
|
|
Note that documentation for python-modular is most complete and also that python's help function will show the documentation when working interactively:
$ python
Python 2.4.4 (#2, Jan 3 2008, 13:36:28)
[GCC 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shogun.Classifier import SVM
>>> help(SVM)
class SVM(CSVM)
| Method resolution order:
| SVM
| CSVM
| CKernelMachine
| Classifier
| SGObject
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, kernel, alphas, support_vectors, b)
[...]
|
Below we provide some of the (in the meantime outdated)
examples that were used to carry out experiments
for a number of publications. Note that more than 600 examples and updated
versions of all of these can also be found in the source code and in the
online documentation.
|
Click on the corresponding link to see classification and regression examples for Matlab(tm), R, Octave or Python:
|
Below one finds some Bioinformatics examples (for octave and matlab) as presented at BOSC 2006:
|
Multiple Kernel Learning examples (JMLR 2006 paper "Large Scale Multiple Kernel Learning"):
|
|
|
|
|
|
|
In case you find bugs or have feature requests, file them using the SHOGUN-TRAC bug tracking system.
We are coordinating development (milestones, roadmap) using trac. Also if you would like to browse syntax hilighted source from svn, just have a look.
In case of comments, problems, questions, bug-reports etc. please use the mailing list (subscription required)
In case you need to directly get in touch with us, feel free to contact
|
Want to contribute ? We maintain SHOGUNs source code via SVN
The authors gratefully acknowledge the support of DFG grant MU 987/2-1, MU 987/6-1, RA-1894/1-1 and the PASCAL Network of Excellence.
|