SHOGUN  6.0.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules
SubsequenceStringKernel.h File Reference

Go to the source code of this file.


class  CSubsequenceStringKernel
 class SubsequenceStringKernel that implements String Subsequence Kernel (SSK) discussed by Lodhi et. al.[1]. A subsequence is any ordered sequence of \(n\) characters occurring in the text, though not necessarily contiguous. More formally, string \(u\) is a subsequence of string \(s\), iff there exists indices \(\mathbf{i}=(i_{1},\dots,i_{|u|})\), with \(1\le i_{1} \le \cdots \le i_{|u|} \le |s|\), such that \(u_{j}=s_{i_{j}}\) for \(j=1,\dots,|u|\), written as \(u=s[\mathbf{i}]\). The feature mapping \(\phi\) in this scenario is given by

\[ \phi_{u}(s)=\sum_{\mathbf{i}:u=s[\mathbf{i}]}\lambda^{l(\mathbf{i})} \]

for some \(lambda\le 1\), where \(l(\mathbf{i})\) is the length of the subsequence in \(s\), given by \(i_{|u|}-i_{1}+1\). The kernel here is an inner product in the feature space generated by all subsequences of length \(n\).

\[ K_{n}(s,t)=\sum_{u\in\Sigma^{n}}\langle \phi_{u}(s), \phi_{u}(t)\rangle = \sum_{u\in\Sigma^{n}}\sum_{\mathbf{i}:u=s[\mathbf{i}]} \sum_{\mathbf{j}:u=t[\mathbf{j}]}\lambda^{l(\mathbf{i})+l(\mathbf{j})} \]

Since the subsequences are weighted by the exponentially decaying factor \(\lambda\) of their full length in the text, more weight is given to those occurrences that are nearly contiguous. A direct computation is infeasible since the dimension of the feature space grows exponentially with \(n\). The paper describes an efficient computation approach using a dynamic programming technique. More...


SHOGUN Machine Learning Toolbox - Documentation