Implement estimators of large-scale sparse Gaussian densities

by Soumyajit De (email: heavensdevil6909@gmail.com, soumyajitde@cse.iitb.ac.in. Github: lambday)
Many many thanks to my mentor Heiko Strathmann, Sergey Lisitsyn, Sören Sonnenburg, Viktor Gal

This notebook illustrates large-scale sparse Gaussian density likelihood estimation. It first introduces the reader to the mathematical background and then shows how one can do the estimation with Shogun on a number of real-world data sets.

Theoretical introduction

Multivariate Gaussian distributions, i.e. some random vector $\mathbf{x}\in\mathbb{R}^n$ having probability density function $$p(\mathbf{x}|\boldsymbol\mu, \boldsymbol\Sigma)=(2\pi)^{-n/2}\text{det}(\boldsymbol\Sigma)^{-1/2} \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)^{T}\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)\right)$$ $\boldsymbol\mu$ being the mean vector and $\boldsymbol\Sigma$ being the covariance matrix, arise in numerous occassions involving large datasets. Computing log-likelihood in these requires computation of the log-determinant of the covariance matrix $$\mathcal{L}(\mathbf{x}|\boldsymbol\mu,\boldsymbol\Sigma)=-\frac{n}{2}\log(2\pi)-\frac{1}{2}\log(\text{det}(\boldsymbol\Sigma))-\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)^{T}\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)$$ The covariance matrix and its inverse are symmetric positive definite (spd) and are often sparse, e.g. due to conditional independence properties of Gaussian Markov Random Fields (GMRF). Therefore they can be stored efficiently even for large dimension $n$.

The usual technique for computing the log-determinant term in the likelihood expression relies on Cholesky factorization of the matrix, i.e. $\boldsymbol\Sigma=\mathbf{LL}^{T}$, ($\mathbf{L}$ is the lower triangular Cholesky factor) and then using the diagonal entries of the factor to compute $\log(\text{det}(\boldsymbol\Sigma))=2\sum_{i=1}^{n}\log(\mathbf{L}_{ii})$. However, for sparse matrices, as covariance matrices usually are, the Cholesky factors often suffer from fill-in phenomena - they turn out to be not so sparse themselves. Therefore, for large dimensions this technique becomes infeasible because of a massive memory requirement for storing all these irrelevant non-diagonal co-efficients of the factor. While ordering techniques have been developed to permute the rows and columns beforehand in order to reduce fill-in, e.g. approximate minimum degree (AMD) reordering, these techniques depend largely on the sparsity pattern and therefore not guaranteed to give better result.

Recent research shows that using a number of techniques from complex analysis, numerical linear algebra and greedy graph coloring, we can, however, approximate the log-determinant up to an arbitrary precision [Aune et. al., 2012]. The main trick lies within the observation that we can write $\log(\text{det}(\boldsymbol\Sigma))$ as $\text{trace}(\log(\boldsymbol\Sigma))$, where $\log(\boldsymbol\Sigma)$ is the matrix-logarithm. Computing the log-determinant then requires extracting the trace of the matrix-logarithm as $$\text{trace}(\log(\boldsymbol\Sigma))=\sum_{j=1}^{n}\mathbf{e}^{T}_{j}\log(\boldsymbol\Sigma)\mathbf{e}_{j}$$ where each $\mathbf{e}_{j}$ is a unit basis vector having a 1 in its $j^{\text{th}}$ position while rest are zeros and we assume that we can compute $\log(\boldsymbol\Sigma)\mathbf{e}_{j}$ (explained later). For large dimension $n$, this approach is still costly, so one needs to rely on sampling the trace. For example, using stochastic vectors we can obtain a Monte Carlo estimator for the trace - $$\text{trace}(\log(\boldsymbol\Sigma))=\mathbb{E}_{\mathbf{v}}(\mathbf{v}^{T}\log(\boldsymbol\Sigma)\mathbf{v})\approx \sum_{j=1}^{k}\mathbf{s}^{T}_{j}\log(\boldsymbol\Sigma)\mathbf{s}_{j}$$ where the source vectors ($\mathbf{s}_{j}$) have zero mean and unit variance (e.g. $\mathbf{s}_{j}\sim\mathcal{N}(\mathbf{0}, \mathbf{I}), \forall j\in[1\cdots k]$). But since this is a Monte Carlo method, we need many many samples to get sufficiently accurate approximation. However, by a method suggested in Aune et. al., we can reduce the number of samples required drastically by using probing-vectors that are obtained from coloring of the adjacency graph represented by the power of the sparse-matrix, $\boldsymbol\Sigma^{p}$, i.e. we can obtain - $$\mathbb{E}_{\mathbf{v}}(\mathbf{v}^{T}\log(\boldsymbol\Sigma)\mathbf{v})\approx \sum_{j=1}^{m}\mathbf{w}^{T}_{j}\log(\boldsymbol\Sigma)\mathbf{w}_{j}$$ with $m\ll n$, where $m$ is the number of colors used in the graph coloring. For a particular color $j$, the probing vector $\mathbb{w}_{j}$ is obtained by filling with $+1$ or $-1$ uniformly randomly for entries corresponding to nodes of the graph colored with $j$, keeping the rest of the entries as zeros. Since the matrix is sparse, the number of colors used is usually very small compared to the dimension $n$, promising the advantage of this approach.

There are two main issues in this technique. First, computing $\boldsymbol\Sigma^{p}$ is computationally costly, but experiments show that directly applying a d-distance coloring algorithm on the sparse matrix itself also results in a pretty good approximation. Second, computing the exact matrix-logarithm is often infeasible because its is not guaranteed to be sparse. Aune et. al. suggested that we can rely on rational approximation of the matrix-logarithm times vector using an approach described in Hale et. al [2008], i.e. writing $\log(\boldsymbol\Sigma)\mathbf{w}_{j}$ in our desired expression using Cauchy's integral formula as - $$log(\boldsymbol\Sigma)\mathbf{w}_{j}=\frac{1}{2\pi i}\oint_{\Gamma}log(z)(z\mathbf{I}-\boldsymbol\Sigma)^{-1}\mathbf{w}_{j}dz\approx \frac{-8K(\lambda_{m}\lambda_{M})^{\frac{1}{4}}}{k\pi N} \boldsymbol\Sigma\Im\left(-\sum_{l=1}^{N}\alpha_{l}(\boldsymbol\Sigma-\sigma_{l}\mathbf{I})^{-1}\mathbf{w}_{j}\right)$$ $K$, $k \in \mathbb{R}$, $\alpha_{l}$, $\sigma_{l} \in \mathbb{C}$ are coming from Jacobi elliptic functions, $\lambda_{m}$ and $\lambda_{M}$ are the minimum/maximum eigenvalues of $\boldsymbol\Sigma$ (they have to be real-positive), respectively, $N$ is the number of contour points in the quadrature rule of the above integral and $\Im(\mathbf{x})$ represents the imaginary part of $\mathbf{x}\in\mathbb{C}^{n}$.

The problem then finally boils down to solving the shifted family of linear systems $(\boldsymbol\Sigma-\sigma_{l}\mathbf{I})\mathbb{x}_{j}=\mathbb{w}_{j}$. Since $\boldsymbol\Sigma$ is sparse, matrix-vector product is not much costly and therefore these systems can be solved with a low memory-requirement using Krylov subspace iterative solvers like Conjugate Gradient (CG). Since the shifted matrices have complex entries along their diagonal, the appropriate method to choose is Conjugate Orthogonal Conjugate Gradient (COCG) [H.A. van der Vorst et. al., 1990.]. Alternatively, these systems can be solved at once using CG-M [Jegerlehner, 1996.] solver which solves for $(\mathbf{A}+\sigma\mathbf{I})\mathbf{x}=\mathbf{b}$ for all values of $\sigma$ using as many matrix-vector products in the CG-iterations as required to solve for one single shifted system. This algorithm shows reliable convergance behavior for systems with reasonable condition number.

One interesting property of this approach is that once the graph coloring information and shifts/weights are known, all the computation components - solving linear systems, computing final vector-vector product - are independently computable. Therefore, computation can be speeded up using parallel computation of these. To use this, a computation framework for Shogun is developed and the whole log-det computation works on top of it.

An example of using this approach in Shogun

We demonstrate the usage of this technique to estimate log-determinant of a real-valued spd sparse matrix with dimension $715,176\times 715,176$ with $4,817,870$ non-zero entries, apache2, which is obtained from the The University of Florida Sparse Matrix Collection. Cholesky factorization with AMD for this sparse-matrix gives rise to factors with $353,843,716$ non-zero entries (from source). We use CG-M solver to solve the shifted systems which is then used with SerialComputationEngine to perform the computation jobs sequentially on a single core, that works even on a normal Desktop machine. Since the original matrix is badly conditioned, here we added a ridge along its diagonal to reduce the condition number so that the CG-M solver converges within reasonable time. Please note that for high condition number, the number of iteration has to be set very high.

In []:
%matplotlib inline
from scipy.sparse import eye
from scipy.io import mmread
from matplotlib import pyplot as plt

matFile='../../../data/logdet/apache2.mtx.gz'
M = mmread(matFile)
rows = M.shape[0]
cols = M.shape[1]
A = M + eye(rows, cols) * 10000.0
plt.title("A")
plt.spy(A, precision = 1e-2, marker = '.', markersize = 0.01)
plt.show()

First, to keep the notion of Krylov subspace, we view the matrix as a linear operator that applies on a vector, resulting a new vector. We use RealSparseMatrixOperator that is suitable for this example. All the solvers work with LinearOperator type objects. For computing the eigenvalues, we use LanczosEigenSolver class. Although computation of the Eigenvalues is done internally within the log-determinant estimator itself (see below), here we explicitely precompute them.

In []:
from modshogun import RealSparseMatrixOperator, LanczosEigenSolver

op = RealSparseMatrixOperator(A.tocsc())

# Lanczos iterative Eigensolver to compute the min/max Eigenvalues which is required to compute the shifts
eigen_solver = LanczosEigenSolver(op)
# we set the iteration limit high to compute the eigenvalues more accurately, default iteration limit is 1000
eigen_solver.set_max_iteration_limit(2000)

# computing the eigenvalues
eigen_solver.compute()
print 'Minimum Eigenvalue:', eigen_solver.get_min_eigenvalue()
print 'Maximum Eigenvalue:', eigen_solver.get_max_eigenvalue()
Minimum Eigenvalue: 10000.0911301
Maximum Eigenvalue: 101873.055398

Next, we use ProbingSampler class which uses an external library ColPack. Again, the number of colors used is precomputed for demonstration purpose, although computed internally inside the log-determinant estimator.

In []:
# We can specify the power of the sparse-matrix that is to be used for coloring, default values will apply a
# 2-distance greedy graph coloring algorithm on the sparse-matrix itself. Matrix-power, if specified, is computed in O(lg p)
from modshogun import ProbingSampler

trace_sampler = ProbingSampler(op)
# apply the graph coloring algorithm and generate the number of colors, i.e. number of trace samples
trace_sampler.precompute()
print 'Number of colors used:', trace_sampler.get_num_samples()
Number of colors used: 13

This corresponds to averaging over 13 source vectors rather than one (but has much lower variance as using 13 Gaussian source vectors). A comparison between the convergence behavior of using probing sampler and Gaussian sampler is presented later.

Then we define LogRationalApproximationCGM operator function class, which internally uses the Eigensolver to compute the Eigenvalues, uses JacobiEllipticFunctions to compute the complex shifts, weights and the constant multiplier in the rational approximation expression, takes the probing vector generated by the trace sampler and submits a computation job to the engine which then uses CG-M solver (CGMShiftedFamilySolver) to solve the shifted systems. Precompute is not necessary here too.

In []:
from modshogun import SerialComputationEngine, CGMShiftedFamilySolver, LogRationalApproximationCGM

engine = SerialComputationEngine()
cgm = CGMShiftedFamilySolver()
# setting the iteration limit (set this to higher value for higher condition number)
cgm.set_iteration_limit(100)

# accuracy determines the number of contour points in the rational approximation (i.e. number of shifts in the systems)
accuracy = 1E-15

# we create a operator-log-function using the sparse matrix operator that uses CG-M to solve the shifted systems
op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, accuracy)
op_func.precompute()
print 'Number of shifts:', op_func.get_num_shifts()
Number of shifts: 21

Finally, we use the LogDetEstimator class to sample the log-determinant of the matrix.

In []:
from modshogun import LogDetEstimator

# number of log-det samples (use a higher number to get better estimates)
# (this is 5 times number of colors estimate in practice, so usually 1 probing estimate is enough)
num_samples = 5

log_det_estimator = LogDetEstimator(trace_sampler, op_func, engine)
estimates = log_det_estimator.sample(num_samples)

estimated_logdet = mean(estimates)
print 'Estimated log(det(A)):', estimated_logdet
Estimated log(det(A)): 7120352.91945

To verify the accuracy of the estimate, we compute exact log-determinant of A using Cholesky factorization using Statistics::log_det method.

In []:
# the following method requires massive amount of memory, for demonstration purpose
# the following code is commented out and direct value obtained from running it once is used

# from modshogun import Statistics
# actual_logdet = Statistics.log_det(A)

actual_logdet = 7120357.73878
print 'Actual log(det(A)):', actual_logdet

plt.hist(estimates)
plt.plot([actual_logdet, actual_logdet], [0,len(estimates)], linewidth=3)
plt.show()
Actual log(det(A)): 7120357.73878

Statistics

We use a smaller sparse-matrix, 'west0479' in this section to demonstrate the benefits of using probing vectors over standard Gaussian vectors to sample the trace of matrix-logarithm. In the following we can easily observe the fill-in phenomena described earlier. Again, a ridge has been added to reduce the runtime for demonstration purpose.

In []:
from scipy.sparse import csc_matrix

m = mmread('../../../data/logdet/west0479.mtx')
# computing a spd with added ridge
B = csc_matrix(m.transpose() * m + identity(m.shape[0]) * 1000.0)

fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1,2,1)
ax.set_title('B')
ax.spy(B, precision = 1e-5, marker = '.', markersize = 2.0)

ax = fig.add_subplot(1,2,2)
ax.set_title('lower Cholesky factor')
dense_matrix = B.todense()
L = cholesky(dense_matrix)
ax.spy(csc_matrix(L), precision = 1e-5, marker = '.', markersize = 2.0)

plt.show()
In []:
op = RealSparseMatrixOperator(B)
eigen_solver = LanczosEigenSolver(op)

# computing log-det estimates using probing sampler
probing_sampler = ProbingSampler(op)
cgm.set_iteration_limit(500)

op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, 1E-5)
log_det_estimator = LogDetEstimator(probing_sampler, op_func, engine)
num_probing_estimates = 100
probing_estimates = log_det_estimator.sample(num_probing_estimates)

# computing log-det estimates using Gaussian sampler
from modshogun import NormalSampler, Statistics

num_colors = probing_sampler.get_num_samples()
normal_sampler = NormalSampler(op.get_dimension())
log_det_estimator = LogDetEstimator(normal_sampler, op_func, engine)
num_normal_estimates = num_probing_estimates * num_colors
normal_estimates = log_det_estimator.sample(num_normal_estimates)

# average in groups of n_effective_samples
effective_estimates_normal = zeros(num_probing_estimates)
for i in range(num_probing_estimates):
    idx = i * num_colors
    effective_estimates_normal[i] = mean(normal_estimates[idx:(idx + num_colors)])

actual_logdet = Statistics.log_det(B)
print 'Actual log(det(B)):', actual_logdet
print 'Estimated log(det(B)) using probing sampler:', mean(probing_estimates)
print 'Estimated log(det(B)) using Gaussian sampler:', mean(effective_estimates_normal)
print 'Variance using probing sampler:',var(probing_estimates)
print 'Variance using Gaussian sampler:',var(effective_estimates_normal)
Actual log(det(B)): 3714.76908074
Estimated log(det(B)) using probing sampler: 3714.7711237
Estimated log(det(B)) using Gaussian sampler: 3716.63636888
Variance using probing sampler: 0.00152887211403
Variance using Gaussian sampler: 1093.20202234

In []:
fig = plt.figure(figsize=(15, 4))
ax = fig.add_subplot(1,3,1)
ax.set_title('Probing sampler')
ax.plot(cumsum(probing_estimates)/(arange(len(probing_estimates))+1))
ax.plot([0,len(probing_estimates)], [actual_logdet, actual_logdet])
ax.legend(["Probing", "True"])

ax = fig.add_subplot(1,3,2)
ax.set_title('Gaussian sampler')
ax.plot(cumsum(effective_estimates_normal)/(arange(len(effective_estimates_normal))+1))
ax.plot([0,len(probing_estimates)], [actual_logdet, actual_logdet])
ax.legend(["Gaussian", "True"])

ax = fig.add_subplot(1,3,3)
ax.hist(probing_estimates)
ax.hist(effective_estimates_normal)
ax.plot([actual_logdet, actual_logdet], [0,len(probing_estimates)], linewidth=3)
plt.show()

A motivational example - likelihood of the Ozone dataset

In Girolami et. al. (2013), an interesting scenario is discussed where the log-likelihood of a model involving large spatial dataset is considered. The data, collected by a satellite consists of $N=173,405$ ozone measurements around the globe. The data is modelled using three stage hierarchical way - $$y_{i}|\mathbf{x},\kappa,\tau\sim\mathcal{N}(\mathbf{Ax},\tau^{−1}\mathbf{I})$$ $$\mathbf{x}|\kappa\sim\mathcal{N}(\mathbf{0}, \mathbf{Q}(\kappa))$$ $$\kappa\sim\log_{2}\mathcal{N}(0, 100), \tau\sim\log_{2}\mathcal{N}(0, 100)$$ Where the precision matrix, $\mathbf{Q}$, of a Matern SPDE model, defined on a fixed traingulation of the globe, is sparse and the parameter $\kappa$ controls for the range at which correlations in the field are effectively zero (see Girolami et. al. for details). The log-likelihood estiamate of the posterior using this model is $$2\mathcal{L}=2\log \pi(\mathbf{y}|\kappa,\tau)=C+\log(\text{det}(\mathbf{Q}(\kappa)))+N\log(\tau)−\log(\text{det}(\mathbf{Q}(\kappa)+\tau \mathbf{A}^{T}\mathbf{A}))− \tau\mathbf{y}^{T}\mathbf{y}+\tau^{2}\mathbf{y}^{T}\mathbf{A}(\mathbf{Q}(\kappa)+\tau\mathbf{A}^{T}\mathbf{A})^{−1}\mathbf{A}^{T}\mathbf{y}$$ In the expression, we have two terms involving log-determinant of large sparse matrices. The rational approximation approach described in the previous section can readily be applicable to estimate the log-likelihood. The following computation shows the usage of Shogun's log-determinant estimator for estimating this likelihood (code has been adapted from an open source library, ozone-roulette, written by Heiko Strathmann, one of the authors of the original paper).

Please note that we again added a ridge along the diagonal for faster execution of this example. Since the original matrix is badly conditioned, one needs to set the iteration limits very high for both the Eigen solver and the linear solver in absense of precondioning.

In []:
from scipy.io import loadmat

def get_Q_y_A(kappa):
    # read the ozone data and create the matrix Q
    ozone = loadmat('../../../data/logdet/ozone_data.mat')
    GiCG = ozone["GiCG"]
    G = ozone["G"]
    C0 = ozone["C0"]
    kappa = 13.1
    Q = GiCG + 2 * (kappa ** 2) * G + (kappa ** 4) * C0
    # also, added a ridge here
    Q = Q + eye(Q.shape[0], Q.shape[1]) * 10000.0
    
    plt.spy(Q, precision = 1e-5, marker = '.', markersize = 1.0)
    plt.show()
    
    # read y and A
    y = ozone["y_ozone"]
    A = ozone["A"]
    return Q, y, A
    
def log_det(A):
    op = RealSparseMatrixOperator(A)
    engine = SerialComputationEngine()
    eigen_solver = LanczosEigenSolver(op)
    probing_sampler = ProbingSampler(op)
    cgm = CGMShiftedFamilySolver()
    cgm.set_iteration_limit(100)
    op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, 1E-5)
    log_det_estimator = LogDetEstimator(probing_sampler, op_func, engine)
    num_estimates = 1
    return mean(log_det_estimator.sample(num_estimates))

def log_likelihood(tau, kappa):
    Q, y, A = get_Q_y_A(kappa)
    n = len(y);
    AtA = A.T.dot(A)
    M = Q + tau * AtA;

    # Computing log-determinants")
    logdet1 = log_det(Q)
    logdet2 = log_det(M)
        
    first = 0.5 * logdet1 + 0.5 * n * log(tau) - 0.5 * logdet2
    
    # computing the rest of the likelihood
    second_a = -0.5 * tau * (y.T.dot(y))
    
    second_b = array(A.T.dot(y))
    from scipy.sparse.linalg import spsolve
    second_b = spsolve(M, second_b)
    second_b = A.dot(second_b)
    second_b = y.T.dot(second_b)
    second_b = 0.5 * (tau ** 2) * second_b
        
    log_det_part = first
    quadratic_part = second_a + second_b
    const_part = -0.5 * n * log(2 * pi)
      
    log_marignal_lik = const_part + log_det_part + quadratic_part

    return log_marignal_lik

L = log_likelihood(1.0, 15.0)
print 'Log-likelihood estimate:', L
Log-likelihood estimate: [[ -2.67019051e+08]]

Useful components

As a part of the implementation of log-determinant estimator, a number of classes have been developed, which may come useful for several other occassions as well.

1. Linear Operators

All the linear solvers and Eigen solvers work with linear operators. Both real valued and complex valued operators are supported for dense/sparse matrix linear operators.

In []:
from modshogun import RealSparseMatrixOperator, ComplexDenseMatrixOperator

dim = 5
random.seed(10)
# create a random valued sparse matrix linear operator
A = csc_matrix(random.randn(dim, dim))
op = RealSparseMatrixOperator(A)

# creating a random vector
random.seed(1)
b = array(random.randn(dim))
v = op.apply(b)
print 'A.apply(b)=',v

# create a dense matrix linear operator
B = array(random.randn(dim, dim)).astype(complex)
op = ComplexDenseMatrixOperator(B)
print 'Dimension:', op.get_dimension()
A.apply(b)= [ 0.18560129  0.07483214 -1.98168711 -0.0882416   3.12636877]
Dimension: 5

2. Linear Solvers

Conjugate Gradient based iterative solvers, that construct the Krylov subspace in their iteration by computing matrix-vector products are most useful for solving sparse linear systems. Here is an overview of CG based solvers that are currently available in Shogun.

Conjugate Gradient Solver

This solver solves for system $\mathbf{Qx}=\mathbf{y}$, where $\mathbf{Q}$ is real-valued spd linear operator (e.g. dense/sparse matrix operator), and $\mathbf{y}$ is real vector.

In []:
from scipy.sparse import csc_matrix
from scipy.sparse import identity
from modshogun import ConjugateGradientSolver

# creating a random spd matrix
dim = 5
random.seed(10)
m = csc_matrix(random.randn(dim, dim))
a = m.transpose() * m + identity(dim)
Q = RealSparseMatrixOperator(a)

# creating a random vector
y = array(random.randn(dim))

# solve the system Qx=y
# the argument is set as True to gather convergence statistics (default is False)
cg = ConjugateGradientSolver(True)
cg.set_iteration_limit(20)
x = cg.solve(Q,y)

print 'x:',x

# verifying the result
print 'y:', y
print 'Qx:', Q.apply(x)

residuals = cg.get_residuals()
plt.plot(residuals)
plt.show()
x: [ 0.96628814  0.15439743  1.27764984  0.3086872   0.80865761]
y: [ 1.67262221  0.09914922  1.39799638 -0.27124799  0.61320418]
Qx: [ 1.67262221  0.09914922  1.39799638 -0.27124799  0.61320418]

Conjugate Orthogonal CG Solver

Solves for systems $\mathbf{Qx}=\mathbf{z}$, where $\mathbf{Q}$ is symmetric but non-Hermitian (i.e. having complex entries in its diagonal) and $\mathbf{z}$ is real valued vector.

In []:
from modshogun import ComplexSparseMatrixOperator
from modshogun import ConjugateOrthogonalCGSolver

# creating a random spd matrix
dim = 5
random.seed(10)
m = csc_matrix(random.randn(dim, dim))
a = m.transpose() * m + identity(dim)
a = a.astype(complex)

# adding a complex entry along the diagonal
for i in range(0, dim):
    a[i,i] += complex(random.randn(), random.randn())
Q = ComplexSparseMatrixOperator(a)

z = array(random.randn(dim))
# solve for the system Qx=z
cocg = ConjugateOrthogonalCGSolver(True)
cocg.set_iteration_limit(20)
x = cocg.solve(Q, z)

print 'x:',x

# verifying the result
print 'z:',z
print 'Qx:',real(Q.apply(x))

residuals = cocg.get_residuals()
plt.plot(residuals)
plt.show()
x: [-0.01923743+0.03897861j  0.17352470+0.00042604j -0.03218531+0.05099476j
  0.24848241-0.00705049j -0.27111294+0.14079509j]
z: [ 0.19501328  0.40020999 -0.33763234  1.25647226 -0.7319695 ]
Qx: [ 0.19501328  0.40020999 -0.33763234  1.25647226 -0.7319695 ]

CG-M Shifted Family Solver

Solves for systems with real valued spd matrices with complex shifts. For using it with log-det, an option to specify the weight of each solution is also there. The solve_shifted_weighted method returns $\sum\alpha_{l}\mathbf{x}_{l}$ where $\mathbf{x}_{l}=(\mathbf{A}+\sigma_{l}\mathbf{I})^{-1}\mathbf{y}$, $\sigma,\alpha\in\mathbb{C}$, $\mathbf{y}\in\mathbb{R}$.

In []:
from modshogun import CGMShiftedFamilySolver

cgm = CGMShiftedFamilySolver()

# creating a random spd matrix
dim = 5
random.seed(10)
m = csc_matrix(random.randn(dim, dim))
a = m.transpose() * m + identity(dim)
Q = RealSparseMatrixOperator(a)

# creating a random vector
v = array(random.randn(dim))

# number of shifts (will be equal to the number of contour points)
num_shifts = 3;

# generating some random shifts
shifts = []
for i in range(0, num_shifts):
    shifts.append(complex(random.randn(), random.randn()))
sigma = array(shifts)
print 'Shifts:', sigma

# generating some random weights
weights = []
for i in range(0, num_shifts):
    weights.append(complex(random.randn(), random.randn()))
alpha = array(weights)
print 'Weights:',alpha

# solve for the systems
cgm = CGMShiftedFamilySolver(True)
cgm.set_iteration_limit(20)
x = cgm.solve_shifted_weighted(Q, v, sigma, alpha)

print 'x:',x

residuals = cgm.get_residuals()
plt.plot(residuals)
plt.show()

# verifying the result with cocg
x_s = array([0+0j] * dim)
for i in range(0, num_shifts):
    a_s = a.astype(complex)
    for j in range(0, dim):
        # moving the complex shift inside the operator
        a_s[j,j] += sigma[i]
    Q_s = ComplexSparseMatrixOperator(a_s)
    # multiplying the result with weight
    x_s += alpha[i] * cocg.solve(Q_s, v)
print 'x\':', x_s
Shifts: [-0.26731719-0.54930901j  0.13270830-0.47614201j  1.30847308+0.19501328j]
Weights: [ 0.40020999-0.33763234j  1.25647226-0.7319695j   0.66023155-0.35087189j]
x: [ 2.01298610-0.52721044j  0.28177672+0.05017587j  2.61925277-0.5518316j
  0.61180933-0.0593927j   1.66181787-0.35116048j]

x': [ 2.01298610-0.52721044j  0.28177672+0.05017587j  2.61925277-0.5518316j
  0.61180933-0.0593927j   1.66181787-0.35116048j]

Apart from iterative solvers, a few more triangular solvers are added.

Direct Sparse Linear Solver

This uses sparse Cholesky to solve for linear systems $\mathbf{Qx}=\mathbf{y}$, where $\mathbf{Q}$ is real-valued spd linear operator (e.g. dense/sparse matrix operator), and $\mathbf{y}$ is real vector.

In []:
from modshogun import DirectSparseLinearSolver

# creating a random spd matrix
dim = 5
random.seed(10)
m = csc_matrix(random.randn(dim, dim))
a = m.transpose() * m + identity(dim)
Q = RealSparseMatrixOperator(a)

# creating a random vector
y = array(random.randn(dim))

# solve the system Qx=y
chol = DirectSparseLinearSolver()
x = chol.solve(Q,y)

print 'x:',x

# verifying the result
print 'y:', y
print 'Qx:', Q.apply(x)
x: [ 0.96628814  0.15439743  1.27764984  0.3086872   0.80865761]
y: [ 1.67262221  0.09914922  1.39799638 -0.27124799  0.61320418]
Qx: [ 1.67262221  0.09914922  1.39799638 -0.27124799  0.61320418]

Direct Linear Solver for Complex

This solves linear systems $\mathbf{Qx}=\mathbf{y}$, where $\mathbf{Q}$ is complex-valued dense matrix linear operator, and $\mathbf{y}$ is real vector.

In []:
from modshogun import DirectLinearSolverComplex

# creating a random spd matrix
dim = 5
random.seed(10)
m = array(random.randn(dim, dim))
a = m.transpose() * m + identity(dim)
a = a.astype(complex)

# adding a complex entry along the diagonal
for i in range(0, dim):
    a[i,i] += complex(random.randn(), random.randn())
Q = ComplexDenseMatrixOperator(a)

z = array(random.randn(dim))
# solve for the system Qx=z
solver = DirectLinearSolverComplex()
x = solver.solve(Q, z)

print 'x:',x

# verifying the result
print 'z:',z
print 'Qx:',real(Q.apply(x))
x: [-0.01374551-0.05604295j  0.18872937+0.03527069j -0.15467908-0.01599583j
  0.27729300-0.21607283j -0.19961759-0.20928082j]
z: [ 0.19501328  0.40020999 -0.33763234  1.25647226 -0.7319695 ]
Qx: [ 0.19501328  0.40020999 -0.33763234  1.25647226 -0.7319695 ]

References

  1. Erlend Aune, Daniel P. Simpson, Jo Eidsvik, Parameter estimation in high dimensional Gaussian distributions. Springer Statistics and Computing, December 2012.
  2. Nicholas Hale, Nicholas J. Higham and Lloyd N. Trefethen, Computing $A^{\alpha}$, $\log(A)$ and Related Matrix Functions by Contour Integrals, MIMS EPrint: 2007.103
  3. H.A. van der Vorst, A Petrov-Galerkin Type Method for Solving $\mathbf{Ax}=\mathbf{b}$ Where $\mathbf{A}$ Is Symmetric Complex, IEEE TRANSACTIONS ON MAGNETICS, VOL. 26, NO. 2, MARCH 1990
  4. B. Jegerlehner, Krylov space solvers for shifted linear systems, HEP-LAT heplat/9612014, 1996
  5. Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, Playing Russian Roulette with Intractable Likelihoods,arXiv:1306.4032 June 2013