Blog Post Series of Numerical Analysis 3: Iterative Methods for Large Linear Systems

14 minute read

Published: December 26, 2024

This is the third blog post of the series on Numerical Analysis. In this post, we will discuss Iterative Methods for Large Linear Systems. The post will cover the theoretical foundations of iterative methods, including matrix splitting methods, Jacobi and Gauss-Seidel iterations, and the Conjugate Gradient method. We will also explore the convergence analysis of these methods and their practical implementation in Python. The post will conclude with a discussion of real-world applications and future directions in the field of iterative methods.

Introduction

Why do mathematicians care about iterative methods? The story begins with a fundamental limitation of direct methods. Consider solving a 3D Poisson equation on a $100\times 100\times 100$ grid. The resulting linear system has one million unknowns. Using Gaussian elimination would require approximately $10^18$ operations - even with a computer performing a trillion operations per second, this would take months. Moreover, storing the full matrix would need $8$ terabytes of memory.

This computational barrier isn’t merely an engineering inconvenience - it represents a fundamental mathematical challenge that has driven the development of iterative methods. The beauty of these methods lies in their ability to transform an impossible computation into a tractable one by exploiting the mathematical structure of the problem. Iterative methods are the cornerstone of modern scientific computing, enabling the solution of problems that would otherwise be infeasible.

The mathematical elegance emerges from three key insights:

Many physical problems naturally suggest iteration (think of heat diffusion)
The connection between iteration and polynomial approximation reveals deep theoretical structures
The interplay between matrix properties and convergence behavior leads to rich mathematical theory

Analysis

Mathematical Framework

Matrix Splitting Methods

Consider splitting matrix $A$ into $A = M - N$, where $M$ is easily invertible. The linear system $Ax = b$ becomes: $\begin{aligned} (M-N)x &= b \\ Mx &= Nx + b \\ x &= M^{-1}Nx + M^{-1}b \end{aligned}$

This lead to the following iteration: $x_{k+1} = M^{-1}Nx_{k} + M^{-1}b$

Jacobi Method Derivation

For the Jacobi method, we split $A$ into:

$D$ is the diagonal of $A$
$L$ is the strictly lower triangular part of $A$
$U$ is the strictly upper triangular part of $A$

Such that $A = D - L - U$. The Jacobi iteration is then: $Dx = (L+U)x + b\\ x = D^{-1}(L+U)x + D^{-1}b$

This leads to the Jacobi iteration: $B_j = D^{-1}(L+U)\\ x_{k+1} = B_jx_{k} + D^{-1}b$

The python code for the Jacobi method is as follows:

def jacobi_iteration(A, b, x0, max_iter=1000, tol=1e-6):
    n = len(A)
    x = x0.copy()
    D = np.diag(A)
    R = A - np.diagflat(D)
    
    for k in range(max_iter):
        x_new = (b - R @ x) / D
        error = np.linalg.norm(x_new - x) / np.linalg.norm(x_new)
        if error < tol:
            return x_new
        x = x_new
    return x

Convergence Analysis

The method converges when the spectral radius of the iteration matrix is less than $1$: $\rho(B_j) = \max_{\lambda \in \Lambda(B_j)}|\lambda| < 1$ For strictly diagonally dominant matrices, the Jacobi method converges. The spectral radius of the iteration matrix is: $\rho(B_j) = \max_{i} \left|1 - \sum_{j \neq i} \frac{a_{ij}}{a_{ii}}\right| < 1$ The convergence rate is linear, with the error bound: $\|e_{k+1}\| \leq \|B_j\| \|e_k\|$ This theoretical framework explains why Jacobi method converges linearly for diagonally dominant matrices, with each iteration reducing the error by a factor of $|B_j|$.

Conjugate Gradient Method

For symmetric positive definite matrices, the Conjugate Gradient (CG) method is a powerful iterative method.

Derivation from Minimization Principle

For symmetric positive definite $A$, solving $Ax = b$ is equivalent to minimizing the quadratic functional: $f(x) = \frac{1}{2}x^TAx - b^Tx$ where:

The gradient is $Ax - b$
The Hessian is $A$
The minimum is at the solution $x$
The minimization problem is equivalent to solving $Ax = b$

The gradient is: $\nabla f(x) = Ax - b = -r(x)$ where $r(x)$ is the residual. The CG method minimizes $f(x)$ by iteratively minimizing the residual in conjugate directions.

Conjugate Directions

The CG method constructs orthogonal search directions {pₖ} such that: $p_i^TAp_j = 0 \quad \text{for} \quad i \neq j$ The iteration proceeds by minimizing the residual in these directions: $x_{k+1} = x_k + \alpha_kp_k$ where $\alpha_k$ is the step size. The CG method is optimal in the sense that it minimizes the error in the $A$-norm.

One may note that $\alpha_k$ can be computed as: $\alpha_k = \frac{r_k^Tr_k}{p_k^TAp_k}$

A-Othogonality and Residual Update

The residuals satisfy the A-orthogonality property: $r_{k+1} = r_k - \alpha_kAp_k$ New search directions are constructed from the residuals: $p_{k+1} = r_{k+1} + \beta_kp_k$ where $\beta_k$ is the conjugacy parameter: $\beta_k = \frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}$

Implementation with Detail Steps

def conjugate_gradient(A, b, x0, tol=1e-10, max_iter=1000):
    """
    Conjugate Gradient method with detailed computation
    
    Parameters:
    -----------
    A : array_like
        Symmetric positive definite matrix
    b : array_like
        Right-hand side vector
    x0 : array_like
        Initial guess
    
    Returns:
    --------
    x : array_like
        Solution vector
    history : dict
        Convergence history
    """
    x = x0.copy()
    r = b - A @ x
    p = r.copy()
    
    # Store convergence history
    history = {
        'residual_norms': [],
        'energy_norms': []
    }
    
    for k in range(max_iter):
        Ap = A @ p
        rr = r @ r
        
        # Compute step size
        alpha = rr / (p @ Ap)
        
        # Update solution and residual
        x += alpha * p
        r_new = r - alpha * Ap
        
        # Compute convergence metrics
        energy_norm = np.sqrt(x @ (A @ x) - 2*(x @ b))
        history['residual_norms'].append(np.linalg.norm(r))
        history['energy_norms'].append(energy_norm)
        
        if np.linalg.norm(r_new) < tol:
            return x, history
            
        # Compute new search direction
        beta = (r_new @ r_new) / rr
        p = r_new + beta * p
        r = r_new
        
    return x, history

Convergence Analysis

The error in the A-norm satisfies: $\|e_k\|_A \leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k \|e_0\|_A$ where:

$\kappa = \frac{\lambda_{\max}}{\lambda_{\min}}$ is the condition number
The error decreases by a factor of $\left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)$ at each iteration
The CG method converges in at most $n$ iterations for an $n \times n$ matrix

This leads to the convergence estimate: $k \geq \frac{1}{2} \sqrt{\kappa} \ln \left(\frac{2}{\epsilon}\right)$ iterations to achieve $|e_k|_A \leq \epsilon |e_0|_A$.

GMRES Method

Fundamental Principle

The Generalized Minimal Residual (GMRES) method is a powerful iterative method for general linear systems. The method constructs an orthonormal basis of the Krylov subspace: $\mathcal{K}_k(A, r_0) = \text{span}\{r_0, Ar_0, A^2r_0, \ldots, A^{k-1}r_0\}$ The GMRES method minimizes the residual over this subspace.

Arnoldi Iteration

The method uses the Arnoldi iteration to construct an orthonormal basis of the Krylov subspace. The Arnoldi iteration constructs the matrix $H_k$ such that:

Orthogonalization: $h_{i,j} = (Av_j, v_i) \quad \text{for} \quad j < i\\ \hat{v}_{i+1} = Av_i - \sum_{j=1}^{i} h_{i,j}v_j$
Normalization: $h_{i+1,i} = \|\hat{v}_{i+1}\|_2\\ v_{i+1} = \hat{v}_{i+1} / h_{i+1,i}$ where $v_1 = r_0 / |r_0|_2$

Minimization Problem

A step $n$, GMRES solves: $x_n = x_0 + V_ny_n$ where $V_n$ is the Arnoldi basis and $y_n$ minimizes the residual: $y_n = \text{argmin}_{y \in \mathbb{R}^n} \|r_0 - AV_ny\|_2$

Implementation

The GMRES method can be implemented as follows:

Construct the Arnoldi basis

 def arnoldi_iteration(A, v, m):
     n = len(v)
     H = np.zeros((m + 1, m))
     V = np.zeros((n, m + 1))
     V[:, 0] = v / np.linalg.norm(v)
        
     for j in range(m):
         w = A @ V[:, j]
         for i in range(j + 1):
             H[i, j] = np.dot(V[:, i], w)
             w = w - H[i, j] * V[:, i]
            
         H[j + 1, j] = np.linalg.norm(w)
         if H[j + 1, j] < 1e-12:
             return V[:, :j+1], H[:j+1, :j]
         V[:, j + 1] = w / H[j + 1, j]
        
     return V, H

Solve the least squares problem

def gmres(A, b, x0, max_iter, tol):
 r0 = b - A @ x0
 beta = np.linalg.norm(r0)
 v = r0 / beta
    
 H = np.zeros((max_iter + 1, max_iter))
 V = np.zeros((A.shape[0], max_iter + 1))
 V[:, 0] = v
    
 for j in range(max_iter):
     w = A @ V[:, j]
     for i in range(j + 1):
         H[i, j] = w @ V[:, i]
         w = w - H[i, j] * V[:, i]
            
     H[j+1, j] = np.linalg.norm(w)
     if H[j+1, j] < tol:
         break
     V[:, j+1] = w / H[j+1, j]
 return x0 + V[:, :j] @ np.linalg.lstsq(H[:j+1, :j], beta * np.eye(j))[0]

Convergence Analysis

The residual at step $k$ satisfies: $\|r_k\|_2 = \min_{p \in \mathcal{P}_k} \|p(A)r_0\|_2$ where $\mathcal{P}_k$ is the set of polynomials of degree at most $k$. The GMRES method minimizes the residual over the Krylov subspace, leading to convergence in at most $n$ iterations for an $n \times n$ matrix.

Theorem

Theorem (Convergence of Iterative Methods)

For the iteration $x_{k+1} = Tx_k + c$, convergence occurs if and only if $\rho(T) < 1$, where $\rho(T)$ is the spectral radius of $T$.

Proof Let $e_k = x_k - x^*$ be the error at iteration $k$. Then: $e_{k+1} = x_{k+1} - x^* = Tx_k + c - x^* = T(x_k - x^*) = Te_k\\ e_{k} = T^ke_0$

By the spectral radius property, $\rho(T) < 1$ implies $|T^k| \to 0$ as $k \to \infty$. This implies that the error $e_k \to 0$ as $k \to \infty$, which is the definition of convergence.

Theorem (CG Convergence Rate)

For the Conjugate Gradient method applied to $Ax = b$ with $A$ symmetric positive definite: $\|x_k - x^*\|_A \leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k \|x_0 - x^*\|_A$ where $\kappa = \frac{\lambda_{\max}}{\lambda_{\min}}$ is the condition number of $A$.

Proof(Sketch)

CG minimizes $|x_k - x^*|_A$ over Krylov subspaces of $A$.
Error can be expressed as min-max problem over polynomials.
Use Chebyshev polynomial properties to bound error.

Theorem (GMRES Minimization Property)

At step $k$, the GMRES approximation $x_k$ satisfies: $\|b-Ax_k\|_2 = \min_{x \in x_0 + \mathcal{K}_k} \|b - Ax\|_2$ where $\mathcal{K}_k$ is the Krylov subspace of $A$.

Proof(Sketch)

$x_k = x_0 + V_ky_k$ minimizes the residual over the Krylov subspace where $y_k$ minimizes $|\beta e_1 - H_ky|_2$.
Show that $|b - Ax_k|_2 = |\beta e_1 - H_ky_k|_2$.
Demonstrates that this is minimal over $x_0 + \mathcal{K}_k$.

Theorem (GMRES Convergence)

For non-singular $A$, GMRES converges in at most $n$ steps (in exact arithmetic), where $n$ is the matrix dimension.

Proof(Sketch)

Show that $\mathcal{K}_n$ contains the solution
Use minimization property
Apply non-singularity of $A$ to show convergence

Theorem (GMRES Residual Bound)

For normal matrices $A$: $\|r_k\|_2 \leq \min_{p \in \Pi_k} \max_{\lambda \in \sigma(A)} |p(\lambda)|\|r_0\|_2$ where $\Pi_k$ is the set of polynomials of degree at most $k$ and $\sigma(A)$ is the spectrum of $A$.

Proof(Sketch) Uses the fact that for normal matrices: $\|p(A)\|_2 = \max_{\lambda \in \sigma(A)} |p(\lambda)|$

Discussion

Comparison of Methods

Method Selection Framework

| Method | Matrix Type | Convergence | Complexity | Memory | Implementation | |——–|————-|————-|————|——–|—————–| | Jacobi | Diagonally dominant | Linear | $O(n^2)$ | $O(n)$ | Simple | | CG | Symmetric positive definite | $\sqrt{\kappa}$ | $O(n^2)$ | $O(n)$ | Complex | | GMRES | General | Problem Dependent | $O(n^2)$ | $O(n)$ | Complex |

Convergence Behavior

Jacobi method converges for diagonally dominant matrices but the convergence rate is slow.
CG method converges in at most $n$ iterations for symmetric positive definite matrices.
GMRES method converges in at most $n$ iterations for general matrices.
The CG method is optimal in the $A$-norm, while GMRES minimizes the residual over the Krylov subspace, Jacobi method is a simple splitting method.
The CG method is optimal for symmetric positive definite matrices, while GMRES is optimal for general matrices, Jacobi method is optimal for diagonally dominant matrices.

Preconditioning Strategies

Classical Preconditioners

Imcomplete $LU$ ($ILU$) factorization
Sparse Approximate Inverse ($SPAI$)
Algebraic Multigrid ($AMG$)

Problem-Specific Preconditioners

For elliptic PDEs:

Multigrid preconditioners ```python def multigrid_preconditioner(A, levels=3): “”” Geometric multigrid preconditioner “”” def apply_V_cycle(r): for l in range(levels): smooth(r) restrict(r) # Coarse grid solve for l in range(levels-1, -1, -1): prolongate(r) smooth(r) return apply_V_cycle

```

Real-World Applications

Computational Fluid Dynamics

Pressure-correction methods
- SIMPLE
- PISO
Incompressible Navier-Stokes
- Projection methods
Turbulence modeling

Example system: $\begin{bmatrix} A & B^T \\ B & 0 \end{bmatrix} \begin{bmatrix} u \\ p \end{bmatrix} = \begin{bmatrix} f \\ g \end{bmatrix}$

Structural Analysis

Large deformation problems
- Newton-Raphson
- Bifurcation analysis
Contact mechanics
- Penalty methods
- Augmented Lagrangian
Modal analysis

Conclusion

Theoretical Insights

The study of iterative methods reveals a beautiful interplay between linear algebra, functional analysis, and computational mathematics. The progression from simple iterations to sophisticated Krylov subspace methods demonstrates how theoretical insights can lead to practical algorithmic improvements. Key theoretical achievements include:

Understanding convergence through polynomial approximation
Establishing optimality of methods like CG for specific matrix classes
Development of robust error bounds and stopping criteria

Practical Impact

The practical significance of iterative methods cannot be overstated:

Problem Size Scaling
- Direct methods: $O(n^3)$ operations
- Iterative methods: $O(n^2)$ per iteration
- Memory requirements: $O(n)$ vs $O(n^2)$
Adaptability to Structure
- Sparse matrix operations
- Matrix-free implementations
- Problem-specific preconditioners

Future Directions

The field continues to evolve along several promising directions:

Algorithmic Improvements
- Hybrid methods combining different approaches
- Adaptive parameter selection
- Auto-tuning frameworks
Hardware Adaptation
- GPU acceleration
- Quantum computing integration
- Mixed precision implementations
Application Domains
- High-performance computing
- Machine learning
- Data assimilation

Final Remarks

The development of iterative methods represents a triumph of mathematical analysis meeting computational practice. As we face increasingly complex computational challenges, the principles established in this field will continue to guide the development of new algorithms and applications. The future lies in adapting these fundamental ideas to emerging computational paradigms while maintaining their mathematical rigor and practical efficiency. This journey through iterative methods demonstrates that the field remains vibrant and essential for modern scientific computing, with many exciting developments yet to come.

Reference

Classical Texts

Kincaid, D. and Cheney, W. (2009). Numerical Analysis: Mathematics of Scientific Computing, Third Edition. American Mathematical Society, Pure and Applied Undergraduate Texts, Volume 2.
Saad, Y. (2003). Iterative Methods for Sparse Linear Systems, Second Edition. SIAM.
Golub, G. H., and Van Loan, C. F. (2013). Matrix Computations, Fourth Edition. Johns Hopkins University Press.

Advanced Topics

Trefethen, L. N., and Bau III, D. (1997). Numerical Linear Algebra. SIAM.
Hackbusch, W. (2016). Iterative Solution of Large Sparse Systems of Equations, Second Edition. Springer.
Greenbaum, A. (1997). Iterative Methods for Solving Linear Systems. SIAM.

Applications and Implementation

Davis, T. A. (2006). Direct Methods for Sparse Linear Systems. SIAM.
Kelley, C. T. (1995). Iterative Methods for Linear and Nonlinear Equations. SIAM.
Saad, Y., and van der Vorst, H. A. (2000). Iterative Solution of Linear Systems in the 20th Century. Journal of Computational and Applied Mathematics.

Share on

Twitter Facebook LinkedIn

Sok Kin Cheng

Introduction

Analysis

Mathematical Framework

Matrix Splitting Methods

Jacobi Method Derivation

Convergence Analysis

Conjugate Gradient Method

Derivation from Minimization Principle

Conjugate Directions

A-Othogonality and Residual Update

Implementation with Detail Steps

Convergence Analysis

GMRES Method

Fundamental Principle

Arnoldi Iteration

Minimization Problem

Implementation

Convergence Analysis

Theorem

Theorem (Convergence of Iterative Methods)

Theorem (CG Convergence Rate)

Theorem (GMRES Minimization Property)

Theorem (GMRES Convergence)

Theorem (GMRES Residual Bound)

Discussion

Comparison of Methods

Method Selection Framework

Convergence Behavior

Preconditioning Strategies

Classical Preconditioners

Problem-Specific Preconditioners

Real-World Applications

Computational Fluid Dynamics

Structural Analysis

Conclusion

Theoretical Insights

Practical Impact

Future Directions

Final Remarks

Reference

Classical Texts

Advanced Topics

Applications and Implementation

Share on

Comments

You May Also Enjoy

Delays, Limit Cycles, and Phase Diffusion: A Mathematical Framework for Optomechanically Levitated Nanoparticles

Hubs, Pheromones, and No-Fly Zones: A Mathematical Framework for Last-Mile Drone Delivery Optimization

Rising Tides, Rising Costs: Optimizing Coastal Defense Strategies Against Sea Level Rise and Storm Surges

The Kessler Horizon: A Deep Dive into Modeling Orbital Debris Cascades with Parallelized Cowell’s Formulation