Notes of Linear Algebra for Preparing Mathematical Modeling

Chapter 3: Matrix Algebra

Author: Kenneth, S.K. Cheng

Notes of Linear Algebra for Preparing Mathematical Modeling
- Chapter 3: Matrix Algebra
  - Author: Kenneth, S.K. Cheng
- Table of Contents

3.1 Matrix Addition and Scalar Multiplication

Unless stated otherwise, a scalar is a complex number where real numbers are a subset of the complex number. Matrices $\mathbf{A} = [a_{ij}]$ and $\mathbf{B} = [b_{ij}]$ are of the same size, i.e. $m \times n$ . The sum of $\mathbf{A}$ and $\mathbf{B}$ is denoted by $\mathbf{A} + \mathbf{B}$ and is defined by

\mathbf{A} + \mathbf{B} = [a_{ij} + b_{ij}].

Example: Suppose $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $\mathbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$ . Then $\mathbf{A} + \mathbf{B} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}$ .

Properties of Matrix Addition: For any $m \times n$ matrices $\mathbf{A}$ , $\mathbf{B}$ , and $\mathbf{C}$ , the following properties hold:

$\mathbf{A} + \mathbf{B}$ is again an $m \times n$ matrix. (Closure Property)
$\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}$ . (Commutative Property)
$(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})$ . (Associative Property)
There is a unique $m \times n$ matrix $\mathbf{O}$ such that $\mathbf{A} + \mathbf{O} = \mathbf{A}$ for all $m \times n$ matrices $\mathbf{A}$ . This matrix $\mathbf{O}$ is called the zero matrix and is denoted by $\mathbf{O}$ . (Addive Identity)
There is a unique $m \times n$ matrix $-\mathbf{A}$ such that $\mathbf{A} + (-\mathbf{A}) = \mathbf{O}$ . This matrix $-\mathbf{A}$ is called the negative of $\mathbf{A}$ . (Additive Inverse)

The scalar multiplication of a matrix $\mathbf{A}$ by a scalar $c$ is denoted by $c\mathbf{A}$ and is defined by

c\mathbf{A} = [ca_{ij}].

Example: Suppose $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $c = 2$ . Then $c\mathbf{A} = \begin{bmatrix} 2 \times 1 & 2 \times 2 \\ 2 \times 3 & 2 \times 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix}$ .

Properties of Scalar Multiplication: For any $m \times n$ matrices $\mathbf{A}$ and $\mathbf{B}$ and scalars $\alpha$ and $\beta$ , the following properties hold:

$\alpha \mathbf{A}$ is again an $m \times n$ matrix. (Closure Property)
$\alpha(\beta\mathbf{A}) = (\alpha\beta)\mathbf{A}$ . (Associative Property)
$\alpha(\mathbf{A} + \mathbf{B}) = \alpha\mathbf{A} + \alpha\mathbf{B}$ . (Distributive Property)
$(\alpha + \beta)\mathbf{A} = \alpha\mathbf{A} + \beta\mathbf{A}$ . (Distributive Property)
There is $1$ such that $1\mathbf{A} = \mathbf{A}$ for all $m \times n$ matrices $\mathbf{A}$ . This $1$ is called the multiplicative identity.

3.2 Matrix Multiplication

The product of two matrices $\mathbf{A}$ and $\mathbf{B}$ is denoted by $\mathbf{A}\mathbf{B}$ and is defined by

\mathbf{A}\mathbf{B} = \begin{bmatrix} a_{i1} & a_{i2} & \cdots & a_{in} \\ a_{i1} & a_{i2} & \cdots & a_{in} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{1j} \\ b_{2j} \\ \vdots \\ b_{nj} \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^{n} a_{ik}b_{kj} \end{bmatrix}.

Example: Suppose there are two $2 \times 2$ matrices $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $\mathbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$ . Then $\mathbf{A}\mathbf{B} = \begin{bmatrix} 1 \times 5 + 2 \times 7 & 1 \times 6 + 2 \times 8 \\ 3 \times 5 + 4 \times 7 & 3 \times 6 + 4 \times 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}$ .

Remarks: Matrix multiplication is not commutative. That is, in general, $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$ .

Matrix Multiplication and Addition by using Python

Though it is not always necessary to use Python to perform matrix multiplication and addition, it is a good practice to do so. The following code shows how to perform matrix multiplication and addition using Python.

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
AB = np.dot(A, B)
AplusB = A + B

print(AB)
print(AplusB)

3.2 Transposition and Symmetric Matrices

A matrix operation that is not derived from scalar multiplication and matrix addition is called a matrix operation. The transpose of a matrix $\mathbf{A}$ is denoted by $\mathbf{A}^T$ and is defined by

\mathbf{A}^T = [a_{ji}].

Example: Suppose $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ . Then $\mathbf{A}^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}$ .

Sometimes a matrix may include complex numbers. In this case, then we may have to take the conjugate of the complex number. The conjugate transpose of a matrix $\mathbf{A}$ is denoted by $\mathbf{A}^*$ and is defined by

\mathbf{A}^* = [\bar{a}_{ji}].

Example: Suppose $\mathbf{A} = \begin{bmatrix} 1-4i & 2+3i \\ 3+2i & 4-1i \end{bmatrix}$ . Then $\mathbf{A}^* = \begin{bmatrix} 1+4i & 3-2i \\ 2-3i & 4+1i \end{bmatrix}$ .

Properties of Transposition: For any $m \times n$ matrix $\mathbf{A}$ and $n \times p$ matrix $\mathbf{B}$ , and scalar $c$ , the following properties hold:

$(\mathbf{A}^T)^T = \mathbf{A}$ .
$(c\mathbf{A})^T = c\mathbf{A}^T$ .
$(\mathbf{A} + \mathbf{B})^T = \mathbf{A}^T + \mathbf{B}^T$ .

For the complex conjugate transpose, the following properties hold:

$(\mathbf{A}^*)^* = \mathbf{A}$ .
$(c\mathbf{A})^* = \bar{c}\mathbf{A}^*$ .
$(\mathbf{A} + \mathbf{B})^* = \mathbf{A}^* + \mathbf{B}^*$ .

Sometimes, a transposition of a matrix is the same as the original matrix. In this case, the matrix is called a symmetric matrix. That is, a matrix $\mathbf{A}$ is symmetric if $\mathbf{A} = \mathbf{A}^T$ .

Example: Suppose $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}$ . Then $\mathbf{A}^T = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}$ which is still $\mathbf{A}$ .

Definition: Let $\mathbf{A} = [a_{ij}]$ be a square matrix.

$\mathbf{A}$ is said to be a symmetric matrix if $\mathbf{A} = \mathbf{A}^T$ .
$\mathbf{A}$ is said to be a skew-symmetric matrix if $\mathbf{A} = -\mathbf{A}^T$ .
$\mathbf{A}$ is said to be a Hermitian matrix if $\mathbf{A} = \mathbf{A}^*$ . This is the complex analog of a symmetric matrix.
$\mathbf{A}$ is said to be a skew-Hermitian matrix if $\mathbf{A} = -\mathbf{A}^*$ . This is the complex analog of a skew-symmetric matrix.

Transposition and Symmetric Matrices by using Python

The following code shows how to perform transposition and check if a matrix is symmetric using Python.

import numpy as np

A = np.array([[1, 2], [2, 3]])
AT = np.transpose(A)

if np.array_equal(A, AT):
    print("The matrix is symmetric.")
else:
    print("The matrix is not symmetric.")

3.3 Linearity

The concept of linearity is the underlying theme of our subject. In elementary mathematics the term “linear function” refers to straight lines, but in higher mathematics linearity means something much more general. Recall that a function $f$ is simply a rule for associating points in one set $\mathcal{D}$ - called the domain of $f$ — to points in another set $\mathcal{R}$ - the range of $f$ . A linear function is a particular type of function that is characterized by the following two properties.

Additivity: For any two points $x$ and $y$ in the domain of $f$ , the value of $f$ at the sum $x + y$ is the sum of the values of $f$ at $x$ and $y$ . In symbols, $f(x + y) = f(x) + f(y)$ .
Homogeneity: For any point $x$ in the domain of $f$ and any scalar $c$ , the value of $f$ at the product $cx$ is the product of the value of $f$ at $x$ and $c$ . In symbols, $f(cx) = cf(x)$ .

These two properties may be combined into a single property called linearity. A function $f$ is linear if it satisfies the following property:

f(cx + y) = cf(x) + f(y).

for all points $x$ and $y$ in the domain of $f$ and all scalars $c$ . The linearity of a function is a fundamental concept in mathematics. It is the key to understanding the behavior of many physical systems and is the basis for the development of the calculus of variations, which is a powerful tool for solving optimization problems.

There are also two more terminologies I would like to introduce here.

The trace of a square matrix $\mathbf{A}$ is denoted by $\text{tr}(\mathbf{A})$ and is defined by

\text{tr}(\mathbf{A}) = \sum_{i=1}^{n} a_{ii}.

The linear combination of matrices $A_{i}$ is denoted by $\sum_{i=1}^{n} c_{i}A_{i}$ and is defined by

\sum_{i=1}^{n} c_{i}A_{i} = c_{1}A_{1} + c_{2}A_{2} + \cdots + c_{n}A_{n}.

3.4 Matrix Inversion

Recall that the inverse of a square matrix $\mathbf{A}$ is denoted by $\mathbf{A}^{-1}$ and is defined by

\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I},

where $\mathbf{I}$ is the identity matrix. The inverse of a matrix may not always exist. If it does exist, then the matrix is said to be invertible or nonsingular. If the inverse does not exist, then the matrix is said to be noninvertible or singular.

Existence of Inverse

For an $n \times n$ matrix $\mathbf{A}$ , the following statements are equivalent:

$\mathbf{A}$ is invertible which means $\mathbf{A}^{-1}$ exists.
$\operatorname{rank}(\mathbf{A}) = n$ .
$\mathbf{A}\mathbf{x} = \mathbf{0}$ implies $\mathbf{x} = \mathbf{0}$ .
$\mathbf{A}$ can be transformed into the identity matrix by a sequence of elementary row operations(Gauss-Jordan elimination).

Properties of Inverse

For any invertible $n \times n$ matrices $\mathbf{A}$ and $\mathbf{B}$ , the following properties hold:

$(\mathbf{A}^{-1})^{-1} = \mathbf{A}$ .
The product of two invertible matrices is invertible and $(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$ .
Inverse of a transpose is the transpose of the inverse, i.e. $(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T$ . For the complex conjugate transpose, $(\mathbf{A}^*)^{-1} = (\mathbf{A}^{-1})^*$ .

3.5 Inverses of Sums and Sensitivity

By previous section, we may see that by the reverse order for inverses of products, we have

(\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.

But the inverse of a sum is not as simple as the inverse of a product. Since the derivation is not trivial, we will skip this part. We usually use the Sherman-Morrison formula to calculate the inverse of a sum.

The Sherman-Morrison formula states that for any invertible $n \times n$ matrix $\mathbf{A}$ and $n \times 1$ vectors $\mathbf{u}$ and $\mathbf{v}$ , if $\mathbf{A} + \mathbf{u}\mathbf{v}^T$ is invertible, then

(\mathbf{A} + \mathbf{u}\mathbf{v}^T)^{-1} = \mathbf{A}^{-1} - \frac{\mathbf{A}^{-1}\mathbf{u}\mathbf{v}^T\mathbf{A}^{-1}}{1 + \mathbf{v}^T\mathbf{A}^{-1}\mathbf{u}}.

It is important to note that the Sherman-Morrison formula is not a general formula for the inverse of a sum. It is a special formula that applies only when the sum is of a particular form.

Recall that we have talked about ill-conditioned matrices in the previous chapter. We know that when we perturb the constant vector $\mathbf{b}$ in the linear system $\mathbf{A}\mathbf{x} = \mathbf{b}$ , the solution $\mathbf{x}$ will also be perturbed.

Therefore, we define the following:

Definition: A nonsingular matrix $\mathbf{A}$ is said to be ill-conditioned if a small perturbation in the matrix $\mathbf{A}$ results in a large change in the inverse of $\mathbf{A}$ . The degree of ill-conditioning of a matrix is measured by the condition number of the matrix. We denote the condition number of a matrix $\mathbf{A}$ by $\kappa(\mathbf{A})$ and it is defined by

\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\|.

where $\|\cdot\|$ is the matrix norm.

The matrix norm is a generalization of the vector norm. For a matrix $\mathbf{A}$ , the matrix norm is defined by

\|\mathbf{A}\| = \max_{i} \sum_j |a_ij| = \text{maximum absolute row sum}.

The condition number of a matrix is a measure of how well-conditioned or ill-conditioned the matrix is. The condition number of a matrix is a nonnegative number. The larger the condition number, the more ill-conditioned the matrix is.

3.6 LU Decomposition

We have now come full circle, and we are back to where the text began—solving a nonsingular system of linear equations using Gaussian elimination with back substitution. This time, however, the goal is to describe and understand the process in the context of matrices.

If $\mathbf{A}\mathbf{x} = \mathbf{b}$ is a system of linear equations, then we can write it as $\mathbf{A}\mathbf{x} = \mathbf{L}\mathbf{U}\mathbf{x} = \mathbf{b}$ , where $\mathbf{L}$ is a lower triangular matrix and $\mathbf{U}$ is an upper triangular matrix. The process of decomposing a matrix $\mathbf{A}$ into the product of a lower triangular matrix $\mathbf{L}$ and an upper triangular matrix $\mathbf{U}$ is called the LU decomposition.

The LU decomposition is a fundamental concept in numerical linear algebra. It is used to solve systems of linear equations, compute the inverse of a matrix, and calculate the determinant of a matrix. The LU decomposition is also used in the Cholesky decomposition, which is used to solve systems of linear equations with symmetric positive definite matrices.

Theorem: Let $\mathbf{A}$ be an $n \times n$ matrix. Then $\mathbf{A}$ has an LU decomposition if and only if all leading principal minors of $\mathbf{A}$ are nonzero.

Algorithm for LU Decomposition:

Start with the matrix $\mathbf{A}$ .
Perform Gaussian elimination to obtain an upper triangular matrix $\mathbf{U}$ .
The lower triangular matrix $\mathbf{L}$ is obtained by setting the elements below the diagonal of $\mathbf{U}$ to zero and setting the diagonal elements of $\mathbf{L}$ to one.
The LU decomposition of $\mathbf{A}$ is given by $\mathbf{A} = \mathbf{L}\mathbf{U}$ .
The system of linear equations $\mathbf{A}\mathbf{x} = \mathbf{b}$ can be solved by solving the two systems of linear equations $\mathbf{L}\mathbf{y} = \mathbf{b}$ and $\mathbf{U}\mathbf{x} = \mathbf{y}$ .
The solution to the system of linear equations is given by $\mathbf{x} = \mathbf{U}^{-1}\mathbf{L}^{-1}\mathbf{b}$ .
The inverse of the matrix $\mathbf{A}$ is given by $\mathbf{A}^{-1} = \mathbf{U}^{-1}\mathbf{L}^{-1}$ .

Example: Suppose we are having a $3\times 3$ matrix

\mathbf{A} = \begin{bmatrix} 2 & 2 & 2 \\ 4 & 7 & 7 \\ 6 & 18 & 22 \end{bmatrix}.

We try to find the LU decomposition of $\mathbf{A}$ . We apply the Gaussian elimination to $\mathbf{A}$ and obtain

\mathbf{U} = \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix}.

For the lower triangular matrix $\mathbf{L}$ , we apply the following:

\begin{bmatrix} 1 & 0 & 0 \\ l_{21} & 1 & 0 \\ l_{31} & l_{32} & 1 \end{bmatrix} \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 2 & 2 \\ 4 & 7 & 7 \\ 6 & 18 & 22 \end{bmatrix}.

By solving the above equation, we obtain

\mathbf{L} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 4 & 1 \end{bmatrix}.

Therefore, the LU decomposition of $\mathbf{A}$ is given by

\mathbf{A} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 4 & 1 \end{bmatrix} \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix}.

LU Decomposition by using Python

The following code shows how to perform LU decomposition using Python. Here we use the scipy library to perform the LU decomposition.

import scipy as sp

A = sp.array([[2, 2, 2], [4, 7, 7], [6, 18, 22]])
L, U = sp.linalg.lu(A)

print(L)
print(U)