Notes of Linear Algebra for Preparing Mathematical Modeling

Chapter 3: Matrix Algebra

Author: Kenneth, S.K. Cheng

Table of Contents

3.1 Matrix Addition and Scalar Multiplication

Unless stated otherwise, a scalar is a complex number where real numbers are a subset of the complex number. Matrices A=[aij]\mathbf{A} = [a_{ij}] and B=[bij]\mathbf{B} = [b_{ij}] are of the same size, i.e. m×nm \times n. The sum of A\mathbf{A} and B\mathbf{B} is denoted by A+B\mathbf{A} + \mathbf{B} and is defined by

A+B=[aij+bij].\mathbf{A} + \mathbf{B} = [a_{ij} + b_{ij}].

Example: Suppose A=[1234]\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} and B=[5678]\mathbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}. Then A+B=[1+52+63+74+8]=[681012]\mathbf{A} + \mathbf{B} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}.

Properties of Matrix Addition: For any m×nm \times n matrices A\mathbf{A}, B\mathbf{B}, and C\mathbf{C}, the following properties hold:

  1. A+B\mathbf{A} + \mathbf{B} is again an m×nm \times n matrix. (Closure Property)
  2. A+B=B+A\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}. (Commutative Property)
  3. (A+B)+C=A+(B+C)(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C}). (Associative Property)
  4. There is a unique m×nm \times n matrix O\mathbf{O} such that A+O=A\mathbf{A} + \mathbf{O} = \mathbf{A} for all m×nm \times n matrices A\mathbf{A}. This matrix O\mathbf{O} is called the zero matrix and is denoted by O\mathbf{O}. (Addive Identity)
  5. There is a unique m×nm \times n matrix A-\mathbf{A} such that A+(A)=O\mathbf{A} + (-\mathbf{A}) = \mathbf{O}. This matrix A-\mathbf{A} is called the negative of A\mathbf{A}. (Additive Inverse)

The scalar multiplication of a matrix A\mathbf{A} by a scalar cc is denoted by cAc\mathbf{A} and is defined by

cA=[caij].c\mathbf{A} = [ca_{ij}].

Example: Suppose A=[1234]\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} and c=2c = 2. Then cA=[2×12×22×32×4]=[2468]c\mathbf{A} = \begin{bmatrix} 2 \times 1 & 2 \times 2 \\ 2 \times 3 & 2 \times 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix}.

Properties of Scalar Multiplication: For any m×nm \times n matrices A\mathbf{A} and B\mathbf{B} and scalars α\alpha and β\beta, the following properties hold:

  1. αA\alpha \mathbf{A} is again an m×nm \times n matrix. (Closure Property)
  2. α(βA)=(αβ)A\alpha(\beta\mathbf{A}) = (\alpha\beta)\mathbf{A}. (Associative Property)
  3. α(A+B)=αA+αB\alpha(\mathbf{A} + \mathbf{B}) = \alpha\mathbf{A} + \alpha\mathbf{B}. (Distributive Property)
  4. (α+β)A=αA+βA(\alpha + \beta)\mathbf{A} = \alpha\mathbf{A} + \beta\mathbf{A}. (Distributive Property)
  5. There is 11 such that 1A=A1\mathbf{A} = \mathbf{A} for all m×nm \times n matrices A\mathbf{A}. This 11 is called the multiplicative identity.

3.2 Matrix Multiplication

The product of two matrices A\mathbf{A} and B\mathbf{B} is denoted by AB\mathbf{A}\mathbf{B} and is defined by

AB=[ai1ai2ainai1ai2ainam1am2amn][b1jb2jbnj]=[k=1naikbkj].\mathbf{A}\mathbf{B} = \begin{bmatrix} a_{i1} & a_{i2} & \cdots & a_{in} \\ a_{i1} & a_{i2} & \cdots & a_{in} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{1j} \\ b_{2j} \\ \vdots \\ b_{nj} \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^{n} a_{ik}b_{kj} \end{bmatrix}.

Example: Suppose there are two 2×22 \times 2 matrices A=[1234]\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} and B=[5678]\mathbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}. Then AB=[1×5+2×71×6+2×83×5+4×73×6+4×8]=[19224350]\mathbf{A}\mathbf{B} = \begin{bmatrix} 1 \times 5 + 2 \times 7 & 1 \times 6 + 2 \times 8 \\ 3 \times 5 + 4 \times 7 & 3 \times 6 + 4 \times 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}.

Remarks: Matrix multiplication is not commutative. That is, in general, ABBA\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}.

Matrix Multiplication and Addition by using Python

Though it is not always necessary to use Python to perform matrix multiplication and addition, it is a good practice to do so. The following code shows how to perform matrix multiplication and addition using Python.

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
AB = np.dot(A, B)
AplusB = A + B

print(AB)
print(AplusB)

3.2 Transposition and Symmetric Matrices

A matrix operation that is not derived from scalar multiplication and matrix addition is called a matrix operation. The transpose of a matrix A\mathbf{A} is denoted by AT\mathbf{A}^T and is defined by

AT=[aji].\mathbf{A}^T = [a_{ji}].

Example: Suppose A=[1234]\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}. Then AT=[1324]\mathbf{A}^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}.

Sometimes a matrix may include complex numbers. In this case, then we may have to take the conjugate of the complex number. The conjugate transpose of a matrix A\mathbf{A} is denoted by A\mathbf{A}^* and is defined by

A=[aˉji].\mathbf{A}^* = [\bar{a}_{ji}].

Example: Suppose A=[14i2+3i3+2i41i]\mathbf{A} = \begin{bmatrix} 1-4i & 2+3i \\ 3+2i & 4-1i \end{bmatrix}. Then A=[1+4i32i23i4+1i]\mathbf{A}^* = \begin{bmatrix} 1+4i & 3-2i \\ 2-3i & 4+1i \end{bmatrix}.

Properties of Transposition: For any m×nm \times n matrix A\mathbf{A} and n×pn \times p matrix B\mathbf{B}, and scalar cc, the following properties hold:

For the complex conjugate transpose, the following properties hold:

Sometimes, a transposition of a matrix is the same as the original matrix. In this case, the matrix is called a symmetric matrix. That is, a matrix A\mathbf{A} is symmetric if A=AT\mathbf{A} = \mathbf{A}^T.

Example: Suppose A=[1223]\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}. Then AT=[1223]\mathbf{A}^T = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix} which is still A\mathbf{A}.

Definition: Let A=[aij]\mathbf{A} = [a_{ij}] be a square matrix.

Transposition and Symmetric Matrices by using Python

The following code shows how to perform transposition and check if a matrix is symmetric using Python.

import numpy as np

A = np.array([[1, 2], [2, 3]])
AT = np.transpose(A)

if np.array_equal(A, AT):
    print("The matrix is symmetric.")
else:
    print("The matrix is not symmetric.")

3.3 Linearity

The concept of linearity is the underlying theme of our subject. In elementary mathematics the term “linear function” refers to straight lines, but in higher mathematics linearity means something much more general. Recall that a function ff is simply a rule for associating points in one set D\mathcal{D} - called the domain of ff — to points in another set R\mathcal{R} - the range of ff. A linear function is a particular type of function that is characterized by the following two properties.

These two properties may be combined into a single property called linearity. A function ff is linear if it satisfies the following property:

f(cx+y)=cf(x)+f(y).f(cx + y) = cf(x) + f(y).

for all points xx and yy in the domain of ff and all scalars cc. The linearity of a function is a fundamental concept in mathematics. It is the key to understanding the behavior of many physical systems and is the basis for the development of the calculus of variations, which is a powerful tool for solving optimization problems.

There are also two more terminologies I would like to introduce here.

The trace of a square matrix A\mathbf{A} is denoted by tr(A)\text{tr}(\mathbf{A}) and is defined by

tr(A)=i=1naii.\text{tr}(\mathbf{A}) = \sum_{i=1}^{n} a_{ii}.

The linear combination of matrices AiA_{i} is denoted by i=1nciAi\sum_{i=1}^{n} c_{i}A_{i} and is defined by

i=1nciAi=c1A1+c2A2++cnAn.\sum_{i=1}^{n} c_{i}A_{i} = c_{1}A_{1} + c_{2}A_{2} + \cdots + c_{n}A_{n}.

3.4 Matrix Inversion

Recall that the inverse of a square matrix A\mathbf{A} is denoted by A1\mathbf{A}^{-1} and is defined by

AA1=A1A=I,\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I},

where I\mathbf{I} is the identity matrix. The inverse of a matrix may not always exist. If it does exist, then the matrix is said to be invertible or nonsingular. If the inverse does not exist, then the matrix is said to be noninvertible or singular.

Existence of Inverse

For an n×nn \times n matrix A\mathbf{A}, the following statements are equivalent:

Properties of Inverse

For any invertible n×nn \times n matrices A\mathbf{A} and B\mathbf{B}, the following properties hold:

3.5 Inverses of Sums and Sensitivity

By previous section, we may see that by the reverse order for inverses of products, we have

(AB)1=B1A1.(\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.

But the inverse of a sum is not as simple as the inverse of a product. Since the derivation is not trivial, we will skip this part. We usually use the Sherman-Morrison formula to calculate the inverse of a sum.

The Sherman-Morrison formula states that for any invertible n×nn \times n matrix A\mathbf{A} and n×1n \times 1 vectors u\mathbf{u} and v\mathbf{v}, if A+uvT\mathbf{A} + \mathbf{u}\mathbf{v}^T is invertible, then

(A+uvT)1=A1A1uvTA11+vTA1u.(\mathbf{A} + \mathbf{u}\mathbf{v}^T)^{-1} = \mathbf{A}^{-1} - \frac{\mathbf{A}^{-1}\mathbf{u}\mathbf{v}^T\mathbf{A}^{-1}}{1 + \mathbf{v}^T\mathbf{A}^{-1}\mathbf{u}}.

It is important to note that the Sherman-Morrison formula is not a general formula for the inverse of a sum. It is a special formula that applies only when the sum is of a particular form.

Recall that we have talked about ill-conditioned matrices in the previous chapter. We know that when we perturb the constant vector b\mathbf{b} in the linear system Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}, the solution x\mathbf{x} will also be perturbed.

Therefore, we define the following:

Definition: A nonsingular matrix A\mathbf{A} is said to be ill-conditioned if a small perturbation in the matrix A\mathbf{A} results in a large change in the inverse of A\mathbf{A}. The degree of ill-conditioning of a matrix is measured by the condition number of the matrix. We denote the condition number of a matrix A\mathbf{A} by κ(A)\kappa(\mathbf{A}) and it is defined by

κ(A)=AA1.\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\|.

where \|\cdot\| is the matrix norm.

The matrix norm is a generalization of the vector norm. For a matrix A\mathbf{A}, the matrix norm is defined by

A=maxijaij=maximum absolute row sum.\|\mathbf{A}\| = \max_{i} \sum_j |a_ij| = \text{maximum absolute row sum}.

The condition number of a matrix is a measure of how well-conditioned or ill-conditioned the matrix is. The condition number of a matrix is a nonnegative number. The larger the condition number, the more ill-conditioned the matrix is.

3.6 LU Decomposition

We have now come full circle, and we are back to where the text began—solving a nonsingular system of linear equations using Gaussian elimination with back substitution. This time, however, the goal is to describe and understand the process in the context of matrices.

If Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} is a system of linear equations, then we can write it as Ax=LUx=b\mathbf{A}\mathbf{x} = \mathbf{L}\mathbf{U}\mathbf{x} = \mathbf{b}, where L\mathbf{L} is a lower triangular matrix and U\mathbf{U} is an upper triangular matrix. The process of decomposing a matrix A\mathbf{A} into the product of a lower triangular matrix L\mathbf{L} and an upper triangular matrix U\mathbf{U} is called the LU decomposition.

The LU decomposition is a fundamental concept in numerical linear algebra. It is used to solve systems of linear equations, compute the inverse of a matrix, and calculate the determinant of a matrix. The LU decomposition is also used in the Cholesky decomposition, which is used to solve systems of linear equations with symmetric positive definite matrices.

Theorem: Let A\mathbf{A} be an n×nn \times n matrix. Then A\mathbf{A} has an LU decomposition if and only if all leading principal minors of A\mathbf{A} are nonzero.

Algorithm for LU Decomposition:

  1. Start with the matrix A\mathbf{A}.
  2. Perform Gaussian elimination to obtain an upper triangular matrix U\mathbf{U}.
  3. The lower triangular matrix L\mathbf{L} is obtained by setting the elements below the diagonal of U\mathbf{U} to zero and setting the diagonal elements of L\mathbf{L} to one.
  4. The LU decomposition of A\mathbf{A} is given by A=LU\mathbf{A} = \mathbf{L}\mathbf{U}.
  5. The system of linear equations Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} can be solved by solving the two systems of linear equations Ly=b\mathbf{L}\mathbf{y} = \mathbf{b} and Ux=y\mathbf{U}\mathbf{x} = \mathbf{y}.
  6. The solution to the system of linear equations is given by x=U1L1b\mathbf{x} = \mathbf{U}^{-1}\mathbf{L}^{-1}\mathbf{b}.
  7. The inverse of the matrix A\mathbf{A} is given by A1=U1L1\mathbf{A}^{-1} = \mathbf{U}^{-1}\mathbf{L}^{-1}.

Example: Suppose we are having a 3×33\times 3 matrix

A=[22247761822].\mathbf{A} = \begin{bmatrix} 2 & 2 & 2 \\ 4 & 7 & 7 \\ 6 & 18 & 22 \end{bmatrix}.

We try to find the LU decomposition of A\mathbf{A}. We apply the Gaussian elimination to A\mathbf{A} and obtain

U=[222033004].\mathbf{U} = \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix}.

For the lower triangular matrix L\mathbf{L}, we apply the following:

[100l2110l31l321][222033004]=[22247761822].\begin{bmatrix} 1 & 0 & 0 \\ l_{21} & 1 & 0 \\ l_{31} & l_{32} & 1 \end{bmatrix} \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 2 & 2 \\ 4 & 7 & 7 \\ 6 & 18 & 22 \end{bmatrix}.

By solving the above equation, we obtain

L=[100210341].\mathbf{L} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 4 & 1 \end{bmatrix}.

Therefore, the LU decomposition of A\mathbf{A} is given by

A=[100210341][222033004].\mathbf{A} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 4 & 1 \end{bmatrix} \begin{bmatrix} 2 & 2 & 2 \\ 0 & 3 & 3 \\ 0 & 0 & 4 \end{bmatrix}.

LU Decomposition by using Python

The following code shows how to perform LU decomposition using Python. Here we use the scipy library to perform the LU decomposition.

import scipy as sp

A = sp.array([[2, 2, 2], [4, 7, 7], [6, 18, 22]])
L, U = sp.linalg.lu(A)

print(L)
print(U)