Linear Algebra

Linear algebra is the study of matrices and vectors.

Introduction

Notation

Vectors

A vector is a list of n numbers, usually written as a column vector.

unit vector is a vector of all 0’s, except entry , which has value 1:

This is also called a one-hot vector.

Matrices

A matrix with m rows and columns is a 3d array of number arranged as follows:

If , the matrix is said to be square.

Tensors

A tensor (in machine learning terminology) is just a generalization of a 2d array to more than 2 dimensions.

The number of dimensions is known as the order or rank of the tensor. We can reshape a matrix into a vector by stacking its columns on top of each other, as shown in Figure 7.1. This is denoted by

Vector spaces

Vector addition and scaling

We can view a vector as defining a point in n-dimensional Euclidean space. A vector space is a collection of such vectors, which can be added together, and scaled by scalars in order to create new points.

Linear independence, spans and basis sets

A set of vectors is said to be (linearly independent), if no vector can be represented as a linear combination of the remaining vectors. Conversely, a vector which can be represented as a linear combination of the remaining vectors is said to be (linearly) dependent.

The span of a set of vectors is the set of all vectors that can be expressed as a linear combination of That is,

Linear maps & matrices

A Linear map or linear transformation is any function such that and for all .

Range & Nullspace of a matrix

The range (sometimes also called the column space) of this matrix is the span of the columns of . In other words.

The nullspace of a matrix is the set of all vectors that get mapped to the null vector when multiplied by A.

Linear projection

The projection of a vector onto the span of (here we assume ) is the vector (), such that is as close as possible to , as measured by the Euclidean norm . We denote the projection as and can define it formally as

Norms of a vector and matrix

Measuring the size of a vector and matrix.

Vector norms

A norm of a vector is, informally, a measure of the length of the vector. More formally , a norm is any function that satisfies 4 properties :

  • For all , (non-negativity)
  • if and only if (definiteness)
  • For all , , (absolute value homogeneity)
  • For all , for (triangle inequality)
Matrix norms

Suppose we think of a matrix as defining a linear function . We define the induced norm of as the maximum amount by which can lengthen any unit-norm input:

Properties of a matrix

Trace of a square matrix

The trace of a square matrix denoted by , is the sum of diagonal elements in the matrix :

it has the following properties, where is a scalar, and are square matrices :

We also have the following important cyclic permutation property: For A, B, C such that ABC is square,

Determinant of a square matrix

The determinant of a square matrix, denoted det(A) or |A|, is a measure of how much it changes a unit volume when viewed as a linear transformation.

The determinant operator satisfies these properties, where

Rank of a matrix

The column rank of a matrix A is the dimension of the space spanned by its columns, and the row rank is the dimension of the space spanned by its rows. It is denoted by . Properties :

  • For , . If , then is said to be full rank, otherwise it is called rank deficient.
  • For , .
  • For , .
  • For ,
Condition numbers

The condition number of a matrix A is a measure of how numerically stable any computations involving A will be. It is defined as follows:

where is the norm of the matrix.

Special types of matrices

Diagonal matrix

A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted by with

The identity matrix, denoted by is a square matrix with ones on the diagonal and zeros everywhere else, . It has the property that for all ,

A block diagonal matrix is one which contains matrices on its main diagonal, an is 0 everywhere else.

A band-diagonal matrix only has non-zero entries along the diagonal, and k side of the diagonal, where is the bandwidth.

Triangular matrices

An upper triangular matrix only has non-zero entries on and above the diagonal. A triangular matrix only has non-zero entries on and below the diagonal. Triangular matrices have the useful property that the diagonal entries of A are the eigenvalues of A, and hence the determinant is the product of diagonal entries: .

Positive definite matrices
Orthogonal matrices

Two vectors are orthogonal if , A vector is normalized if . A set of vectors that is pairwise orthogonal and normalized is called orthonormal. A square matrix is orthogonal if all its columns are orthonormal.

Matrix multiplication

The product of two matrices and is the matrix

where

Note that in order for the matrix product to exist, the number of columns in must equal the number of rows in . Matrix multiplication generally takes time.

Properties
  • Associative :
  • Distributive :
  • Not commutative :

Vector-vector products

Given two vectors the quantity called the inner product, dot product or scalar product of the vectors, is real number given by

Note that it is always the case that .

Given vector , is called the outer product of the vectors. It is a matrix whose entries are given by

Matrix-vector products

Given a matrix and a vector m their product is a vector . If we write by rows and then we can express as follows

In other words, the th entry of is equal to the inner product of the th row of and , Alternatively

In other words, is the linear combination of the columns of , where the coefficients of the linear combination are given by the entries of . We can view the columns of A as a set of basis vectors defining a linear subspace.

Matrix-matrix products

Matrix multiplication is that , entry of is equal to the inner product of the th row of and the th column of .

Matrix inversion

The inverse of a square matrix is denoted by and is the unique matrix such that

exists if and only if . If , it is called a singular matrix. The following are properties of the inverse; all assume that are non-singular:

For a case of a matrix, the expression for is simple enough to give explicitly. We have

For a block matrix

Eigenvalue decomposition(EVD)

Basics

Given a square matrix , we say that is an eigenvalue of and is the corresponding eigenvector if

Intuitively, this definition means that multiplying by the vector results in a new vector that points in the same direction as but is scaled by a factor .

here is also an eigen vector. We can rewrite the equation above to state that is an eigenvalue-eigenvector pair of if

Properties of eigenvalue and eigenvectors
  • The trace of a matrix is equal to the sum of its eigenvalues
  • The determinant of is equal to the product of its eigenvalues
  • The rank of is equal to the number of non-zero eigenvalues of .
  • If is non-singular then is an eigenvalue of with associated eigenvector .
  • The eigenvalues of a diagonal or triangular matrix are just the diagonal entries.
Diagonalization

We can write all the eigenvector equations simultaneously as

where the columns of are the eigenvectors of and is a diagonal matrix whose entries are eigenvalues of

If the eigenvectors of are linearly independent, then the matrix will be invertible, so

A matrix that can be written in this form is called diagonalizable.

Eigenvalues and eigenvectors of symmetric matrices

When is real and symmetric, it can be shown that all the eigen values are real, and the eigenvectors are orthonormal, i.e, if and where are the eigenvectors. In matrix form this becomes ; hence we is an orthogonal matrix.

Thus multiplying by any symmetric matrix can be interpreted as multiplying by a rotation matrix , a scaling matrix , followed by an inverse rotation . This is corresponds to rotating, upscaling and then rotating back.

Geometry of quadratic forms

A quadratic form is a function that can be written as

where and is a positive definite, symmetric n-by-n matrix. Let be a diagonalization of . Hence we can write

where and .

Power method

Power method is used to compute the eigenvector corresponding to the largest eigenvalue of a real, symmetric matrix.

Let be a matrix with orthonormal eigenvectors and eigenvalues , so . Let be an arbitrary vector in the range of , so for some . Hence we can write as

for some constants . We can now repeatedly multiply by and renormalize :

Deflation

We can compute subsequent eigenvectors and values. Since the eigenvectors are orthonormal, and eigenvalues are real, we can project out the components from the matrix as follows :

This is called matrix deflation.

Singular value decomposition(SVD)

Any (real) matrix A can be decomposed as

where is an whose columns are orthonormal (so ), is an matrix whose row and columns are orthogonal (so ) and is an matrix containing the singular values on the main diagonal, with 0s filling the rest of the matrix. The columns of U are the left singular vectors and the columns of V are the right singular vectors. This is called the singular value decomposition or SVD of the matrix.

Connection between SVD and EVD
Pseudo inverse

The Moore-Penrose pseudo-inverse of , pseudo inverse denoted by , is defined as the unique matrix that satisfies the following 4 properties.

SVD and the range and nullspace of a matrix

where is the rank of . Thus any can be written as a linear combination of the left singular vectors , so the range of is given by

with dimension . To find a basis for the null space, let us now define a second vector that is a linear combination of right singular vectors for the zero singular values.

since the ‘s are orthonormal, we have

Hence the right singular vectors form an orthonormal basis for the null space:

with dimension . we see that

This is often written as

This is called the rank-nullity theorem.

Truncated SVD

Let be the SVD of and let , where we use the first columns of and . If there is not error introduced by this decomposition. But if , we incur some error. This is called a truncated SVD.

Other decompositions

LU factorization

We can factorize any square matrix into a product of a lower triangular matrix and an upper triangular matrix

In general we may need to permute the entries in the matrix before creating this decomposition. To see this, suppose . Since , this means either or or both must be zero, but that would imply or are singular. To avoid this, the first step of the algorithm can simply reorder the rows so that the first element is nonzero. This is repeated for subsequent steps. We can denote this process by

where is a permutation matrix, i.e., a square binary matrix where if row j gets permuted to row . This is called partial pivoting.

QR decomposition

Suppose we have representing a set of linearly independent basis vectors and we want to find a series of orthonormal vectors that span the successive subspaces of etc. In other words we want to find vectors and coefficients such that

We can write this

So we see spans the space of , and and span the space of In matrix notation we have where is a with orthonormal columns and is and upper triangular. This is called reduced QR or economy QR factorization of .

Cholesky decomposition

Any symmetric positive definite matrix can be factorized as , where is upper triangular with real, positive diagonal elements. (This can also be written as , where is lower triangular.) This is called a Cholesky factorization or matrix square root. In NumPy, this is implemented by np.linalg.cholesky. The computational complexity of this operation is , where is the number of variables, but can be less for sparse matrices.

Solving systems of linear equations

An important application of linear algebra is the study of systems of linear equations. For example consider the following set of 3 equations :

We can represent this in matrix-vector form as follows:

where

The solution is . In general if we have equations and unknowns, then will be a matrix, and will be a vector. If there is a single unique solution. If the system is underdetermined, so there is not a unique solution. If , the system is overdetermined.

Matrix calculus

Derivates

Consider a scalar-argument function . We define its derivate at a point to be the quantity

assuming the limit exists. This measures how quickly the output changes when we move a small distance in input space away from x. The use of symbol to denote the derivative is called Lagrange notation. The second derivate which measures how quickly the gradient is changing is denoted by and The nth derivate function is denoted . We can use Leibniz notation in which we denote the function by and its derivative by or . To denote the evaluation of the derivate at a point , we write .

Gradients

We can extend the notation of derivates to handle vector-argument functions, , by defining the partial derivative of with respect to to be

where is the ith unit vector. The gradient of a function at a point x is the vector of its partial derivatives:

To emphasize the point at which the gradient is evaluated, we can write

We see that the operator maps a function to another function . Since is a vector valued function, it is known as a vector field. By contrast, the derivation function is a scalar field.

Directional derivative

The directional derivative measures how much the function changes along a direction in space. It is defined as follows

Total derivative

Suppose that some of the arguments to the function depend on each other. Concretely, suppose the function has the form . We define the total derivative of with respect to as follows :

Total differential

Jacobian

Consider a function that maps a vector to another vector, , the Jacobian matrix of this function is an matrix of partial derivatives:

Hessian

For a function that is twice differentiable, we define the Hessian matrix as the matrix of second partial derivatives:

Gradients of commonly used functions.

Consider a differentiable function , Here are some useful identities from scalars calculus.

Chain rule of calculus

Function that map vectors to scalars
Function that map matrices to scalars

Consider a function which maps a matrix to a scalar.

Identities involving quadratic forms

Identities involving matrix trace

identities involving matrix determinants