For my students in Math 304, this note is only complementary to your
book,
any materials on the book is fair game for the exam. This note is not a replacement for the book.
Linear Systems & Matrices
Matrices and Vectors
A matrix is a rectangular array of numbers
arranged in
rows and
columns.
$$A = \begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \vdots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{bmatrix}$$
If the matrix has $m$ rows and $n$ columns, then the size of the matrix is $m \times
n$.
The set of all matrices with size $m \times n$ will be denoted by $\R^{m\times n}$.
A column vector or vector is a matrix
with only one column.
$$b = \begin{bmatrix}
b_{1} \\
b_{2} \\
\vdots \\
b_{m}
\end{bmatrix}$$
The set of all vectors with size $m$ will be denoted by $\R^m$.
Addition and Scalar Multiplication
Addition of two matrices of the same size is defined by adding the corresponding entries. Where
the $i,j$-th entry of the sum is given by
$$(A+B)_{ij} = A_{ij} + B_{ij}$$
Since vectors are matrices with only one column, the same definition applies to vectors.
Scalar Multiplication of a matrix by a scalar is defined by multiplying each entry by
the
scalar.
$$(cA)_{ij} = cA_{ij}$$
The same definition applies to vectors.
Matrix Multiplication
Matrix Multiplication of a matrix $A$ with size $m
\times n$ and a
matrix $B$ with size $n \times
p$
gives a matrix $C$ with size $m \times p$.
$$
\underbrace{A}_{{\color{green}m}\times \color{red}{n}} \underbrace{B}_{{\color{red}n} \times {\color{green}p}} =
\underbrace{C}_{ {\color{green}m} \times {\color{green}p}}
$$
And the $i,j$-th entry of $C$ is given by
$$C_{ij} = \sum_{k=1}^n A_{ik}B_{kj} = A_{i,:} \cdot B_{:,j}$$
Where $A_{i,:}$ is the $i$-th row of $A$ and $B_{:,j}$ is the $j$-th column of $B$.
A linear equation is an equation of the form
$$a_1x_1 + a_2x_2 + \cdots + a_nx_n = b$$
where $a_1, a_2, \cdots, a_n$ and $b$ are real numbers.
$x_1, x_2, \cdots, x_n$ are the variables of the equation, and $a_1, a_2, \cdots,
a_n$ are
the coefficients, and $b$ is the constant.
In terms of dot product, a linear equation can be written as
$$\vc{a} \cdot \vc{x} = b$$
where $\vc{a} = (a_1, a_2, \cdots, a_n)$ and $\vc{x} = (x_1, x_2, \cdots, x_n)$.
A solution to a linear equation in $n$ variables is a vector $\vc{x} = (x_1, x_2,
\cdots,
x_n)$ such
that the equation is
satisfied. That is, $\vc{a}\cdot \vc{x} =b$. The solution (set), to a linear equation in
$n$
variables
is a $(n-1)$ dimensional hyperplane.
The solution set to a linear equation in $2$ variables, $ax+by=c$, is a line in $\R^2$.
The solution set to a linear equation in $3$ variables, $ax+by+cz = d$, is a plane in $\R^3$.
Linear Systems
Given a linear system of $m$ equations, the solution (set) to the system is
geometrically
the intersection of the $m$ solution sets to the $m$ equations.
Algebraically, the solution set is usually
obtained
through a process of row operations applying to the augmented matrix of the system.
Row Operations are the following:
Interchange two rows.
Multiply a row by a nonzero scalar.
Add a multiple of one row to another row.
Two matrices are row equivalent if one can be obtained from the other by a sequence
of
row operations.
When solving a system of linear equations, there are three possible outcomes:
There is an unique solution.
There are infinitely many solutions.
The system is inconsistent.
In the notation of matrix multiplication, a system of linear equations is essentially a matrix equation.
$$\underbrace{\begin{array}{rr}
a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n & = & b_1 \\
a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n & = & b_2 \\
\vdots \vdots \vdots\\
a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n & = & b_m
\end{array}}_{\text{$m$ equations in $n$ variables}}\Longrightarrow Ax=b$$
where $A$ is the $m\times n$ matrix of coefficients, $x \in \R^n$ is the vector of variables and $b\in \R^m$ is
the
vector of constants.
In matrix notation, it is clear that a homogeneous system of linear equations, i.e. a
system of the form $Ax=\vc{0}$, always has a solution, the zero vector $\vc{0}$, since $A \vc{0} = \vc{0}$.
If $v$ is a solution to the homogeneous system $Ax=0$, and $w$ is a solution to the non-homogeneous system $Ax=b$,
then $v+w$ is another solution to $Ax=b$ since
$$A(v+w) = Av + Aw = 0+b=b$$
Given a matrix equation, it will be nice if solving the problem is as easy as
$$Ax=b \implies x = A^{-1}b$$
Matrix Inversion
A square matrix $A$ is invertible if there exist a matrix $M$ such
that
$M
A=I$. And $M$ is
called the inverse matrix of $A$, denoted by $A^{-1}$.
To find the inverse of a matrix, we can use the Gauss-Jordan Elimination method.
$$
[M|I] \quad \xrightarrow[\text{ Elimination}]{ \text{Gauss-Jordan}}\quad [I|M^{-1}]
$$
An elementary matrix is a matrix that can be obtained from the identity matrix by a
single elementary row operation.
Let $A$ be a square matrix with size $n\times n$, and partition $A$ into four blocks
$$A = \begin{bmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22}
\end{bmatrix}$$
where $A_{11}$ is a $k\times k$ matrix,
The Schur's complement of $A_{11},A_{22}$ in $A$ are the matrices
$$
A/A_{11}=A_{22}-A_{21}A_{11}^{-1}A_{12}
$$
$$
A/A_{22}=A_{11}-A_{12}A_{22}^{-1}A_{21}
$$
$$
A^{-1} = \begin{bmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22}
\end{bmatrix}^{-1} = \begin{bmatrix}
(A/A_{22})^{-1} & -(A/A_{22})^{-1}A_{12}A_{22}^{-1} \\
-(A/A_{11})^{-1}A_{21}A_{11} & (A/A_{11})^{-1}
\end{bmatrix}
$$
If $\Sigma=\begin{bmatrix}
\Sigma_{11} & \Sigma_{12} \\
\Sigma_{21} & \Sigma_{22}
\end{bmatrix}$ is the covariance matrix of a random vector $\vc{x}=\begin{bmatrix}
\vc{x}_1 \\
\vc{x}_2
\end{bmatrix}$, then the conditional covariance of $\vc{x}_1$ given $\vc{x}_2$ is
$$
\Sigma_{1|2} = \Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} = \Sigma / \Sigma_{22}
$$
Vector Spaces
Vector Spaces
Euclidean Spaces
Vector addition and scalar multiplication defined for $\R^n$, turns $\R^n$ into a vector
space, in the sense that $(\R^n,+,\cdot)$ satisfies the following properties:
For all $\mathbf{u},\mathbf{v},\mathbf{w}\in \R^n$ and $k,l\in \R$:
Existence of identities
$$\mathbf{v}+\mathbf{0}=\mathbf{v} $$ $$1\mathbf{v}=\mathbf{v} $$
Existence of additive inverse$$\exists \mathbf{-v}\in V:\quad
\mathbf{v}\oplus(-\mathbf{v})=\mathbf{0}$$
Distributivity$$k(\mathbf{u}+\mathbf{v})=(k \mathbf{u})+(
k
\mathbf{v})$$ $$(k+l) \mathbf{v}=(k \mathbf{v})+( l \mathbf{v})$$
$\R^n$ is the blueprint for all vector spaces. The above properties will be taken as
the definition for general vector
spaces.
Vector Spaces
Now we abstract the idea of an Euclidean space to a more general concept of a vector
space by taking properties of Euclidean spacones as the definition of vector spaces.
A set $V$ together with an addition $$ \oplus: V\times V \to V $$ and a scalar
multiplication $$ \otimes:\R \times V
\to V $$ defined on $ V $ is a
vector space if, for $\mathbf{u}, \mathbf{v},$ and $\mathbf{w}$ in $ V $, and $k$ and $l$ in $
\mathbb{R} $, the
following rules hold:
$(\mathbb{R},\oplus,\otimes)$ is a vector space, with
$$u\oplus v=u+v\qquad u,v\in \mathbb{R} $$$$ \lambda\otimes u=\lambda u\qquad \lambda\in \mathbb{R},v\in
\mathbb{R} $$
Denote the set of all polynomials of degree at most $ n $ on $ \R $
\[ \ms{P}_{n}:=\{a_nx^{n}+a_{n-1}x^{n-1}+\cdots+a_1x+a_0:a_i\in R \} \]
The set of all polynomials is denoted by $ \ms{P} $.
Both $ \ms{P}_{n} $ and $ \ms{P} $ are vector spaces, with operations
$$ f\oplus g:x\mapsto f(x)+g(x)\qquad f,g\in \ms{P} $$ $$ \lambda\otimes f:x\mapsto \lambda f(x) \qquad
\lambda\in \R,f\in \ms{P}
$$
Denote $\mathbb{R}^{S}:= \{f\mid f:S\to \R \}$, then
$(\mathbb{R}^{S},\oplus,\otimes)$ is a vector space, where $$ f\oplus g:x\mapsto f(x)+g(x)\qquad f,g\in
\mathbb{R}^{S} $$
$$ \lambda\otimes f:x\mapsto \lambda f(x)\qquad \lambda\in \mathbb{R},f\in \mathbb{R}^{S} $$
$(\R^{n\times m}, \oplus, \otimes)$
is a vector space, with
$$
\begin{aligned}
&(A \oplus B)_{i j}=A_{i j}+B_{i j} \quad A, B \in \R^{n\times m} \\
&(\lambda \otimes A)_{i j}=\lambda A_{i j} \quad \lambda \in \mathbb{R}, A \in \R^{n\times m}
\end{aligned}
$$
Direct Sums
Given vector spaces, it is possible to construct a new larger vector space.
Let $ \ms{ V }$ and $ \ms{ W }$ be vector spaces. The direct sum of $ \ms{ V
}$
and $
\ms{ W }$ is the vector space $$ \ms{ V } \bigoplus \ms{ W } = \{ (v,w) \mid v \in \ms{ V }, w \in \ms{ W
}
\}
$$
with
addition and scalar multiplication defined by $$ (v,w) \oplus (v',w') = (v \oplus_\ms{V} v', w \oplus_\ms{W}
w') $$ $$
\lambda \otimes (v,w) = (\lambda \otimes_\ms{V} v, \lambda \otimes_\ms{W} w) $$
Direct sums of vector spaces are vector spaces.
The Euclidean spaces are essentially direct sums of $R$
$$\R^n =\bigoplus_{i=1}^{n}\R $$
Spans & Subspaces
Subspaces
Let $ \ms{ V }$ be a vector space. Let $ \ms{ S} \subset \ms{ V} $ be a nonempty subset. If
$\mathscr{S}$ is
also a vector space
over $\R$ with the same operations, then $\mathscr{S}$ is said to be a subspace of
$\mathscr{V}
.$
Instead of checking all the properties of a vector space,
it is sufficient to check that the subset is closed under the operations A nonempty subset
$\mathcal{S}$ of a vector space $\mathcal{V}$ is a subspace of $\mathcal{V}$
iff
Any line passing through the origin is a subspace of $\R^2$.
Any line or plane passing through the origin is a subspace of $\R^3$. In fact, these are almost all
the
subspaces
of $\R^3$, with just $2$ exceptions.
$$ \ms{P}_{n}\subspace \ms{P} \subspace \R^{\R}$$
Denote the subset of continuous functions
$$
C(\R)=\{f\in \R^{\R} : f\text{ is continuous}\}$$
$$ C(\R) \subspace \R^{\R} $$
Denote the subset of differentiable functions
$$C^1(\R)=\{f\in \R^{\R} : f\text{ is differentiable}\}$$
$$C^1(\R)\subspace C(\R) \subspace \R^{\R} $$
The transpose $ A^{T} $ of a $n \times m$ matrix $A$ is the $m \times n$ matrix \[
(A^{T})_{i
j}=A_{j i} \]
A square matrix, $H$, is symmetric if $H=H^T$. The subset of all Symmetric Matrices is a
subspace of $\mathcal{M}(n,n)$.
Linear Combination and Span
A linear combination of vectors $\ms{B}= \{v_1,\cdots,v_m\}
$
in
$ \ms{V} $, is an object of the form \[ x=a_1 v_1+\cdots+a_n v_m,\qquad a_i\in \R \]
The set of all
linear
combinations of $ \ms{B} $ is called the span of $ \ms{B} $, denoted by $
\op{Span}(\ms{B}) $. If $ \ms{V}=\op{Span}(\ms{B}) $, we say $ \ms{B} $ spans $
\ms{ V}
$ or $\ms{B}$ is a spanning set for $\ms{V}$.
In $ \R^{2} $, let $$ e_1=\begin{pmatrix}
1\\0
\end{pmatrix},e_2=\begin{pmatrix}
0\\1
\end{pmatrix} $$ Every vector in $ \R^{2} $ is a linear combination of $ \{e_1,e_2\} $.
The Span of any subset of vectors is a subspace
Let $V$ be vector space , and $S\subset V$
\[ \text{span}\left(S\right) \underset{subspace}{\subset} \R^n\]
\[ ex^3+3x^2+x+\pi \] is in the span of \[ S=\{x^5,x^4,x^3,x^2,x,1\} \]
Check whether a vector is in the span or whether a subset spans a space, it boils down to solving a
non-homogenous linear system
$$ \begin{pmatrix}
a\\b\\c
\end{pmatrix} \stackrel{?}{\in} \op{Span}(\{\begin{pmatrix}
1\\0\\0
\end{pmatrix},\begin{pmatrix}
1\\1\\-1
\end{pmatrix},\begin{pmatrix}
-2\\3\\1
\end{pmatrix} \})\qquad \Longrightarrow\qquad \text{ ? $ \exists x,y,z\in \R $} : \begin{pmatrix}
a\\b\\c
\end{pmatrix}=x\begin{pmatrix}
1\\0\\0
\end{pmatrix}+y\begin{pmatrix}
1\\1\\-1
\end{pmatrix}+z\begin{pmatrix}
-2\\3\\1
\end{pmatrix}\qquad \Longrightarrow\qquad \begin{array}{rr}
x+y-2 z= & a \\
y+3 z= & b \\
-y-z= & c
\end{array}$$
When checking for spanning, it may often be easier to try simplifying the system by using
column operations first, since the span of the columns of a matrix doesn't change under column operations.
Linear Independence & Basis
Linear Independence
Let $ \ms{V} $ be a vector space.
$
\ms{B} \subset \ms{V} $ is linearly independent if no vector in $ \ms{B} $
is a
linear
combination of the
other
vectors in
$
\ms{B} $.
$ \ms{B}=\{v_1,\cdots,v_m \}\subset\ms{V} $ is linearly independent iff\[ a_1v_1+\cdots+a_nv_m=0\implies
a_i=0 \forall i \]
Check whether a subset is linearly independent it boils down to solving
a homogenous linear system
\[
\{\begin{pmatrix}
1\\0\\0
\end{pmatrix},\begin{pmatrix}
1\\1\\-1
\end{pmatrix},\begin{pmatrix}
-2\\3\\1
\end{pmatrix} \}\quad\text{Linearly Independent ? } \qquad \Longrightarrow\qquad \text{? $ x=y=z=0 $ the unique
solution to } \begin{pmatrix}
0\\0\\0
\end{pmatrix}= x\begin{pmatrix}
1\\0\\0
\end{pmatrix}+y\begin{pmatrix}
1\\1\\-1
\end{pmatrix}+z\begin{pmatrix}
-2\\3\\1
\end{pmatrix} \text{?} \qquad \Longrightarrow\qquad \begin{array}{rr}
x+y-2 z= & 0 \\
y+3 z= & 0 \\
-y-z= & 0
\end{array}\]
If there is only one way to get the zero vector from a linear combination of the
vectors, then each vector contains some data that's not in the others.
If otherwise, suppose there exist $x,y,z \ne 0$
$$x\vc{v}+y\vc{u}+z\vc{w} = 0 \implies \vc{w} =\dfrac{1}{z} (-x\vc{v}-y\vc{u}) $$
$\vc{w}$ is redundant in the sense that it can be expressed as a linear combination of the other vectors.
Basis
Let $ \ms{V} $ be a vector space.
$
\ms{B} \subset \ms{V} $ is a basis of $V$, if $ \mathscr{ B} $ is linearly
independent
and spans $V$.
The number of elements in a basis
is
called the dimension of the space $ V
$.
Let $ \ms{ B}_1, \ms{ B}_2 $ be two different basis of a vector space $ \ms{V} $. We must have \[
|\ms{B}_1|=|
\ms{B}_2| \]
Every vector space has a basis.
Any linearly independent subset can be extended to a basis.
Any spanning set contains a basis.
For the space $\R^{n} $, consider \[ \ms{B}=\{e_1,e_2,\cdots,e_n \} \]
Where $ e_k $ be the column vector with the k-th entry $ 1 $ and all other entries $ 0 $.
$
\ms{B} $
is is called
the
standard basis of $ \R^{n} $.
\[ \{x^5,x^4,x^3,x^2,x,1 \} \] is the standard basis for $ \ms{P}_5 $. What is the dimension of $
\ms{P}_5 $?
$$\{
\begin{bmatrix}
1&0\\
0&0
\end{bmatrix}, \begin{bmatrix}
0&1\\
0&0
\end{bmatrix}
,
\begin{bmatrix}
0&0\\
1&0
\end{bmatrix}
,
\begin{bmatrix}
0&0\\
0&1
\end{bmatrix}\}
$$ is the standard basis for $\R^{2\times 2}$.
Consider
$$Z = \{ p(x)\in \mathcal{P}_3 : p(7) = 0 \}\subspace \mathcal{P}_3$$
The condition $p(7)=0 $ implies, by the Fundamental Theorem of Algebra, that $$(x-7)$$ is a factor for any
vector $p(x)$ in $Z$, i.e.
$$ p(x)=c(x-7)(x-a)^i(x-b)^j,\quad i,j\in \{0,1\},c\in \R \qquad \forall p(x)\in Z $$
A basis for $Z$ will be
$$\{x-7, x^2-7x,x^3-7x^2 \}$$
Let $ E $ be a subset of a vector space $\ms{V} $.
$$\op{Span}(E)\subspace \ms{ V} $$
If $ E $ is linearly independent, then $ E $ is a basis for $\op{Span}(E) $.
To check whether a set of vectors is a basis for $\R^n$, is essentially to check if it is linearly independent
and
spans $\R^n$.
Internal Direct Sums
Given a vector space of dimension $n\ge 2$, it is possible to decompose it into small pieces.
For a vector space $W$ and subspaces $U$ and $V$ of $W$.
$W$ is said to be the (internal) direct sum of $U$ and $V$ if
$$\forall w\in W, \exists !\ ( u\in U,v\in V ): w=u+v$$
and we write $$W=U\bigoplus V$$
Let $\ms{E}\subset \R^{\R} $ be the subset of all even functions, and let $\ms{O}\subset \R^{\R} $ be the
set
of
all odd
functions. Then \[\R^{\R}=\ms{E}\bigoplus \ms{O} \]
Since for each $f\in \R^\R$
$$f(x) = \underbrace{\dfrac{f(x)+f(-x)}{2}}_{\ms{E}} + \underbrace{\dfrac{f(x)-f(-x)}{2}}_{\ms{O}}$$
Let $$ Q_i = \{ cx^i : c\in \R \}$$
then $$
\ms{P}_n = \bigoplus_{i=0}^{n} Q_i
$$
Let $\mc{A}= \{ v_1, \cdots , v_n \}$ be a basis for vector space $V$. Then
$$
V = \span\{v_1,\cdots,v_m \} \bigoplus \span\{v_{m+1},\cdots,v_n \}
$$
for any $m\in \{1,\cdots,n-1\}$.
Linear Maps
Definitions and Examples
Linear transformations
For any mathematical construction, it is important to study structure-preserving maps. For Euclidean
spaces, the structure-preserving map is the linear map.
Let $V$ and $U$ be vector spaces over $R$. A map $$F: V \rightarrow U$$ is called a linear
map or
linear transformation if
For any vectors $v, w \in V$, $F(v+w)=F(v)+F(w)$
For any scalar $k \in R$ and vector $v \in V$, $F(k v)=k F(v)$.
A linear map from a vector space $V $ to $ \R $ is called a linear functional.
Matrix-vector multiplication is Linear Given a matrix $M$ of size
$n\times m$,
the map \[ x\mapsto Mx\qquad x\in \R^{m} \]is a linear map $ \R^{m}\to
\R^{n} $.
Projection is linear
\[ F: \mathbb{R}^{3} \rightarrow \mathbb{R}^{3} \]
\[ F(x, y, z)=(x, y, 0) \]
Translation is not Linear \[ G:\R^{2}\to \R^{2} \]\[ G(x,
y)=(x+1, y+2) \]
Differentiation is linear \[ \op{D}:\mathcal{P}\to \mathcal{ P} \]
\[ \op{D}f(x)=f'(x) \]
Definite Integrals are linear functionals. $$\mathbf{J}: \ms{P} \rightarrow
\mathbb{R}$$
\[
\mathbf{J}(f(t))=\int_{0}^{1} f(t) d t
\]
The trace of a sqaure matrix $ M $ is defined as\[ \op{tr}(M)=\sum_{i=1}^{n}M_{ii}
\]Consider
it as
a map between vector spaces\[ \op{tr}: \R^{n\times n} \to \R \]
is a linear functional.
The technical jargon for a structure-preserving map is homomorphism.
You can change the matrix $M, v$ to see how the linear map $M$ acts on the standard basis vectors and
$v$.
$M$
$v$
Isomorphisms
Let $F: V \rightarrow W$ be a linear map
$F$ is one-to-one if and only if
$$F(v)=F(u) \implies v=u$$
$F$ is onto if and only if
$$\forall w \in W, \exists v\in V : F(v)=w $$
$F$ is an isomorphism if it is both one-to-one and onto.
Consider the differential operator $$D: \mathcal{P}_n \to \mathcal{P}_{n-1}$$
\[ Df(x)=f'(x) \]
For any $g \in \mathcal{P}_{n-1}$,
let $G \in \mathcal{P}_n$ be an anti-derivative of $g$, then
$D(G)=g$.
This shows that the map is onto.
The map is not one-to-one,
since $$D(x+3) = D(x+4) = 1$$
but $x+3 \ne x+4$.
The embedding
$E: \R^{n} \rightarrow \R^{n+1}$
$$E\begin{pmatrix}
v_1 &
v_2 &
\cdots &
v_n
\end{pmatrix}= \begin{pmatrix}
v_1 &
v_2 &
\cdots &
v_n &
0
\end{pmatrix}$$
$E$ is not onto since nothing gets map to $\begin{pmatrix}
1 & 1 & \cdots & 1 & 1\end{pmatrix}$.
Consider the transpose operator $$T: \R^{n\times m} \to \R^{m\times n}$$
\[ T(M)=M^T \]
The map is onto, since for any $W \in \R^{m\times n}$, we have $W^T\in \R^{n\times m} $,
and $$T(W^T)=W$$
The map is also one-to-one, if $T(M) = T(W)$, by linearity
$$(M-W)^T=T(M-W) =T(M)-T(W) = \vc{0} $$
But the only matrix with transpose having all zeros is the zero matrix, thus $M=W$.
It follows that $T$ is an isomorphism.
Let $\mathcal{B}=\left\{\beta_1, \cdots, \beta_n\right\}$ be a basis of vector space $V$. Let $v \in V$, then
$v$
can be expressed uniquely as
$$v=c_1
\beta_1+\cdots+c_n \beta_n$$
Define a linear map $I_\mathcal{B}: V \rightarrow \R^{n}$ by
$$I_\mathcal{B}(v)=\left(\begin{array}{c}
c_1 \\
\vdots \\
c_n
\end{array}\right)$$
Then $I_\mathcal{B}$ is a linear map from $V$ to $\R^{n}$ that is both one-to-one and onto. Thus $I_\mathcal{B}$
is an
isomorphism
from
$V$ to $\R^{n}$.
$$
v_{\mathcal{B}}:=\left(\begin{array}{c}
c_1 \\
\vdots \\
c_n
\end{array}\right)
$$
is the column representation or coordinate vector of $v$ with respect
to the basis $\mathcal{B}$.
Matrix Representation
Matrix Representation
Map between Euclidean Spaces
For any $L:\R^n\to\R^m$ is a linear map, there exists a matrix $M_L$ such that
\[ L(\mathbf{u})=M_L\mathbf{u} \]
Take the simple case of a linear map $L:\R^2\to\R^2$,
suppose $L$ is represented by the matrix $M = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22}
\end{bmatrix}$.
How do we find the matrix? I.e.
what are the entrie $m_{11}$,
$m_{12}$,
$m_{21}$, $m_{22}$?
Consider applying $L$ to the basis vectors $\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and
$\mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$. Then,
\[ L(\mathbf{e}_1) = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} 1 \\ 0
\end{bmatrix} = \begin{bmatrix} m_{11} \\ m_{21} \end{bmatrix}
\qquad \text{and} \qquad L(\mathbf{e}_2) = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix}
\begin{bmatrix} 0 \\ 1
\end{bmatrix} =
\begin{bmatrix} m_{12} \\ m_{22} \end{bmatrix}
\]
If for example $L$ is the linear map swaps the coordinates,
$$ L(\begin{bmatrix}x_1\\x_2\end{bmatrix}) = \begin{bmatrix}x_2\\x_1\end{bmatrix} $$
then
\[ L(\mathbf{e}_1) = \begin{bmatrix} m_{11} \\ m_{21}
\end{bmatrix} = {\color{green}\begin{bmatrix} 0 \\ 1 \end{bmatrix}}
\qquad \text{and} \qquad L(\mathbf{e}_2) = \begin{bmatrix} m_{12} \\ m_{22} \end{bmatrix}=
{\color{red}\begin{bmatrix} 1 \\ 0 \end{bmatrix}}
\]
So the matrix is
\[ M = \begin{bmatrix} {\color{green}0} & {\color{red}1} \\ {\color{green}1} & {\color{red}0} \end{bmatrix} \]
This process can be generalized to any linear map $L:\R^n\to\R^m$, see the proof below.
Given any linear map $$L:\R^n\to\R^m$$ there exists a unique matrix $M$ such that $L(x)=Mx$ for all
$x\in\R^n$.
And $M$ has columns given by the image of the standard basis $L(e_i)$
$$
M=\begin{bmatrix} L(\mathbf{e}_1) & \cdots & L(\mathbf{e}_n) \end{bmatrix}
$$
Take any vector $v\in \R^n$,
$$
v=\begin{pmatrix}
v_1\\
\vdots\\
v_n
\end{pmatrix}= \sum_{i=1}^n v_i \mathbf{e}_i
$$
If we apply $L$ to $v$, by linearity we have
$$
L(v)=\sum_{i=1}^n v_i L(\mathbf{e}_i) = \begin{bmatrix}
L(\mathbf{e}_1) & \cdots & L(\mathbf{e}_n) \end{bmatrix} \begin{bmatrix}
v_1\\
\vdots\\
v_n
\end{bmatrix} = Mv
$$
Where $M$ is the matrix with i-th column $L(\mathbf{e}_i)$.
Map between General Spaces
Suppose $V$ and $U$ are vector spaces with basis
$$
A=\left\{v_1, v_2, \ldots, v_n\right\} \quad \text { and } \quad B=\left\{w_1, w_2,
\ldots,
w_m\right\}
$$
Denote the column representation of vectors in $V, W$ with respective to $A, B$ by
$$
v_A \quad w_{B}
$$
If $$F: V \rightarrow W$$ is a linear mapping,
the matrix representation of $F$ with respect to the basis $A$ and
$B$ is given by
$$
F_{A, B}:=\left[\begin{array}{llll}
F\left(v_1\right)_{B} & F\left(v_2\right)_{B} & \cdots &
F\left(v_m\right)_{B}
\end{array}\right] \in \R^{m \times n}
$$
For any vector $v\in V$, we have
$$ F(v)_{B}=F_{A,B} v_{A} $$
$$F(v) = I_B^{-1}( F_{A,B} v_{A})$$
As a special case, if $V=W$ and $A,B$ are two basis of $V$, then $id_{A,B}$ is the change
of
bases matrix from $A$ to $B$.
Let $S$ be the canonical basis of $V$, $A,B$ be two other bases of $V$,
then the change of basis matrix can also be obtained (very inefficiently from)
$$
id_{A,B} = [(B_1)_S \cdots (B_n)_S]^{-1} [(A_1)_S \cdots (A_n)_S]
$$
Composition and Matrix Multiplication
Suppose we have two linear maps $$f: \R^n \to \R^m,\qquad g: \R^m \to \R^p$$ Represented by matrices
$$M_f,\qquad
M_g$$
Then the composition $$g \circ f : \R^n \to \R^p$$ is represented by the matrix $M_g
M_f$, since
$$ (g \circ f)(x) = g(f(x)) = M_g f(x) = M_g M_f x $$
to summarize,
$$ g \circ f : x\mapsto M_g M_f x $$
The same is true for general linear maps
Let $\mathcal{B}_U, \mathcal{B}_V, \mathcal{B}_W$ be bases of vector spaces $U, V, W$, respectively.
Let $F:$
$U
\rightarrow V$ and $G: V \rightarrow W$ be linear maps. Then
$$
(G \circ F)_{\mathcal{B}_U, \mathcal{B}_W}=G_{\mathcal{B}_V, \mathcal{B}_W}F_{\mathcal{B}_U,
\mathcal{B}_V}
$$
Image & Kernel
Let $F: V \rightarrow W$ be a linear map between vector spaces $V, W$
The image or range of $F$ is the set
$$
\im F:=\{F(v): v \in V\}
$$
The rank of $F$ is the dimension of $\im F$.
The kernel or null space of $F$ is the set
$$
\ker F=\{v \in V: F(v)=\mathbf{0}\}
$$
The nullity of $F$ is the dimension of $\ker F$.
If $ c_1,\cdots,c_m $ are columns, and $ r_1,\cdots,r_n $ are rows
of some $ n\times m$ matrix $M$,
The span of the columns $ c_1,\cdots,c_m $ is called the column space of the matrix,
$$ \op{col}(M) = \op{Span}(\{c_1,\cdots,c_m \})$$
the column rank of $M$ is the dimension of $\op{col}(M)$.
The span of the rows $ r_1,\cdots,r_n $ is called the row space of the matrix,
$$ \op{row}(M)= \op{col}(M^T) = \op{Span}(\{r^T_1,\cdots,r^T_n \})$$
the row rank of $M$ is the dimension of $\op{row}(M)$.
The row rank and column rank of a matrix are equal.
The result follows from the following oberservations:
If $\{v_1,\cdots, v_n\}$ is linearly independent, then $\dim \span \{v_1,\cdots, v_n\}=n$
If $E$ is an invertible matrix, then $\dim \span \{E v_1,\cdots, E v_n\}=n$
Row operations are just multiplication on the left by an elementary matrix.
And elementary matrices are invertible.
It follows that
$$
\dim \operatorname{col} EM = \dim \operatorname{col} M
$$
Thus row operations doesn't change the column rank.
Certainly row operations don't change the row rank.
Convert the matrix to row echelon form via row operations. The number of pivots gives both the row
rank and the column rank, thus they are equal.
The null space of $M$ is the set of all vectors $ \mathbf{v} $ such that $
M\mathbf{v} =
0 $. It is denoted by $\op{null}(M)$.
The nullity of $M$ is the dimension of $\op{null}(M)$.
All of these are subspaces of the corresponding parent vector spaces.
Let $F: V \rightarrow W$ be a linear map between vector spaces $V, W$
$$\im F \subspace W$$
$$\ker F \subspace V$$
If $M$ is an $n\times m$ matrix, then
$$\op{col}(M) \subspace \R^n $$
$$ \op{null}(M) \subspace \R^m $$
$$\op{row}(M) = \op{col}(M^T) \subspace \R^m$$
If $$L: \R^{n} \rightarrow \R^{n}$$ is a linear map, and $M_L$ is the matrix representation of $L$, then
$$\im{L} = \op{col}(M_L)$$
$$\ker{L} = \op{null}(M_L)$$
A vector space isomorphism is essentially a linear map with trial kernel and full image.
$F$ is one-to-one iff $\ker F=\{\mathbf{0}\}$.
$F$ is onto iff $\im F=U$.
For one-to-one :
Suppose $$F(v)=F(u) \implies v=u$$
if $\ker{F}\ne \{0\}$, i.e. there exist $g\ne \vc{0} \in \ker{F}$ such that $F(g)=\vc{0}$
then for any $u$
$$F(u+g)= F(u)+F(g)=F(u)$$
but $u+v\ne u$ since $v\ne \vc{0}$, a contradiction.
On the other hand, if $\ker{F}=\{0\}$ then
$$F(v-u)=F(v)-F(u)=0 \implies v-u\in \ker{F} \implies v-u=\vc{0}\implies v=u$$
The proof for onto is easy, I leave it as an exercise.
Let $V$ be a vector space and $F$ a linear map on $V$ to some vector space $U$.
$$
\operatorname{dim} V=\operatorname{dim}(\ker F)+\operatorname{dim}(\operatorname{Im} F)
$$
Let $F: V \rightarrow W$ be a linear map, and $V,W$ are of the same dimension.
Then $F$ is an isomorphism.
Statistics of Linear Maps
Determinant
Definition and Properties
Let $S_n$ be the set of all order $n$ permutations.
The
determinant of a square
matrix is defined by \[
\det{A}=\sum_{\sigma \in S_{n}}(\operatorname{sgn} \sigma) a_{1 \sigma(1)} a_{2 \sigma(2)} \cdots a_{n
\sigma(n)}
\]
If $M$ is triangular, then $$\det{M} = \prod_{i=1}^n M_{i i}$$
Laplace Expansion
A more efficient way is to use the Laplace
Expansion.
Consider $A=\left[a_{i j}\right] \in \R^{n\times n}$. Let $M_{i j}$ denote the submatrix
of $A$ obtained by
deleting its $i$ th row and $j$ th column. The determinant
$$
\det M_{i j}
$$
is called the minor of the element $a_{i j}$, and the cofactor
of $a_{i j}$,
denoted by $C_{i j}$, is
defined
by:
$$
C_{i j}=(-1)^{i+j}\det M_{i j}
$$
Then the determinant of $A$ is given by
$$
\det A=\sum_{j=1}^{n} a_{i j} C_{i j}
$$
the above formula is independent of $i$, where the laplace expansion is performed along the ith row.
One can also perform the laplace expansion along the $j$-th column, and the result is the same.
Usually the convention is to perform the expansion across the row or column with the most zeros.
Geometries of Determinant
A shear matrix is an elementary matrix of the form
$$
E_{C_i+kC_j\rightarrow C_i}
$$
Let $M$ be a matrix with columns $C_1,\ldots,C_n$.
The parallelepiped spanned by the columns of $M$ are essentially
$$
P(M) = \{\lambda_1 C_1+\lambda_2 v_2+\cdots+\lambda_n C_n: \lambda_i\in [0,1]\}
$$
Let $E$ be a shear matrix.
Then
$$ \operatorname{vol} P(ME) = \operatorname{vol} P(M)$$
If $E = E_{C_i+kC_j\rightarrow C_i}$, then
$$
(ME)_{:,i} = C_i + k C_j
$$
$$
\begin{aligned}
P(ME) &= \{\lambda_1 C_1+\lambda_2 v_2+\cdots+ \lambda_i(C_i + k C_j) +\cdots+\lambda_n C_n: \lambda_i\in
[0,1]\}\\
&= \{\lambda_1 C_1+\lambda_2 v_2+\cdots+ \lambda_i C_i +\cdots+ (\lambda_ik+\lambda_j) C_j +\cdots+\lambda_n
C_n: \lambda_i\in
[0,1]\}\\
\end{aligned}
$$
Which is just a translation of all the points in $P(M)$ by $\lambda_ik C_j$ (this translation is not uniform,
it dependes on $\lambda_i$, hence the name shearing).
The determinant of a square matrix $M\in \R^{n\times n}$
$$|\det M |= \operatorname{vol} M$$
is the signed volume of the parallelepiped spanned by the columns of $M$.
The result is easy to see when $M$ is diagonal.
The result is also "obvious" when the columns of $M$ is not linearly independent.
In this case, the determinant is zero, and the volume of the parallelepiped is zero.
Since the span of the columns of $M$ is just a "thin hyperplane".
Assume the columns of $M$ is linearly independent.
We can use a series of
shearing
to transform $M$ into triangular form.
If $\det M = 0$, then the columns of $M$ are coplanar.
Adjoin & Inverse
The adjoint, $\adj{M}$, of a matrix $M$ is
$$
\adj{M} = C^T
$$
Where $C$ is the matrix of cofactors of $M$.
If $\det M \ne 0$, then
$$M^{-1} = \frac{1}{\det M}\adj{M}$$
$$ \det{M^{-1}}=(\det{M})^{-1}$$
If $\det M = 0$, then $M$ is singular (not invertible).
Eigen Values
Definition and Computations
Let $ F:V\to V $ be a linear map on some vector space $ V $.
A scalar $\lambda$ is called an
eigenvalue of
$F$ if there exists a nonzero vector $v$ such that
\[
F( v)=\lambda v
\]
Any vector satisfying this relation is called an eigenvector of $F$ associated
with the
eigenvalue $\lambda
.$
The set of all eigenvalues of $F$ is called the spectrum of $F$, denoted by
$$
\lambda(F)
$$
The subspace of all eigenvectors of $F$ associated with $\lambda$
$$ E_F(\lambda) = \{v\in V: F(v)=\lambda v\}$$
is called the eigenspace associated with $\lambda$.
Abusing the notation a bit, we will also use $F$ as the matrix representation of $F$.
\[
F( v)=\lambda v \implies (F-\lambda) v=0 \implies \det(F-\lambda I)=0
\]
The polynomial
$$\chi_F(\lambda)=\det(F-\lambda I)$$
is called the characteristic polynomial of $F$.
The eigenvalues of $F$ are given by the roots of $\chi_F(\lambda)$.
If $$A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$ then its characteristic polynomial is
$$\chi_A(\lambda) =
\det(A - \lambda I) = (\lambda - 1)(\lambda - 4) - 6 = \lambda^2 - 5\lambda - 2$$ The roots of this polynomial
are $$\lambda_1 = \frac{5 + \sqrt{33}}{2},\qquad \lambda_2 = \frac{5 - \sqrt{33}}{2}$$ which are the
eigenvalues
of $A$. Then any vectors in $$\ker(A - \lambda_i I),\qquad \lambda_i\in\{1,2\}$$ is an eigenvector for
$\lambda_i$.
Geometry of eigen value and vectors
Let $\chi_F(\lambda)$ be the characteristic polynomial of a linear map $F$.
The algebraic multiplicity of $\lambda$ is the degree of $\lambda$ in $\chi_F$.
The geometric multiplicity of $\lambda$ is the dimension of $E_F(\lambda)$.
An eigenvalue is defective if its algebraic multiplicity is greater than its geometric
multiplicity.
The matrix
$$M=\begin{bmatrix}
3 & 1 \\
0 & 3
\end{bmatrix}$$
has an eigenvalue $\lambda = 3$ with algebraic multiplicity $2$ and geometric multiplicity $1$.
And the matrix is invertible since $0$ is not an eigenvalue.
If $F:V\to W$ has no defective eigenvalues, then
$$V = \bigoplus_{\lambda\in \lambda(F)} E_F(\lambda)$$
Diagonalization
Two matrices $A$ and $B$ are said to be similar if there exists a matrix $P$ such
that
$$P^{-1}AP=B$$
A matrix $A$ is said to be diagonalizable if $A$ is similar to a diagonal matrix.
A square matrix $A$ is diagonalizable if and only if $A$ has no defective eigenvalues.
Equivalently, if and only if $A$ has $n$ linearly independent eigenvectors.
If $M\in \R^{n\times n}$ has $n$ linearly independent eigenvectors
$$
E = \{v_1,\cdots, v_n \}
$$
with corresponding eigenvalues
$$
\lambda_1,\cdots, \lambda_n
$$
then $E$ is a basis for $\R^n$,
Let $S$ be the standard basis of $\R^n$
Let $P$ be the matrix whose columns are the eigenvectors of $M$.
Note that $P$ is just a change of basis matrix $$P= id_{E,S}$$
Let $D$ be the diagonal matrix with the eigenvalues of $M$ on the diagonal.
Note that $D$ is just the matrix representation of $M$ with respect to the basis $E$.
$$D = M_{E,E}$$
Then $$
D = M_{E,E} = P^{-1}MP \implies M = PDP^{-1}
$$
Which is obviouse from the following commutative diagram.
Singular Values
Eigenvalues are only defined for square matrices. For rectangular
matrices, we can use the singular values.
Let $A$ be a matrix. The singular values of $A$ are the square roots of the eigenvalues of
$A^TA$.
Singular Value Decomposition
The Singular Value Decomposition of a matrix $A$ of size $m\times n$ is given by
$$
A = U\Sigma V^T
$$
where
$U$ is an $m\times m$ orthogonal matrix
$V$ is an $n\times n$ orthogonal matrix
$\Sigma$ is an $m\times n$ diagonal matrix with the singular values of $A$ on the diagonal.
Suppose $X$ is is a rectangular matrix of size $m\times n$.
We want to find a vector $w$ such that the projection of $m$ data points $x_1,\cdots,x_m$ onto $w$ that retains
the
most variance / information. This is equivalent to finding the vector $w$ such that
$$w= \arg\max_{w\in \S^{n-1} } \underbrace{\sum_{i=1}^{m} (x_i\cdot w)^2}_{\text{retained
variance}}=\arg\max_{w\in \S^{n-1} } w^T X^TX w$$
Let $X$ be a matrix of size $m\times n$, with covariance matrix $C = X^TX$.
The Principal Value Decomposition of $X$ is given by
$$
T = XW
$$
where $W$ is the matrix whose columns are the eigenvectors of $C$.
Inner Product Spaces
Inner Product Spaces
A real vector space $\mathscr{V}$ is said to be an inner-product space if
for each pair of elements $x$ and $y$ in $\mathscr{V}$, there is a number $\langle x, y\rangle$, called the
inner product of $x$ and $y$, such that
$$\langle x+y, z\rangle=\langle x, z\rangle+\langle y, z\rangle$$
$$\langle\alpha x, y\rangle=\alpha\langle x, y\rangle$$
for all $x, y \in \mathscr{V}$ and $\alpha \in
\mathbb{R}$.
$$\langle x, x\rangle \geq 0$$
$$\langle x, x\rangle=0 \iff x=0$$
The dot product\[\Inn{u}{v}:=u\cdot v= u^{T}v=u_1v_1+u_2v_2+\cdots+u_nv_n
\] turns $\R^n$
into
an
inner-product space.
$\ms{P}$ with the following inner product is an inner product space
\[
\langle f, g\rangle=\int_{0}^{1} f(t) g(t) d t
\]
$ \R^{m\times n} $ is an inner product space with the following inner product
\[ \Inn{A}{B}=\operatorname{tr}(B^{T}A) \]
Geometries of Inner Product
Length and Angle
We can extend the definition of norm and angle in $\R^n$ in the natural
way
The norm of an element $v$ of an inner-product space is defined
to be
$$
\|v\|=\sqrt{\langle v, v\rangle}
$$
The angle between two vectors $ {v} $, $ {w} $ in $ \ms{V} $ is defined as
\[\measuredangle( v, w)
= \arccos\frac{\Inn{v}{w}}{\Nor{ v}\,\Nor{ w}} \]Two vectors $ v,w\in \ms{V} $ are
perpendicular
iff \[ \Inn{v}{w}=0 \]$ v,w \in \ms{V}$ are parallel if \[ v=\lambda w\qquad
\text{ for some
}\lambda\in \R
\]
Cauchy-Schwarz$$
|\langle x, y\rangle| \leq\|x\|\|y\| \quad \text { for all } x, y \in \mathscr{V}
$$
Triangle Inequality$$
\|x+y\| \leq\|x\|+\|y\| \quad \text { for all } x, y \in \mathscr{V}
$$
Parallelogram Law$$
\|x+y\|^{2}+\|x-y\|^{2}=2\|x\|^{2}+2\|y\|^{2} \quad \text { for all } x, y \in \mathscr{V}
$$
Law of Cosine \[
\Nor{x-y}^{2}=\Nor{x}^{2}+\Nor{y}^{2}-2\Nor{x}\Nor{y}\cos(\measuredangle(x,y)) \]
Projections and Complements
The orthogonal complement of a subset $\mathscr{S}$ of $V$ is
the set $\mathscr{S}^{\perp}$, such that
$$x \in \mathscr{S}^{\perp} \iff \langle x, y\rangle=0,\quad \forall y \in
\mathscr{S}$$
Let $S$ be a subset of a vector space $V$. Then $S^{\perp}$ is a subspace of $V$.
Let $A\in \R^{n\times m}$,
$$
\ker A = (\im A^T)^{\perp}
$$
Let $S$ be a subset of a vector space $V$, consider the maps
$$P_S:=x\mapsto \hat{x}, \quad I-P_S$$
where $\hat{x}$ the point in $S$ that minimizes $\|x-\hat{x} \|$, then
$P_S: V\to S$ is surjective.
$I-P_S: V\to S^\perp$ is surjective.
$P_S$ is called the projection onto $S$.
If $u$ is a unit vector in $V$, then
$$P_{\span{u}}x = \underbrace{\inn{u}{x}}_{comp_u(x)} u = uu^T x$$
If $u$ is not a unit vector, then
$$P_{\span{u}}x = u(u^Tu)^{-1}u^T x = \dfrac{\inn{u}{x}}{\inn{u}{u}}u$$
for proof and explaination, please see my
calculus notes.
If $\mathbf{x}_{i} \in \mathbb{R}^{n}, i=1, \ldots, m$, and
$M=\operatorname{Span}\left\{\mathbf{x}_{1}, \ldots, \mathbf{x}_{m}\right\}$ then if $XX' $ is invertible
$$P_M \mathbf{x}=X\left(X^{\prime} X\right)^{-1} X^{\prime} \mathbf{x}$$ where $X$ is the matrix with the
i-th
column of $X$ being $\mathbf{x}_{i}$.
Orthogonormal Decomposition
Orthogonormal
Consider a set $S=\left\{u_{1}, u_{2}, \ldots, u_{r}\right\}$ of nonzero vectors in an inner product space $V$.
$S$
is orthogonal if each pair of vectors in $S$ are orthogonal,$$\left\langle u_{i},
u_{j}\right\rangle=0 \qquad i \neq j$$ and $S$ is called orthonormal if $S$ is
orthogonal and
each
vector
in $S$ has unit length. That is,
\[\Inn{u_i}{u_j}= \begin{cases}
0\qquad&i\ne j\\
1&i=j
\end{cases} \]
Consider the following subset of $ C[-\pi,\pi] $\[S= \{1, \cos t, \cos 2 t, \cos 3 t, \ldots, \sin t, \sin
2
t, \sin 3 t, \ldots\} \]
$ S $ is orthogonal, but not orthonormal.
\[ E=\left\{e_{1}, e_{2}, e_{3}\right\}=\{\begin{pmatrix}
1\\0\\0
\end{pmatrix},\begin{pmatrix}
0\\1\\0
\end{pmatrix},\begin{pmatrix}
0\\0\\1
\end{pmatrix}\} \] is an orthonormal basis for $\R^3$.
The following subset of $L^2[-\pi,\pi]$ is orthonormal,$$\{e_n:=\exp(inx):n\in \mb{Z} \}$$
Suppose $U = \left\{u_{1}, u_{2}, \ldots, u_{r}\right\}$ is an orthogonal set of nonzero vectors. Then
$U$ is
linearly independent.
If in addition, $U$ is orthonormal, then let $M=\op{Span}(U)\subset \ms{V}$ $$
P_{M} x=\sum_{i=1}^{k}\left\langle x, u_{i}\right\rangle u_{i} \quad \text { for all } x \in
\mathscr{V} $$
Gram-Schmidt Orthogonalization
Given ${v_1,\cdots,v_n}$ be a basis of $V$
Let $$u_1=v_1$$
For $i=2,\cdots,n$, let $$u_i=v_i-\sum_{j=1}^{i-1}\frac{\Inn{v_i}{u_j}}{\Inn{u_j}{u_j}}u_j$$
$$\{u_1,\cdots,u_n\}$$ is an orthogonal basis of $V$.
Spectral Theorem
A matrix $U\in \R^{n\times n}$ is called orthogonal if $$U^TU=I$$
A matrix $M$ is orthogonally diagonalizable if
$$\exists \text{ orthogonal } U\ : \qquad U^TMU=D$$
where $D$ is a diagonal matrix.
If $A\in \R^{n\times n}$ is orthogonal, then
$$\{A_{:,i},\cdots,A_{:,n}\}\qquad \text{is a basis of }\R^n$$
$$A^{-1}=A^T$$
$$\|Ax\|=\|x\|\qquad \forall x\in \R^n$$
$$\Inn{Ax}{Ay}=\Inn{x}{y}\qquad \forall x,y\in \R^n$$
$A\in \R^{n\times n}$ is orthogonally diagonalizable if and only if $A$ is symmetric.
QR Decomposition
Let $A\in \R^{m\times n}$ be invertible. Then there exists a decomposition
$$A=QR$$
where $Q\in \R^{m\times m}$ is orthogonal and $R\in \R^{m\times n}$ is upper triangular.
Apply Gram-Schmidt to the columns of $A$ to get $Q$.
Thenz
$$A=QT$$
where $$T=Q^TA$$ is upper triangular.
Functional Analysis
A linear algebra class typically studies finite dimensional vector spaces.
The study of infinite dimensional vector spaces is called functional analysis.
We have already encountered a few examples of infinite dimensional vector spaces.
$\ms{P}$, the space of all polynomials.
$\R^\R$, the space of all real valued functions on $\R$.
$C[-\pi,\pi]$, the space of all continuous functions on $[-\pi,\pi]$.
$L^2[-\pi,\pi]$, the space of all square integrable functions on $[-\pi,\pi]$.
Banach Spaces
Normed Spaces
A norm on a vector space $V$ is a function $\norm{\cdot}$ such that
$\norm{x}\ge 0$ for all $x\in V$.
$\norm{x}=0$ if and only if $x=0$.
$\norm{\alpha x}=\left|\alpha\right|\norm{x}$ for all $\alpha\in \R$ and $x\in V$.
$\norm{x+y}\le \norm{x}+\norm{y}$ for all $x,y\in V$.
$(V, \norm{\cdot})$ is called a normed space.
If $V$ is an inner product space, then $$\norm{x}:=\sqrt{\langle x,x\rangle}$$ is a norm.
Thus every inner product space is a normed space.
Not every normed space is an inner product space.
Consider $C[0,1]$ with the norm $$\norm{x}:=\sup_{t\in [0,1]}\left|x(t)\right|$$
and $$f(x)=x,\ g(x)=x^2 \in C[0,1]$$
The parallelogram law does not hold for these functions
$$
\begin{aligned}
\|f+g\|^2+\|f-g\|^2 &= 4 + 1/16
\\ &\ne 2+2 =2\|f\|^2+2\|g\|^2
\end{aligned}$$
Banach Spaces
A sequence $\left\{x_{n}\right\}\subset \ms{V}$ is a Cauchy sequence if
$$\forall \varepsilon > 0, \exists N\in \N^+ : \left\|x_{n}-x_{m}\right\|<\varepsilon,\qquad \forall m, n>N$$
A Banach space is a normed space such that every Cauchy sequence has a limit.
A Banach Space with norm induced by an inner product is called a Hilbert Space.
The set of all real valued random variables with finite second moment is a Hilbert space.
For details see notes.
Continuous Linear Operators
A linear operator $T:V\to W$ is continuous iff
$$\sup_{\norm{v}=1}\norm{T(v)}< \infty$$
$$\norm{T} = \sup_{\norm{v}=1}\norm{Tv}$$
Defines a norm on $$
\ms{L}(V,W) = \left\{T: V\to W \mid A \text{ is linear and continuous} \right\}
$$
Linear Differential Operatior
Green's Functions
Let $L$ be a linear differential operator on a space of functions $f$.
A Green's function for $L$ is a function $G(x,t)$ such that
$$LG(x,t)=\delta(x-t)$$
where $\delta(x-t)$ is the Dirac delta function.
If $G(x,t)$ is a Green's function for $$LG(x,t)=\delta(x-t)$$
then
$$u(x)=\int_{a}^{b}G(x,t)f(t)dt$$
is the solution to the inhomogeneous equation
$$Lu(x)=f(x)$$
$$\int_{a}^{b}LG(x,t)f(t)dt=\int_{a}^{b}\delta(x-t)f(t)dt=f(x)$$
$$L\underbrace{\int_{a}^{b}G(x,t)f(t)dt}_{u(x)}=f(x)$$
Appendix
Symmetric Groups
Permutations
A permutation is a bijection
\[ \sigma: [n] \to [n] \]
The set of all, $n!$ permutations of $[n]=\{1,2,\cdots,n\}$ is denoted by $S_n$.
A cycle is a permutation such that
\[ \forall a\in [n], \forall b\in [n], \exists i\in \N : \sigma^n(a)=b \]
We will be using the cycle notation $(x_1,\cdots,x_n)$ to denote the cycle $$\sigma(x_i)=x_{i+1}\quad i\ne
n,\qquad \sigma(x_n)=x_1$$
A transposition is a cycle of length 2.
The identity permutation
$$\sigma = [1,2,3,4,5,6,7]$$
or equivalently
$$\sigma(i)=i$$
The cyclic permutation
$$\sigma = [2,3,4,5,1]$$
or equivalently in cycle notation
$$\sigma=(1,2,3,4,5)$$
A permutation with 2 cycles
$$\sigma=(1,2,3,4)(5,6,7)$$
or equivalently in list notation
$$\sigma = [2,3,4,1,6,7,5]$$
A permutation with a fixed point and transpositions
$$\sigma=(1,3)(4,5)(6,7)(2)$$
or equivalently in list notation
$$\sigma = [3,2,1,5,4,7,6]$$
Define the composition (product) of two permutations $\sigma$ and $\tau$ as
\[
\sigma\tau(i)=\sigma(\tau(i))
\]
If $\sigma$ can be written as a product of $m$ transpositions, then the
sign of a permutation $\sigma$ is defined as
\[
\operatorname{sgn}(\sigma)= (-1)^m
\]
Every permutation $\sigma$ can be written as a product of disjoint cycles.
Every cycle can be written as a product of transpositions.
Every permutation $\sigma$ can be written as a product of transpositions.
The permutation $$\sigma = [2,3,4,1,6,7,5] = (1,2,3,4)(5,6,7) = \color{green}(1,4)(1,3)(1,2)
\color{blue}(5,6)(6,7)$$
is an odd permutation.
Symmetric Groups
A group is a set $G$ with a binary operation $\cdot :G\to G$ such that
$\exists e \in G$ such that $$e\cdot x=x\cdot e=x\qquad \forall x\in G$$
$\forall x\in G$, $\exists x^{-1}\in G$ such that $$x\cdot x^{-1}=x^{-1}\cdot x=e$$
A group homomorphism is a function $f:G\to H$ such that
$$f(e)=e$$
$$f(x\cdot y)=f(x)\cdot f(y)$$
A group isomorphism is a bijective group homomorphism.
$S_n$ with the composition operation is a group , called the symmetric group $S_n$.
Every group is isomorphic to a subgroup of $S_n$.
Every polynomial $p(x)$ of degree $n$ with complex coefficients has $n$ complex roots.
$$ \exists \alpha_1,\dots,\alpha_n\in\mathbb{C} :
p(x)=\prod_{i=1}^n(x-\alpha_i)$$
Matrix Calculus
Limits
Gelfand's formula: Given any matrix norm $\|\cdot\|$,
$$
\rho(A):= \max\left(\{|\lambda_i| : \lambda_i \text{ an eigenvalue of }A \} \right)
=\lim _{k=\infty}\left\|A^k\right\|^{1 / k}
$$