Sous-section One-by-one matrices
Let us motivate what we want to achieve with matrices. Real-valued linear mappings of the real line, linear functions that eat numbers and spit out numbers, are just multiplications by a number. Consider a mapping defined by multiplying by a number. Let’s call this number \(\alpha\text{.}\) The mapping then takes \(x\) to \(\alpha x\text{.}\) What we can do is to add such mappings. If we have another mapping \(\beta\text{,}\) then
\begin{equation*}
\alpha x + \beta x = (\alpha + \beta) x\text{.}
\end{equation*}
We get a new mapping \(\alpha+\beta\) that multiplies \(x\) by, well, \(\alpha+\beta\text{.}\) If \(D\) is a mapping that doubles things, \(Dx = 2x\text{,}\) and \(T\) is a mapping that triples, \(Tx = 3x\text{,}\) then \(D+T\) is a mapping that multiplies by \(5\text{,}\) \((D+T)x = 5x\text{.}\)
Similarly we can compose such mappings, that is, we could apply one and then the other. We take \(x\text{,}\) we run it through the first mapping \(\alpha\) to get \(\alpha\) times \(x\text{,}\) then we run \(\alpha x\) through the second mapping \(\beta\text{.}\) In other words,
\begin{equation*}
\beta ( \alpha x ) = (\beta \alpha) x\text{.}
\end{equation*}
We just multiply those two numbers. Using our doubling and tripling mappings, if we double and then triple, that is \(T(Dx)\) then we obtain \(3(2x) = 6x\text{.}\) The composition \(TD\) is the mapping that multiplies by \(6\text{.}\) For larger matrices, composition also ends up being a kind of multiplication.
Sous-section Matrix addition and scalar multiplication
The mappings that multiply numbers by numbers are just \(1 \times 1\) matrices. The number \(\alpha\) above could be written as a matrix \([\alpha]\text{.}\) So perhaps we would want to do the same things to all matrices that we did to those \(1 \times 1\) matrices at the start of this section above. First, let us add matrices. If we have a matrix \(A\) and a matrix \(B\) that are of the same size, say \(m \times n\text{,}\) then they are mappings from \({\mathbb{R}}^n\) to \({\mathbb{R}}^m\text{.}\) The mapping \(A+B\) should also be a mapping from \({\mathbb{R}}^n\) to \({\mathbb{R}}^m\text{,}\) and it should do the following to vectors:
\begin{equation*}
(A+B) \vec{x} = A\vec{x} + B \vec{x}\text{.}
\end{equation*}
It turns out you just add the matrices element-wise: If the \(ij^{\text{ th } }\) entry of \(A\) is \(a_{ij}\text{,}\) and the \(ij^{\text{ th } }\) entry of \(B\) is \(b_{ij}\text{,}\) then the \(ij^{\text{ th } }\) entry of \(A+B\) is \(a_{ij} + b_{ij}\text{.}\) If
\begin{equation*}
A = \begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix} \qquad \text{ and } \qquad B = \begin{bmatrix} b_{11} \amp b_{12} \amp b_{13} \\ b_{21} \amp b_{22} \amp b_{23} \end{bmatrix}\text{,}
\end{equation*}
then
\begin{equation*}
A+B = \begin{bmatrix} a_{11} + b_{11} \amp a_{12} + b_{12} \amp a_{13} + b_{13} \\ a_{21} + b_{21} \amp a_{22} + b_{22} \amp a_{23} + b_{23} \end{bmatrix}\text{.}
\end{equation*}
Let us illustrate on a more concrete example:
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6 \end{bmatrix} + \begin{bmatrix} 7 \amp 8 \\ 9 \amp 10 \\ 11 \amp -1 \end{bmatrix} = \begin{bmatrix} 1+7 \amp 2+8 \\ 3+9 \amp 4+10 \\ 5+11 \amp 6-1 \end{bmatrix} = \begin{bmatrix} 8 \amp 10 \\ 12 \amp 14 \\ 16 \amp 5 \end{bmatrix}\text{.}
\end{equation*}
Let’s check that this does the right thing to a vector. Let’s use some of the vector algebra that we already know, and regroup things:
\begin{equation*}
\begin{split} \begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \end{bmatrix} + \begin{bmatrix} 7 \amp 8 \\ 9 \amp 10 \\ 11 \amp -1 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \end{bmatrix} \amp = \left( 2 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} - \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix} \right) + \left( 2 \begin{bmatrix} 7 \\ 9 \\ 11 \end{bmatrix} - \begin{bmatrix} 8 \\ 10 \\ -1 \end{bmatrix} \right) \\ \amp = 2 \left( \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + \begin{bmatrix} 7 \\ 9 \\ 11 \end{bmatrix} \right) - \left( \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix} + \begin{bmatrix} 8 \\ 10 \\ -1 \end{bmatrix} \right) \\ \amp = 2 \begin{bmatrix} 1+7 \\ 3+9 \\ 5+11 \end{bmatrix} - \begin{bmatrix} 2+8 \\ 4+10 \\ 6-1 \end{bmatrix} = 2 \begin{bmatrix} 8 \\ 12 \\ 16 \end{bmatrix} - \begin{bmatrix} 10 \\ 14 \\ 5 \end{bmatrix} \\ \amp = \begin{bmatrix} 8 \amp 10 \\ 12 \amp 14 \\ 16 \amp 5 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \end{bmatrix} \left( = \begin{bmatrix} 2(8)- 10 \\ 2(12) - 14 \\ 2(16) - 5 \end{bmatrix} = \begin{bmatrix} 6 \\ 10 \\ 27 \end{bmatrix} \right) . \end{split}
\end{equation*}
If we replaced the numbers by letters that would constitute a proof! You’ll notice that we didn’t really have to even compute what the result is to convince ourselves that the two expressions were equal.
If the sizes of the matrices do not match, then addition is not defined. If \(A\) is \(3 \times 2\) and \(B\) is \(2 \times 5\text{,}\) then we cannot add these matrices. We don’t know what that could possibly mean.
It is also useful to have a matrix that when added to any other matrix does nothing. This is the zero matrix, the matrix of all zeros:
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \end{bmatrix} + \begin{bmatrix} 0 \amp 0 \\ 0 \amp 0 \end{bmatrix} = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \end{bmatrix}\text{.}
\end{equation*}
We often denote the zero matrix by \(0\) without specifying size. We would then just write \(A + 0\text{,}\) where we just assume that \(0\) is the zero matrix of the same size as \(A\text{.}\)
There are really two things we can multiply matrices by. We can multiply matrices by scalars or we can multiply by other matrices. Let us first consider multiplication by scalars. For a matrix \(A\) and a scalar \(\alpha\) we want \(\alpha A\) to be the matrix that accomplishes
\begin{equation*}
(\alpha A) \vec{x} = \alpha (A \vec{x})\text{.}
\end{equation*}
That is just scaling the result by \(\alpha\text{.}\) If you think about it, scaling every term in \(A\) by \(\alpha\) accomplishes just that: If
\begin{equation*}
A = \begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix}, \qquad\text{ then } \qquad \alpha A = \begin{bmatrix} \alpha a_{11} \amp \alpha a_{12} \amp \alpha a_{13} \\ \alpha a_{21} \amp \alpha a_{22} \amp \alpha a_{23} \end{bmatrix}\text{.}
\end{equation*}
For example,
\begin{equation*}
2 \begin{bmatrix} 1 \amp 2 \amp 3 \\ 4 \amp 5 \amp 6 \end{bmatrix} = \begin{bmatrix} 2 \amp 4 \amp 6 \\ 8 \amp 10 \amp 12 \end{bmatrix}\text{.}
\end{equation*}
Let us list some properties of matrix addition and scalar multiplication. Denote by \(0\) the zero matrix, by \(\alpha\text{,}\) \(\beta\) scalars, and by \(A\text{,}\) \(B\text{,}\) \(C\) matrices. Then:
\begin{align*}
A + 0 \amp = A = 0 + A ,\\
A + B \amp = B + A ,\\
(A + B) + C \amp = A + (B + C) ,\\
\alpha(A+B) \amp = \alpha A+\alpha B,\\
(\alpha+\beta)A \amp = \alpha A + \beta A\text{.}
\end{align*}
These rules should look very familiar.
Sous-section Matrix multiplication
As we mentioned above, composition of linear mappings is also a multiplication of matrices. Suppose \(A\) is an \(m \times n\) matrix, that is, \(A\) takes \({\mathbb R}^n\) to \({\mathbb R}^m\text{,}\) and \(B\) is an \(n \times p\) matrix, that is, \(B\) takes \({\mathbb R}^p\) to \({\mathbb R}^n\text{.}\) The composition \(AB\) should work as follows
\begin{equation*}
AB\vec{x} = A(B\vec{x})\text{.}
\end{equation*}
First, a vector \(\vec{x}\) in \({\mathbb R}^p\) gets taken to the vector \(B\vec{x}\) in \({\mathbb R}^n\text{.}\) Then the mapping \(A\) takes it to the vector \(A(B\vec{x})\) in \({\mathbb R}^m\text{.}\) In other words, the composition \(AB\) should be an \(m \times p\) matrix. In terms of sizes we should have
\begin{equation*}
\text{ `` } [ m \times n ] \, [ n \times p ] = [ m \times p ] . \text{ '' }
\end{equation*}
Notice how the middle size must match.
OK , now we know what sizes of matrices we should be able to multiply, and what the product should be. Let us see how to actually compute matrix multiplication. We start with the so-called dot product (or inner product ) of two vectors. Usually this is a row vector multiplied with a column vector of the same size. Dot product multiplies each pair of entries from the first and the second vector and sums these products. The result is a single number. For example,
\begin{equation*}
\begin{bmatrix} a_1 \amp a_2 \amp a_3 \end{bmatrix} \cdot \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} = a_1 b_1 + a_2 b_2 + a_3 b_3\text{.}
\end{equation*}
And similarly for larger (or smaller) vectors. A dot product is really a product of two matrices: a \(1 \times n\) matrix and an \(n \times 1\) matrix resulting in a \(1 \times 1\) matrix, that is, a number.
Armed with the dot product we define the product of matrices . First let us denote by \(\operatorname{row}_i(A)\) the \(i^{\text{ th } }\) row of \(A\) and by \(\operatorname{column}_j(A)\) the \(j^{\text{ th } }\) column of \(A\text{.}\) For an \(m \times n\) matrix \(A\) and an \(n \times p\) matrix \(B\) we can compute the product \(AB\text{.}\) The matrix \(AB\) is an \(m \times p\) matrix whose \(ij^{\text{ th } }\) entry is the dot product
\begin{equation*}
\operatorname{row}_i(A) \cdot \operatorname{column}_j(B)\text{.}
\end{equation*}
For example, given a \(2 \times 3\) and a \(3 \times 2\) matrix we should end up with a \(2 \times 2\) matrix:
\begin{equation}
\begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix} \begin{bmatrix} b_{11} \amp b_{12} \\ b_{21} \amp b_{22} \\ b_{31} \amp b_{32} \end{bmatrix} = \begin{bmatrix} a_{11} b_{11} + a_{12} b_{21} + a_{13} b_{31} \amp \amp a_{11} b_{12} + a_{12} b_{22} + a_{13} b_{32} \\ a_{21} b_{11} + a_{22} b_{21} + a_{23} b_{31} \amp \amp a_{21} b_{12} + a_{22} b_{22} + a_{23} b_{32} \end{bmatrix}\text{,}\tag{6.2.1}
\end{equation}
or with some numbers:
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \amp 3 \\ 4 \amp 5 \amp 6 \end{bmatrix} \begin{bmatrix} -1 \amp 2 \\ -7 \amp 0 \\ 1 \amp -1 \end{bmatrix} = \begin{bmatrix} 1\cdot (-1) + 2\cdot (-7) + 3 \cdot 1 \amp \amp 1\cdot 2 + 2\cdot 0 + 3 \cdot (-1) \\ 4\cdot (-1) + 5\cdot (-7) + 6 \cdot 1 \amp \amp 4\cdot 2 + 5\cdot 0 + 6 \cdot (-1) \end{bmatrix} = \begin{bmatrix} -12 \amp -1 \\ -33 \amp 2 \end{bmatrix}\text{.}
\end{equation*}
A useful consequence of the definition is that the evaluation \(A \vec{x}\) for a matrix \(A\) and a (column) vector \(\vec{x}\) is also matrix multiplication. That is really why we think of vectors as column vectors, or \(n \times 1\) matrices. For example,
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \cdot 2 + 2 \cdot (-1) \\ 3 \cdot 2 + 4 \cdot (-1) \end{bmatrix} = \begin{bmatrix} 0 \\ 2 \end{bmatrix}\text{.}
\end{equation*}
If you look at the last section, that is precisely the last example we gave.
You should stare at the computation of multiplication of matrices \(AB\) and the previous definition of \(A\vec{y}\) as a mapping for a moment. What we are doing with matrix multiplication is applying the mapping \(A\) to the columns of \(B\text{.}\) This is usually written as follows. Suppose we write the \(n \times p\) matrix \(B = [ \vec{b}_1 ~ \vec{b}_2 ~ \cdots ~ \vec{b}_p ]\text{,}\) where \(\vec{b}_1, \vec{b}_2, \ldots, \vec{b}_p\) are the columns of \(B\text{.}\) Then for an \(m \times n\) matrix \(A\text{,}\)
\begin{equation*}
AB = A [ \vec{b}_1 ~ \vec{b}_2 ~ \cdots ~ \vec{b}_p ] = [ A\vec{b}_1 ~ A\vec{b}_2 ~ \cdots ~ A\vec{b}_p ]\text{.}
\end{equation*}
The columns of the
\(m \times p\) matrix
\(AB\) are the vectors
\(A\vec{b}_1, A\vec{b}_2, \ldots, A\vec{b}_p\text{.}\) For example, in
(6.2.1), the columns of
\begin{equation*}
\begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix} \begin{bmatrix} b_{11} \amp b_{12} \\ b_{21} \amp b_{22} \\ b_{31} \amp b_{32} \end{bmatrix}
\end{equation*}
are
\begin{equation*}
\begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix} \begin{bmatrix} b_{11} \\ b_{21} \\ b_{31} \end{bmatrix} \qquad \text{ and } \qquad \begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \end{bmatrix} \begin{bmatrix} b_{12} \\ b_{22} \\ b_{32} \end{bmatrix}\text{.}
\end{equation*}
This is a very useful way to understand what matrix multiplication is. It should also make it easier to remember how to perform matrix multiplication.
Sous-section Some rules of matrix algebra
For multiplication we want an analogue of a 1. That is, we desire a matrix that just leaves everything as it found it. This analogue is the so-called identity matrix . The identity matrix is a square matrix with 1s on the main diagonal and zeros everywhere else. It is usually denoted by \(I\text{.}\) For each size we have a different identity matrix and so sometimes we may denote the size as a subscript. For example, the \(I_3\) would be the \(3 \times 3\) identity matrix
\begin{equation*}
I = I_3 = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \end{bmatrix}\text{.}
\end{equation*}
Let us see how the matrix works on a smaller example,
\begin{equation*}
\begin{bmatrix} a_{11} \amp a_{12} \\ a_{21} \amp a_{22} \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} = \begin{bmatrix} a_{11} \cdot 1 + a_{12} \cdot 0 \amp \amp a_{11} \cdot 0 + a_{12} \cdot 1 \\ a_{21} \cdot 1 + a_{22} \cdot 0 \amp \amp a_{21} \cdot 0 + a_{22} \cdot 1 \end{bmatrix} = \begin{bmatrix} a_{11} \amp a_{12} \\ a_{21} \amp a_{22} \end{bmatrix}\text{.}
\end{equation*}
Multiplication by the identity from the left looks similar, and also does not touch anything.
We have the following rules for matrix multiplication. Suppose that \(A\text{,}\) \(B\text{,}\) \(C\) are matrices of the correct sizes so that the following make sense. Let \(\alpha\) denote a scalar (number). Then
\begin{align*}
A(BC) \amp = (AB)C \amp \amp \text{(\myindex{associative law})} ,\\
A(B+C) \amp = AB + AC \amp \amp \text{(\myindex{distributive law})} ,\\
(B+C)A \amp = BA + CA \amp \amp \text{ (distributive law) } ,\\
\alpha(AB) \amp = (\alpha A)B = A(\alpha B) , \amp \amp\\
IA \amp = A = AI \amp \amp \text{ (identity) } \text{.}
\end{align*}
Exemple 6.2.1.
Let us demonstrate a couple of these rules. For example, the associative law:
\begin{equation*}
\underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \biggl( \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B \underbrace{ \begin{bmatrix} -1 \amp 4 \\ 5 \amp 2 \end{bmatrix} }_C \biggr) = \underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \underbrace{ \begin{bmatrix} 16 \amp 24 \\ -16 \amp -2 \end{bmatrix} }_{BC} = \underbrace{ \begin{bmatrix} -96 \amp -78 \\ 64 \amp 52 \end{bmatrix} }_{A(BC)}\text{,}
\end{equation*}
and
\begin{equation*}
\biggl( \underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B \biggr) \underbrace{ \begin{bmatrix} -1 \amp 4 \\ 5 \amp 2 \end{bmatrix} }_C = \underbrace{ \begin{bmatrix} -9 \amp -21 \\ 6 \amp 14 \end{bmatrix} }_{AB} \underbrace{ \begin{bmatrix} -1 \amp 4 \\ 5 \amp 2 \end{bmatrix} }_C = \underbrace{ \begin{bmatrix} -96 \amp -78 \\ 64 \amp 52 \end{bmatrix} }_{(AB)C}\text{.}
\end{equation*}
Or how about multiplication by scalars:
\begin{equation*}
10 \biggl( \underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B \biggr) = 10 \underbrace{ \begin{bmatrix} -9 \amp -21 \\ 6 \amp 14 \end{bmatrix} }_{A B} = \underbrace{ \begin{bmatrix} -90 \amp -210 \\ 60 \amp 140 \end{bmatrix} }_{10 (AB)}\text{,}
\end{equation*}
\begin{equation*}
\biggl( 10 \underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \biggr) \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B = \underbrace{ \begin{bmatrix} -30 \amp 30 \\ 20 \amp -20 \end{bmatrix} }_{10 A} \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B = \underbrace{ \begin{bmatrix} -90 \amp -210 \\ 60 \amp 140 \end{bmatrix} }_{(10 A)B}\text{,}
\end{equation*}
and
\begin{equation*}
\underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_A \biggl( 10 \underbrace{ \begin{bmatrix} 4 \amp 4 \\ 1 \amp -3 \end{bmatrix} }_B \biggr) = \underbrace{ \begin{bmatrix} -3 \amp 3 \\ 2 \amp -2 \end{bmatrix} }_{A} \underbrace{ \begin{bmatrix} 40 \amp 40 \\ 10 \amp -30 \end{bmatrix} }_{10B} = \underbrace{ \begin{bmatrix} -90 \amp -210 \\ 60 \amp 140 \end{bmatrix} }_{A(10B)}\text{.}
\end{equation*}
A multiplication rule you have used since primary school on numbers is quite conspicuously missing for matrices. That is, matrix multiplication is not commutative. Firstly, just because \(AB\) makes sense, it may be that \(BA\) is not even defined. For example, if \(A\) is \(2 \times 3\text{,}\) and \(B\) is \(3 \times 4\text{,}\) the we can multiply \(AB\) but not \(BA\text{.}\)
Even if \(AB\) and \(BA\) are both defined, does not mean that they are equal. For example, take \(A = \left[ \begin{matrix}1 \amp 1 \\ 1 \amp 1 \end{matrix} \right]\) and \(B = \left[ \begin{matrix}1 \amp 0 \\ 0 \amp 2 \end{matrix} \right]\text{:}\)
\begin{equation*}
AB = \begin{bmatrix} 1 \amp 1 \\ 1 \amp 1 \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp 2 \end{bmatrix} = \begin{bmatrix} 1 \amp 2 \\ 1 \amp 2 \end{bmatrix} \qquad \not= \qquad \begin{bmatrix} 1 \amp 1 \\ 2 \amp 2 \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 2 \end{bmatrix} \begin{bmatrix} 1 \amp 1 \\ 1 \amp 1 \end{bmatrix} = BA\text{.}
\end{equation*}
Sous-section Inverse
A couple of other algebra rules you know for numbers do not quite work on matrices:
\(AB = AC\) does not necessarily imply \(B=C\text{,}\) even if \(A\) is not 0.
\(AB = 0\) does not necessarily mean that \(A=0\) or \(B=0\text{.}\)
For example:
\begin{equation*}
\begin{bmatrix} 0 \amp 1 \\ 0 \amp 0 \end{bmatrix} \begin{bmatrix} 0 \amp 1 \\ 0 \amp 0 \end{bmatrix} = \begin{bmatrix} 0 \amp 0 \\ 0 \amp 0 \end{bmatrix} = \begin{bmatrix} 0 \amp 1 \\ 0 \amp 0 \end{bmatrix} \begin{bmatrix} 0 \amp 2 \\ 0 \amp 0 \end{bmatrix}\text{.}
\end{equation*}
To make these rules hold, we do not just need one of the matrices to not be zero, we would need to “divide” by a matrix. This is where the matrix inverse comes in. Suppose that \(A\) and \(B\) are \(n \times n\) matrices such that
\begin{equation*}
AB = I = BA\text{.}
\end{equation*}
Then we call \(B\) the inverse of \(A\) and we denote \(B\) by \(A^{-1}\text{.}\) Perhaps not surprisingly, \({(A^{-1})}^{-1} = A\text{,}\) since if the inverse of \(A\) is \(B\text{,}\) then the inverse of \(B\) is \(A\text{.}\) If the inverse of \(A\) exists, then we say \(A\) is invertible . If \(A\) is not invertible, we say \(A\) is singular .
If \(A = [a]\) is a \(1 \times 1\) matrix, then \(A^{-1}\) is \(a^{-1} = \frac{1}{a}\text{.}\) That is where the notation comes from. The computation is not nearly as simple when \(A\) is larger.
The proper formulation of the cancellation rule is: If \(A\) is invertible, then \(AB = AC\) implies \(B=C\text{.}\) The computation is what you would do in regular algebra with numbers, but you have to be careful never to commute matrices:
\begin{align*}
AB \amp = AC ,\\
A^{-1}AB \amp = A^{-1}AC ,\\
IB \amp = IC ,\\
B \amp = C \text{.}
\end{align*}
And similarly for cancellation on the right: If \(A\) is invertible, then \(BA = CA\) implies \(B=C\text{.}\)
The rule says, among other things, that the inverse of a matrix is unique if it exists: If \(AB = I = AC\text{,}\) then \(A\) is invertible and \(B=C\text{.}\)
We will see later how to compute an inverse of a matrix in general. For now, let us note that there is a simple formula for the inverse of a \(2 \times 2\) matrix
\begin{equation*}
\begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d \amp -b \\ -c \amp a \end{bmatrix}\text{.}
\end{equation*}
For example:
\begin{equation*}
\begin{bmatrix} 1 \amp 1 \\ 2 \amp 4 \end{bmatrix}^{-1} = \frac{1}{1\cdot 4-1 \cdot 2} \begin{bmatrix} 4 \amp -1 \\ -2 \amp 1 \end{bmatrix} = \begin{bmatrix} 2 \amp \nicefrac{-1}{2} \\ -1 \amp \nicefrac{1}{2} \end{bmatrix}\text{.}
\end{equation*}
Let’s try it:
\begin{equation*}
\begin{bmatrix} 1 \amp 1 \\ 2 \amp 4 \end{bmatrix} \begin{bmatrix} 2 \amp \nicefrac{-1}{2} \\ -1 \amp \nicefrac{1}{2} \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \qquad \text{ and } \qquad \begin{bmatrix} 2 \amp \nicefrac{-1}{2} \\ -1 \amp \nicefrac{1}{2} \end{bmatrix} \begin{bmatrix} 1 \amp 1 \\ 2 \amp 4 \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix}\text{.}
\end{equation*}
Just as we cannot divide by every number, not every matrix is invertible. In the case of matrices however we may have singular matrices that are not zero. For example,
\begin{equation*}
\begin{bmatrix} 1 \amp 1 \\ 2 \amp 2 \end{bmatrix}
\end{equation*}
is a singular matrix. But didn’t we just give a formula for an inverse? Let us try it:
\begin{equation*}
\begin{bmatrix} 1 \amp 1 \\ 2 \amp 2 \end{bmatrix}^{-1} = \frac{1}{1\cdot 2-1 \cdot 2} \begin{bmatrix} 2 \amp -1 \\ -2 \amp 1 \end{bmatrix} = \text{ ? }
\end{equation*}
We get into a bit of trouble; we are trying to divide by zero.
So a \(2 \times 2\) matrix \(A\) is invertible whenever
\begin{equation*}
ad - bc \not= 0
\end{equation*}
and otherwise it is singular. The expression \(ad-bc\) is called the determinant and we will look at it more carefully in a later section. There is a similar expression for a square matrix of any size.
Sous-section Diagonal matrices
A simple (and surprisingly useful) type of a square matrix is a so-called diagonal matrix . It is a matrix whose entries are all zero except those on the main diagonal from top left to bottom right. For example a \(4 \times 4\) diagonal matrix is of the form
\begin{equation*}
\begin{bmatrix} d_1 \amp 0 \amp 0 \amp 0 \\ 0 \amp d_2 \amp 0 \amp 0 \\ 0 \amp 0 \amp d_3 \amp 0 \\ 0 \amp 0 \amp 0 \amp d_4 \end{bmatrix}\text{.}
\end{equation*}
Such matrices have nice properties when we multiply by them. If we multiply them by a vector, they multiply the \(k^{\text{ th } }\) entry by \(d_k\text{.}\) For example,
\begin{equation*}
\begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 3 \end{bmatrix} \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 \cdot 4 \\ 2 \cdot 5 \\ 3 \cdot 6 \end{bmatrix} = \begin{bmatrix} 4 \\ 10 \\ 18 \end{bmatrix}\text{.}
\end{equation*}
Similarly, when they multiply another matrix from the left, they multiply the \(k^{\text{ th } }\) row by \(d_k\text{.}\) For example,
\begin{equation*}
\begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp -1 \end{bmatrix} \begin{bmatrix} 1 \amp 1 \amp 1 \\ 1 \amp 1 \amp 1 \\ 1 \amp 1 \amp 1 \end{bmatrix} = \begin{bmatrix} 2 \amp 2 \amp 2 \\ 3 \amp 3 \amp 3 \\ -1 \amp -1 \amp -1 \end{bmatrix}\text{.}
\end{equation*}
On the other hand, multiplying on the right, they multiply the columns:
\begin{equation*}
\begin{bmatrix} 1 \amp 1 \amp 1 \\ 1 \amp 1 \amp 1 \\ 1 \amp 1 \amp 1 \end{bmatrix} \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp -1 \end{bmatrix} = \begin{bmatrix} 2 \amp 3 \amp -1 \\ 2 \amp 3 \amp -1 \\ 2 \amp 3 \amp -1 \end{bmatrix}\text{.}
\end{equation*}
And it is really easy to multiply two diagonal matrices together:
\begin{equation*}
\begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 3 \end{bmatrix} \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp -1 \end{bmatrix} = \begin{bmatrix} 1 \cdot 2 \amp 0 \amp 0 \\ 0 \amp 2 \cdot 3 \amp 0 \\ 0 \amp 0 \amp 3 \cdot (-1) \end{bmatrix} = \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 6 \amp 0 \\ 0 \amp 0 \amp -3 \end{bmatrix}\text{.}
\end{equation*}
For this last reason, they are easy to invert, you simply invert each diagonal element:
\begin{equation*}
\begin{bmatrix} d_1 \amp 0 \amp 0 \\ 0 \amp d_2 \amp 0 \\ 0 \amp 0 \amp d_3 \end{bmatrix}^{-1} = \begin{bmatrix} d_1^{-1} \amp 0 \amp 0 \\ 0 \amp d_2^{-1} \amp 0 \\ 0 \amp 0 \amp d_3^{-1} \end{bmatrix}\text{.}
\end{equation*}
Let us check an example
\begin{equation*}
\underbrace{ \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp 4 \end{bmatrix}^{-1} }_{A^{-1}} \underbrace{ \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp 4 \end{bmatrix} }_{A} = \underbrace{ \begin{bmatrix} \frac{1}{2} \amp 0 \amp 0 \\ 0 \amp \frac{1}{3} \amp 0 \\ 0 \amp 0 \amp \frac{1}{4} \end{bmatrix} }_{A^{-1}} \underbrace{ \begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ 0 \amp 0 \amp 4 \end{bmatrix} }_{A} = \underbrace{ \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \end{bmatrix} }_{I}\text{.}
\end{equation*}
It is no wonder that the way we solve many problems in linear algebra (and in differential equations) is to try to reduce the problem to the case of diagonal matrices.
Sous-section Transpose
Vectors do not always have to be column vectors, that is just a convention. Swapping rows and columns is from time to time needed. The operation that swaps rows and columns is the so-called transpose . The transpose of \(A\) is denoted by \(A^T\text{.}\) Example:
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \amp 3 \\ 4 \amp 5 \amp 6 \end{bmatrix}^T = \begin{bmatrix} 1 \amp 4 \\ 2 \amp 5 \\ 3 \amp 6 \end{bmatrix}\text{.}
\end{equation*}
So transpose takes an \(m \times n\) matrix to an \(n \times m\) matrix.
A key fact about the transpose is that if the product \(AB\) makes sense then \(B^TA^T\) also makes sense, at least from the point of view of sizes. In fact, we get precisely the transpose of \(AB\text{.}\) That is:
\begin{equation*}
{(AB)}^T = B^TA^T\text{.}
\end{equation*}
For example,
\begin{equation*}
{\left( \begin{bmatrix} 1 \amp 2 \amp 3 \\ 4 \amp 5 \amp 6 \end{bmatrix} \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \\ 2 \amp -2 \end{bmatrix} \right)}^T = \begin{bmatrix} 0 \amp 1 \amp 2 \\ 1 \amp 0 \amp -2 \end{bmatrix} \begin{bmatrix} 1 \amp 4 \\ 2 \amp 5 \\ 3 \amp 6 \end{bmatrix}\text{.}
\end{equation*}
It is left to the reader to verify that computing the matrix product on the left and then transposing is the same as computing the matrix product on the right.
If we have a column vector \(\vec{x}\) to which we apply a matrix \(A\) and we transpose the result, then the row vector \(\vec{x}^T\) applies to \(A^T\) from the left:
\begin{equation*}
{(A\vec{x})}^T = \vec{x}^TA^T\text{.}
\end{equation*}
Another place where transpose is useful is when we wish to apply the dot product to two column vectors:
\begin{equation*}
\vec{x} \cdot \vec{y} = \vec{y}^T \vec{x}\text{.}
\end{equation*}
That is the way that one often writes the dot product in software.
We say a matrix \(A\) is symmetric if \(A = A^T\text{.}\) For example,
\begin{equation*}
\begin{bmatrix} 1 \amp 2 \amp 3 \\ 2 \amp 4 \amp 5 \\ 3 \amp 5 \amp 6 \end{bmatrix}
\end{equation*}
is a symmetric matrix. Notice that a symmetric matrix is always square, that is, \(n \times n\text{.}\) Symmetric matrices have many nice properties, and come up quite often in applications.