Every theorem in Lesson 3PM treated a matrix as a single rectangle of entries. That is often what we want, but it throws away a second piece of information that is usually just as useful: a matrix arising from a problem often already carries internal structure, a corner to quarantine, a column that deserves to be singled out, a diagonal strip worth handling on its own. Drawing a few lines across the rectangle lets us name those pieces, reason about them one at a time, and recombine them through the addition and multiplication rules already in place.
Submatrices and Block Partitions
Definition 1 (Submatrix)
Let A∈Fm×n. If some (possibly empty) collection of complete rows and some (possibly empty) collection of complete columns of A are deleted, the rectangle that remains is called a submatrix of A. Equivalently, a submatrix is determined by an ordered index set I⊆{1,2,…,m} of rows to keep and J⊆{1,2,…,n} of columns to keep.
Example 1 (Submatrices of a Small Rectangle)
Take
M=140−2133−1202−5∈R3×4.
Keeping rows {1,2} with columns {1,2} returns the 2×2 submatrix [14−21]. Keeping only row 2 and column 3 returns the 1×1 submatrix [−1]. Keeping row 1 with every column returns the row matrix [1−230]. Keeping rows {1,3} with columns {1,4} returns [100−5]. Each is a submatrix of M obtained from a different choice of surviving rows and columns.
A submatrix on its own is only a selection. The construction below is what makes the idea productive in computation: rather than singling out one piece, cut the whole rectangle along several row and column boundaries and keep all of the resulting tiles.
Definition 2 (Partition and Block-Matrix)
A partition of A∈Fm×n is the choice of dividing lines between specified rows and between specified columns, each running the full width or height of the array. The partition splits A into an ordered rectangular array of submatrices called blocks, and A written in terms of these blocks is called a block-matrix. We record a partition as
A=[Aαβ]α,β=1r,s,
where Aαβ is the block at block-position (α,β). Any block may itself be as small as a 1×1 scalar.
Example 2 (Two Partitions of the Same Matrix)
The matrix M of Example 1 admits many partitions. One natural cut places a horizontal line between rows 2 and 3 and a vertical line between columns 2 and 3:
Neither layout is canonical; each simply exposes a different internal geometry of the same rectangle.
Every shape name from the opening classification of matrix types in Lesson 3PM transfers to the block-structured picture without alteration. A square block-matrix has equal numbers of row-blocks and column-blocks; a diagonal block-matrix has zero blocks off the main block-diagonal; an upper-triangular block-matrix has zero blocks below it, and so on. The blocks themselves are submatrices rather than scalars, but the label means what it did before.
Example 3 (Triangular and Diagonal Block-Matrices)
The partitioned matrices
[A110A12A22],[B1100B22]
are an upper-triangular and a diagonal block-matrix respectively, exactly as in the opening classification of matrix types in Lesson 3PM, except that the scalar zeros of that section are now zero blocks of the sizes forced by the partition.
A square matrix admits a finer distinction. When the dividing lines are placed so that every diagonal block ends up square, the partition is well-suited to iteration, since the diagonal blocks can then be treated as square matrices in their own right.
Definition 3 (Symmetric Partition)
A partition of a square matrix A∈Fn×n is symmetric when every diagonal block Aαα is itself square. Equivalently, the sequence of row-cut sizes matches the sequence of column-cut sizes.
Example 4 (A Symmetric Partition of a 3×3 Matrix)
Cutting both the rows and the columns at the boundary between index 2 and index 3 partitions
N=300240−112
symmetrically: the top-left block is the 2×2 matrix [3024], the bottom-right block is the 1×1 scalar [2], and both diagonal blocks are square as the preceding definition requires.
Problem 1
Exhibit three different submatrices of
A=13200−12140−12−2101503−1
of sizes 2×3, 3×2, and 1×4, each obtained from a different choice of surviving rows and columns.
Problem 2
Give two different symmetric partitions of
A=3001024004201003
and, for each partition, identify the diagonal blocks. For which of your partitions does the block-matrix become block-diagonal?
Addition and Multiplication in Block Form
The partition has so far only been a way to see a matrix. The pay-off is that addition and multiplication of matrices, once the block sizes line up, can be carried out block by block using the same formulas as at the scalar level. The entrywise addition and row-into-column multiplication introduced in Lesson 3PM are upgraded without any new idea.
Theorem 1 (Block Addition)
Let A,B∈Fm×n be partitioned into the same block pattern,
A=[Aαβ]α,β=1r,s,B=[Bαβ]α,β=1r,s,
with corresponding blocks Aαβ and Bαβ of the same size for every (α,β). Then
Matrix addition in Lesson 3PM is entrywise, and the two partitions share a single grid of dividing lines, so the partition of A+B at block position (α,β) is the set of entries (aij+bij) with (i,j) ranging over that block. Reading those entries off as a submatrix gives (A+B)αβ=Aαβ+Bαβ. The subtraction statement is identical with + replaced by −.
The hypothesis that the two partitions match is load-bearing: if B were cut differently, corresponding blocks would not even have the same size, and the block formula would no longer make sense, even though the underlying entrywise sum still does.
■
Example 5 (Adding Two Matrices Block by Block)
Let
A=10223−2−140,B=0−1311120−4,
both in R3×3 and partitioned identically. Block by block,
An entrywise sum produces the same result, confirming the block-addition theorem on this instance.
Block multiplication is slightly more delicate, because multiplication of ordinary matrices already required a conformability check. For block-matrices there are two such checks, one on the block grid itself and one inside the blocks.
Theorem 2 (Block Multiplication)
Let A∈Fm×l be partitioned into blocks Aαk of size mα×lk with 1≤α≤r and 1≤k≤t, and let B∈Fl×n be partitioned into blocks Bkβ of size lk×nβ with 1≤k≤t and 1≤β≤s. Thus the column-cut of A matches the row-cut of B. Then
AB=[k=1∑tAαkBkβ]α,β=1r,s.
Proof
Fix a block position (α,β) and a scalar position (i,j) inside that block. The value of (AB)ij is the row-into-column product from Lesson 3PM,
(AB)ij=h=1∑laihbhj.
Break the summation index h at the boundaries of the column-cut of A: letting Lk be the set of scalar indices lying in the kth column-block of A,
(AB)ij=k=1∑th∈Lk∑aihbhj.
Within the kth inner sum, aih runs over row i of the block Aαk and bhj runs over column j of the block Bkβ, because the column-cut of A at Lk was chosen to match the row-cut of B. That inner sum is therefore (AαkBkβ)ij, and reading off every (i,j) in the block returns (AB)αβ=∑kAαkBkβ.
Two conformability conditions are in play. The number of column-blocks of A must equal the number of row-blocks of B, so the index k ranges over the same set on both sides. Additionally, each paired product AαkBkβ must itself be defined, meaning the column-count of Aαk equals the row-count of Bkβ. If either condition fails the block formula cannot be written, although the underlying product AB may still exist and can be computed entrywise in the ordinary way.
with A11=[13], A12=[200−1], B11=[21], B21=[10−12]. The column-cut of A sits at position 1 and the row-cut of B sits at position 1, so the two partitions are compatible. The block-multiplication theorem gives
A direct row-into-column computation of AB returns the same 2×2 matrix.
A partition that cuts A into its individual columns and B into its individual rows sets t=l, and the block formula collapses to
AB=k=1∑lakrk,
exactly the column-row decomposition proved earlier in Lesson 3PM. That is the extreme case of the block-multiplication theorem above; the general statement lets us pick any cut that is convenient for the problem, not only the finest one.
Write out the blocks of A and B implied by these partitions, and decide whether they are compatible for block multiplication in the sense of the block-multiplication theorem above. If they are, compute AB by the block formula. If not, say which conformability condition fails and compute AB directly.
Problem 4
Let A∈Fn×n be partitioned as the block-diagonal matrix
A=[A1100A22]
with each Aii square. Using the block-multiplication theorem above, show that A is idempotent (respectively nilpotent, respectively involutory) in the sense of Lesson 3PM if and only if each diagonal block Aii is. Give a small example of two 2×2 diagonal blocks whose block assembly is a nilpotent 4×4 matrix of index 2.
Polynomials in a Matrix
Building on the sections on matrix addition and multiplication in Lesson 3PM, the next natural step is to combine integer powers of a fixed square matrix into a scalar-weighted sum, mirroring ordinary polynomial evaluation at a number.
Definition 4 (Polynomial in a Matrix)
Let A∈Fn×n and let
p(λ)=a0+a1λ+a2λ2+⋯+alλl,al=0,
be a scalar polynomial of degree l over F. The polynomial in A associated to p is
p(A)=a0In+a1A+a2A2+⋯+alAl∈Fn×n,
with powers interpreted as in the definition of matrix powers from Lesson 3PM. The degree zero term a0 is promoted to a0In so that every summand is an n×n matrix.
Every power of A commutes with every other power of A by the commuting-powers theorem from Lesson 3PM, so the polynomials p(A), with p ranging over scalar polynomials over F and A fixed, all commute with one another. That one observation is what lets the scalar identities of polynomial arithmetic transfer to the matrix side without adjustment.
Theorem 3 (Polynomial Evaluation Respects Sum, Product, and Division)
Let A∈Fn×n and let p,q,d be scalar polynomials over F.
If p(λ)+q(λ)=h(λ), then p(A)+q(A)=h(A).
If p(λ)q(λ)=t(λ), then p(A)q(A)=t(A).
If p(λ)=q(λ)d(λ)+r(λ) with r either zero or of degree strictly less than the degree of d, then p(A)=q(A)d(A)+r(A).
Proof
For (1), addition of scalar polynomials collects coefficients of each power of λ, and the same collection performed on the matrix powers Ak returns h(A), using the algebraic laws of matrix addition from Lesson 3PM.
For (2), expand using bilinearity of matrix multiplication from Lesson 3PM:
where the final equality uses the exponent law for matrix powers from Lesson 3PM. The coefficient of Ak on the right is ∑i+j=kaibj, which is exactly the coefficient of λk in p(λ)q(λ)=t(λ). Hence p(A)q(A)=t(A).
For (3), evaluate both sides of p=qd+r at A, applying (1) to the outer sum and (2) to the product qd.
■
If d(λ) divides p(λ) at the scalar level, then r=0 and (3) collapses to p(A)=q(A)d(A), so d(A) divides p(A) by the matrix factor q(A). Because p(A) and q(A) commute, left and right division give the same answer, and scalar-style factorisation arguments push through unchanged.
What does not carry over from scalar polynomials is the count of zeros. A scalar polynomial of degree l has at most l zeros in F, but the matrix equation p(A)=0 can be satisfied by whole parametrised families of matrices. The reason is structural: matrix multiplication permits a product of two non-zero factors to vanish, a phenomenon already recorded in the matrix-multiplication section of Lesson 3PM.
Example 8 (A Quadratic with Infinitely Many Matrix Zeros)
Let p(λ)=(λ−1)(λ+2)=λ2+λ−2. By the polynomial-evaluation theorem above,
p(A)=A2+A−2I=(A−I)(A+2I)
for every A∈F2×2. The scalar equation p(λ)=0 has exactly the two roots 1 and −2, so the two scalar matrices A=I and A=−2I are obvious zeros of p. They are not the only ones. For every b∈F take
Ab=[10b−2].
Then
Ab−I=[00b−3],Ab+2I=[30b0],
and a row-into-column computation gives (Ab−I)(Ab+2I)=0. The whole one-parameter family {Ab:b∈F} therefore consists of zeros of the degree-two polynomial p. Among them, only A0=diag(1,−2) has I or −2I as a scalar counterpart, confirming that the scalar count of roots is genuinely lost once A is allowed to be a matrix.
Problem 5
Let p(λ)=λ2−3λ+2, let q(λ)=λ+1, and take
A=[1023].
Compute p(A), q(A), the sum p(A)+q(A), and the product p(A)q(A). Compute (p+q)(λ) and (pq)(λ) at the scalar level, evaluate the resulting polynomials at A, and verify directly that
p(A)+q(A)=(p+q)(A),p(A)q(A)=(pq)(A),
in accordance with the polynomial-evaluation theorem above.
Problem 6
Exhibit a one-parameter family of 2×2 real matrices A with A2=I2 that contains neither I2 nor −I2. Using the polynomial-evaluation theorem above applied to p(λ)=λ2−1, explain why each matrix in your family is a zero of p, and connect the result to the involutory class from Lesson 3PM.
Problem 7
Let A∈Fn×n be idempotent, so A2=A. Prove by induction on k that Ak=A for every k≥1, and deduce that for every scalar polynomial p(λ)=a0+a1λ+⋯+alλl,
p(A)=p(0)In+(p(1)−p(0))A.
Hence every polynomial in an idempotent matrix is of the form
αIn+βA
for suitable scalars α,β∈F.
Exercises
Exercise 1 (A Plane of Equidistant Points)
Let P and Q be different points in R3 with position vectors p and q, and let E be the collection of all vectors v satisfying
∥p−v∥=∥q−v∥.
Show, by squaring both sides and using ∥w∥2=w⋅w, that E consists exactly of those v satisfying
v⋅(p−q)=21(∥p∥2−∥q∥2).
For
p=343,q=−15−2,
give an explicit equation for the plane E.
Exercise 2 (Verifying Block Multiplication by Hand)
Identify the size of each block, confirm that the conformability conditions of the block-multiplication theorem above are satisfied, and verify by direct computation that
AB=A11B11+A12B21.
Exercise 3 (Block Addition in General)
Let A,B∈Fm×n be partitioned into the same block pattern, with corresponding blocks Aαβ and Bαβ of matching size for every (α,β). Prove that
Using difference vectors, show that no three of these points are collinear.
Show that the four points all lie in the same plane in R3 by finding an equation for such a plane.
Exercise 5 (Block Multiplication in General)
Let A∈Fm×l carry blocks Aαk of size mα×lk for 1≤α≤r and 1≤k≤t, and let B∈Fl×n carry blocks Bkβ of size lk×nβ for 1≤k≤t and 1≤β≤s, so that the column-cut of A matches the row-cut of B. Prove that
AB=[k=1∑tAαkBkβ]α,β=1r,s.
Exercise 6 (Polynomials in an Involutory Matrix)
Let A∈Fn×n be involutory in the sense of Lesson 3PM, so that A2=In, and assume 21∈F. Using the even and odd power formulas for an involutory matrix from Lesson 3PM, prove that for every scalar polynomial
p(λ)=a0+a1λ+a2λ2+⋯+alλl,
the matrix polynomial collapses to
p(A)=21(p(1)+p(−1))In+21(p(1)−p(−1))A.
Compare the closed form with the earlier idempotent problem above, and describe explicitly the family of matrices of the form
αIn+βA
in which p(A) lies.
Exercise 7 (Angles Between Planes)
Let P be the plane
2x−2y−z=3,
and let Q be the plane containing the three non-collinear points
(1,0,2),(3,0,−1),(3,1,2).
Find the cosine of the angle between the xy-plane and the plane P.
Find the cosine of the angle between the xy-plane and the plane Q.
Explain in terms of normal vectors why P and Q are not parallel, and find a parametric form for their line of intersection.
Exercise 8 (A Polynomial on a Block-Diagonal Matrix with Bidiagonal Blocks)
Let
p(λ)=λ3+λ2+2λ+2
and suppose
A∈C6×6
has the block-diagonal form
A=λ0000000μ000001μ000000ν000001ν000001ν,
with one 1×1 block, one 2×2 upper-bidiagonal block at μ, and one 3×3 upper-bidiagonal block at ν. Compute the relevant powers of each block directly, and using block addition from the theorem above prove that
the first and second ordinary derivatives of the scalar polynomial p. If you are taking MA1B alongside this course, these formulas should already look familiar; if not, you may simply use them here as given.
Exercise 9 (Powers of a Block-Triangular Matrix)
Let
A=[A110A12I]
be symmetrically partitioned, with A11 square and I the identity block of matching order. Using the block-multiplication theorem above together with induction on n, prove that for every positive integer n,
An=[A11n0pn(A11)A12I],
where pn(λ)=λ−1λn−1=1+λ+λ2+⋯+λn−1 is the polynomial coming from the geometric-series identity, evaluated at A11 in the sense of the definition of a polynomial in a matrix above.