Mascot image.
#Math#Vectors#Geometry
← Back to MA1A: Applied Linear Algebra

Lesson assets

No linked assets.

Beyond Three Dimensions: The Space Rn\mathbb{R}^n

While we have extensively visualised vectors as directed segments with magnitude and direction in R2\mathbb{R}^2 and R3\mathbb{R}^3, the algebraic structure we have built is not confined to two or three dimensions. By defining a vector algebraically as an ordered list of numbers, we can seamlessly extend these concepts to spaces of any dimension.

Definition 1 (n-vector)

For a positive integer n, an n-vector is an ordered list of n real numbers, written as a column. The collection of all possible n-vectors is denoted by Rn\mathbb{R}^n.

Example 1

The column containing (3,4.9,1/2,0,1)(-3, 4.9, 1/2, 0, 1) written vertically is a 5-vector belonging to R5\mathbb{R}^5.

Remark (Points and Vectors in $\mathbb{R}^n$)

Some linear algebra texts make no distinction between a point of Rn\mathbb{R}^n and a vector of Rn\mathbb{R}^n, since both are specified by the same ordered list of real numbers. In these notes we keep the earlier convention: points are written with capital letters and parentheses, whereas vectors are written in bold as columns. The underlying numerical data are the same; what changes is the geometric interpretation. Once an origin has been fixed, every point determines a position vector and every position vector determines a point.

Once n > 3, the classical physical interpretation of a vector as an arrow possessing magnitude and direction is no longer a meaningful way to conceptualise the space. However, vectors in higher dimensions are indispensable across both the physical sciences and data-driven fields. In complex physical systems, describing the complete state of a model often requires grouping numerous 3-vectors into a single, massive n-vector.

It is equally important to recognise a second point of view. An n-vector may simply be a structured list of numerical data. In that setting, the entries need not describe motion in space at all. They may record measurements, scores, counts, or coded information drawn from some underlying system.

Example 2 (Rainfall Record)

Suppose a weather station records the rainfall on each day of a non-leap year. These measurements may be arranged into the vector

R=[r1r2r365],\mathbf{R} = \begin{bmatrix} r_1 \\ r_2 \\ \vdots \\ r_{365} \end{bmatrix},

where rkr_k is the rainfall measured on the kkth day. The entries do not describe a point moving through space; they simply collect related numerical information into a single mathematical object.

Example 3 (Inventory Data)

Suppose a warehouse stores mm different products, and let qkq_k denote the number of units currently held of the kkth product. The entire stock list may be written as

q=[q1q2qm].\mathbf{q} = \begin{bmatrix} q_1 \\ q_2 \\ \vdots \\ q_m \end{bmatrix}.

Such a vector provides a compact way to encode the state of the warehouse at a given time.

Example 4 (Sensor Data)

Consider a large engineering system fitted with nn sensors. If sks_k denotes the reading of the kkth sensor at a fixed moment, then the entire state of the sensor network may be represented by

s=[s1s2sn],\mathbf{s} = \begin{bmatrix} s_1 \\ s_2 \\ \vdots \\ s_n \end{bmatrix},

where nn may be very large. Again, the point is not that s\mathbf{s} is an arrow in ordinary space, but that vector notation organises a large collection of related measurements in a form that algebra can handle cleanly.

Calling such lists vectors is not merely a change of language. Once the data are placed into Rn\mathbb{R}^n, the algebra developed throughout these notes becomes available. We may add vectors, scale them, compare them, and later use further tools to detect structure hidden inside the data.

The extension is not limited to addition and scalar multiplication. The metric ideas developed earlier also admit a uniform algebraic formulation. In R2\mathbb{R}^2 and R3\mathbb{R}^3 these formulas were recovered from geometry. In Rn\mathbb{R}^n we now adopt them as definitions, since once n>3n > 3 there is no longer a literal picture in ordinary space to appeal to.

Example 5 (Distance from a Starting Point in Space)

Suppose a signal lamp is mounted on a framework so that, relative to a fixed origin, it lies 44 units in the positive xx-direction, 33 units in the positive yy-direction, and 1212 units above the xyxy-plane. Its position vector is

p=[4312].\mathbf{p} = \begin{bmatrix} 4 \\ 3 \\ 12 \end{bmatrix}.

The question “how far is the lamp from the starting point?” is exactly the question of finding the length of p\mathbf{p}. Using the three-dimensional distance formula,

p=42+32+122=16+9+144=169=13.\|\mathbf{p}\| = \sqrt{4^2 + 3^2 + 12^2} = \sqrt{16 + 9 + 144} = \sqrt{169} = 13.

This is the same pattern already seen in the plane and in space: the distance from the origin to a point is the magnitude of its position vector.

Definition 2 (Magnitude in Rn\mathbb{R}^n)

Let

v=[v1v2vn]Rn.\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n.

The magnitude (or norm) of v\mathbf{v} is defined by

v=v12+v22++vn2.\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}.

When n=2n=2 or n=3n=3, this reduces to the formulas already established in Lesson 1AM and earlier in this lesson. For larger nn, the formula is no longer visual, but it remains the correct algebraic continuation of the same metric structure.

Theorem 1 (Basic Magnitude Identities in Rn\mathbb{R}^n)

Let vRn\mathbf{v} \in \mathbb{R}^n and let cRc \in \mathbb{R}. Then

  1. v=v\|-\mathbf{v}\| = \|\mathbf{v}\|.
  2. cv=cv\|c\mathbf{v}\| = |c|\,\|\mathbf{v}\|.
Proof

If v=[v1v2vn]\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}, then

v=(v1)2+(v2)2++(vn)2=v12+v22++vn2=v.\|-\mathbf{v}\| = \sqrt{(-v_1)^2 + (-v_2)^2 + \cdots + (-v_n)^2} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \|\mathbf{v}\|.

Also,

cv=(cv1)2+(cv2)2++(cvn)2=c2(v12+v22++vn2)=cv.\|c\mathbf{v}\| = \sqrt{(cv_1)^2 + (cv_2)^2 + \cdots + (cv_n)^2} = \sqrt{c^2(v_1^2 + v_2^2 + \cdots + v_n^2)} = |c|\,\|\mathbf{v}\|.
Definition 3 (Distance in Rn\mathbb{R}^n)

Let x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n. The distance between x\mathbf{x} and y\mathbf{y} is defined by

d(x,y)=xy.d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|.

Thus the distance between two vectors is computed by taking the difference vector and then measuring its magnitude. This agrees with the ordinary distance formulas in R2\mathbb{R}^2 and R3\mathbb{R}^3, where xy\mathbf{x} - \mathbf{y} is the displacement from the point represented by y\mathbf{y} to the point represented by x\mathbf{x}. In higher dimensions we keep the same formula because it extends those lower-dimensional cases exactly and behaves in the way a distance should. In particular,

d(x,y)=d(y,x),d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x}),

since yx=(xy)\mathbf{y} - \mathbf{x} = -(\mathbf{x} - \mathbf{y}) and v=v\|-\mathbf{v}\| = \|\mathbf{v}\|.

Note (Notation)

When primes are used on vectors or parameters, as in v\mathbf{v}', v\mathbf{v}'', or tt', they simply distinguish one object from another. They do not denote derivatives here.

Example 6 (Distance Computation in R4\mathbb{R}^4)

Let

x=[1213],y=[3101].\mathbf{x} = \begin{bmatrix} 1 \\ 2 \\ -1 \\ 3 \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} 3 \\ -1 \\ 0 \\ -1 \end{bmatrix}.

Then

xy=[2314],\mathbf{x} - \mathbf{y} = \begin{bmatrix} -2 \\ 3 \\ -1 \\ 4 \end{bmatrix},

so the distance between x\mathbf{x} and y\mathbf{y} is

d(x,y)=xy=(2)2+32+(1)2+42=4+9+1+16=30.d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\| = \sqrt{(-2)^2 + 3^2 + (-1)^2 + 4^2} = \sqrt{4 + 9 + 1 + 16} = \sqrt{30}.
Problem 1

Let

a=[2103],b=[1121].\mathbf{a} = \begin{bmatrix} 2 \\ -1 \\ 0 \\ 3 \end{bmatrix}, \qquad \mathbf{b} = \begin{bmatrix} -1 \\ 1 \\ 2 \\ -1 \end{bmatrix}.

Compute ab\mathbf{a} - \mathbf{b} and d(a,b)d(\mathbf{a}, \mathbf{b}). Then compute ba\mathbf{b} - \mathbf{a} and explain why the distance is unchanged.

Dot Products and Angles in Rn\mathbb{R}^n

Length and distance are not the end of the metric story. In the plane and in space, the Law of Cosines tied lengths to angles, and the dot product gave an algebraic way to package that relation. The same algebraic construction extends to Rn\mathbb{R}^n.

Definition 4 (Dot Product in Rn\mathbb{R}^n)

Let

x=[x1x2xn],y=[y1y2yn]Rn.\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \in \mathbb{R}^n.

Their dot product is the scalar

xy=x1y1+x2y2++xnyn=i=1nxiyi.\mathbf{x} \cdot \mathbf{y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n = \sum_{i=1}^n x_i y_i.

This is only defined when the two vectors belong to the same Rn\mathbb{R}^n.

Remark (Summation Notation)

The symbol

i=1nxiyi\sum_{i=1}^n x_i y_i

means “add the expression xiyix_i y_i as the index ii runs from 11 to nn”. In other words,

i=1nxiyi=x1y1+x2y2++xnyn.\sum_{i=1}^n x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n.

The letter ii is only a running index: writing k=1nxkyk\sum_{k=1}^n x_k y_k would mean exactly the same thing.

As in the earlier lessons, the output of a dot product is a scalar, not a vector. The self dot product recovers the square of the magnitude:

xx=x12+x22++xn2=x2.\mathbf{x} \cdot \mathbf{x} = x_1^2 + x_2^2 + \cdots + x_n^2 = \|\mathbf{x}\|^2.
Theorem 2 (Algebraic Properties of the Dot Product in Rn\mathbb{R}^n)

For all x,y,zRn\mathbf{x}, \mathbf{y}, \mathbf{z} \in \mathbb{R}^n and all scalars rRr \in \mathbb{R}:

  1. Symmetry: xy=yx\mathbf{x} \cdot \mathbf{y} = \mathbf{y} \cdot \mathbf{x}.
  2. Bilinearity: (rx+y)z=r(xz)+yz(r\mathbf{x} + \mathbf{y}) \cdot \mathbf{z} = r(\mathbf{x} \cdot \mathbf{z}) + \mathbf{y} \cdot \mathbf{z}, and likewise in the second argument.
  3. Positive Definiteness: xx0\mathbf{x} \cdot \mathbf{x} \ge 0, with equality if and only if x=0\mathbf{x} = \mathbf{0}.
Proof

Each statement is verified component by component exactly as in R2\mathbb{R}^2 and R3\mathbb{R}^3; the only difference is that there are nn summands instead of two or three.

Example 7 (Using the Algebraic Rules)

Let

u=[121],v=[304],w=[215].\mathbf{u} = \begin{bmatrix} 1 \\ 2 \\ -1 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 3 \\ 0 \\ 4 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} -2 \\ 1 \\ 5 \end{bmatrix}.

Then

uv=13+20+(1)4=1,\mathbf{u} \cdot \mathbf{v} = 1 \cdot 3 + 2 \cdot 0 + (-1)\cdot 4 = -1,

and by symmetry vu=1\mathbf{v} \cdot \mathbf{u} = -1 as well. Also,

uu=12+22+(1)2=6=u2.\mathbf{u} \cdot \mathbf{u} = 1^2 + 2^2 + (-1)^2 = 6 = \|\mathbf{u}\|^2.

Finally,

u(2vw)=2(uv)uw=2(1)((2)+25)=3.\mathbf{u} \cdot (2\mathbf{v} - \mathbf{w}) = 2(\mathbf{u} \cdot \mathbf{v}) - \mathbf{u} \cdot \mathbf{w} = 2(-1) - \bigl((-2) + 2 - 5\bigr) = 3.
Example 8 (Computing Dot Products)

For

[213][324],\begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix} \cdot \begin{bmatrix} 3 \\ -2 \\ 4 \end{bmatrix},

we obtain

23+1(2)+34=16.2 \cdot 3 + 1 \cdot (-2) + 3 \cdot 4 = 16.

Likewise, in R4\mathbb{R}^4,

[3101/2][1236]=32+0+3=4.\begin{bmatrix} 3 \\ 1 \\ 0 \\ 1/2 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ -2 \\ 3 \\ 6 \end{bmatrix} = 3 - 2 + 0 + 3 = 4.
Example 9 (Prices and Quantities)

Suppose an economy tracks nn goods. Let

p=[p1p2pn]\mathbf{p} = \begin{bmatrix} p_1 \\ p_2 \\ \vdots \\ p_n \end{bmatrix}

be the price vector and

x=[x1x2xn]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

the quantity vector for a particular consumer. Then

px=p1x1+p2x2++pnxn\mathbf{p} \cdot \mathbf{x} = p_1x_1 + p_2x_2 + \cdots + p_nx_n

is the total amount spent. This is a typical higher-dimensional use of the dot product: the vectors encode data, not directions in physical space.

Theorem 3 (Cauchy-Schwarz in Rn\mathbb{R}^n)

For all vectors x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n,

xyxyxy.-\|\mathbf{x}\|\,\|\mathbf{y}\| \le \mathbf{x} \cdot \mathbf{y} \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

Equivalently,

xyxy.|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

Moreover, equality holds in one of these inequalities if and only if one of the two vectors is a scalar multiple of the other.

Proof

If x=0\mathbf{x} = \mathbf{0} or y=0\mathbf{y} = \mathbf{0}, the statement is immediate, and equality certainly holds because 0\mathbf{0} is a scalar multiple of every vector. Assume from now on that both vectors are non-zero.

For any real number tt,

0x+ty2=y2t2+2(xy)t+x2.0 \le \|\mathbf{x} + t\mathbf{y}\|^2 = \|\mathbf{y}\|^2 t^2 + 2(\mathbf{x} \cdot \mathbf{y})t + \|\mathbf{x}\|^2.

Thus the quadratic polynomial

q(t)=y2t2+2(xy)t+x2q(t) = \|\mathbf{y}\|^2 t^2 + 2(\mathbf{x} \cdot \mathbf{y})t + \|\mathbf{x}\|^2

is non-negative for every real tt. Since its leading coefficient y2\|\mathbf{y}\|^2 is positive, its discriminant must be non-positive:

(2(xy))24y2x20.\bigl(2(\mathbf{x} \cdot \mathbf{y})\bigr)^2 - 4\|\mathbf{y}\|^2\|\mathbf{x}\|^2 \le 0.

Rearranging gives

(xy)2x2y2,(\mathbf{x} \cdot \mathbf{y})^2 \le \|\mathbf{x}\|^2\|\mathbf{y}\|^2,

and taking square roots yields

xyxy.|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

This is equivalent to the two-sided inequality.

For the equality case, equality in Cauchy-Schwarz means

(xy)2=x2y2,(\mathbf{x} \cdot \mathbf{y})^2 = \|\mathbf{x}\|^2\|\mathbf{y}\|^2,

so the discriminant of q(t)q(t) is 00. Hence q(t)q(t) has a real root, meaning x+ty2=0\|\mathbf{x} + t\mathbf{y}\|^2 = 0 for some tt, so x+ty=0\mathbf{x} + t\mathbf{y} = \mathbf{0}. Thus one vector is a scalar multiple of the other. Conversely, if x=cy\mathbf{x} = c\mathbf{y} for some scalar cc, then

xy=cy2,xy=cy2,\mathbf{x} \cdot \mathbf{y} = c\,\|\mathbf{y}\|^2, \qquad \|\mathbf{x}\|\,\|\mathbf{y}\| = |c|\,\|\mathbf{y}\|^2,

so equality holds.

Definition 5 (Angle and Orthogonality in Rn\mathbb{R}^n)

Let x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n be non-zero. Their angle is the unique number θ[0,π]\theta \in [0, \pi] satisfying

cosθ=xyxy.\cos\theta = \frac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\|\,\|\mathbf{y}\|}.

Because of Cauchy-Schwarz, the right-hand side always lies in the interval [1,1][-1, 1], so this definition makes sense.

Two vectors x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n are orthogonal (or perpendicular) if

xy=0.\mathbf{x} \cdot \mathbf{y} = 0.
Remark (Changing Orientation)

If one of the two vectors is multiplied by 1-1, the dot product changes sign while the magnitudes do not. The cosine therefore changes sign, so the angle is replaced by its supplementary angle πθ\pi - \theta. This is the same acute-versus-obtuse ambiguity that appears when one speaks about the angle between two lines through the origin rather than between two directed vectors.

Example 10 (Angles with Scalar Multiples)

Let xRn\mathbf{x} \in \mathbb{R}^n be non-zero and let c0c \neq 0 be a scalar. The angle between x\mathbf{x} and cxc\mathbf{x} behaves exactly as geometric intuition suggests.

Indeed,

cosθ=(cx)xcxx=c(xx)cx2=cc.\cos\theta = \frac{(c\mathbf{x}) \cdot \mathbf{x}}{\|c\mathbf{x}\|\,\|\mathbf{x}\|} = \frac{c(\mathbf{x} \cdot \mathbf{x})}{|c|\,\|\mathbf{x}\|^2} = \frac{c}{|c|}.

If c>0c > 0, then cosθ=1\cos\theta = 1, so θ=0\theta = 0. If c<0c < 0, then cosθ=1\cos\theta = -1, so θ=π\theta = \pi.

In fact, by the equality case of Cauchy-Schwarz, these are the only situations in which the angle between two non-zero vectors can be 00 or π\pi: that happens precisely when one is a scalar multiple of the other.

Example 11 (Angle Computation in R4\mathbb{R}^4)

Let

x=[1100],y=[1010].\mathbf{x} = \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}.

Then

xy=1,x=2,y=2.\mathbf{x} \cdot \mathbf{y} = 1, \qquad \|\mathbf{x}\| = \sqrt{2}, \qquad \|\mathbf{y}\| = \sqrt{2}.

Hence

cosθ=1(2)(2)=12,\cos\theta = \frac{1}{(\sqrt{2})(\sqrt{2})} = \frac{1}{2},

so the angle between x\mathbf{x} and y\mathbf{y} is θ=π/3\theta = \pi/3.

Theorem 4 (Pythagoras in Rn\mathbb{R}^n)

If u,vRn\mathbf{u}, \mathbf{v} \in \mathbb{R}^n are orthogonal, then

u+v2=u2+v2.\|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2.
Proof

Since uv\mathbf{u} \perp \mathbf{v}, we have uv=0\mathbf{u} \cdot \mathbf{v} = 0. Therefore

u+v2=(u+v)(u+v)=uu+uv+vu+vv.\|\mathbf{u} + \mathbf{v}\|^2 = (\mathbf{u} + \mathbf{v}) \cdot (\mathbf{u} + \mathbf{v}) = \mathbf{u} \cdot \mathbf{u} + \mathbf{u} \cdot \mathbf{v} + \mathbf{v} \cdot \mathbf{u} + \mathbf{v} \cdot \mathbf{v}.

By symmetry and orthogonality, the middle two terms are both 00, so

u+v2=u2+v2.\|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2.
Problem 2

Let v,wRn\mathbf{v}, \mathbf{w} \in \mathbb{R}^n be non-zero.

(a) Show that if v+w\mathbf{v} + \mathbf{w} and vw\mathbf{v} - \mathbf{w} are perpendicular, then v=w\|\mathbf{v}\| = \|\mathbf{w}\|.

(b) Show the converse: if v=w\|\mathbf{v}\| = \|\mathbf{w}\|, then v+w\mathbf{v} + \mathbf{w} and vw\mathbf{v} - \mathbf{w} are perpendicular.

Example 12 (Cosine Similarity)

Suppose v,wRn\mathbf{v}, \mathbf{w} \in \mathbb{R}^n have the same length \ell, and let θ\theta be the angle between them. Then

vw2=(vw)(vw)=v22(vw)+w2.\|\mathbf{v} - \mathbf{w}\|^2 = (\mathbf{v} - \mathbf{w}) \cdot (\mathbf{v} - \mathbf{w}) = \|\mathbf{v}\|^2 - 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2.

Since vw=vwcosθ=2cosθ\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\|\,\|\mathbf{w}\|\cos\theta = \ell^2\cos\theta, this becomes

vw2=22(1cosθ).\|\mathbf{v} - \mathbf{w}\|^2 = 2\ell^2(1 - \cos\theta).

So if the angle is small, then cosθ\cos\theta is close to 11 and the vectors are close together. For unit vectors, the quantity cosθ=vw\cos\theta = \mathbf{v} \cdot \mathbf{w} itself becomes a direct measure of similarity. This is why cosine similarity appears so often in document retrieval, embeddings, and other large-scale data problems.

Example 13 (Correlation Coefficient)

Given data points (x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n), first recenter the data by subtracting the averages:

x^i=xix,y^i=yiy.\widehat{x}_i = x_i - \overline{x}, \qquad \widehat{y}_i = y_i - \overline{y}.

Then form the two vectors

X=[x^1x^2x^n],Y=[y^1y^2y^n].\mathbf{X} = \begin{bmatrix} \widehat{x}_1 \\ \widehat{x}_2 \\ \vdots \\ \widehat{x}_n \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} \widehat{y}_1 \\ \widehat{y}_2 \\ \vdots \\ \widehat{y}_n \end{bmatrix}.

The correlation coefficient is

r=XYXY.r = \frac{\mathbf{X} \cdot \mathbf{Y}}{\|\mathbf{X}\|\,\|\mathbf{Y}\|}.

Thus rr is the cosine of the angle between the centred data vectors. It always lies in the interval [1,1][-1, 1]. Values near 11 indicate a strong positive linear relationship, values near 1-1 a strong negative linear relationship, and values near 00 suggest little linear correlation.

Example 14 (A Computed Correlation Example)

Consider the five data points

(3,4), (2,1), (0,1), (1,1), (4,3).(-3,4),\ (-2,1),\ (0,-1),\ (1,-1),\ (4,-3).

Their xx-coordinates and yy-coordinates already have average 00, so no recentering is needed. The associated vectors are

X=[32014],Y=[41113].\mathbf{X} = \begin{bmatrix} -3 \\ -2 \\ 0 \\ 1 \\ 4 \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} 4 \\ 1 \\ -1 \\ -1 \\ -3 \end{bmatrix}.

We compute

XY=122+0112=27,\mathbf{X} \cdot \mathbf{Y} = -12 - 2 + 0 - 1 - 12 = -27,

and

X=30,Y=28.\|\mathbf{X}\| = \sqrt{30}, \qquad \|\mathbf{Y}\| = \sqrt{28}.

Therefore the correlation coefficient is

r=XYXY=2730280.9316.r = \frac{\mathbf{X} \cdot \mathbf{Y}}{\|\mathbf{X}\|\,\|\mathbf{Y}\|} = \frac{-27}{\sqrt{30}\sqrt{28}} \approx -0.9316.

This is close to 1-1, so the data are strongly clustered around a line of negative slope.

Problem 3

Consider the four data points

(1,1), (1,1), (k,k), (k,k),(1,1),\ (-1,-1),\ (k,-k),\ (-k,k),

where k0k \neq 0.

(a) Compute the correlation coefficient when k=1k=1.

(b) Compute the correlation coefficient r(k)r(k) for general kk.

(c) Explain algebraically why 1<r(k)<1-1 < r(k) < 1 for every non-zero kk, and then explain the same fact geometrically.

Problem 4

Let

u=[2130],v=[1422],w=[101].\mathbf{u} = \begin{bmatrix} 2 \\ -1 \\ 3 \\ 0 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ 4 \\ -2 \\ 2 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}.

Compute uv\mathbf{u} \cdot \mathbf{v}. Use its sign to decide whether the angle between u\mathbf{u} and v\mathbf{v} is acute, right, or obtuse. Then explain why uw\mathbf{u} \cdot \mathbf{w} is not defined.

Theorem 5 (Triangle Inequality in Rn\mathbb{R}^n)

For any vectors x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n,

x+yx+y.\|\mathbf{x} + \mathbf{y}\| \le \|\mathbf{x}\| + \|\mathbf{y}\|.

Equivalently, for any u,v,wRn\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathbb{R}^n,

d(u,w)d(u,v)+d(v,w).d(\mathbf{u}, \mathbf{w}) \le d(\mathbf{u}, \mathbf{v}) + d(\mathbf{v}, \mathbf{w}).
Proof

By the dot-product identity,

x+y2=(x+y)(x+y)=x2+2(xy)+y2.\|\mathbf{x} + \mathbf{y}\|^2 = (\mathbf{x} + \mathbf{y}) \cdot (\mathbf{x} + \mathbf{y}) = \|\mathbf{x}\|^2 + 2(\mathbf{x} \cdot \mathbf{y}) + \|\mathbf{y}\|^2.

Applying Cauchy-Schwarz gives

xyxyxy,\mathbf{x} \cdot \mathbf{y} \le |\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|,

so

x+y2x2+2xy+y2=(x+y)2.\|\mathbf{x} + \mathbf{y}\|^2 \le \|\mathbf{x}\|^2 + 2\|\mathbf{x}\|\,\|\mathbf{y}\| + \|\mathbf{y}\|^2 = (\|\mathbf{x}\| + \|\mathbf{y}\|)^2.

Taking square roots yields the first inequality.

For the distance form, observe that

wu=(vu)+(wv),\mathbf{w} - \mathbf{u} = (\mathbf{v} - \mathbf{u}) + (\mathbf{w} - \mathbf{v}),

and then apply the first inequality.

Problem 5

Use the triangle inequality to prove that for any x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^n,

xyxy.\bigl|\|\mathbf{x}\| - \|\mathbf{y}\|\bigr| \le \|\mathbf{x} - \mathbf{y}\|.

Unit Vectors in Rn\mathbb{R}^n

Definition 6 (Zero Vector and Unit Vector in Rn\mathbb{R}^n)

The zero vector in Rn\mathbb{R}^n is

0=[000].\mathbf{0} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}.

A vector uRn\mathbf{u} \in \mathbb{R}^n is called a unit vector if u=1\|\mathbf{u}\| = 1.

As before, every magnitude is non-negative. Moreover, if v=0\|\mathbf{v}\| = 0, then

v12+v22++vn2=0,v_1^2 + v_2^2 + \cdots + v_n^2 = 0,

so each component must vanish, and therefore v=0\mathbf{v} = \mathbf{0}. Thus if v0\mathbf{v} \neq \mathbf{0}, division by v\|\mathbf{v}\| is legitimate and produces a new vector whose magnitude is 11.

Theorem 6 (Normalising a Non-zero Vector)

Let vRn\mathbf{v} \in \mathbb{R}^n be non-zero. Then

1vv\frac{1}{\|\mathbf{v}\|}\mathbf{v}

is a unit vector. Moreover, it is the unique unit vector obtained from v\mathbf{v} by multiplication by a positive scalar.

Proof

Since v0\mathbf{v} \neq \mathbf{0}, we have v>0\|\mathbf{v}\| > 0. Using the homogeneity of magnitude,

1vv=1vv=1.\left\|\frac{1}{\|\mathbf{v}\|}\mathbf{v}\right\| = \frac{1}{\|\mathbf{v}\|}\,\|\mathbf{v}\| = 1.

So this vector is indeed a unit vector.

Now let c>0c > 0. Then cvc\mathbf{v} is a positive scalar multiple of v\mathbf{v}, so geometrically it has the same direction as v\mathbf{v}. If cvc\mathbf{v} is required to be a unit vector, then

cv=cv=1,\|c\mathbf{v}\| = c\|\mathbf{v}\| = 1,

whence

c=1v.c = \frac{1}{\|\mathbf{v}\|}.

So no other positive scalar multiple of v\mathbf{v} can have magnitude 11.

The vector

v^=vv\widehat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

is often called the unit vector in the direction of v\mathbf{v}. The unit vector in the opposite direction is

v^=vv.-\widehat{\mathbf{v}} = -\frac{\mathbf{v}}{\|\mathbf{v}\|}.

The standard coordinate vectors e1,,en\mathbf{e}_1, \dots, \mathbf{e}_n, each having a single entry equal to 11 and all remaining entries equal to 00, are basic examples of unit vectors.

Example 15 (Normalising a Vector in R3\mathbb{R}^3)

Let

v=[221].\mathbf{v} = \begin{bmatrix} 2 \\ -2 \\ 1 \end{bmatrix}.

Then

v=22+(2)2+12=4+4+1=3.\|\mathbf{v}\| = \sqrt{2^2 + (-2)^2 + 1^2} = \sqrt{4 + 4 + 1} = 3.

Hence the corresponding unit vector is

v^=13[221]=[2/32/31/3].\widehat{\mathbf{v}} = \frac{1}{3} \begin{bmatrix} 2 \\ -2 \\ 1 \end{bmatrix} = \begin{bmatrix} 2/3 \\ -2/3 \\ 1/3 \end{bmatrix}.

A direct check gives

v^=(2/3)2+(2/3)2+(1/3)2=4/9+4/9+1/9=1.\|\widehat{\mathbf{v}}\| = \sqrt{(2/3)^2 + (-2/3)^2 + (1/3)^2} = \sqrt{4/9 + 4/9 + 1/9} = 1.
Example 16 (A Compass Direction)

Suppose the positive xx-axis points east and the positive yy-axis points north. A vector pointing exactly north-west is

[11].\begin{bmatrix} -1 \\ 1 \end{bmatrix}.

Its magnitude is

(1)2+12=2.\sqrt{(-1)^2 + 1^2} = \sqrt{2}.

Therefore the unit vector pointing north-west is

12[11]=[1/21/2].\frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1/\sqrt{2} \\ 1/\sqrt{2} \end{bmatrix}.
Problem 6

Let

u=[111]Rn.\mathbf{u} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \in \mathbb{R}^n.

Find u\|\mathbf{u}\| and the unit vector pointing in the same direction as u\mathbf{u}.

Example 17 (K-means and Distance in Rn\mathbb{R}^n)

Suppose data points x1,x2,,xm\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_m in Rn\mathbb{R}^n are to be split into KK clusters. The K-means method begins with centres c1,,cK\mathbf{c}_1, \dots, \mathbf{c}_K and assigns each data vector to the centre with smallest distance:

xi is assigned to the cluster whose centre cj minimises d(xi,cj).\mathbf{x}_i \text{ is assigned to the cluster whose centre } \mathbf{c}_j \text{ minimises } d(\mathbf{x}_i, \mathbf{c}_j).

After that, each centre is replaced by the average of the vectors assigned to its cluster. Thus the method depends directly on two ideas already present in these notes: averaging vectors and measuring distances in Rn\mathbb{R}^n.

Remark (Higher Dimensions)

Once n>3n > 3, there is no need to force a literal spatial interpretation onto every vector. The point of Rn\mathbb{R}^n is practical, not mystical: many problems naturally involve many variables, and the geometric language of magnitude, distance, and direction gives a disciplined way to think about them.

Exercises

Exercise 1 (Arithmetic in Higher Dimension)

Let

u=[5216],v=[1310],w=[2421].\mathbf{u} = \begin{bmatrix} 5 \\ -2 \\ 1 \\ 6 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ 3 \\ -1 \\ 0 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} 2 \\ -4 \\ 2 \\ 1 \end{bmatrix}.

(a) Compute u2v+3w\mathbf{u} - 2\mathbf{v} + 3\mathbf{w}.

(b) Find the vector x\mathbf{x} such that

u+x=2vw.\mathbf{u} + \mathbf{x} = 2\mathbf{v} - \mathbf{w}.
Exercise 2 (Distance in R4\mathbb{R}^4)

Let

p=[2140],q=[1212].\mathbf{p} = \begin{bmatrix} 2 \\ -1 \\ 4 \\ 0 \end{bmatrix}, \qquad \mathbf{q} = \begin{bmatrix} -1 \\ 2 \\ 1 \\ -2 \end{bmatrix}.

Compute d(p,q)d(\mathbf{p}, \mathbf{q}).

Exercise 3 (Dot Products and Angles)

Let

u=[3101/2],v=[1236].\mathbf{u} = \begin{bmatrix} 3 \\ 1 \\ 0 \\ 1/2 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ -2 \\ 3 \\ 6 \end{bmatrix}.

(a) Compute uv\mathbf{u} \cdot \mathbf{v}.

(b) Compute the angle between u\mathbf{u} and v\mathbf{v}.

(c) Decide whether the angle is acute, right, or obtuse.

Exercise 4 (Normalising Vectors)

For each of the following vectors, compute its magnitude and the unit vector pointing in the same direction.

(a)

[815]\begin{bmatrix} -8 \\ 15 \end{bmatrix}

(b)

[111]Rn\begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \in \mathbb{R}^n

Your answer in part (b) should be expressed in terms of nn.

Exercise 5 (A Metric Identity)

Prove that for any v,wRn\mathbf{v}, \mathbf{w} \in \mathbb{R}^n,

v+w2+vw2=2v2+2w2.\|\mathbf{v} + \mathbf{w}\|^2 + \|\mathbf{v} - \mathbf{w}\|^2 = 2\|\mathbf{v}\|^2 + 2\|\mathbf{w}\|^2.
Exercise 6 (A First Clustering Computation)

Five customers rate two drinks on a scale from 11 to 55. The first entry records the rating for tea and the second the rating for coffee:

x1=[11],x2=[14],x3=[23],x4=[41],x5=[42].\mathbf{x}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \mathbf{x}_2 = \begin{bmatrix} 1 \\ 4 \end{bmatrix}, \quad \mathbf{x}_3 = \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \quad \mathbf{x}_4 = \begin{bmatrix} 4 \\ 1 \end{bmatrix}, \quad \mathbf{x}_5 = \begin{bmatrix} 4 \\ 2 \end{bmatrix}.

Use K=2K=2 clusters with initial centres

c1=x2,c2=x4.\mathbf{c}_1 = \mathbf{x}_2, \qquad \mathbf{c}_2 = \mathbf{x}_4.

(a) Assign each data point to the nearer centre.

(b) Compute the updated centre of each cluster.

(c) Reassign the five data points using the updated centres, and determine whether the clustering has stabilised.

Exercise 7 (Three Unit Vectors with Prescribed Dot Products)

Let a,b,c\mathbf{a}, \mathbf{b}, \mathbf{c} be unit vectors in Rn\mathbb{R}^n such that

ab=0,ac=12,bc=15.\mathbf{a} \cdot \mathbf{b} = 0, \qquad \mathbf{a} \cdot \mathbf{c} = \frac{1}{2}, \qquad \mathbf{b} \cdot \mathbf{c} = \frac{1}{5}.

Using the identity v2=vv\|\mathbf{v}\|^2 = \mathbf{v} \cdot \mathbf{v}, compute:

(a) a+b2\|\mathbf{a} + \mathbf{b}\|^2

(b) bc2\|\mathbf{b} - \mathbf{c}\|^2

(c) a+b+c2\|\mathbf{a} + \mathbf{b} + \mathbf{c}\|^2

Exercise 8 (Long Orthogonal Vectors)

Give explicit examples of:

(a) two orthogonal 10001000-vectors with no entries equal to 00

(b) two orthogonal 999999-vectors with no entries equal to 00

Exercise 9 (Three Sign Possibilities)

Using

v+w2=(v+w)(v+w)=v2+2(vw)+w2,\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w}) = \|\mathbf{v}\|^2 + 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2,

give non-zero vectors v,wR3\mathbf{v}, \mathbf{w} \in \mathbb{R}^3 satisfying each of the following:

(a) v+w2=v2+w2\|\mathbf{v} + \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2

(b) v+w2>v2+w2\|\mathbf{v} + \mathbf{w}\|^2 > \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2

(c) v+w2<v2+w2\|\mathbf{v} + \mathbf{w}\|^2 < \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2