16/03/2026

#Math#Vectors#Geometry

Lesson assets

No linked assets.

Beyond Three Dimensions: The Space $\mathbb{R}^n$

While we have extensively visualised vectors as directed segments with magnitude and direction in $\mathbb{R}^2$ and $\mathbb{R}^3$ , the algebraic structure we have built is not confined to two or three dimensions. By defining a vector algebraically as an ordered list of numbers, we can seamlessly extend these concepts to spaces of any dimension.

Definition 1 (n-vector)

For a positive integer n, an n-vector is an ordered list of n real numbers, written as a column. The collection of all possible n-vectors is denoted by $\mathbb{R}^n$ .

Example 1

The column containing $(-3, 4.9, 1/2, 0, 1)$ written vertically is a 5-vector belonging to $\mathbb{R}^5$ .

Remark (Points and Vectors in $\mathbb{R}^n$)

Some linear algebra texts make no distinction between a point of $\mathbb{R}^n$ and a vector of $\mathbb{R}^n$ , since both are specified by the same ordered list of real numbers. In these notes we keep the earlier convention: points are written with capital letters and parentheses, whereas vectors are written in bold as columns. The underlying numerical data are the same; what changes is the geometric interpretation. Once an origin has been fixed, every point determines a position vector and every position vector determines a point.

Once n > 3, the classical physical interpretation of a vector as an arrow possessing magnitude and direction is no longer a meaningful way to conceptualise the space. However, vectors in higher dimensions are indispensable across both the physical sciences and data-driven fields. In complex physical systems, describing the complete state of a model often requires grouping numerous 3-vectors into a single, massive n-vector.

It is equally important to recognise a second point of view. An n-vector may simply be a structured list of numerical data. In that setting, the entries need not describe motion in space at all. They may record measurements, scores, counts, or coded information drawn from some underlying system.

Example 2 (Rainfall Record)

Suppose a weather station records the rainfall on each day of a non-leap year. These measurements may be arranged into the vector

\mathbf{R} = \begin{bmatrix} r_1 \\ r_2 \\ \vdots \\ r_{365} \end{bmatrix},

where $r_k$ is the rainfall measured on the $k$ th day. The entries do not describe a point moving through space; they simply collect related numerical information into a single mathematical object.

Example 3 (Inventory Data)

Suppose a warehouse stores $m$ different products, and let $q_k$ denote the number of units currently held of the $k$ th product. The entire stock list may be written as

\mathbf{q} = \begin{bmatrix} q_1 \\ q_2 \\ \vdots \\ q_m \end{bmatrix}.

Such a vector provides a compact way to encode the state of the warehouse at a given time.

Example 4 (Sensor Data)

Consider a large engineering system fitted with $n$ sensors. If $s_k$ denotes the reading of the $k$ th sensor at a fixed moment, then the entire state of the sensor network may be represented by

\mathbf{s} = \begin{bmatrix} s_1 \\ s_2 \\ \vdots \\ s_n \end{bmatrix},

where $n$ may be very large. Again, the point is not that $\mathbf{s}$ is an arrow in ordinary space, but that vector notation organises a large collection of related measurements in a form that algebra can handle cleanly.

Calling such lists vectors is not merely a change of language. Once the data are placed into $\mathbb{R}^n$ , the algebra developed throughout these notes becomes available. We may add vectors, scale them, compare them, and later use further tools to detect structure hidden inside the data.

The extension is not limited to addition and scalar multiplication. The metric ideas developed earlier also admit a uniform algebraic formulation. In $\mathbb{R}^2$ and $\mathbb{R}^3$ these formulas were recovered from geometry. In $\mathbb{R}^n$ we now adopt them as definitions, since once $n > 3$ there is no longer a literal picture in ordinary space to appeal to.

Example 5 (Distance from a Starting Point in Space)

Suppose a signal lamp is mounted on a framework so that, relative to a fixed origin, it lies $4$ units in the positive $x$ -direction, $3$ units in the positive $y$ -direction, and $12$ units above the $xy$ -plane. Its position vector is

\mathbf{p} = \begin{bmatrix} 4 \\ 3 \\ 12 \end{bmatrix}.

The question “how far is the lamp from the starting point?” is exactly the question of finding the length of $\mathbf{p}$ . Using the three-dimensional distance formula,

\|\mathbf{p}\| = \sqrt{4^2 + 3^2 + 12^2} = \sqrt{16 + 9 + 144} = \sqrt{169} = 13.

This is the same pattern already seen in the plane and in space: the distance from the origin to a point is the magnitude of its position vector.

Definition 2 (Magnitude in

\mathbb{R}^n

)

Let

\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n.

The magnitude (or norm) of $\mathbf{v}$ is defined by

\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}.

When $n=2$ or $n=3$ , this reduces to the formulas already established in Lesson 1AM and earlier in this lesson. For larger $n$ , the formula is no longer visual, but it remains the correct algebraic continuation of the same metric structure.

Theorem 1 (Basic Magnitude Identities in

\mathbb{R}^n

)

Let $\mathbf{v} \in \mathbb{R}^n$ and let $c \in \mathbb{R}$ . Then

$\|-\mathbf{v}\| = \|\mathbf{v}\|$ .
$\|c\mathbf{v}\| = |c|\,\|\mathbf{v}\|$ .

Proof

If $\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}$ , then

\|-\mathbf{v}\| = \sqrt{(-v_1)^2 + (-v_2)^2 + \cdots + (-v_n)^2} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \|\mathbf{v}\|.

Also,

\|c\mathbf{v}\| = \sqrt{(cv_1)^2 + (cv_2)^2 + \cdots + (cv_n)^2} = \sqrt{c^2(v_1^2 + v_2^2 + \cdots + v_n^2)} = |c|\,\|\mathbf{v}\|.

■

Definition 3 (Distance in

\mathbb{R}^n

)

Let $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ . The distance between $\mathbf{x}$ and $\mathbf{y}$ is defined by

d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|.

Thus the distance between two vectors is computed by taking the difference vector and then measuring its magnitude. This agrees with the ordinary distance formulas in $\mathbb{R}^2$ and $\mathbb{R}^3$ , where $\mathbf{x} - \mathbf{y}$ is the displacement from the point represented by $\mathbf{y}$ to the point represented by $\mathbf{x}$ . In higher dimensions we keep the same formula because it extends those lower-dimensional cases exactly and behaves in the way a distance should. In particular,

d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x}),

since $\mathbf{y} - \mathbf{x} = -(\mathbf{x} - \mathbf{y})$ and $\|-\mathbf{v}\| = \|\mathbf{v}\|$ .

Note (Notation)

When primes are used on vectors or parameters, as in $\mathbf{v}'$ , $\mathbf{v}''$ , or $t'$ , they simply distinguish one object from another. They do not denote derivatives here.

Example 6 (Distance Computation in

\mathbb{R}^4

)

Let

\mathbf{x} = \begin{bmatrix} 1 \\ 2 \\ -1 \\ 3 \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} 3 \\ -1 \\ 0 \\ -1 \end{bmatrix}.

Then

\mathbf{x} - \mathbf{y} = \begin{bmatrix} -2 \\ 3 \\ -1 \\ 4 \end{bmatrix},

so the distance between $\mathbf{x}$ and $\mathbf{y}$ is

d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\| = \sqrt{(-2)^2 + 3^2 + (-1)^2 + 4^2} = \sqrt{4 + 9 + 1 + 16} = \sqrt{30}.

Problem 1

Let

\mathbf{a} = \begin{bmatrix} 2 \\ -1 \\ 0 \\ 3 \end{bmatrix}, \qquad \mathbf{b} = \begin{bmatrix} -1 \\ 1 \\ 2 \\ -1 \end{bmatrix}.

Compute $\mathbf{a} - \mathbf{b}$ and $d(\mathbf{a}, \mathbf{b})$ . Then compute $\mathbf{b} - \mathbf{a}$ and explain why the distance is unchanged.

Dot Products and Angles in $\mathbb{R}^n$

Length and distance are not the end of the metric story. In the plane and in space, the Law of Cosines tied lengths to angles, and the dot product gave an algebraic way to package that relation. The same algebraic construction extends to $\mathbb{R}^n$ .

Definition 4 (Dot Product in

\mathbb{R}^n

)

Let

\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \in \mathbb{R}^n.

Their dot product is the scalar

\mathbf{x} \cdot \mathbf{y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n = \sum_{i=1}^n x_i y_i.

This is only defined when the two vectors belong to the same $\mathbb{R}^n$ .

Remark (Summation Notation)

The symbol

\sum_{i=1}^n x_i y_i

means “add the expression $x_i y_i$ as the index $i$ runs from $1$ to $n$ ”. In other words,

\sum_{i=1}^n x_i y_i = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n.

The letter $i$ is only a running index: writing $\sum_{k=1}^n x_k y_k$ would mean exactly the same thing.

As in the earlier lessons, the output of a dot product is a scalar, not a vector. The self dot product recovers the square of the magnitude:

\mathbf{x} \cdot \mathbf{x} = x_1^2 + x_2^2 + \cdots + x_n^2 = \|\mathbf{x}\|^2.

Theorem 2 (Algebraic Properties of the Dot Product in

\mathbb{R}^n

)

For all $\mathbf{x}, \mathbf{y}, \mathbf{z} \in \mathbb{R}^n$ and all scalars $r \in \mathbb{R}$ :

Symmetry: $\mathbf{x} \cdot \mathbf{y} = \mathbf{y} \cdot \mathbf{x}$ .
Bilinearity: $(r\mathbf{x} + \mathbf{y}) \cdot \mathbf{z} = r(\mathbf{x} \cdot \mathbf{z}) + \mathbf{y} \cdot \mathbf{z}$ , and likewise in the second argument.
Positive Definiteness: $\mathbf{x} \cdot \mathbf{x} \ge 0$ , with equality if and only if $\mathbf{x} = \mathbf{0}$ .

Proof

Each statement is verified component by component exactly as in $\mathbb{R}^2$ and $\mathbb{R}^3$ ; the only difference is that there are $n$ summands instead of two or three.

■

Example 7 (Using the Algebraic Rules)

Let

\mathbf{u} = \begin{bmatrix} 1 \\ 2 \\ -1 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 3 \\ 0 \\ 4 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} -2 \\ 1 \\ 5 \end{bmatrix}.

Then

\mathbf{u} \cdot \mathbf{v} = 1 \cdot 3 + 2 \cdot 0 + (-1)\cdot 4 = -1,

and by symmetry $\mathbf{v} \cdot \mathbf{u} = -1$ as well. Also,

\mathbf{u} \cdot \mathbf{u} = 1^2 + 2^2 + (-1)^2 = 6 = \|\mathbf{u}\|^2.

Finally,

\mathbf{u} \cdot (2\mathbf{v} - \mathbf{w}) = 2(\mathbf{u} \cdot \mathbf{v}) - \mathbf{u} \cdot \mathbf{w} = 2(-1) - \bigl((-2) + 2 - 5\bigr) = 3.

Example 8 (Computing Dot Products)

For

\begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix} \cdot \begin{bmatrix} 3 \\ -2 \\ 4 \end{bmatrix},

we obtain

2 \cdot 3 + 1 \cdot (-2) + 3 \cdot 4 = 16.

Likewise, in $\mathbb{R}^4$ ,

\begin{bmatrix} 3 \\ 1 \\ 0 \\ 1/2 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ -2 \\ 3 \\ 6 \end{bmatrix} = 3 - 2 + 0 + 3 = 4.

Example 9 (Prices and Quantities)

Suppose an economy tracks $n$ goods. Let

\mathbf{p} = \begin{bmatrix} p_1 \\ p_2 \\ \vdots \\ p_n \end{bmatrix}

be the price vector and

\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

the quantity vector for a particular consumer. Then

\mathbf{p} \cdot \mathbf{x} = p_1x_1 + p_2x_2 + \cdots + p_nx_n

is the total amount spent. This is a typical higher-dimensional use of the dot product: the vectors encode data, not directions in physical space.

Theorem 3 (Cauchy-Schwarz in

\mathbb{R}^n

)

For all vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ ,

-\|\mathbf{x}\|\,\|\mathbf{y}\| \le \mathbf{x} \cdot \mathbf{y} \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

Equivalently,

|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

Moreover, equality holds in one of these inequalities if and only if one of the two vectors is a scalar multiple of the other.

Proof

If $\mathbf{x} = \mathbf{0}$ or $\mathbf{y} = \mathbf{0}$ , the statement is immediate, and equality certainly holds because $\mathbf{0}$ is a scalar multiple of every vector. Assume from now on that both vectors are non-zero.

For any real number $t$ ,

0 \le \|\mathbf{x} + t\mathbf{y}\|^2 = \|\mathbf{y}\|^2 t^2 + 2(\mathbf{x} \cdot \mathbf{y})t + \|\mathbf{x}\|^2.

Thus the quadratic polynomial

q(t) = \|\mathbf{y}\|^2 t^2 + 2(\mathbf{x} \cdot \mathbf{y})t + \|\mathbf{x}\|^2

is non-negative for every real $t$ . Since its leading coefficient $\|\mathbf{y}\|^2$ is positive, its discriminant must be non-positive:

\bigl(2(\mathbf{x} \cdot \mathbf{y})\bigr)^2 - 4\|\mathbf{y}\|^2\|\mathbf{x}\|^2 \le 0.

Rearranging gives

(\mathbf{x} \cdot \mathbf{y})^2 \le \|\mathbf{x}\|^2\|\mathbf{y}\|^2,

and taking square roots yields

|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|.

This is equivalent to the two-sided inequality.

For the equality case, equality in Cauchy-Schwarz means

(\mathbf{x} \cdot \mathbf{y})^2 = \|\mathbf{x}\|^2\|\mathbf{y}\|^2,

so the discriminant of $q(t)$ is $0$ . Hence $q(t)$ has a real root, meaning $\|\mathbf{x} + t\mathbf{y}\|^2 = 0$ for some $t$ , so $\mathbf{x} + t\mathbf{y} = \mathbf{0}$ . Thus one vector is a scalar multiple of the other. Conversely, if $\mathbf{x} = c\mathbf{y}$ for some scalar $c$ , then

\mathbf{x} \cdot \mathbf{y} = c\,\|\mathbf{y}\|^2, \qquad \|\mathbf{x}\|\,\|\mathbf{y}\| = |c|\,\|\mathbf{y}\|^2,

so equality holds.

■

Definition 5 (Angle and Orthogonality in

\mathbb{R}^n

)

Let $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ be non-zero. Their angle is the unique number $\theta \in [0, \pi]$ satisfying

\cos\theta = \frac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\|\,\|\mathbf{y}\|}.

Because of Cauchy-Schwarz, the right-hand side always lies in the interval $[-1, 1]$ , so this definition makes sense.

Two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ are orthogonal (or perpendicular) if

\mathbf{x} \cdot \mathbf{y} = 0.

Remark (Changing Orientation)

If one of the two vectors is multiplied by $-1$ , the dot product changes sign while the magnitudes do not. The cosine therefore changes sign, so the angle is replaced by its supplementary angle $\pi - \theta$ . This is the same acute-versus-obtuse ambiguity that appears when one speaks about the angle between two lines through the origin rather than between two directed vectors.

Example 10 (Angles with Scalar Multiples)

Let $\mathbf{x} \in \mathbb{R}^n$ be non-zero and let $c \neq 0$ be a scalar. The angle between $\mathbf{x}$ and $c\mathbf{x}$ behaves exactly as geometric intuition suggests.

Indeed,

\cos\theta = \frac{(c\mathbf{x}) \cdot \mathbf{x}}{\|c\mathbf{x}\|\,\|\mathbf{x}\|} = \frac{c(\mathbf{x} \cdot \mathbf{x})}{|c|\,\|\mathbf{x}\|^2} = \frac{c}{|c|}.

If $c > 0$ , then $\cos\theta = 1$ , so $\theta = 0$ . If $c < 0$ , then $\cos\theta = -1$ , so $\theta = \pi$ .

In fact, by the equality case of Cauchy-Schwarz, these are the only situations in which the angle between two non-zero vectors can be $0$ or $\pi$ : that happens precisely when one is a scalar multiple of the other.

Example 11 (Angle Computation in

\mathbb{R}^4

)

Let

\mathbf{x} = \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}.

Then

\mathbf{x} \cdot \mathbf{y} = 1, \qquad \|\mathbf{x}\| = \sqrt{2}, \qquad \|\mathbf{y}\| = \sqrt{2}.

Hence

\cos\theta = \frac{1}{(\sqrt{2})(\sqrt{2})} = \frac{1}{2},

so the angle between $\mathbf{x}$ and $\mathbf{y}$ is $\theta = \pi/3$ .

Theorem 4 (Pythagoras in

\mathbb{R}^n

)

If $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ are orthogonal, then

\|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2.

Proof

Since $\mathbf{u} \perp \mathbf{v}$ , we have $\mathbf{u} \cdot \mathbf{v} = 0$ . Therefore

\|\mathbf{u} + \mathbf{v}\|^2 = (\mathbf{u} + \mathbf{v}) \cdot (\mathbf{u} + \mathbf{v}) = \mathbf{u} \cdot \mathbf{u} + \mathbf{u} \cdot \mathbf{v} + \mathbf{v} \cdot \mathbf{u} + \mathbf{v} \cdot \mathbf{v}.

By symmetry and orthogonality, the middle two terms are both $0$ , so

\|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2.

■

Problem 2

Let $\mathbf{v}, \mathbf{w} \in \mathbb{R}^n$ be non-zero.

(a) Show that if $\mathbf{v} + \mathbf{w}$ and $\mathbf{v} - \mathbf{w}$ are perpendicular, then $\|\mathbf{v}\| = \|\mathbf{w}\|$ .

(b) Show the converse: if $\|\mathbf{v}\| = \|\mathbf{w}\|$ , then $\mathbf{v} + \mathbf{w}$ and $\mathbf{v} - \mathbf{w}$ are perpendicular.

Example 12 (Cosine Similarity)

Suppose $\mathbf{v}, \mathbf{w} \in \mathbb{R}^n$ have the same length $\ell$ , and let $\theta$ be the angle between them. Then

\|\mathbf{v} - \mathbf{w}\|^2 = (\mathbf{v} - \mathbf{w}) \cdot (\mathbf{v} - \mathbf{w}) = \|\mathbf{v}\|^2 - 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2.

Since $\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\|\,\|\mathbf{w}\|\cos\theta = \ell^2\cos\theta$ , this becomes

\|\mathbf{v} - \mathbf{w}\|^2 = 2\ell^2(1 - \cos\theta).

So if the angle is small, then $\cos\theta$ is close to $1$ and the vectors are close together. For unit vectors, the quantity $\cos\theta = \mathbf{v} \cdot \mathbf{w}$ itself becomes a direct measure of similarity. This is why cosine similarity appears so often in document retrieval, embeddings, and other large-scale data problems.

Example 13 (Correlation Coefficient)

Given data points $(x_1, y_1), \dots, (x_n, y_n)$ , first recenter the data by subtracting the averages:

\widehat{x}_i = x_i - \overline{x}, \qquad \widehat{y}_i = y_i - \overline{y}.

Then form the two vectors

\mathbf{X} = \begin{bmatrix} \widehat{x}_1 \\ \widehat{x}_2 \\ \vdots \\ \widehat{x}_n \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} \widehat{y}_1 \\ \widehat{y}_2 \\ \vdots \\ \widehat{y}_n \end{bmatrix}.

The correlation coefficient is

r = \frac{\mathbf{X} \cdot \mathbf{Y}}{\|\mathbf{X}\|\,\|\mathbf{Y}\|}.

Thus $r$ is the cosine of the angle between the centred data vectors. It always lies in the interval $[-1, 1]$ . Values near $1$ indicate a strong positive linear relationship, values near $-1$ a strong negative linear relationship, and values near $0$ suggest little linear correlation.

Example 14 (A Computed Correlation Example)

Consider the five data points

(-3,4),\ (-2,1),\ (0,-1),\ (1,-1),\ (4,-3).

Their $x$ -coordinates and $y$ -coordinates already have average $0$ , so no recentering is needed. The associated vectors are

\mathbf{X} = \begin{bmatrix} -3 \\ -2 \\ 0 \\ 1 \\ 4 \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} 4 \\ 1 \\ -1 \\ -1 \\ -3 \end{bmatrix}.

We compute

\mathbf{X} \cdot \mathbf{Y} = -12 - 2 + 0 - 1 - 12 = -27,

and

\|\mathbf{X}\| = \sqrt{30}, \qquad \|\mathbf{Y}\| = \sqrt{28}.

Therefore the correlation coefficient is

r = \frac{\mathbf{X} \cdot \mathbf{Y}}{\|\mathbf{X}\|\,\|\mathbf{Y}\|} = \frac{-27}{\sqrt{30}\sqrt{28}} \approx -0.9316.

This is close to $-1$ , so the data are strongly clustered around a line of negative slope.

Problem 3

Consider the four data points

(1,1),\ (-1,-1),\ (k,-k),\ (-k,k),

where $k \neq 0$ .

(a) Compute the correlation coefficient when $k=1$ .

(b) Compute the correlation coefficient $r(k)$ for general $k$ .

Problem 4

Let

\mathbf{u} = \begin{bmatrix} 2 \\ -1 \\ 3 \\ 0 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ 4 \\ -2 \\ 2 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}.

Compute $\mathbf{u} \cdot \mathbf{v}$ . Use its sign to decide whether the angle between $\mathbf{u}$ and $\mathbf{v}$ is acute, right, or obtuse. Then explain why $\mathbf{u} \cdot \mathbf{w}$ is not defined.

Theorem 5 (Triangle Inequality in

\mathbb{R}^n

)

For any vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ ,

\|\mathbf{x} + \mathbf{y}\| \le \|\mathbf{x}\| + \|\mathbf{y}\|.

Equivalently, for any $\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathbb{R}^n$ ,

d(\mathbf{u}, \mathbf{w}) \le d(\mathbf{u}, \mathbf{v}) + d(\mathbf{v}, \mathbf{w}).

Proof

By the dot-product identity,

\|\mathbf{x} + \mathbf{y}\|^2 = (\mathbf{x} + \mathbf{y}) \cdot (\mathbf{x} + \mathbf{y}) = \|\mathbf{x}\|^2 + 2(\mathbf{x} \cdot \mathbf{y}) + \|\mathbf{y}\|^2.

Applying Cauchy-Schwarz gives

\mathbf{x} \cdot \mathbf{y} \le |\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\|\,\|\mathbf{y}\|,

\|\mathbf{x} + \mathbf{y}\|^2 \le \|\mathbf{x}\|^2 + 2\|\mathbf{x}\|\,\|\mathbf{y}\| + \|\mathbf{y}\|^2 = (\|\mathbf{x}\| + \|\mathbf{y}\|)^2.

Taking square roots yields the first inequality.

For the distance form, observe that

\mathbf{w} - \mathbf{u} = (\mathbf{v} - \mathbf{u}) + (\mathbf{w} - \mathbf{v}),

and then apply the first inequality.

■

Problem 5

Use the triangle inequality to prove that for any $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ ,

\bigl|\|\mathbf{x}\| - \|\mathbf{y}\|\bigr| \le \|\mathbf{x} - \mathbf{y}\|.

Unit Vectors in $\mathbb{R}^n$

Definition 6 (Zero Vector and Unit Vector in

\mathbb{R}^n

)

The zero vector in $\mathbb{R}^n$ is

\mathbf{0} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}.

A vector $\mathbf{u} \in \mathbb{R}^n$ is called a unit vector if $\|\mathbf{u}\| = 1$ .

As before, every magnitude is non-negative. Moreover, if $\|\mathbf{v}\| = 0$ , then

v_1^2 + v_2^2 + \cdots + v_n^2 = 0,

so each component must vanish, and therefore $\mathbf{v} = \mathbf{0}$ . Thus if $\mathbf{v} \neq \mathbf{0}$ , division by $\|\mathbf{v}\|$ is legitimate and produces a new vector whose magnitude is $1$ .

Theorem 6 (Normalising a Non-zero Vector)

Let $\mathbf{v} \in \mathbb{R}^n$ be non-zero. Then

\frac{1}{\|\mathbf{v}\|}\mathbf{v}

is a unit vector. Moreover, it is the unique unit vector obtained from $\mathbf{v}$ by multiplication by a positive scalar.

Proof

Since $\mathbf{v} \neq \mathbf{0}$ , we have $\|\mathbf{v}\| > 0$ . Using the homogeneity of magnitude,

\left\|\frac{1}{\|\mathbf{v}\|}\mathbf{v}\right\| = \frac{1}{\|\mathbf{v}\|}\,\|\mathbf{v}\| = 1.

So this vector is indeed a unit vector.

Now let $c > 0$ . Then $c\mathbf{v}$ is a positive scalar multiple of $\mathbf{v}$ , so geometrically it has the same direction as $\mathbf{v}$ . If $c\mathbf{v}$ is required to be a unit vector, then

\|c\mathbf{v}\| = c\|\mathbf{v}\| = 1,

whence

c = \frac{1}{\|\mathbf{v}\|}.

So no other positive scalar multiple of $\mathbf{v}$ can have magnitude $1$ .

■

The vector

\widehat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

is often called the unit vector in the direction of $\mathbf{v}$ . The unit vector in the opposite direction is

-\widehat{\mathbf{v}} = -\frac{\mathbf{v}}{\|\mathbf{v}\|}.

The standard coordinate vectors $\mathbf{e}_1, \dots, \mathbf{e}_n$ , each having a single entry equal to $1$ and all remaining entries equal to $0$ , are basic examples of unit vectors.

Example 15 (Normalising a Vector in

\mathbb{R}^3

)

Let

\mathbf{v} = \begin{bmatrix} 2 \\ -2 \\ 1 \end{bmatrix}.

Then

\|\mathbf{v}\| = \sqrt{2^2 + (-2)^2 + 1^2} = \sqrt{4 + 4 + 1} = 3.

Hence the corresponding unit vector is

\widehat{\mathbf{v}} = \frac{1}{3} \begin{bmatrix} 2 \\ -2 \\ 1 \end{bmatrix} = \begin{bmatrix} 2/3 \\ -2/3 \\ 1/3 \end{bmatrix}.

A direct check gives

\|\widehat{\mathbf{v}}\| = \sqrt{(2/3)^2 + (-2/3)^2 + (1/3)^2} = \sqrt{4/9 + 4/9 + 1/9} = 1.

Example 16 (A Compass Direction)

Suppose the positive $x$ -axis points east and the positive $y$ -axis points north. A vector pointing exactly north-west is

\begin{bmatrix} -1 \\ 1 \end{bmatrix}.

Its magnitude is

\sqrt{(-1)^2 + 1^2} = \sqrt{2}.

Therefore the unit vector pointing north-west is

\frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1/\sqrt{2} \\ 1/\sqrt{2} \end{bmatrix}.

Problem 6

Let

\mathbf{u} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \in \mathbb{R}^n.

Find $\|\mathbf{u}\|$ and the unit vector pointing in the same direction as $\mathbf{u}$ .

Example 17 (K-means and Distance in

\mathbb{R}^n

)

Suppose data points $\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_m$ in $\mathbb{R}^n$ are to be split into $K$ clusters. The K-means method begins with centres $\mathbf{c}_1, \dots, \mathbf{c}_K$ and assigns each data vector to the centre with smallest distance:

\mathbf{x}_i \text{ is assigned to the cluster whose centre } \mathbf{c}_j \text{ minimises } d(\mathbf{x}_i, \mathbf{c}_j).

After that, each centre is replaced by the average of the vectors assigned to its cluster. Thus the method depends directly on two ideas already present in these notes: averaging vectors and measuring distances in $\mathbb{R}^n$ .

Remark (Higher Dimensions)

Once $n > 3$ , there is no need to force a literal spatial interpretation onto every vector. The point of $\mathbb{R}^n$ is practical, not mystical: many problems naturally involve many variables, and the geometric language of magnitude, distance, and direction gives a disciplined way to think about them.

Exercises

Exercise 1 (Arithmetic in Higher Dimension)

Let

\mathbf{u} = \begin{bmatrix} 5 \\ -2 \\ 1 \\ 6 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ 3 \\ -1 \\ 0 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} 2 \\ -4 \\ 2 \\ 1 \end{bmatrix}.

(a) Compute $\mathbf{u} - 2\mathbf{v} + 3\mathbf{w}$ .

(b) Find the vector $\mathbf{x}$ such that

\mathbf{u} + \mathbf{x} = 2\mathbf{v} - \mathbf{w}.

Exercise 2 (Distance in

\mathbb{R}^4

)

Let

\mathbf{p} = \begin{bmatrix} 2 \\ -1 \\ 4 \\ 0 \end{bmatrix}, \qquad \mathbf{q} = \begin{bmatrix} -1 \\ 2 \\ 1 \\ -2 \end{bmatrix}.

Compute $d(\mathbf{p}, \mathbf{q})$ .

Exercise 3 (Dot Products and Angles)

Let

\mathbf{u} = \begin{bmatrix} 3 \\ 1 \\ 0 \\ 1/2 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 1 \\ -2 \\ 3 \\ 6 \end{bmatrix}.

(a) Compute $\mathbf{u} \cdot \mathbf{v}$ .

(b) Compute the angle between $\mathbf{u}$ and $\mathbf{v}$ .

Exercise 4 (Normalising Vectors)

For each of the following vectors, compute its magnitude and the unit vector pointing in the same direction.

(a)

\begin{bmatrix} -8 \\ 15 \end{bmatrix}

(b)

\begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \in \mathbb{R}^n

Your answer in part (b) should be expressed in terms of $n$ .

Exercise 5 (A Metric Identity)

Prove that for any $\mathbf{v}, \mathbf{w} \in \mathbb{R}^n$ ,

\|\mathbf{v} + \mathbf{w}\|^2 + \|\mathbf{v} - \mathbf{w}\|^2 = 2\|\mathbf{v}\|^2 + 2\|\mathbf{w}\|^2.

Exercise 6 (A First Clustering Computation)

Five customers rate two drinks on a scale from $1$ to $5$ . The first entry records the rating for tea and the second the rating for coffee:

\mathbf{x}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \mathbf{x}_2 = \begin{bmatrix} 1 \\ 4 \end{bmatrix}, \quad \mathbf{x}_3 = \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \quad \mathbf{x}_4 = \begin{bmatrix} 4 \\ 1 \end{bmatrix}, \quad \mathbf{x}_5 = \begin{bmatrix} 4 \\ 2 \end{bmatrix}.

Use $K=2$ clusters with initial centres

\mathbf{c}_1 = \mathbf{x}_2, \qquad \mathbf{c}_2 = \mathbf{x}_4.

(a) Assign each data point to the nearer centre.

(b) Compute the updated centre of each cluster.

Exercise 7 (Three Unit Vectors with Prescribed Dot Products)

Let $\mathbf{a}, \mathbf{b}, \mathbf{c}$ be unit vectors in $\mathbb{R}^n$ such that

\mathbf{a} \cdot \mathbf{b} = 0, \qquad \mathbf{a} \cdot \mathbf{c} = \frac{1}{2}, \qquad \mathbf{b} \cdot \mathbf{c} = \frac{1}{5}.

Using the identity $\|\mathbf{v}\|^2 = \mathbf{v} \cdot \mathbf{v}$ , compute:

(a) $\|\mathbf{a} + \mathbf{b}\|^2$

(b) $\|\mathbf{b} - \mathbf{c}\|^2$

Exercise 8 (Long Orthogonal Vectors)

Give explicit examples of:

(a) two orthogonal $1000$ -vectors with no entries equal to $0$

(b) two orthogonal $999$ -vectors with no entries equal to $0$

Exercise 9 (Three Sign Possibilities)

Using

\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w}) = \|\mathbf{v}\|^2 + 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2,

give non-zero vectors $\mathbf{v}, \mathbf{w} \in \mathbb{R}^3$ satisfying each of the following:

(a) $\|\mathbf{v} + \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2$

(b) $\|\mathbf{v} + \mathbf{w}\|^2 > \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2$

Lesson assets

Beyond Three Dimensions: The Space Rn\mathbb{R}^nRn

Dot Products and Angles in Rn\mathbb{R}^nRn

Unit Vectors in Rn\mathbb{R}^nRn

Exercises

Beyond Three Dimensions: The Space $\mathbb{R}^n$

Dot Products and Angles in $\mathbb{R}^n$

Unit Vectors in $\mathbb{R}^n$