Theorem.
Let \mathcal{H} be a Hilbert space equipped with the inner product \langle \cdot, \cdot \rangle, and let w, v_1, \ldots, v_n \in \mathcal{H}. If
G(w_1, \ldots, w_n) = \det [\langle w_i, w_j \rangle]_{i,j=1}^{n}
denotes the
Gram determinant, then the followings hold true:
-
The vector w_{\perp} defined by the (formal) determinant
w_{\perp} = \frac{1}{G(v_1,\ldots,v_n)}\begin{vmatrix}
w & v_1 & \cdots & v_n \\
\langle v_1, w \rangle & \langle v_1, v_1 \rangle & \cdots & \langle v_1, v_n \rangle \\
\vdots & \vdots & \ddots & \vdots \\
\langle v_n, w \rangle & \langle v_n, v_1 \rangle & \cdots & \langle v_n, v_n \rangle
\end{vmatrix} \tag{1}
is orthogonal to V = \operatorname{span}\{v_1, \ldots, v_n\}, and in fact,
w = w_{\perp} + (w - w_{\perp})
is the orthogonal decomposition of w.
-
The square-distance between w and V is given by
\|w_{\perp}\|^2 = \operatorname{dist}(w, V)^2 = \frac{G(w, v_1, \ldots, v_n)}{G(v_1, \ldots, v_n)}. \tag{2}
Proof. Extend the notation by letting
G(\{v_i\}_{i=1}^{n}, \{w_i\}_{i=1}^{n}) = \det [\langle v_i, w_j \rangle]_{i,j=1}^{n},
and then define the linear functional \ell on \mathcal{H} by
\ell(u) = G(\{u, v_1, \ldots, v_n\}, \{w, v_1, \ldots, v_n\}).
By the Riesz representation theorem, there exists h \in \mathcal{H} such that \ell(u) = \langle h, u \rangle for all u \in \mathcal{H}. Since \ell(v_i) = 0 for all i = 1, \ldots, n, it follows that h is orthogonal to V. Moreover, expanding the determinant defining \ell(u) along the first line, we see that
\begin{align*}
\ell(u)
&= G(v_1, \ldots, v_n) \langle u, w \rangle \\
&\quad + \sum_{i=1}^{n} (-1)^i G(\{w, v_1, \ldots, \widehat{v_i}, \ldots, v_n\}, \{v_1, \ldots, v_n\}) \langle u, v_i \rangle,
\end{align*}
and so,
\begin{align*}
h
&= G(v_1, \ldots, v_n) w \\
&\quad + \sum_{i=1}^{n} (-1)^i G(\{w, v_1, \ldots, \widehat{v_i}, \ldots, v_n\}, \{v_1, \ldots, v_n\}) v_1,
\end{align*}
which is precisely the formal determant in \text{(1)}. This also implies that h is a linear combination of w, v_1, \ldots, v_n with the coefficient of w given by G(v_1, \ldots, v_n), hence
w_{\perp} = \frac{h}{G(v_1, \ldots, v_n)} \in w + V.
This and w_{\perp} \perp V together proves the first item of the theorem. For the second item, the equality \|w_{\perp}\| = \operatorname{dist}(w, V) is obvious by the orthogonality. Moreover,
\|w_{\perp}\|^2 = \langle w_{\perp}, w_{\perp} \rangle = \langle w_{\perp}, w \rangle = \frac{\ell(w)}{G(v_1, \ldots, v_n)} = \frac{G(w, v_1, \ldots, v_n)}{G(v_1, \ldots, v_n)}.
This completes the proof. \square