Lemma. Let f:Z+→R satisfy the following two conditions
- f(mn)=f(m)+f(n) for all m,n∈Z+,
- lim.
Proof. Define \delta_i = \max\{ |f(n+1) - f(n)| : 2^i \leq n < 2^{i+1} \}. By the assumption, we know that \delta_i \to 0 as i\to\infty. Now let n \geq 2, and for each k \geq 1 we write n^k = \sum_{j=0}^{L_k} a_j 2^j, \qquad a_j \in \{0, 1\} \quad \text{and} \quad a_{L_k} = 1. Then by using the fact that f(a_{L_k}2^{L_k}) = L_k f(2), we can write \begin{align*} \left| \frac{f(n^k)}{L_k} - f(2) \right| &\leq \frac{1}{L_k} \sum_{l=0}^{L_k-1} \left| f\left( \sum_{j=l}^{L_k} a_j 2^k \right) - f\left( \sum_{j=l+1}^{L_k} a_j 2^k \right) \right| \\ &= \frac{1}{L_k} \sum_{l=0}^{L_k-1} \left| f\left( \sum_{j=l}^{L_k} a_j 2^{k-l} \right) - f\left( \sum_{j=l+1}^{L_k} a_j 2^{k-l} \right) \right| \\ &\leq \frac{1}{L_k} \sum_{l=0}^{L_k-1} \delta_{L_k - l} = \frac{1}{L_k} \sum_{i=1}^{L_k} \delta_{i}. \end{align*} Since L_k = \lfloor k \log_2 n \rfloor tends to \infty as k\to\infty, by Stolz–Cesàro theorem, this bound converges to 0. So f(2) = \lim_{k\to\infty} \frac{f(n^k)}{L_k} = \lim_{k\to\infty} \frac{k f(n)}{\lfloor k \log_2 n \rfloor} = \frac{f(n)}{\log_2 n}, and f(n) = c \log n with c = f(2)/\log 2. When n = 1, we have f(1) = f(1^2) = 2f(1) and hence f(1) = 0 = c \log 1, completing the proof. ////
This lemma is the key step for proving the theorem of Faddeev. To state this theorem, let \Delta^n = \{ (p_1, \cdots, p_n) \in [0, \infty)^n : p_1 + \cdots + p_n = 1 \} and \mathtt{Prob} = \bigcup_{n=1}^{\infty} \Delta^n the set of all probability vectors.
Theorem. (Faddeev)[1] Let H : \mathtt{Prob} \to [0, \infty) satisfy the following properties
- H is a symmetric function on each \Delta^n.
- H is continuous on \Delta^2.
- H(\frac{1}{2}, \frac{1}{2}) = 1.
- H(t p_1, (1-t)p_1, p_2, \cdots, p_n) = H(p_1, \cdots, p_n) + p_1 H(t, 1-t).
Proof. We extend H onto the cone \{ \lambda p : p \in \mathtt{Prob}, \lambda > 0 \} by setting H(\lambda p) = \lambda H(p), \qquad \forall p \in \mathtt{Prob}, \lambda > 0. Then the condition 4 can be rephrased as H(p_0, p_1, p_2, \cdots, p_n) = H(p_0+p_1, p_2, \cdots, p_n) + H(p_0, p_1). \tag{1} By applying \text{(1)} repeatedly, we find that \begin{align*} H(p_1, \cdots, p_n) &= \sum_{k=2}^{n} H(p_1 + \cdots + p_{k-1}, p_k). \end{align*} This tells that H is continuous on each \Delta^n as well. By the same argument, we also find that \begin{align*} &H(p_1, \cdots, p_n, q_1, \cdots, q_m) \\ &= H(p_1+\cdots+p_n, q_1, \cdots, q_m) + \sum_{k=2}^{n} H(p_1 + \cdots + p_{k-1}, p_k) \\ &= H(p_1+\cdots+p_n, q_1, \cdots, q_m) + H(p_1, \cdots, p_n), \tag{2} \end{align*} which generalizes the property \text{(1)}. Now define f(n) = H(\frac{1}{n}, \cdots, \frac{1}{n}). By applying \text{(2)}, we obtain f(mn) = f(m) + f(n). Next we show that f(n+1) - f(n) \to 0 as n\to\infty. In view of Stolz–Cesàro theorem together with f(2^n)/2^n = n/2^n \to 0, we know that the only candidate for the limit is zero. Also, by \text{(2)} we easily find that (n+1)f(n+1) = H(n,1) + nf(n). So nf(n) is increasing and 0 \leq nf(n) \leq 2^{\lceil \log_2 n\rceil } f(2^{\lceil \log_2 n\rceil }) = 2^{\lceil \log_2 n\rceil } \lceil \log_2 n\rceil. This tells that f(n) = \mathcal{O}(\log n) and hence f(n+1) - f(n) = H\left(\frac{n}{n+1}, \frac{1}{n}\right) - \frac{1}{n+1}f(n) converges as n\to\infty by the continuity of H on \Delta^2. So this limit must be zero, and by the lemma, f(n) = \log_2 n. Now if (\frac{a_1}{N}, \cdots, \frac{a_n}{N}) \in \Delta^n, then by \text{(2)}, \begin{align*} H\left( \frac{a_1}{N}, \cdots, \frac{a_n}{N} \right) &= f(N) - \sum_{k=1}^{N} \frac{a_k}{N} f(a_k) \\ &= - \sum_{k=1}^{N} \frac{a_k}{N} \log_2 \frac{a_k}{N} \end{align*} and hence H equals the Shannon entropy for probability vectors with rational coordinates. Then the general case follows by the continuity and the proof is complete.
References.
- [1] Rényi, Alfréd. On measures of entropy and information. HUNGARIAN ACADEMY OF SCIENCES Budapest Hungary, 1961.