Negative entropy, its properties, and applications in machine learning.
Background
Notation: Denoting standard simplex in as .
Strongly convex property
We consider the negative entropy defined as
Theorem 1. The negative entropy is a strongly convex on in the -norm with convexity parameter 1, i.e., for any , the following inequality holds
Proof. First note that and , and thus
Now we shall find
According to the Proposition 1 the optimal and optimal dual multiplier satisfy
in the first step, we have absorbed the minus operator into since it is a free variable; the second step follows the fact that and .
Taking along with , we get
Proposition 1. Let be a local minimum of a differentiable function subject to the linear equality constraints
where is an -matrix with full row rank, and . Then there exists a vector of multipliers such that
Proof. Let be the basis of the null space of and where , . We observe that for any , and let . Since is a local minimum of then
i.e., is orthogonal to the null space of , hence must stay in the row space of which completes the proof.
Remark: .
Applications
Differentiable dynamic programming
The operator can be formulated as
However, it is not a smooth operator w.r.t and we can smooth it with negative entropy to yield
where the r.h.s is a strongly concave function w.r.t , more details is available in the paper.
Smoothed Wasserstain distance
Recall the Wasserstain distance between two measure :
the r.h.s is a convex function w.r.t and we can transform it to a -strongly convex function with negative entropy