Legendre transformation, its geometry intuition, and applications in machine learning.

Legendre conjugate

For a strictly convex function f:ΩRnR with 2f0, its Legendre conjugate is defined as

f(y)=maxxΩ y,xf(x),yRn.

As xmaxxΩ y,xf(x) is a strictly concave function of x (linear function + concave function) it has unique point x that attains the maximum value

y,xf(x)withf(x)=y.

Hence, we can explain f(y) like this, which point xΩ has a gradient f(x) equals to the given y.

Geometry explaination

Let us approach Legendre conjugate from the geometry perspective to gain more intuition.

Recall that the epigraph for a convex function f is

O={(x,z)xΩ,zf(x)}.

How can we describe its boundary O ?

Of course, given a xΩ we have (x,z=f(x))O, representing O from x-coordinate.

Let us consider the tangent hyperplane passing a fixed point (x0,f(x0))Ω:

z=f(x0),xx0+f(x0).

The key point here is that there is only one point of O that admits a tangent hyperplane with slope f(x0), since f(x) is a monotonous increasing function w.r.t x (2f0). Therefore, we may also represent the boundary O with the parameter y=f(x), representing O from y-coordinate. In summary, we can identify a point (x,z)O, either by its x-coordinate or its y-coordinate (y=f(x)). Namely, we have exhibited a dual coordinate system.

How can we write the boundary O in terms of y-coordinate? Note that any hyperplane with a given slope f(x0) passing through O has a unique point on z-axis. These lines admit one equation:

z=f(x0),px+f(x),

where (x,f(x)) is the intersection point between the hyperplane and O (there may be multiple intersection points, and any of them is valid). Among these parallel lines, z=f(x0),px0+f(x0) is the unique one that attains the minimized z-intersection value

f(x0),x0+f(x0).

Remark: This is because f is a convex function and f(x)f(x0),xx0+f(x0), which concludes

f(x0),x+f(x)f(x0),x0+f(x0).

We can also imagine this visually. The hyperplane z=f(x0),xx0+f(x0) only has one intersection with O wheras other lines z=f(x0),px+f(x) with xx0 all have mutiple intersections with O; we can move any of the later lines to get the former by decreasing the z-axis intersection.

The tangent hyperlane equation z=f(x0),xx0+f(x0) can also be writen as

[f(x0)1],[xz][x0f(x0)]=0

This form reveals that [x,z]Rn+1 is any point in the hyperplane and [f(x0),1]Rn+1 is the normal vector perpendicular to the hyperplane at point [x0,f(x0)].

So we can represent (x0,z=f(x0))O with y:=f(x0) as

x0=argminxy,x+f(x)=argmaxxy,xf(x).

In other words, finding a point xΩ that has a gradient f(x) equals to the given y, which is equivalent to

maxxΩ y,xf(x),yRn.

Remark: y,0x describes the change of z value of the hyperplane z=y,px+f(x) when p goes from x to 0. Given the slope y, the hyperplane passing the point (x,f(x)) with f(x)=y achieves the minimum value on z-axis.

Interpreting Legendre transformation in one sentence:

maxxΩ y,xf(x),yRn

is a linear function of y and thus it amounts to

minxΩ y,x+f(x),yRn,

which is the minimized z-intersection value that hyperplane

z=y,px+f(x),pRn

could attain.

Properties

  • f(y)=maxxΩ y,xf(x) is a convex function, since it is a linear function of y.
  • f=f.
  • f=(f)1 or (f)1=f if f is inversible.
  • f(x)+f(y)y,x.

Applications

Smoothed max operator.