Skip to content

Softmax to Gibbs

The Subtle Magic of Softmax: Unveiling the Gibbs Distribution

Disclaimer : I am not a physicist.

It’s funny how often we use softmax without realizing we’re invoking a physical principle from statistical mechanics. But here’s the kicker: when we softmax our logits, we’re actually sampling from a Gibbs distribution. Now, why should you care? Let me break it down:

  1. Maximum Entropy Principle: The Gibbs distribution isn’t arbitrary - it’s what you get when you maximize entropy subject to energy constraints.

  2. Temperature Control: Remember that temperature parameter TT in the Gibbs formula? It’s usually set to 1 and forgotten. But by exposing it, we get a knob to tune the “decisiveness” of our model. Low T? Sharp decisions. High T? More exploration. It’s like having a built-in simulated annealing mechanism.

Temperature Control

  1. Bridge to Physics: Recognizing the Gibbs distribution opens a two-way street between machine learning and statistical physics. We can borrow ideas from physics (like mean field theory) and potentially contribute back.

Let me convince you that Gibbs distribution is actually, is the one with the max entropy.

System States

Consider a system that can exist in a discrete set of states labeled by ii. Each state ii has an energy EiE_i.

Probabilities

Let PiP_i be the probability of the system being in state ii.

Then, the entropy SS of the system is given by:

S=kiPilnPiS = -k \sum_i P_i \ln P_i

where kk is Boltzmann’s constant.

Constraints

The sum of all probabilities must equal 1.

iPi=1\sum_i P_i = 1

The average energy E\langle E \rangle of the system is fixed.

iPiEi=E\sum_i P_i E_i = \langle E \rangle

Lets maximize the entropy SS with respect to the probabilities {Pi}\{P_i\} while satisfying the two constraints.

We are going to use Lagrange multipliers α\alpha and β\beta to incorporate the constraints into the optimization.

Lagrangian Function

L=kiPilnPiα(iPi1)β(iPiEiE)L = -k \sum_i P_i \ln P_i - \alpha \left( \sum_i P_i - 1 \right) - \beta \left( \sum_i P_i E_i - \langle E \rangle \right)

We need to find the probabilities PiP_i that maximize LL. So we simply take the partial derivative of LL with respect to each PiP_i and set it to zero:

LPi=0\frac{\partial L}{\partial P_i} = 0 LPi=k(lnPi+1)αβEi\frac{\partial L}{\partial P_i} = -k \left( \ln P_i + 1 \right) - \alpha - \beta E_i k(lnPi+1)αβEi=0-k \left( \ln P_i + 1 \right) - \alpha - \beta E_i = 0

Rewriting the equation,

k(lnPi+1)=α+βEi-k \left( \ln P_i + 1 \right) = \alpha + \beta E_i

Divide both sides by k-k,

lnPi+1=αkβkEi\ln P_i + 1 = -\frac{\alpha}{k} - \frac{\beta}{k} E_i lnPi=1αkβkEi\ln P_i = -1 - \frac{\alpha}{k} - \frac{\beta}{k} E_i

Exponentiate both sides to solve for PiP_i,

Pi=e1αkeβkEiP_i = e^{-1 - \frac{\alpha}{k}} e^{-\frac{\beta}{k} E_i}

Define a constant AA,

A=e1αkA = e^{-1 - \frac{\alpha}{k}}

So,

Pi=AeβkEiP_i = A e^{-\frac{\beta}{k} E_i}

Use the normalization constraint to solve for AA:

iPi=iAeβkEi=1\sum_i P_i = \sum_i A e^{-\frac{\beta}{k} E_i} = 1

Therefore,

A=1ieβkEi=1ZA = \frac{1}{\sum_i e^{-\frac{\beta}{k} E_i}} = \frac{1}{Z}

where ZZ is the partition function:

Z=ieβkEiZ = \sum_i e^{-\frac{\beta}{k} E_i}

Substitute A=1ZA = \frac{1}{Z} back into the expression for PiP_i:

Pi=eβkEiZP_i = \frac{e^{-\frac{\beta}{k} E_i}}{Z}

This is the Gibbs (Boltzmann) distribution.

Relationship Between β\beta and Temperature

In thermodynamics, the Lagrange multiplier β\beta associated with the energy constraint is related to the inverse temperature. Specifically:

βk=1kT\frac{\beta}{k} = \frac{1}{k T}

So,

β=1T\beta = \frac{1}{T}

Substitute back into the expression for PiP_i and ZZ:

Pi=eEi/kTZP_i = \frac{e^{-E_i / k T}}{Z}

And

Z=ieEi/kTZ = \sum_i e^{-E_i / k T}

By maximizing the entropy SS with respect to the probabilities PiP_i under the constraints of normalization and fixed average energy, we derive the Gibbs distribution:

Pi=eEi/kTjeEj/kTP_i = \frac{e^{-E_i / k T}}{\sum_j e^{-E_j / k T}}

Previous Post
Softmax is actually a softer version of argmax.
Next Post
Upcoming