Upcoming

1- Likelihood Ratio Trick a.k.a. REINFORCE a.k.a. score function estimator

2- Gumbel-Softmax Estimator

3- Sampling is not differentiable, at least not always.

Why we cant back-prop through sampling? (math)
Reparametrization Trick of VAE
Need a simple example

4- Entropy, varentropy, crossentropy

5- Entropix: Entropy based sampling in LLMs

6- Q-learning vs Value Function Iteration and PG vs Hotz Miller CCP

7- Efficient serving of LLMs and implementations

8- Attention types and implementations

9- Traditional ML and implementations

10- Causal Inference and ML

Softmax to Gibbs