Softmax is a distribution over choices, it maps a vector into the probability simplex that is defined as , where the sum of all elements of the vector must equal 1. Softmax is used a lot in classification and I thought it would be interesting to visualize (when possible, on lower dimensions) the trajectories of individual samples in that simplex as predicted by the network while the network is being trained.

In the animations below you’ll see the trajectories of the sample individual sample (from the test set) over the simplex of 3 classes (dog, cat, horse) from CIFAR-10 and using a simple shallow CNN both with Adam and SGD. Each frame is generated after 10 optimization steps and the video is from 4 epochs with CIFAR-10 dataset with only the 3 aforementioned classes.

Trajectory of a CNN using Adam with LR of 0.001

Trajectory of a CNN using SGD with LR of 0.001 and momentum