A few months ago I made a post about Randomized Prior Functions for Deep Reinforcement Learning, where I showed how to implement the training procedure in PyTorch and how to extract the model uncertainty from them.

Using the same code showed earlier, these animations below show the training of an ensemble of 40 models with 2-layer MLP and 20 hidden units in different settings. These visualizations are really nice to understand what are the convergence differences when using or not bootstrap or randomized priors.

### Naive Ensemble

This is a training session without bootstrapping data or adding a randomized prior, it’s just a naive ensembling:

### Ensemble with Randomized Prior

This is the ensemble but with the addition of the randomized prior (MLP with the same architecture, with random weights and fixed):

$$Q_{\theta_k}(x) = f_{\theta_k}(x) + p_k(x)$$

The final model \(Q_{\theta_k}(x)\) will be the k model of the ensemble that will fit the function \(f_{\theta_k}(x)\) with an untrained prior \(p_k(x)\):

### Ensemble with Randomized Prior and Bootstrap

This is a ensemble with the randomized prior functions and data bootstrap:

### Ensemble with a fixed prior and bootstrapping

This is an ensemble with a fixed prior (Sin) and bootstrapping: