Isn't double descent explained by the following? The network contains many more ...

Kinrany · on July 26, 2020

Might be a stupid question, but can we skip the random walk and just pick the middle?

pigscantfly · on July 26, 2020

Not a stupid question at all, but one problem is that the boundaries of a zero-train-loss region are not well characterized and evaluating the validation loss even at a single point is computationally expensive. The centroid of one of these regions might not even be inside it (eg. donut shape but in higher dimensions) Interesting discussion though -- probably worth a few papers if someone were to investigate further.

chillee · on July 27, 2020

This just sounds like Stochastic Weight Averaging, which works quite well: https://arxiv.org/abs/1803.05407