"Notably, all DNNs face the issue of overfitting as they learn, which is when performance on one data set increases but the network's performance fails to generalize (often measured by the divergence of performance on training vs testing data sets)."
"In this paper, we consider the implicit bias in a well-known and simple setting, namely learning linear predictors (x->x'w) for binary classification with respect to linearly-separable data"
Hoping that this applies to deep neural networks is a huge leap of faith to be honest.
Not really. For example, "Gradient Methods Never Overfit On Separable Data" https://arxiv.org/abs/2007.00028