New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine

November 12, 2017

New Theory Cracks Open the Black Box of #DeepLearning Highlights the importance of a compression phase for generalization

“Then learning switches to the compression phase. The network starts to shed information about the input data, keeping track of only the strongest features — those correlations that are most relevant to the output label. This happens because, in each iteration of stochastic gradient descent, more or less accidental correlations in the training data tell the network to do different things, dialing the strengths of its neural connections up and down in a random walk. This
randomization is effectively the same as compressing the system’s representation of the input data. As an example”