 
Title : Hidden convexity in linear neural networks
Abstract :
Training neural networks involves minimising a loss function that is nonconvex with respect to the network’s weights. Despite this nonconvexity, when the optimization converges to a local minimum, it is often close to globally optimal. This transfer from local properties to global properties is often achieved through convexity in optimization which neural networks seem to lack, or is it hidden ? There are two sources of nonconvexity in neural networks : 1) the nonlinear activation functions and 2) the multilinear product of the weight matrices.
Interestingly, recent research has demonstrated that the second source does not, on its own, lead to local minima that are not global when paired with a mean squared error loss. Although this result is promising, the complexity of the proof limits its generalization to more complex models, such as those with nonlinear activation functions or other loss structures. In this talk, we reveal the convexity hidden in the problem and show how it allows for a simpler and more insightful proof. By exposing this underlying structure, we aim to open the door to recognizing which types of models are more likely to train well and to extend this understanding to other machine learning architectures.
The seminar will take place in Room S08 at the Faculty of Sciences.