11/6/2023 Difference between SGD and ADAM by Xiaoxi Shen and Jialong Li

Abstract: With the growth of dimensionality of the data we encounter nowadays, the classical gradient descent methods need to speed up. In the presentation, we will start by reviewing the gradient descent and stochastic gradient descent and discuss their theoretical guarantees. After that, more commonly used optimization algorithms in deep learning will be introduced and discussed. This includes gradient descent with momentum, adaptive gradient method (AdaGrad), root mean square propagation (RMSProp) and Adaptive Moment Estimation (ADAM). All these algorithms will be illustrated via simple examples.

(slides) ML_Reading_Seminar_11_17_23

(Jupiter notebook of Jialong’s code) in pdf.