ccff4444-ad21-442b-bb61-b21423fca6aa beie director@blueeyesintelligence.org 10.35940/ijitee.B8259.0210421 10.1007/978-3-7908-2604-3_16Bottou, Léon. "Large-scale machine learning with stochastic gradient descent." Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186.Buduma, Nikhil, and Nicholas Locascio. "Fundamentals of deep learning: designing next-generation machine intelligence algorithms. " O'Reilly Media, Inc.", 2017.Lau, Suki. "Learning rate schedules and adaptive learning rate methods for deep learning." Towards Data Science 2017.Goodfellow, I., Y. Bengio, and A. Courville. "Deep learning, series, the adaptive computation and machine learning series." 2016.Some state of the art optimizers in neural networks, https://hackernoon.com/some-state-of-the-art-optimizers-in-neural-networks-a3c2ba5a5643, on 9/2020."An overview of gradient descent optimization algorithms", https://ruder.io/optimizing-gradient-descent/, on 9/2020.Visa, Sofia, et al. "Confusion Matrix-based Feature Selection." MAICS 710 (2011): 120-127."Learning rate schedules and adaptive learning rate methods for deep learning," https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1, on 9/2020.Christian Daniel, Jonathan Taylor, and Sebastian Nowozin. Learning step size controllers for robust neural network training. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.Smith, Samuel L., et al. "Don't decay the learning rate, increase the batch size." arXiv preprint arXiv:1711.00489 2017.Clevert, Djork-Arné, Thomas Unterthiner, and Sepp Hochreiter. "Fast and accurate deep network learning by exponential linear units (elus)." arXiv preprint arXiv:1511.07289 2015.Zeiler, Matthew D. "Adadelta: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 2012.Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." , International conference on machine learning. 2013.10.3390/sym12040660Park, Jieun, Dokkyun Yi, and Sangmin Ji. "A novel learning rate schedule in optimization for neural networks and its convergence." Symmetry 12.4 2020: 660.Ruder, Sebastian. "An overview of gradient descent optimization algorithms.", arXiv preprint arXiv:1609.04747 2016.10.1007/978-3-319-18038-0_35Chin, Wei-Sheng, et al. "A learning-rate schedule for stochastic gradient methods to matrix factorization." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2015.Zulkifli, Hafidz. "Understanding learning rates and how it improves performance in deep learning." Software testing fundalmentals 2018.Dauphin, Yann N., et al. "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization." , Advances in neural information processing systems. 2014.Zhang, Sixin, Anna E. Choromanska, and Yann LeCun. "Deep learning with elastic averaging SGD." , Advances in neural information processing systems. 2015.Dozat, Timothy. "Incorporating nesterov momentum into Adam." , 2016.Brownlee, J. "Using learning rate schedules for deep learning models in python with keras." (2016).