Determine the Undervalued US Major League Baseball Players with Machine Learning
Lu Xiong1, Kecheng Tian2, Yuwen Qian3, Wilson Musyoka4, Xingyu Chen5

1Lu Xiong, Assistant Professor, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
2 Kechen Tian, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
3 Yuwen Qian, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
4Wilson Musyoka, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
5Xingyu Chen, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
Manuscript received on 27 December 2022 | Revised Manuscript received on 01 February 2023 | Manuscript Accepted on 15 February 2023 | Manuscript published on 28 February 2023 | PP: 17-24 | Volume-12 Issue-3, February 2023 | Retrieval Number: 100.1/ijitee.B94060112223 | DOI: 10.35940/ijitee.B9406.0212323

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Baseball is a sport of statistics. The industry has accumulated detailed offensive and defensive statistical data for over a century. Experience has shown that data analysis can give a competitive advantage compared to teams without using such analysis. In the last two decades, with the development of machine learning and artificial intelligence, we have had more advanced algorithms to analyze data in baseball. In the following research, we will run different ML models using sci-kit-learn and H2O on Colab, and the Caret package on RStudio to examine the datasets (hitting dataset and salary dataset) and determine the undervalued players by predicting the number of runs scored in the next year. We will compare machine learning regression algorithms and ensemble methods and give comprehensive explanations of the result. The suggestion of which model is superior in terms of prediction accuracy will be determined.
Keywords: Sports Analytics, Machine Learning, Ensemble Methods, Deep Learning
Scope of the Article: Machine Learning