![]()
Determine the Undervalued US Major League Baseball Players with Machine Learning
Lu Xiong1, Kecheng Tian2, Yuwen Qian3, Wilson Musyoka4, Xingyu Chen5
1Lu Xiong, Assistant Professor, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
2 Kechen Tian, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
3 Yuwen Qian, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
4Wilson Musyoka, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
5Xingyu Chen, Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, USA.
Manuscript received on 27 December 2022 | Revised Manuscript received on 01 February 2023 | Manuscript Accepted on 15 February 2023 | Manuscript published on 28 February 2023 | PP: 17-24 | Volume-12 Issue-3, February 2023 | Retrieval Number: 100.1/ijitee.B94060112223 | DOI: 10.35940/ijitee.B9406.0212323
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Baseball is a sport of statistics. The industry has accumulated detailed statistical data on both offence and defence for over a century. Experience has shown that data analysis can provide a competitive advantage over teams that do not utilise such analysis. Over the last two decades, the development of machine learning and artificial intelligence has enabled the creation of more advanced algorithms for analysing data in baseball. In the following research, we will run different ML models using sci-kitlearn and H2O on Colab, and the Caret package on RStudio to examine the datasets (hitting dataset and salary dataset) and determine the undervalued players by predicting the number of runs scored in the following year. We will compare machine learning regression algorithms and ensemble methods, providing comprehensive explanations of the results. The suggestion of which model is superior in terms of prediction accuracy will be determined.
Keywords: Sports Analytics, Machine Learning, Ensemble Methods, Deep Learning
Scope of the Article: Machine Learning
