Loading

Identifying Optimal Innovative Machine Learning Models for Predicting U.S. Mortgage Loan Defaults: A Comparative Analysis
Chi Ton Cong Nguyen1, Victoria N. Dean2

1Chi Ton Cong Nguyen, James Madison High School, Vienna, VA, USA.

2Dr. Victoria N. Dean, AI/ML Principal Researcher, New Jersey, USA. 

Manuscript received on 30 July 2025 | First Revised Manuscript received on 06 August 2025 | Second Revised Manuscript received on 18 August 2025 | Manuscript Accepted on 15 September 2025 | Manuscript published on 30 September 2025 | PP: 1-8 | Volume-14 Issue-10, September 2025 | Retrieval Number: 100.1/ijitee.I112714090825 | DOI: 10.35940/ijitee.I1127.14100925

Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Mortgage loan defaults pose substantial risks to the financial industry, and accurately predicting these defaults remains a challenge. Traditional credit scoring models often lack accuracy and computational efficiency because they are not data-driven and fail to capture special data patterns that drive borrower default behavior. This limitation becomes especially significant in volatile housing markets, where early detection of defaults can substantially reduce credit losses compared to identifying them at a later stage. Timely and accurate prediction of mortgage loan defaults plays a vital role in formulating effective credit risk management strategies. This research implemented a framework to address this challenge by leveraging cutting-edge Artificial Intelligence / Machine Learning (AI/ML) algorithms and conducting a comparative analysis of model performance to identify the optimal model for predicting mortgage loan defaults. The proposed framework systematically trained and evaluated each model on an extensive dataset comprising over 100,000 loans, featuring a rich set of loan, borrower, and property characteristics from Freddie Mac, as well as macroeconomic factors. Model performance was evaluated using key metrics including accuracy, AUC, F1 scores, and ROC curves. The paper discovered that Extreme Gradient Boosting (XGBoost) was the top performer, offering superior performance and robustness to overfitting, compared to other ML models, including Logistic Regression, Neural Network, Decision Tree, Gradient Boosting, and Random Forest. The results demonstrated that XGBoost achieved the best performance across all evaluation metrics, with 99% accuracy on the training data, 98% on the testing data, and more than 90% for all other metrics. The robust predictive power of XGBoost is mainly due to its ensemble and regularisation techniques, which minimise errors and the overfitting problem simultaneously. These findings contribute a crucial benchmark for mortgage default modeling practice and develop an innovative financial ML technique, XGBoost, for predicting “good” and “bad” loans accurately. Given XGBoost’s exceptional performance in predicting mortgage loan defaults, the study offers an innovative AI solution for credit risk assessment and smart lending decisions for banks, mortgage lenders, and financial institutions in the FinTech industry. The research also highlights a superior AI/ML classification algorithm for any field.

Keywords: Artificial Intelligence/Machine Learning, Credit Scoring Models, FinTech Industry, Financial Innovation, Mortgage Loan Defaults.
Scope of the Article: Data Analytics