A Machine Learning Framework for Profitability Profiling and Dynamic Price Prediction for the New York City Taxi Trips
Shylaja S1, Kannika Nirai Vaani M2

1Shylaja S*, Business Analytics Specialization, Institute of Management, Christ (Deemed to be University), Bangalore, India.
2Kannika Nirai Vaani M, Assistant Professor, School of Business and Management, Christ (Deemed to be University). Bangalore, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 20, 2020. | Manuscript published on March 10, 2020. | PP: 973-979 | Volume-9 Issue-5, March 2020. | Retrieval Number: E2669039520/2020©BEIESP | DOI: 10.35940/ijitee.E2669.039520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The New York City Taxi & Limousine Commission’s (NYC TLC) Yellow cabs are facing increased competition from app-based car services such as Ola, Uber, Didi, Lyft and Grab which is rapidly eating away its revenue and market share. Research work: In response to this, the study proposes to do profitability profiling of the taxi trips to focus on various key aspects that generate more revenue in future, visualization to assess the departure and arrival counts of the trips in various locations based on time of the day to maintain demand and supply equilibrium and also build a dynamic price prediction model to balance both margins as well as conversion rates. Methodology/Techniques used: The NYC TLC yellow taxi trip data is analysed through a cross-industry standard process for data mining (CRISP-DM) methodology. Firstly, the taxi trips are grouped into two profitability segments according to the fare amount, trip duration and trip distance by applying K means clustering. Secondly, spatiotemporal data analysis is carried to assess the demand for taxi trips at various locations at various times of the day. Thirdly, multiple linear regression, decision tree, and random forest models are adopted for dynamic price prediction. The findings of the study are as follows, high profitable segments are characterized by airport pickup and drop trips, Count of trip arrivals to airports are more compared to departures from airports at any time of the day, and further analysis revealed that drivers making only a few numbers of airport trips can earn more revenue compared to making more number of trips in local destinations. Compared to multiple linear regression and decision tree, the random forest regression model is considered to be most reliable for dynamic pricing prediction with an accuracy of 91%. Application of research work: The practical implication of the study is the deployment of a dynamic pricing model that can increase the revenue of the NYC TLC cabs along with balancing margin and conversion rates. 
Keywords: Clustering, Profitability Profiling, Machine Learning, Dynamic Pricing, Predictive Modeling
Scope of the Article: Machine Learning