Abstract
In this research, we use Lending Club data from Kaggle to analyze FICO scores and loan amounts funded using multiple predictors. Lending Club is a US peer-to-peer lending company, headquartered in San Francisco, California. First, we cleaned our big data with 1,048,575 rows and 97 columns and then performed exploratory data analysis. We also used feature engineering and subset selection methods to build a linear model to predict FICO score and amount funded of customers loan requests. Overall, we found that FICO score is best modeled using backward regression which gives an exponential function with the predictors being grade, title, initial list status, total payment(rounded), recoveries, months since oldest bank installment account opened, number of mortgage accounts, and number of currently active bankcard accounts. We also found that funded amount is best modeled using forward regression or sequential replacement which gives an exponential function with the predictors being term, title, debt to income ratio, number of open trades in the past 6 months, balance to credit limit on all trades, ratio of total current balance to high credit/credit limit for all bankcard accounts, total bankcard high credit/credit limit, FICO score average, and home ownership.
Included in
Modeling FICO Score and Loan Amount
In this research, we use Lending Club data from Kaggle to analyze FICO scores and loan amounts funded using multiple predictors. Lending Club is a US peer-to-peer lending company, headquartered in San Francisco, California. First, we cleaned our big data with 1,048,575 rows and 97 columns and then performed exploratory data analysis. We also used feature engineering and subset selection methods to build a linear model to predict FICO score and amount funded of customers loan requests. Overall, we found that FICO score is best modeled using backward regression which gives an exponential function with the predictors being grade, title, initial list status, total payment(rounded), recoveries, months since oldest bank installment account opened, number of mortgage accounts, and number of currently active bankcard accounts. We also found that funded amount is best modeled using forward regression or sequential replacement which gives an exponential function with the predictors being term, title, debt to income ratio, number of open trades in the past 6 months, balance to credit limit on all trades, ratio of total current balance to high credit/credit limit for all bankcard accounts, total bankcard high credit/credit limit, FICO score average, and home ownership.