Bank_Loan_Classification using Machine Learning Techniques

5 min readOct 12, 2019

We are using classification techniques like CART,Logistic Regression,Random Forest,Perceptron,SVM(SUPPORT VECTOR MACHINE),KNN(K NEAREST NEIGHBOR),NAIVE BAYES to predict the bank loan for given features

FLOW OF THE ARTICLE

1.About the data

2.Loading the data and import the necessary library

3. Pre processing and Data Visualization,finding correlation between features and target labels

4.Splitting the data into training set and testing set

5.Classification models

6.Analyzing different classification metrics like Accuracy,MSE,RMSE,Precision,Recall

7.Comparing them and concluding the final model

1.About the data

Kaggle: Your Home for Data Science

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…

www.kaggle.com

Above the link of data set I took this from kaggle,it consist of 5000 rows and 14 columns

The columns of the data set are:

1. ID

2. Age

3. Experience

4.Income

5.ZIP Code

6. Family

7. CCAvg

8.Education

9. Mortgage

10. Personal Loan

11. Securities Account

12. CD Account

13. Online

14. Credit Card

Our target label is Personal Loan , we need to classify the Personal Loan based on the specifications

2.Importing the Libraries and Loading the data

Now lets read the CSV file:

3.Pre processing the data set

By removing the unnecessary columns,applying One Hot Encoding and normalizing the data we will get the data set free of noise and there is no missing values involved in data set

4.Correlation Matrix for the Data set(Data Visualization)

So to know which feature is important or contributes more to predict the class we draw correlation matrix and then the highest correlation value indicates that it is contributing more to predict the class.

5.Splitting the data into training and testing data set

we just used sklearn library to split into train,test and we divided them into 80–20 ratio.

6.Classification Models

Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Importing the libraries that are needed

Calculate the evaluating measures

Naive Bayes

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of features values, where the class labels are drawn from some finite set.

Now let’s import the necessary libraries needed

Random Forest

The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

Now let’s import the necessary libraries needed

SVM:

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples

Now let’s import the necessary libraries needed

Perceptron:

A Perceptron is a neural network unit that does certain computations to detect features or business intelligence in the input data.

Now let’s import the necessary libraries needed

CART:

KNN(K Nearest Neighbor):

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems

Comparing the models:

For the data set with 80% training and 20% testing Random forest model is predicting the accuracy of 98.7 % which is highest among all the models

For the data with 70% training and 30% testing Random forest model is predicting the accuracy of 98.1% which is highest among all the models

For the data with 60% training and 40% testing Random forest model is predicting the accuracy of 98.05% which is highest among all the models

Conclusion:

So for all these models the accuracy is more for Random forest with accuracy of 98.7%

Thank you

Abhishek Veeravelli

Bennett University

Bank_Loan_Classification using Machine Learning Techniques

FLOW OF THE ARTICLE

1.About the data

Kaggle: Your Home for Data Science

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…

Written by Abhishek

No responses yet