Well, the end is in sight! We have just completed the final assignment and week. This final assignment is a rather lengthy one, and is a cumulation of the last several weeks in R. The final assignment focuses on loan defaults. We were told that we work for a major financial institution as an R data scientist, and we have been asked to identify which customers are likely to default on their loans.
The first step of this project was to stage the datasets, ensuring that the datatypes are correct. The second step of this project was exploratory analysis, looking at the summary statistics and relationships between the predictor variables and the target variable. The third step was to prepare the data by deriving new variables, dealing with missing values and extremes, and transforming categorical variables. After partitioning the data, we then moved to building the models. I built both a logistic regression model and random forest model. The random forest model proved to be the better model so this is the one that I used to test its performance and generate the predictions.
This assignment was definitely challenging, but also very interesting and rewarding – much like the rest of the semester! I thoroughly enjoyed this class, and I am proud of my growth over the course of the semester, thanks to Professor Ames. I look forward to maintaining my skillset and further expanding it through different resources! I am grateful that I was able to be a part of this class and gain this valuable skillset.