ensemble_summary.utf8

Algorithm Gridding

Algorithm Gridding is really useful. It’s going through several machine learning algorithms and parameters over 1 or many datasets. We do this to see which algorithms may perform better for a certain dataset, or certain types of datasets. This way, instead of wondering which algorithm would predict my data the best, you get to compare all of them at once!

I have attached a certain grid I assembled with a team of 3 people, which report and code I created. In this grid, we analyzed 8 different data sets with 6 different algorithms, 3 of which we had 2 sets of parameters, and did every algorithm with 2 different cross validation amounts, making 18 total algorithm combinations.

Grid Report

The purpose of the report was to visualize how the different algorithms perform. We observed that the Naive Bayes algorithm varied the most in accuracy, row amount wasn’t a factor, the amount of unique targets strongly affects the accuracy across all algorithms (as expected), and for the most part the algorithms performed about the same across all datasets, which attests to the no free lunch theory, which states that across all possible datasets, all algorithms perform about the same.

The following Python code was used to create the report above. Notice that the code uses pandas as well as numpy to format the data correctly, as well as import and export. SKLEARN was the main provider of the optimized algorithms.

Code for Grid (Python)