This section aims to provide an insight into several feature selection techniques - they are described in the following:
1. Random Forest
The random forest is a supervised machine learning algorithm which is used for classification and regression and consists of many decision trees. It creates an uncorrelated forest of trees whose prediction as a group is more accurate than any individual tree. Each decision tree gives a classification or a “vote.” The forest chooses the classification with the majority of the “votes”. Random forest can be used for feature selection through the Variable importance plot or the Gini coefficient approach (1,2,3).
2. Penalized regression
2.1 LASSO regression
LASSO (Least Absolute Shrinkage and Selection Operator) regression is similar to linear regression, but it uses a "shrinkage" technique, where data values are shrunk towards a central point as the mean. By that, overfitting can be avoided and due to the resulting simple, sparse models, applicability on different datasets is ensured. This type of regression is used when the dataset shows high multicollinearity or when you want to automate variable elimination and feature selection. It performs L1 regularization (absolute value of magnitude) (1,2).
2.2 RIDGE regression
Ridge regression is a model tuning method that is used to analyze any data that suffers from multicollinearity and when the number of predictor variables in a set exceeds the number of observations. It is appropriate to use dimensionality if the data is greater than the number of samples used. It performs L2 regularization (squared magnitude). As opposed to the LASSO regression, the RIDGE does not reduce the length of the vector of coefficients (1,2).
2.3 Elastic Net
Elastic net is a penalized linear regression model that combines both the lasso and ridge regression methods by learning from their shortcomings to improve the regularization of statistical models. Hence, it includes both L1 and L2 regularization (1,2).
3. Stepwise analysis
Stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Usually, this takes the form of a forward, backward, or combined sequence of F-tests or t-tests. At nova we mostly used this approach for feature selection purposes, to compare linear or generalized linear models using AIC (1).