Experiments In Machine Learning To Cope-Up With Overfitting and Underfitting

Digvijay Mali
Analytics Vidhya
Published in
8 min readJan 12, 2021

--

Let’s say your data is scaled properly by using either of the following techniques: Standardization where scaled values are centered around the mean with a unit standard deviation. This means that the mean of the attribute is zero and the resultant distribution has a unit standard deviation. Normalization where values are shifted and rescaled so that they end up ranging between 0 and 1 which is also known as Min-Max scaling.

The model is considered accurate when it operates on training and test data with the highest precision in exactly the same way. The cause of poor performance is either overfitting or underfitting the data in a machine learning model.

High variance while training indicates that the model is showing ‘underfitting’. Model is not biased towards training data hence, not able to fit the data points well and hence generating a high variance in training data.

[Q] How to deal with this issue? _________________________________________________________________ [1] What if? — Increase the number(rows) of training samples

NO, If there is a problem with the model, an increase in training data may not help. It may again increase the training error. _________________________________________________________________

[2] What if? — Add more features(columns) to training data

YES, Adding more features may help if your model is giving more importance to irrelevant features which it is already using while training. The data features that are already present are not that informative so either replace them or add more relevant features

_________________________________________________________________ [3] What if? — Recleaning data

YES, Having clean data will ultimately increase overall productivity and allow for the highest quality information in decision-making. Removal of errors always helps in some case like when multiple sources are contributing to single dataset. _________________________________________________________________

[4] What if? — Increase the power of the algorithm

YES, We can increase the power of the algorithm or model by kernelization or we can replace a model with another powerful model that fits training data really well. _________________________________________________________________ [5] What if? — Analysis of outliers

YES, Outliers are the data points are either too high or too low in value, such that they do not belong to the general distribution of the rest of the dataset so it is always better to do outlier detection unless your model is robust to handle outliers. _________________________________________________________________ [6] What if? — Apply Boosting

YES, Boosting will increase model complexity and hence will help in the decrease in bias. _________________________________________________________________ [7] What if? — Apply Bagging

NO, Bagging decreases variance if we observe any high variance during testing, so this may not help. _________________________________________________________________ [8] What if? — Apply log Transformation before model training

May be YES, If our data is highly skewed and we apply ‘log transformation’ to make it normally distributed, it may decrease variance while training but it may not be helpful. Because sometimes results of standard statistical tests after performing ‘log transformation’ on data are NOT RELEVANT if you compared it with non-transformed data. Most researchers do not deal with the skewed data, but they apply new methods that are independent of distributions like GEE ( Generalized Estimating Equations) or Malhanobis distance while calculating distance in non-normalized or non-standardized data. _________________________________________________________________ [9] What if? — Apply SMOTE to generate more data samples

May be YES, As we discussed, adding more data may not be helpful, but SMOTE may be helpful if we recover from outliers by adding more synthetic data points which makes data outlier free. _________________________________________________________________ [10] What if? — Reduction in regularization parameter

YES, If we introduce more regularization, it will increase more bias which may cause under-fitting, but the reduction in regularization parameter may help in the decrease in bias hence will reduce under-fitting. _________________________________________________________________ [11] What if? — Introduce Dropouts in Neural Network

NO, In case of under-fitting, skipping some neurons with probability P will not help as it will decrease model complexity and will introduce more bias in training. _________________________________________________________________ [12] What if? — Increase folds in cross-validation

NO, If your model is under-fitting and you applied more folds in cross-validation then it may not be helpful as it will just increase data points while training a model but will fail to increase the number of features or power of the model. _________________________________________________________________ [13] What if? — We train the model for Longer Time

May be YES, Since underfitting means less model complexity, training longer can help in learning more complex patterns. This is especially true in terms of Deep Learning. _________________________________________________________________

High variance while testing new data points indicates that the model is showing ‘overfitting’ as low variance on training data but high variance while testing. The model is biased towards training data hence, not able to fit the data points well and hence generating a high variance in testing data.

[Q] How to deal with this issue?_________________________________________________________________ [1] What if? — Increase the number of training samples

YES, It may be possible that our model is trained on limited data and hence showing overfitting. An increase in mode training samples will be helpful. In most situations, more data is usually better. Overfitting is essentially learning spurious correlations that occur in your training data, but not in the real world. Increasing the size of your data set should reduce these spurious correlations and improve the performance of your learner. Adding more examples adds diversity. It decreases the generalization error because your model becomes more general by virtue of being trained on more examples. _________________________________________________________________ [2] What if? — Adding more features to training data

NO, the Model is overfitting means it is not understanding the training data very well. We have to decrease in the number of features to reduce overfitting. We may use different feature selection methods to find out which features are actually relevant to the model. Again reduction in a number of features after removing multicollinearity will be helpful. Adding more input features, or columns may increase overfitting because more features may be either irrelevant or redundant and there’s more opportunity to complicate the model in order to fit the examples at hand. _________________________________________________________________ [3] What if? — Recleaning data

YES, Re-cleaning the data is a good option, one cause of over-fitting may also be caused by impure data. If over-fitting occurs, we need to clean the data. _________________________________________________________________ [4] What if? — Increase the power of the algorithm

NO, It is possible that the model has become more complex to handle traing data very well and not handling testing data that well. Hence we have to decrease model complexity instead of increasing model complexity _________________________________________________________________ [5] What if? — Analysis of outliers

YES, This is a highly subjective question and depends on what model is used.If you are using Linear regression, for sure outliers will affect your line of regression. If you are using KNN, again it would. SVM handles outliers better. So it is always a good practise to normalize you data proactively. This in most of cases will save you from over-fitting your model due to the presence of outliers. _________________________________________________________________ [6] What if? — Decrease model complexity

YES, Reducing the capacity of the model reduces the likelihood of the model overfitting the training dataset, to a point where it no longer overfits. Ex: The capacity of a neural network model, its complexity, is defined by both it’s structure in terms of nodes and layers and the parameters in terms of its weights. _________________________________________________________________ [7] What if? — Boosting

NO, Boosting will increase model complexity and hence will not be helpful in reducing overfitting. _________________________________________________________________ [8] What if? — Bagging

YES, Bagging decreases variance and hence may increase bias to reduce overfitting. _________________________________________________________________ [9] What if? — Apply SMOTE to generate more data samples

May be YES, As we discussed, adding more data may not be helpful, but SMOTE may be helpful if we recover from outliers by adding more synthetic data points which makes data outlier free. _________________________________________________________________ [10] What if? — Reduction in regularization parameter

NO, we have to increase the regularization parameter to increase a little bias and to reduce overfitting. _________________________________________________________________ [11] What if? — Introduce Dropouts in Neural Network

YES, In case of overfitting, skipping some neurons with probability P will help to decrease model complexity and will introduce more bias in training. Sometimes it may be helpful to decrease overfitting. _________________________________________________________________ [12] What if? — Increase folds in cross-validation

YES, If your model is overfitting and we applied more folds in cross-validation then it would be helpful as it will increase data points while training a model. _________________________________________________________________ [13] What if? — Reduce Multicollinearity

YES, Another issue with high multicollinearity is that small changes to the input data can lead to large changes in the model, even resulting in changes of sign of parameter estimates. A principal danger of such data redundancy is that of overfitting in regression analysis models. So reduction in multicolinearity will help. _________________________________________________________________ [14] What if? — Use Soft margin classifier

YES, Using Soft Margin Classifier instead of SVM allows SVM to make a certain number of mistakes and keep margin as wide as possible so that other points can still be classified correctly. _________________________________________________________________

[Q] How ROC (Receiver Operating Characteristic curve) and AUC (Area Under Curve) are related to Under-fitting and Overfitting?

The final model’s ROC curve (or AUC) on the training set alone will not provide any information unless you know anything about the optimal classifier’s efficiency. The training set can not be used to test overfitting/under-fitting by definition, as it can not calculate the model’s generalization efficiency. Comparing ROC curves of the training set and the validation set can help, however. When the difference is wide, the size of the gap between the training and validation metrics is an indication of overfitting and shows under-fitting when there is no gap. Anything in between is subject to interpretation, but a small gap should be created by a good model.

The ROC curves should be determined by calculating the region between the curves to determine the distance between training and validation. Bear in mind that the AUC variance does not measure the same amount.
During the learning process, tracking the ROC curves and the gap will provide additional data as you can see the progression of the gap size.

Say, test = validation set

[1] What If — AUC(test) < AUC(train)

  • Model is learned for training data only and hence showing overfitting
  • High testing variance
  • Add more training data than training features
  • Increase folds in cross-validation from n-folds to n+k folds, so that more data will go in the training part

[2] What If? — If AUC(test) > AUC(train)

  • ADD more training features than training data
  • Put more features as there is a possibility of some information leakage.

Many models are wrong, but simple models are more wrong! That’s right, Occam’s is the most abused razor in science. You’ll overfit your model unless you have your model regularized. Regularization prevents model parameters from shifting too quickly, so they are less willing to match the data peculiarities and easier to concentrate on structures that are persistent in them. In linear models, it is well known why it works, and much less understood in neural networks, say. But all the intuition is the same: by making the model flexible enough to capture and regularize complex structures, let the data speak for itself!

--

--

Digvijay Mali
Analytics Vidhya

Software Engineer at Intel Corporation | Graduate Student -Viterbi School of Engineering | University of Southern California, Los Angeles.