“Feature Importance” is not enough. You also need to look at “Error Contribution” if you want to know which features are beneficial for your model.
The concept of “feature importance” is widely used in machine learning as the most basic type of model explainability. For example, it is used in Recursive Feature Elimination (RFE), to iteratively drop the least important feature of the model.
However, there is a misconception about it.
The fact that a feature is important doesn’t imply that it is beneficial for the model!
Indeed, when we say that a feature is important, this simply means that the feature brings a high contribution to the predictions made by the model. But we should consider that such contribution may be wrong.
Take a simple example: a data scientist accidentally forgets the Customer ID between its model’s features. The model uses Customer ID as a highly predictive feature. As a consequence, this feature will have a high feature importance even if it is actually worsening the model, because it cannot work well on unseen data.
To make things clearer, we will need to make a distinction between two concepts:
- Prediction Contribution: what part of the predictions is due to the feature; this is equivalent to feature importance.
- Error Contribution: what part of the prediction errors is due to the presence of the feature in the model.
In this article, we will see how to calculate these quantities and how to use them to get valuable insights about a predictive model (and to improve it).
Suppose we built a model to predict the income of people based on their job, age, and nationality. Now we use the model to make predictions on three people.
Thus, we have the ground truth, the model prediction, and the resulting error:
This post originally appeared on TechToday.