Loading...
logo




Team Ingenii | Feature Selection

Feature Selection has to do with determining which features actually contribute to carbon emission. It has to do with analyzing the various input features to the output features (in our case carbon emission) using various algorithms to determine the important features. The importance of feature selection is that it reduces computation time while also helping the manufactures in making decisions.

For our Prediction and suggestion we use the features gotten Correlation


Feature Selection Algorithms



Like the Recursive feature elimination, SelectFromModel from Scikit-Learn is based on a Machine Learning Model estimation for selecting the features.

The differences are that SelectFromModel feature selection is based on the importance attribute (often is coef_ or feature_importances_ but it could be any callable) threshold. By default, the threshold is the mean.

Imprtant Features
- Vehicle Class
- Engine Size(L)
- Cylinders
- Transmission Type
- Fuel Type
- Fuel Consumption City (L/100 km)
- Fuel Consumption Hwy (L/100 km)
- Fuel Consumption Comb (L/100 km)
- Fuel Consumption Comb (mpg)
This is an unsupervised modelling approach because it does not work with the output data.

Feature with a higher variance means that the value within that feature varies or has a high cardinality. On the other hand, lower variance means the value within the feature are quite similar, and zero variance means you have a feature with the same value.

You would want to have a varied feature as we don’t want our predictive model to be biased. Variance Threshold is a simple approach to eliminate features based on our expected variance within each feature.

Imprtant Features
- Vehicle Class
- Transmission Type
- Fuel Consumption City (L/100 km)
- Fuel Consumption Comb (L/100 km)
- Fuel Consumption Comb (mpg)
Correlation is a statistical term which in common usage refers to how close two variables are to having a linear relationship with each other.

Features with high correlation are more linearly dependent and hence have almost the same effect on the dependent variable. So, when two features have a high correlation, we can drop one of the two features.

Imprtant Features
- Engine Size(L)
- Cylinders
- Fuel Consumption City (L/100 km)
- Fuel Consumption Hwy (L/100 km)
- Fuel Consumption Comb (L/100 km)
- Fuel Consumption Comb (mpg)