I found two ways to implement MRMR for feature selection in python. The source of the paper that contains the method is:
https://www.dropbox.com/s/tr7wjpc2ik5xpxs/doc.pdf?dl=0
This is my code for the dataset.
import numpy as np import pandas as pd from sklearn.datasets import make_classification from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" X, y = make_classification(n_samples=10000, n_features=6, n_informative=3, n_classes=2, random_state=0, shuffle=False) # Creating a dataFrame df = pd.DataFrame({'Feature 1':X[:,0], 'Feature 2':X[:,1], 'Feature 3':X[:,2], 'Feature 4':X[:,3], 'Feature 5':X[:,4], 'Feature 6':X[:,5], 'Class':y}) y_train = df['Class'] X_train = df.drop('Class', axis=1)
Method 1: Applying MRMR using pymrmr
Contains MID and MIQ
Which is published by the author The link is https://github.com/fbrundu/pymrmr
import pymrmr pymrmr.mRMR(df, 'MIQ',6)
['Feature 4', 'Feature 5', 'Feature 2', 'Feature 6', 'Feature 1', 'Feature 3']
or running using the second way
pymrmr.mRMR(df, 'MID',6)
['Feature 4', 'Feature 6', 'Feature 5', 'Feature 2', 'Feature 1', 'Feature 3']
Both these methods, on the above dataset yields this 2 output. Another author on GitHub claims that you can use his version to apply the MRMR method. However when I use it for the same dataset I have a different result.
Method 2: Applying MRMR using MIFS
Github link
https://github.com/danielhomola/mifs
import mifs for i in range(1,11): feat_selector = mifs.MutualInformationFeatureSelector('MRMR',k=i) feat_selector.fit(X_train, y_train) # call transform() on X to filter it down to selected features X_filtered = feat_selector.transform(X_train.values) #Create list of features feature_name = X_train.columns[feat_selector.ranking_] print(feature_name)
And if you run the above iteration for all different values of i, there will come no time where both methods actually yield the same feature selection output.
What seems to be the problem here ?
1 Answers
Answers 1
You'll probably need to contact either the authors of the original paper and/or the owner of the Github repo for a final answer, but most likely the differences here come from the fact that you are comparing 3 different algorithms (despite the name).
Minimum redundancy Maximum relevance algorithms are actually a family of feature selection algorithms whose common objective is to select features that are mutually far away from each other while still having "high" correlation to the classification variable.
You can measure that objective using Mutual Information measures, but the specific method to follow(i.e. what to do with the scores computed? In what order? What other post-processing methods will be used? ...) is going to be different from one author to another - even in the paper they are actually giving you two different implementations, MIQ
and MID
.
So my suggestion would be to just choose the implementation you are more comfortable with (or even better, the one that produces better results in your pipeline after conducting a proper validation), and just report which specific source did you choose and why.
0 comments:
Post a Comment