I am trying to run sklearn.metrics.classification_report with my data being in a Pandas dataframe. The dataframe df_joined looks like this and has a 100 rows:
Timestamp Label Pred 2016-10-05 29.75 30.781430 2016-10-06 30.35 31.379146 2016-10-07 31.59 31.174824 2017-02-13 29.63 29.875497 2017-02-14 29.60 29.923161 2017-02-15 30.22 30.257284 2017-02-16 30.12 30.374257 2017-02-17 30.09 30.357196 2017-02-20 31.03 30.971070 2017-02-21 31.05 30.930189
I am now trying to print the classification_report by
print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined[label],df_joined['Pred'] )
and I am getting the error:
File "\Python\WinPython-32bit-2.7.10.3\python-2.7.10\lib\site-packages\sklearn\utils\multiclass.py", line 106, in unique_labels raise ValueError("Unknown label type: %r" % ys)
TypeError: not all arguments converted during string formatting
I have been trying to use sklearn.metrics.classification_report(df_joined[label].values, df_joined['Pred'].values)
instead but it produces the same error.
Has someone a idea where this is coming from?
2 Answers
Answers 1
I believe classification_report
quantifies how well you have classified/predicted the label of a data point, not its actual value. A label can't be a float, all the examples in the sklearn documentation and the sklearn user guide uses integers for their labels.
The parameters also hints at this, as the alternative to passing a 1-d array is a specific array construct for labels only.
sklearn.metrics.classification_report(y_true, y_pred, labels=None,target_names=None, sample_weight=None, digits=2) y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) target values. y_pred : 1d array-like, or label indicator array / sparse matrix Estimated targets as returned by a classifier. ...
If your data would have been integer-labels, the exact dataframe format you passed would have worked fine:
# Does not raise an error classification_report(df_joined['Label'].astype(int), df_joined['Pred'].astype(int))
You can read more about sklearn's different model evaluation tools in Model evaluation: quantifying the quality of predictions, and pick one that is suitable to evaluate your classifier.
Answers 2
What happens if you take them out as list
types?
I.e.
print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined['Label'].tolist(),df_joined['Pred'].tolist() )
0 comments:
Post a Comment