Monday, April 10, 2017

sklearn classification_report with input from pandas dataframe prduces: “TypeError: not all arguments converted during string formatting”

Leave a Comment

I am trying to run sklearn.metrics.classification_report with my data being in a Pandas dataframe. The dataframe df_joined looks like this and has a 100 rows:

Timestamp    Label       Pred 2016-10-05   29.75  30.781430 2016-10-06   30.35  31.379146 2016-10-07   31.59  31.174824 2017-02-13   29.63  29.875497 2017-02-14   29.60  29.923161 2017-02-15   30.22  30.257284 2017-02-16   30.12  30.374257 2017-02-17   30.09  30.357196 2017-02-20   31.03  30.971070 2017-02-21   31.05  30.930189 

I am now trying to print the classification_report by

print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined[label],df_joined['Pred'] ) 

and I am getting the error:

File "\Python\WinPython-32bit-2.7.10.3\python-2.7.10\lib\site-packages\sklearn\utils\multiclass.py", line 106, in unique_labels raise ValueError("Unknown label type: %r" % ys)

TypeError: not all arguments converted during string formatting

I have been trying to use sklearn.metrics.classification_report(df_joined[label].values, df_joined['Pred'].values) instead but it produces the same error.

Has someone a idea where this is coming from?

2 Answers

Answers 1

I believe classification_report quantifies how well you have classified/predicted the label of a data point, not its actual value. A label can't be a float, all the examples in the sklearn documentation and the sklearn user guide uses integers for their labels.

The parameters also hints at this, as the alternative to passing a 1-d array is a specific array construct for labels only.

sklearn.metrics.classification_report(y_true, y_pred, labels=None,target_names=None, sample_weight=None, digits=2)  y_true : 1d array-like, or label indicator array / sparse matrix      Ground truth (correct) target values.  y_pred : 1d array-like, or label indicator array / sparse matrix      Estimated targets as returned by a classifier.  ... 

If your data would have been integer-labels, the exact dataframe format you passed would have worked fine:

# Does not raise an error  classification_report(df_joined['Label'].astype(int), df_joined['Pred'].astype(int)) 

You can read more about sklearn's different model evaluation tools in Model evaluation: quantifying the quality of predictions, and pick one that is suitable to evaluate your classifier.

Answers 2

What happens if you take them out as list types?

I.e.

print 'Classification Report:', '\n', sklearn.metrics.classification_report(df_joined['Label'].tolist(),df_joined['Pred'].tolist() )

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment