Thursday, April 26, 2018

u2013 error with openpyxl - python 2.7

Leave a Comment

I am using python2.7. I can't use python 3. I wrote this to convert an excel spreadsheet to csv. It is throwing an error for "u2013'" which is an 'en dash' character. In perl - you can load a file in unicode with the open command, but I don't know how to do that in python.

#!/home/casper/python/core/2.7.14/exec/bin/python2.7 # -*- coding: utf-8 -*- import openpyxl import csv  wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx') sh = wb.get_active_sheet() with open('goodRiskLimitSnapshot.csv', 'wb') as f:      c = csv.writer(f)     for r in sh.rows:         c.writerow([cell.value for cell in r]) 

error :

Traceback (most recent call last):   File "/home/casper/pyExceltoCSV", line 16, in <module>     c.writerow([cell.value for cell in r]) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 74: ordinal not in range(128) 

I changed the script to use io.open:

wb = openpyxl.load_workbook('DailyETRiskLimitSnapshot.xlsx' ,   data_only=True) sh = wb.get_active_sheet()     with io.open('goodDailyETRiskLimitSnapshot.csv', 'w', encoding='utf8') as f:     c = csv.writer(f, dialect='excel')     for r in sh.rows:         c.writerow([cell.value for cell in r]) 

However it is throwing a different error:

Traceback (most recent call last):   File "./pyExceltoCVS.py", line 20, in <module>     c.writerow([cell.value for cell in r]) TypeError: write() argument 1 must be unicode, not str 

6 Answers

Answers 1

The correct way to open a file for encoded output is to use the io module:

import io  with io.open('goodRiskLimitSnapshot.csv', 'w', encoding='utf8') as f:      c = csv.writer(f)     for r in sh.rows:         c.writerow([cell.value for cell in r]) 

Answers 2

Have you tried using pandas:

wb = pd.read_excel('RiskLimitSnapshot.xlsx') #you can specify the sheet name using sheetname argument wb.to_csv('goodRiskLimitSnapshot.csv', encoding='utf-8') 

Alternatively you can use codecs and do:

codecs.open(filename, encoding='utf8') 

Answers 3

Python3 makes lot of things easier by using the default encoding as unicode. But in Python2 you get a default str and different unicode representation

Now consider the case of en-dash = , looks similar to a normal - but is not.

Let's fire up a python 2.7 console and see the difference

>>> val_str = '–' >>> val_str '\xe2\x80\x93' 

Above is how an en-dash is represented by a str. While for unicode

>>> val_unicode = u'–' >>> val_unicode u'\u2013' 

Now let's try and write these to a csv file using different combinations

# -*- coding: utf-8 -*-  import csv import io  val_str = '–' val_unicode = u'–'   def try_writing_csv(filename, data, mode='w', **kwargs):     try:         with io.open(filename, mode=mode, **kwargs) as f:             c = csv.writer(f, dialect='excel')             c.writerow([data])     except Exception, ex:         print("failed to write - " + filename)   try_writing_csv("ascii1.csv", val_str) try_writing_csv("ascii2.csv", val_str, encoding="utf8") try_writing_csv("ascii3.csv", val_str.decode('utf8'), encoding="utf8")  try_writing_csv("unicode1.csv", val_unicode) try_writing_csv("unicode2.csv", val_unicode, encoding="utf8") try_writing_csv("unicode3.csv", val_unicode.encode('utf8'), encoding="utf8") 

And now let's run the same

failed to write - ascii1.csv failed to write - ascii2.csv failed to write - ascii3.csv failed to write - unicode1.csv failed to write - unicode2.csv failed to write - unicode3.csv 

The results are overwhelming as all the method fails. So we need look at what is wrong with it. Let's make some more trials

try_writing_csv("ascii4.csv", val_str.decode('utf8'), mode="wb") try_writing_csv("ascii5.csv", val_str, mode="utf8") try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb")  try_writing_csv("unicode4.csv", val_unicode, mode="wb") try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb') 

And now out run would give an output

failed to write - ascii1.csv failed to write - ascii2.csv failed to write - ascii3.csv failed to write - ascii4.csv failed to write - ascii5.csv failed to write - unicode1.csv failed to write - unicode2.csv failed to write - unicode3.csv failed to write - unicode4.csv 

So ascii6.csv and unicode.csv was actually successful. Let's check the files also

CSV file

and looks like we did get then right for both the files. So the final two statements which worked are below

try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb") try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb') 

So the key learnings

  • Don't use encoding=utf8 while opening the file
  • Use binary mode for writing the file
  • If it is a str then decode as utf8 and then encode as utf8
  • If it is a unicode then encode as utf8

And then now comes the explanation time, which you can get from the below SO thread

2.7 CSV module wants unicode, but doesn't want unicode

If you are trying to write out unicode data, you'll have to encode that data before passing it to the csv.writer() object. The csv module examples section includes code to make encoding from Unicode before writing a little easier.

Answers 4

Python2 csv library does not support well unicode. Have you considerd using the libraries unicodecsv or backports.csv ?

Cheers !

Answers 5

That error tells you that you are trying to write python unicode object to file and default ASCII codec cant encode it. Encode called implicitly to convert python string/unicode objects to bytes. You should do it yourself with needed encoding - utf-8 in your case:

Change that line:

c.writerow([cell.value for cell in r]) 

to:

c.writerow([cell.value.encode('utf-8') for cell in r]) 

Without explicitly specifying needed encoding default is used and your variant of writerow call can be writed as: c.writerow([cell.value.encode('ascii') for cell in r]) and of course will raise UnicodeEncodeError because you have unicode string.

You can check your default encoding with this code:

import sys sys.getdefaultencoding() 

Answers 6

Do you try ?

#!/home/casper/python/core/2.7.14/exec/bin/python2.7 # -*- coding: utf-8 -*- import openpyxl import csv  wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx') sh = wb.get_active_sheet() with open('goodRiskLimitSnapshot.csv', 'wb') as f:      c = csv.writer(f)     for r in sh.rows:         c.writerow(unicode([cell.value for cell in r])) 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment