I am using python2.7. I can't use python 3. I wrote this to convert an excel spreadsheet to csv. It is throwing an error for "u2013'" which is an 'en dash' character. In perl - you can load a file in unicode with the open command, but I don't know how to do that in python.
#!/home/casper/python/core/2.7.14/exec/bin/python2.7 # -*- coding: utf-8 -*- import openpyxl import csv wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx') sh = wb.get_active_sheet() with open('goodRiskLimitSnapshot.csv', 'wb') as f: c = csv.writer(f) for r in sh.rows: c.writerow([cell.value for cell in r])
error :
Traceback (most recent call last): File "/home/casper/pyExceltoCSV", line 16, in <module> c.writerow([cell.value for cell in r]) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 74: ordinal not in range(128)
I changed the script to use io.open:
wb = openpyxl.load_workbook('DailyETRiskLimitSnapshot.xlsx' , data_only=True) sh = wb.get_active_sheet() with io.open('goodDailyETRiskLimitSnapshot.csv', 'w', encoding='utf8') as f: c = csv.writer(f, dialect='excel') for r in sh.rows: c.writerow([cell.value for cell in r])
However it is throwing a different error:
Traceback (most recent call last): File "./pyExceltoCVS.py", line 20, in <module> c.writerow([cell.value for cell in r]) TypeError: write() argument 1 must be unicode, not str
6 Answers
Answers 1
The correct way to open a file for encoded output is to use the io
module:
import io with io.open('goodRiskLimitSnapshot.csv', 'w', encoding='utf8') as f: c = csv.writer(f) for r in sh.rows: c.writerow([cell.value for cell in r])
Answers 2
Have you tried using pandas
:
wb = pd.read_excel('RiskLimitSnapshot.xlsx') #you can specify the sheet name using sheetname argument wb.to_csv('goodRiskLimitSnapshot.csv', encoding='utf-8')
Alternatively you can use codecs
and do:
codecs.open(filename, encoding='utf8')
Answers 3
Python3 makes lot of things easier by using the default encoding as unicode. But in Python2 you get a default str
and different unicode
representation
Now consider the case of en-dash
= –
, looks similar to a normal -
but is not.
Let's fire up a python 2.7 console and see the difference
>>> val_str = '–' >>> val_str '\xe2\x80\x93'
Above is how an en-dash
is represented by a str
. While for unicode
>>> val_unicode = u'–' >>> val_unicode u'\u2013'
Now let's try and write these to a csv
file using different combinations
# -*- coding: utf-8 -*- import csv import io val_str = '–' val_unicode = u'–' def try_writing_csv(filename, data, mode='w', **kwargs): try: with io.open(filename, mode=mode, **kwargs) as f: c = csv.writer(f, dialect='excel') c.writerow([data]) except Exception, ex: print("failed to write - " + filename) try_writing_csv("ascii1.csv", val_str) try_writing_csv("ascii2.csv", val_str, encoding="utf8") try_writing_csv("ascii3.csv", val_str.decode('utf8'), encoding="utf8") try_writing_csv("unicode1.csv", val_unicode) try_writing_csv("unicode2.csv", val_unicode, encoding="utf8") try_writing_csv("unicode3.csv", val_unicode.encode('utf8'), encoding="utf8")
And now let's run the same
failed to write - ascii1.csv failed to write - ascii2.csv failed to write - ascii3.csv failed to write - unicode1.csv failed to write - unicode2.csv failed to write - unicode3.csv
The results are overwhelming as all the method fails. So we need look at what is wrong with it. Let's make some more trials
try_writing_csv("ascii4.csv", val_str.decode('utf8'), mode="wb") try_writing_csv("ascii5.csv", val_str, mode="utf8") try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb") try_writing_csv("unicode4.csv", val_unicode, mode="wb") try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb')
And now out run would give an output
failed to write - ascii1.csv failed to write - ascii2.csv failed to write - ascii3.csv failed to write - ascii4.csv failed to write - ascii5.csv failed to write - unicode1.csv failed to write - unicode2.csv failed to write - unicode3.csv failed to write - unicode4.csv
So ascii6.csv
and unicode.csv
was actually successful. Let's check the files also
and looks like we did get then right for both the files. So the final two statements which worked are below
try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb") try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb')
So the key learnings
- Don't use
encoding=utf8
while opening the file - Use binary mode for writing the file
- If it is a
str
thendecode
asutf8
and then encode asutf8
- If it is a
unicode
thenencode
asutf8
And then now comes the explanation time, which you can get from the below SO thread
2.7 CSV module wants unicode, but doesn't want unicode
If you are trying to write out unicode data, you'll have to encode that data before passing it to the
csv.writer()
object. Thecsv
module examples section includes code to make encoding from Unicode before writing a little easier.
Answers 4
Python2 csv library does not support well unicode. Have you considerd using the libraries unicodecsv or backports.csv ?
Cheers !
Answers 5
That error tells you that you are trying to write python unicode object to file and default ASCII codec cant encode it. Encode called implicitly to convert python string/unicode objects to bytes. You should do it yourself with needed encoding - utf-8
in your case:
Change that line:
c.writerow([cell.value for cell in r])
to:
c.writerow([cell.value.encode('utf-8') for cell in r])
Without explicitly specifying needed encoding default is used and your variant of writerow call can be writed as: c.writerow([cell.value.encode('ascii') for cell in r])
and of course will raise UnicodeEncodeError
because you have unicode string.
You can check your default encoding with this code:
import sys sys.getdefaultencoding()
Answers 6
Do you try ?
#!/home/casper/python/core/2.7.14/exec/bin/python2.7 # -*- coding: utf-8 -*- import openpyxl import csv wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx') sh = wb.get_active_sheet() with open('goodRiskLimitSnapshot.csv', 'wb') as f: c = csv.writer(f) for r in sh.rows: c.writerow(unicode([cell.value for cell in r]))
0 comments:
Post a Comment