Running into a problem with my JSON:
First issue was that SyntaxError: Non-ASCII character '\xe2' in file so I added # -*- coding: utf-8 -*- at the top of my file.
Then the problem became a problem where I load my JSON x = json.loads(x): ValueError: Expecting , delimiter: line 3 column 52 (char 57). I referenced this stackoverflow solution and so added an r in front of my JSON:
x = r"""[ { my validated json... } ]""" But then I get an error TypeError: sequence item 3: expected string or Unicode, NoneType found - I think it that the r is throwing it off somehow?
JSON Resembles the following:
[ { "brief": "Brief 1", "description": "Description 1", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010" ], "price": "145", "tags": [ "tag1", "tag2", "tag3" ], "title": "Title 1" }, { "brief": "Brief 2", "description": "Description 2", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010" ], "price": "150", "tags": [ "tag4", "tag5", "tag6", "tag7", "tag8" ], "title": "Title 2" },{ "brief": "blah blah 5'0\" to 5'4\"", "buyerPickup": true, "condition": "Good", "coverShipping": false, "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n \r\n\r\n", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111" ], "price": "240", "tags": [ "tag2", "5'0\"-5'4\"" ], "title": "blah blah 17\" Frame", "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111" } ] CURRENT CODE
# -*- coding: utf-8 -*- import csv import json x = """[ { "brief": "Brief 1", "description": "Description 1", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010" ], "price": "145", "tags": [ "tag1", "tag2", "tag3" ], "title": "Title 1" }, { "brief": "Brief 2", "description": "Description 2", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010" ], "price": "150", "tags": [ "tag4", "tag5", "tag6", "tag7", "tag8" ], "title": "Title 2" },{ "brief": "blah blah 5'0\" to 5'4\"", "buyerPickup": true, "condition": "Good", "coverShipping": false, "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n \r\n\r\n", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111" ], "price": "240", "tags": [ "tag2", "5'0\"-5'4\"" ], "title": "blah blah 17\" Frame", "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111" } ]""" x = json.loads(x) f = csv.writer(open("example.csv", "wb+")) f.writerow(["Handle","Title","Body (HTML)", "Vendor","Type","Tags","Published","Option1 Name","Option1 Value","Variant Inventory Qty","Variant Inventory Policy","Variant Fulfillment Service","Variant Price","Variant Requires Shipping","Variant Taxable","Image Src"]) for x in x: allTags = "\"" + ','.join(x["tags"]) + "\"" images = x["photos"] f.writerow([x["title"], x["title"], x["description"], "Vendor Name", "Widget", allTags, "TRUE", "Title", "Default Title", "1", "deny", "manual", x["price"], "TRUE", "TRUE", images.pop(0) if images else None]) while images: f.writerow([x["title"],None,None,None,None,None,None,None,None,None,None,None,None,None,None,images.pop(0)]) ERROR MESSAGE: Full traceback that I see: Traceback (most recent call last):
Traceback (most recent call last): File "runnit2.py", line 976, in <module> allTags = "\"" + ','.join(x["tags"]) + "\"" TypeError: sequence item 3: expected string or Unicode, NoneType found
UPDATE: I've identified that the data, specifically [x["title"], x["title"],x["description"], has some characters that the code doesn't like. 'ascii' codec can't encode character u'\u201d' in position 9: ordinal not in range(128). I've done a quick fix with x["description"].encode('utf-8'), etc., but it pretty much eliminates everything that's in that cell. Is there is a better way which doesn't delete everything after offending character?
4 Answers
Answers 1
Use raw string and set file encoding to utf-8 in normal (non-binary mode) mode when opening. For Python 3.6 it will be enough.
On Python 2.7 you should use codecs.open('example.csv', 'w', encoding='utf-8') instead of regular open() when dealing with unicode content. Also, csv module on Python 2.7 does not support unicode out of the box, so I suggest switching to unicodecsv or following the guidelines in this answer.
Answers 2
Modify reading and writing using W If you must use WB, use the following functions. You need to add r in front of all texts to handle special symbols.
import csv import json x = r"""[ { "brief": "Brief 1", "description": "Description 1", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010" ], "price": "145", "tags": [ "tag1", "tag2", "tag3" ], "title": "Title 1" }, { "brief": "Brief 2", "description": "Description 2", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010" ], "price": "150", "tags": [ "tag4", "tag5", "tag6", "tag7", "tag8" ], "title": "Title 2" },{ "brief": "blah blah 5'0\" to 5'4\"", "buyerPickup": true, "condition": "Good", "coverShipping": false, "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n \r\n\r\n", "photos": [ "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111", "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111" ], "price": "240", "tags": [ "tag2", "5'0\"-5'4\"" ], "title": "blah blah 17\" Frame", "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111" } ]""" x = json.loads(x) def to_str(bytes_or_str): if isinstance(bytes_or_str, bytes): value = bytes_or_str.encode('utf-8') else: value = bytes_or_str return value def to_bytes(bytes_or_str): if isinstance(bytes_or_str, str): value = bytes_or_str.encode('utf-8') else: value = bytes_or_str return value f = csv.writer(open("example.csv", "w+")) writeList = ["Handle", "Title", "Body (HTML)", "Vendor", "Type", "Tags", "Published", "Option1 Name", "Option1 Value", "Variant Inventory Qty", "Variant Inventory Policy", "Variant Fulfillment Service", "Variant Price", "Variant Requires Shipping", "Variant Taxable", "Image Src"] newList = [] for item in writeList: newList.append(to_bytes(item)) f.writerow(newList) for x in x: allTags = r"\"" + ','.join(x["tags"]) + r"\"" images = x["photos"] f.writerow([x["title"], x["title"], x["description"], "Vendor Name", "Widget", allTags, "TRUE", "Title", "Default Title", "1", "deny", "manual", x["price"], "TRUE", "TRUE", images.pop(0) if images else None]) while images: f.writerow([x["title"], None, None, None, None, None, None, None, None, None, None, None, None, None, None, images.pop(0)]) Answers 3
From your posted sample data, I assume that the 1st index of the posted json has a null in the 3rd index of the values of tag key. i.e: tag7
"tags": [ "tag4", "tag5", "tag6", "tag7", "tag8" ], To get rid of the TypeError that raises due to nulls you can simply check and replace the nulls if they exist as shown below.
x["tags"] = ["" if i is None else i for i in x["tags"]] allTags = "\"" + ','.join(x["tags"]) + "\"" I have assigned an empty string to replace nulls.
Alternatively you can remove all the false elements by using None in the filter() function.
allTags = "\"" + ','.join(filter(None, x["tags"])) + "\"" NOTE: Add r"[...]" and fix the indentation issue in the for loop.
Answers 4
Possible duplicate of this question how to convert characters like \x22 into string
On cleaning the code the error boils down to
import json x = ''' { "brief": "\"" }''' x = json.loads(x) Consider replacing \" with \u201d
import json x = '{"brief": "\u201d"}' x = json.loads(x)
0 comments:
Post a Comment