Sunday, July 15, 2018

Valid JSON Load in Python file

Leave a Comment

Running into a problem with my JSON:

First issue was that SyntaxError: Non-ASCII character '\xe2' in file so I added # -*- coding: utf-8 -*- at the top of my file.

Then the problem became a problem where I load my JSON x = json.loads(x): ValueError: Expecting , delimiter: line 3 column 52 (char 57). I referenced this stackoverflow solution and so added an r in front of my JSON:

x = r"""[   { my validated json... } ]""" 

But then I get an error TypeError: sequence item 3: expected string or Unicode, NoneType found - I think it that the r is throwing it off somehow?

JSON Resembles the following:

[   {     "brief": "Brief 1",     "description": "Description 1",     "photos": [       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"     ],     "price": "145",     "tags": [       "tag1",       "tag2",       "tag3"     ],     "title": "Title 1"   },   {     "brief": "Brief 2",     "description": "Description 2",     "photos": [       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"     ],     "price": "150",     "tags": [       "tag4",       "tag5",       "tag6",       "tag7",       "tag8"     ],     "title": "Title 2"   },{     "brief": "blah blah 5'0\" to 5'4\"",     "buyerPickup": true,     "condition": "Good",     "coverShipping": false,         "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",     "photos": [       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",       "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"     ],     "price": "240",     "tags": [       "tag2",       "5'0\"-5'4\""     ],     "title": "blah blah 17\" Frame",     "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"       }  ] 

CURRENT CODE

# -*- coding: utf-8 -*-  import csv import json  x = """[       {         "brief": "Brief 1",         "description": "Description 1",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"         ],         "price": "145",         "tags": [           "tag1",           "tag2",           "tag3"         ],         "title": "Title 1"       },       {         "brief": "Brief 2",         "description": "Description 2",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"         ],         "price": "150",         "tags": [           "tag4",           "tag5",           "tag6",           "tag7",           "tag8"         ],         "title": "Title 2"       },{         "brief": "blah blah 5'0\" to 5'4\"",         "buyerPickup": true,         "condition": "Good",         "coverShipping": false,             "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"         ],         "price": "240",         "tags": [           "tag2",           "5'0\"-5'4\""         ],         "title": "blah blah 17\" Frame",         "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"           }      ]"""  x = json.loads(x)  f = csv.writer(open("example.csv", "wb+"))  f.writerow(["Handle","Title","Body (HTML)", "Vendor","Type","Tags","Published","Option1 Name","Option1 Value","Variant Inventory Qty","Variant Inventory Policy","Variant Fulfillment Service","Variant Price","Variant Requires Shipping","Variant Taxable","Image Src"])      for x in x:          allTags = "\"" + ','.join(x["tags"]) + "\""         images = x["photos"]         f.writerow([x["title"],                     x["title"],                     x["description"],                     "Vendor Name",                     "Widget",                     allTags,                     "TRUE",                     "Title",                     "Default Title",                     "1",                     "deny",                     "manual",                     x["price"],                     "TRUE",                     "TRUE",                     images.pop(0) if images else None])         while images:             f.writerow([x["title"],None,None,None,None,None,None,None,None,None,None,None,None,None,None,images.pop(0)]) 

ERROR MESSAGE: Full traceback that I see: Traceback (most recent call last):

Traceback (most recent call last): File "runnit2.py", line 976, in <module> allTags = "\"" + ','.join(x["tags"]) + "\"" TypeError: sequence item 3: expected string or Unicode, NoneType found

UPDATE: I've identified that the data, specifically [x["title"], x["title"],x["description"], has some characters that the code doesn't like. 'ascii' codec can't encode character u'\u201d' in position 9: ordinal not in range(128). I've done a quick fix with x["description"].encode('utf-8'), etc., but it pretty much eliminates everything that's in that cell. Is there is a better way which doesn't delete everything after offending character?

4 Answers

Answers 1

Use raw string and set file encoding to utf-8 in normal (non-binary mode) mode when opening. For Python 3.6 it will be enough.

On Python 2.7 you should use codecs.open('example.csv', 'w', encoding='utf-8') instead of regular open() when dealing with unicode content. Also, csv module on Python 2.7 does not support unicode out of the box, so I suggest switching to unicodecsv or following the guidelines in this answer.

Answers 2

Modify reading and writing using W If you must use WB, use the following functions. You need to add r in front of all texts to handle special symbols.

import csv import json  x = r"""[       {         "brief": "Brief 1",         "description": "Description 1",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"         ],         "price": "145",         "tags": [           "tag1",           "tag2",           "tag3"         ],         "title": "Title 1"       },       {         "brief": "Brief 2",         "description": "Description 2",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"         ],         "price": "150",         "tags": [           "tag4",           "tag5",           "tag6",           "tag7",           "tag8"         ],         "title": "Title 2"       },{         "brief": "blah blah 5'0\" to 5'4\"",         "buyerPickup": true,         "condition": "Good",         "coverShipping": false,             "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",         "photos": [           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",           "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"         ],         "price": "240",         "tags": [           "tag2",           "5'0\"-5'4\""         ],         "title": "blah blah 17\" Frame",         "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"           }      ]"""  x = json.loads(x)   def to_str(bytes_or_str):     if isinstance(bytes_or_str, bytes):         value = bytes_or_str.encode('utf-8')     else:         value = bytes_or_str     return value   def to_bytes(bytes_or_str):     if isinstance(bytes_or_str, str):         value = bytes_or_str.encode('utf-8')     else:         value = bytes_or_str      return value   f = csv.writer(open("example.csv", "w+")) writeList = ["Handle", "Title", "Body (HTML)", "Vendor", "Type", "Tags", "Published", "Option1 Name", "Option1 Value",              "Variant Inventory Qty", "Variant Inventory Policy", "Variant Fulfillment Service", "Variant Price",              "Variant Requires Shipping", "Variant Taxable", "Image Src"] newList = [] for item in writeList:     newList.append(to_bytes(item))  f.writerow(newList)  for x in x:      allTags = r"\"" + ','.join(x["tags"]) + r"\""     images = x["photos"]     f.writerow([x["title"],                 x["title"],                 x["description"],                 "Vendor Name",                 "Widget",                 allTags,                 "TRUE",                 "Title",                 "Default Title",                 "1",                 "deny",                 "manual",                 x["price"],                 "TRUE",                 "TRUE",                 images.pop(0) if images else None])     while images:         f.writerow([x["title"], None, None, None, None, None, None, None, None, None, None, None, None, None, None,                     images.pop(0)]) 

Answers 3

From your posted sample data, I assume that the 1st index of the posted json has a null in the 3rd index of the values of tag key. i.e: tag7

"tags": [           "tag4",           "tag5",           "tag6",           "tag7",           "tag8"         ], 

To get rid of the TypeError that raises due to nulls you can simply check and replace the nulls if they exist as shown below.

x["tags"] = ["" if i is None else i for i in x["tags"]] allTags = "\"" + ','.join(x["tags"]) + "\"" 

I have assigned an empty string to replace nulls.

Alternatively you can remove all the false elements by using None in the filter() function.

allTags = "\"" + ','.join(filter(None, x["tags"])) + "\"" 

NOTE: Add r"[...]" and fix the indentation issue in the for loop.

Answers 4

Possible duplicate of this question how to convert characters like \x22 into string

On cleaning the code the error boils down to

import json  x = '''   {     "brief": "\""   }'''  x = json.loads(x) 

Consider replacing \" with \u201d

import json  x = '{"brief": "\u201d"}'  x = json.loads(x) 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment