Wednesday, April 5, 2017

Dealing with disassociated records in a poorly designed database

Leave a Comment

Overview

I have inherited a website that allows users to order customised products. The customisations were saved in a way that disassociates them from their record. I would like to modify the db so these records can be associated.

Example

Users can get Product #1 "stock", or customise it, changing as many as 10 different properties. Let's say color, fabric, width, height etc.

Orders can, and regularly do, contain multiple products, each of which may be customised.

Additionally, users can save their orders, so they can re-order later.

When the database was designed, the details of the order was neatly organised into individual columns. Customer name, address, payment type etc. But the list of products and more notably their customisations were saved as a JSON string in a single column. For ease, let's call this column the "cart".

Basically, the order table has a column cart and the cart column contains a JSON-formatted list of products and customisations.

Unfortunately, the JSON object has reference ids to the product table, but lacks references to the customisation table. Instead it uses a bunch of strings meant for a human to read. Fortunately those strings exist in the customisation table, but they were written as the cart was created.

The problem we face is that the list of customisations can be changed by a CMS. So far, they haven't been changed. 🙏 But they will need to be soon and that's going to cause problems:

Problems

  1. If a customisation option is removed (say, a fabric option) and a customer re-orders from an old saved order, we need to be able to parse the cart, detect this and warn them of the change.

  2. Customisations are currently immutable. Once a product is added to the cart, it cannot be changed. Users need to delete and re-add to make a single change. Poor UX.

  3. If anyone changes the human-readable text on a customisation we're dead. ☠️

Questions

  • How would you design this if you were staring from scratch?

  • How might we go about converting the current implementation and legacy data to this new schema?

I don't know if stack is notable, but we're on Postgres and Django-Python.

2 Answers

Answers 1

I would implement this with the following tables:

Products {   productId                   // primary key   name   price }  Customization_Types {   customizationTypeId         // primary key   name                        // e.g. COLOR, FABRIC, LENGTH }  Customizations {   customizationId             // primary key   customizationTypeId         // foreign key   value                       // e.g. BEIGE, VELVET, 8 }  Product_Customizations {   productCustomizationId      // primary key   productId                   // foreign key   customizationId             // foreign key   priceModifier               // price markup for applying the customization   isValid                     // false if this record is invalid/obsolete }  Orders {   orderId                     // primary key   customerId                  // foreign key }  Product_Orders {   productOrderId              // primary key   orderId                     // foreign key   productId                   // foreign key   quantity }  Customization_Orders {   customizationOrderId        // primary key   productOrderId              // foreign key   productCustomizationId      // foreign key } 

The Products table contains the data for your base products - name, price, etc

The Customization_Types table contains the type names for your different customizations - COLOR, FABRIC, LENGTH, etc

The Customizations table contains a link to a customizationTypeId as well as a legal value - I'm assuming that users can't enter arbitrary numerical values (for e.g. LENGTH or WIDTH) i.e. they're given a drop-down box instead of a text box, however if they can enter arbitrary numerical data then you'll need MIN/MAX fields that are null for named constraints (so e.g. you could have Type:COLOR/Value:BEIGE/Min:NULL/Max:NULL or Type:LENGTH/Value:NULL/Min:4/Max:8)

The Product_Customizations table links a Customization to a Product, so for example if ProductX can come in BEIGE then you would create a Product_Customization record that links ProductX to BEIGE.

The Orders table just contains an orderId and anything else relevant to the order (e.g. a link to the customerId and shippingAddressId)

Product_Orders links a product to an order

Customization_Orders links a Product_Customization to a Product_Order


Let's say a customer orders ProductX in BEIGE and LENGTH=8, then you would create an Order record, a Product_Order record with a link to ProductX, and two Customization_Order records - one linked to COLOR=BEIGE and one linked to LENGTH=8.

This should make it easy to modify a product's customizations without having to reload the entire product - the user can modify color to COLOR=RED without touching the length customization (delete the old Customization_Order:COLOR=BEIGE record and create a new COLOR=RED record), or the user can remove the length customization without touching the color customization (delete the old Customization_Order:LENGTH=8 record).

When reloading an old order/product you can quickly verify that the same productCustomizationIds still apply to the product in question, else flag the user. Additionally, you can flag the user if the customization still applies but the customization's price modifier has changed.


As far as converting the legacy data, I'm not familiar with Python but I do have experience with reading JSON via Java and I'm assuming that Python offers similar if not better libraries for this. The trick is going to be matching the existing data to pre-loaded Product_Customization data - if the data fails to match then create a new Product_Customization row corresponding to it with isValid=FALSE (this is assuming that the customization in question is no longer offered), and when you get a chance manually iterate through the invalid Product_Customization rows to ensure that these really are unmatched customizations and not just parsing errors.

Answers 2

Little improvement to Zim-Zam's answer.

Even better approach is to store not plain values (BEIGE, VELVET, 8) as customization parameters, but kind of schema from which code can build up correct view of a customization.

It could be just JSON/XML formatted text. And the entity that is responsible for building view and applying logic should be able to work with JSON data of different versions.

For example, if properties of a customization have changed and something new has been added, in that case you only need to change code and adjusted JSON will be saved. No need to change existing data. Also there should be possibility to read old JSON versions with old properties and work with it.

Two possible ways of what to do if you read an old entity from DB:

  1. View builder will ignore all old properties of a customization, add new properties and set their values to default. I would go with that personally.
  2. Old view is presented to user, but when user clicks, for example, Ok button or Finish, additional logic will check that there are old properties and notifies user that they should be removed manually or just removes them automatically.

More flexible approach that requires only code changes without touching db and allows to show user old customization properties if it is necessary.

Update: Customizations could have two kind of properties: one that administrator can define, such as title or price, which are not frequently changed and common for all customizations and another one such as size and color which could be changed frequently, could have user defined values and are not common for all customizations.

The first kind should be stored in Customization table as separate columns. That will allow to changed such properties in administrative panel and have all previously stored data consistent.

The second kind of properties could be 1) frequently changed 2) not all customization types could have such properties. It is a bad idea to store them as separate columns because if there are huge amount of data, changing column type or adding new column could cause performance degradation and sometimes could not be possible due to incompatible types of properties.
Actually, if they are stored as separate columns, you are probably will have to change code to support new properties anyway.

My idea is that you still allow administrator to change type of such properties and add new one or remove old one through some interface. The key thing here is that you are storing JSON data like this

{     "properties": {             {                 "propertyName": "height",                 "propertyType": "int",                 "min" : 10,                 "max" : 25,             },             {                 "propertyName": "color",                 "propertyType": "color",             },             {                 "propertyName": "anotherCustomField",                 "propertyType": "text",             },         } } 

What's left to do is to implement view builders or renderers for all kinds of property type. And add a separate table with only values. You fetched a customization record from db, you found out which customization properties there are, checked which one are still valid and rendered only valid ones. If administrator changed type of customization's property or just removed one, you marked that customization's property as not valid in db and that's all the work. No code changes, no database schema changes.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment