Sunday, June 18, 2017

Auto save server architecture

Leave a Comment

I want to save a lengthy form's inputs at the server. But I don't think making db calls on each auto-save action is the best approach to go for.

What would constitute as a good approach to solve this?

Another problem is that I have 3 app servers. So in memory cache wouldn't work.

I was thinking keeping the data in redis and updating it on every call and finally updating the db. But since I have 3 servers how do I make sure the calls are in queue?

Can anyone help with the architecture?

3 Answers

Answers 1

But I don't think making db calls on each auto-save action is the best approach to go for.

That's the real question, let's start with that. Why would you think that? You want auto-save, right? This is the only thing that saves user work.

All the other options you listed (memcached/redis, in-process caching) - not only do they not save user work, they're ticking time bombs. Think of all things that can fail there: redis dies, network is split, the whole data center is hit by lightning.

Why create all the complexity when you can just... save? You may find out that it's not that slow (if this was your concern).

Answers 2

This solution adopted industry wide successfully. Configure a 3 node redis cluster which take care of replication of data. Writes happen only to the master node.

redis (master) - app server 1, redis (slave1) - app server 2, redis (slave2) - app server 3

Adding a slave is straighforward using the slaveof :port command. Replication is done over the wire (not temporary disk storage) Link -https://redis.io/topics/replication

Answers 3

This is a very classical problem faced when scaling your architecture, but lets come to scaling later as essentially your initial app requires many calls to the data base level lets optimize that first.

Since you have not given any details to your iops I'll describe below the approaches to solving it, in increasing order of load, all approaches are of cascading nature, that is the last approach actually is built on all previous solutions :

The Direct Database Approach {db calls on each auto-save action}:

client -> application layer -> database (save data here directly)

Where basically on any data update we directly propagate it to the database level. The main block in this schema is the database Things to take care in this approach :

  • For the most popularly used relational databases like Mysql :

    • An insert query takes less time than an update query, for a college project you could keep updating the row against the same primary key and it would work beautifully.
    • But in any mid sized application where a single form modify takes 30-40 requests the update query would block your db resource.

    • So keep a addendum kind of schema, like maybe maintain a secondary key like status that keeps track of till which level the user has filled the form but keep inserting data for each update. And always read as per the most recent status inserted.

    • For further optimization use of indexes such as foreign key constraints should be applied

    When this step fails, next step being the database itself, the next step to optimise on is, The type of data you're dealing with, you can choose a schema-less db like mongo, dynamodb etc for non transactional data

  • Use a schema-less db can be very helpful for large amounts of non-transactional data as inherently they allow for the addendum approach to the same row of data.

    • For additional optimization use secondary indexes to query data faster.

Involving The Application Layer Approach {Application layer caching }:

client -> application layer (save some data here then propagate later) -> database (Finally save here)

  • When the simple database is unable to serve and scale as required the easiest way is to shift some load on to your API server . As horizontal scaling of the same is much easier and cheaper.
  • It is exactly as you understand the memory cache approach, though do not go about designing your own - It takes years of understanding large scale web infrastructure, then also people are not able to design an efficient cache on the application layer.
  • use something like Express Session , I was using this for a web app having 10 ec2 nodejs instances, storing about 15mb of user data per session . It scales beautifully, maintains unique user session data across all servers - so on each auto update save data to user session - on form submit write to database from session. Can be easily scaled by adding more api servers, best use case with use it to save data on redis {Why re-invent the wheel ?}

Re-inventing the wheel : Custom level application layer caching + db caching + db optimize

client -> application layer (save some data here) ->{ Add your db cache}-> database (save data here finally)

  • This is where we come to designing our own thing using redis/Dynamodb/mongo to act as a cache for your primary database. (Please note if using a non transactional db in the first place go for the addendum approach - this is purely much more suited to scale transactional databases by adding a wrapper of a non transactional database)

  • Also express session works in the same way actually by caching the data on redis at the application layer, always try to lessen the number of calls to db as much as possible.

  • So if you have a fully functioning application layer caching and an optimized db then go for this approach, as it needs experienced developers and is resource intensive and usually employed for very large scale applications like I had a layer of redis caching after the express session for an application that avgd iops of 300k reqs per second

  • The approach here would be to save data on user session, apply a lazy write back to your cache. Maritain a cache ledger then write to your main database. For the queuing approach in such a large scale system I had written an entire separate microservice that worked in background to transport data from session to redis to mysql, for your concern of how to maintain a queue read more about priority queues and background workers . I used Kue is a priority job queue backed by redis, built for node.js.

  • Maintaining parallel and sequential queues

  • php client for KUE
  • Background Services
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment