Tuesday, May 3, 2016

Porting socket server from Node.js to C#

Leave a Comment

I've built multiple socket server apps in Node.js for a multi-user artificial intelligence app. We're looking at 1K to 10K active socket connections per box. However even when idle and with 0 active connections, some of my servers consume 50-100 MB of memory when running on Unix. I'm sure with a sensible platform like C# or C++, this should be close to 0 MB. So we are considering a port to a "better" platform. Now let my clarify my use case:

  • This is not a "web server". No files are served.
  • We do lots of CPU intensive data processing and certain portions have already been ported to C++ and pulled into node via native modules.
  • We don't need to access much I/O (in most cases a few files are accessed, in some cases none, we don't use an RDBMS either)

We went with node because it was Unix friendly (unlike .NET) and seemed easy to use. But with its current memory consumption we need to evaluate other options. Many have compared Node.js with ASP.NET but I need to build a socket server in C# or C++.

I have significant experience with .NET and C++. There are libs like SuperSocket (used by Redgate and Telerik) that handle all of the low-level stuff in .NET. I will have to find a similar socket framework for C++.

So putting this all together, what are the advantages of using .NET or C++ over Node.js? And considering my servers are highly CPU-bound (not I/O bound) would the benefits of using .NET/C++ be significant or should I stick with Node.js? Any other comments regarding porting a Node.js app to C# or C++?

Bounty: I need advice and a recommended socket server library/implementation/example app in C# and/or C++. Must be open source. I need it to be high-performance, async and bug-free. Must support binary data transfer. Must run on Windows. Unix is a bonus.

2 Answers

Answers 1

We're looking at 1K to 10K active socket connections per box

the bottleneck here is not the programing language or the technology, it's the hardware and OS support. the thing that limits the amount of concurrent sockets count is basically the machine you're running on. yet, from my experience, the determinisitic object lifetime of C++ can help dramatically for supporting large number of concurrent OS resources.

This is not a "web server". No files are served.

I have done some Node.js in my profesional work, I have done some C# but mostly C++. even with node.js as a web server, most of the client and server code didn't had many much in common besides the language itself. the web server dealt with buisness logic mostly, while the client dealt with fetching and presenting the data interactivly. So, I think the main advantage of node.js as a web server is that it gives purist-JS developers the ability to write server side without using languages/technology they are not familliar with.

We do lots of CPU intensive data processing and certain portions have already been ported to C++ and pulled into node via native modules.

yep. using strongly typed language can do wonders here. no redunadand runtime-parsing.

We don't need to access much I/O (in most cases a few files are accessed, in some cases none, we don't use an RDBMS either)

Well, I feel there's a myth in the air that node.js somehow handles IO better than other technologies. this is simply wrong. the main feature of Node.js is the fact that by default, the IO is asynchronous. but Node.js didn't invent any wheel. you have asynchronous IO in Java (aka Java.NIO), C# (async/await) and C++ (as native stuff like epoll/IOCompletionPort, or some higher stuff like Boost.ASIO/ CPP-rest, Proxygen etc.)

We went with node because it was Unix friendly (unlike .NET)

.Net Core is a relativly new technology where .Net can run on Unix-based systems (like linux)

I will have to find a similar socket framework for C++.

Boost.ASIO, or write something yourself, it's really not that hard..

So putting this all together, what are the advantages of using .NET or C++ over Node.js?

better CPU usage: because C++ and C# are strongly typed languages, and C++ is a statically compiled language, there are huge oppretunities for the compiler to optimize CPU extensive jobs.

lower memory footprint: usually because strongly typed languages have smaller objects without the overhead of keeping a lot of meta-data behind the scences. with C++, having stack allocation and scoped object life-time usually the memory footprint is low. again, it depends on the quality of the code in any language.

no callback hell: C# has tasks and async await. C++ has futures/promises and some compilers (aka VC++) do supports await as well. the asynchronous code simply becomes pure fun to write as oppossed to callbacks. yes, I do aware of JS promises and the new async/await stuff, but they are relativly new compared to .Net implementation.

Compiler checks : since C# and C++ have to be compiled, a lot of silly bugs are caught in compile time. no "undefiend is not a function" or "cannot read property of undefined".

other than that it's pretty much a matter of choice.

Answers 2

We do lots of CPU intensive data processing

Node.js may have been the wrong choice from the start and it would probably never match performances of a C++ server. However, it can be pretty close, if you are doing things right. In addition, writing good C++ and a complete rewrite of a system is difficult and time consuming. So, I want to give some reasons for you to stick to Node.js or at least, completely exhaust all your options before you move.

my servers consume 50-100 MB

Are you using Node.js v0.12? With Node.js v4.2 LTS, idle Node.js server should use around 20 MB of memory. (It would probably never be near 0 MB because of V8) Have you checked for memory leaks?

1K to 10K active socket connections per box

This should be easily achievable. If you are using the most popular socket.io library, here's some relevant benchmarks.

on a 3.3 GHz Xeon X5470 using one core, the max messages-sent-per-second rate is around 9,000–10,000 depending on the concurrency level.

from: http://drewww.github.io/socket.io-benchmarking/ (Since, all these connections are kept alive concurrently, CPU usage matters more)

If you are already using that and having issues, try replacing socket.io with SocketCluster which is faster and more scalable. Replacing this should be easier than a complete rewrite. Here's some benchmarks:

8-core Amazon EC2 m3.2xlarge instance running Linux

at 42K, the CPU use of the busiest worker dropped to around 45%

http://socketcluster.io/#!/performance

Finally, to prove that Node.js can nearly reach C++ performance. Have a look at this:

servers use 12G memory

It supports 1,200,000 active websocket connections

https://github.com/smallnest/C1000K-Servers

My point is you have average performance goals that you should be able to reach with Node.js with little effort. Try to benchmark (https://github.com/machinezone/tcpkali) and find the issue rather than do a complete rewrite.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment