Wednesday, March 21, 2018

MongoDB map-reduce (via nodejs): How to include complex modules (with dependencies) in scopeObj?

Leave a Comment

I'm working on a complicated map-reduce process for a mongodb database. I've split some of the more complex code off into modules, which I then make available to my map/reduce/finalize functions by including it in my scopeObj like so:

  const scopeObj = {     userCalculations: require('../lib/userCalculations')   }    function myMapFn() {     let userScore = userCalculations.overallScoreForUser(this)     emit({       'Key': this.userGroup     }, {       'UserCount': 1,       'Score': userScore     })   }    function myReduceFn(key, objArr) { /*...*/ }    db.collection('userdocs').mapReduce(     myMapFn,     myReduceFn,     {       scope: scopeObj,       query: {},       out: {         merge: 'userstats'       }     },     function (err, stats) {       return cb(err, stats);     }   ) 

...This all works fine. I had until recently thought it wasn't possible to include module code into a map-reduce scopeObj, but it turns out that was just because the modules I was trying to include all had dependencies on other modules. Completely standalone modules appear to work just fine.

Which brings me (finally) to my question. How can I -- or, for that matter, should I -- incorporate more complex modules, including things I've pulled from npm, into my map-reduce code? One thought I had was using Browserify or something similar to pull all my dependencies into a single file, then include it somehow... but I'm not sure what the right way to do that would be. And I'm also not sure of the extent to which I'm risking severely bloating my map-reduce code, which (for obvious reasons) has got to be efficient.

Does anyone have experience doing something like this? How did it work out, if at all? Am I going down a bad path here?

UPDATE: A clarification of what the issue is I'm trying to overcome: In the above code, require('../lib/userCalculations') is executed by Node -- it reads in the file ../lib/userCalculations.js and assigns the contents of that file's module.exports object to scopeObj.userCalculations. But let's say there's a call to require(...) somewhere within userCalculations.js. That call isn't actually executed yet. So, when I try to call userCalculations.overallScoreForUser() within the Map function, MongoDB attempts to execute the require function. And require isn't defined on mongo.

Browserify, for example, deals with this by compiling all the code from all the required modules into a single javascript file with no require calls, so it can be run in the browser. But that doesn't exactly work here, because I need to be the resulting code to itself be a module that I can use like I use userCalculations in the code sample. Maybe there's a weird way to run browserify that I'm not aware of? Or some other tool that just "flattens" a whole hierarchy of modules into a single module?

Hopefully that clarifies a bit.

1 Answers

Answers 1

As a generic response, the answer to your question: How can I -- or, for that matter, should I -- incorporate more complex modules, including things I've pulled from npm, into my map-reduce code? - is no, you can not safely include complex modules in node code you plan to send to MongoDB for mapReduce jobs.

You mentioned the problem yourself - nested require statements. Now, require is sync, but if you have nested functions inside, these require calls would not be executed until call time, and MongoDB VM would throw at this point.

Consider the following example of three files: data.json, dep.js and main.js.

// data.json - just something we require "lazily" false  // dep.js -- equivalent of your userCalculations module.exports = {   isValueTrue() {     // The problem: nested require     return require('./data.json');   } }   // main.js - from here you send your mapReduce to MongoDB. // require dependency instantly const calc = require('./dep.js'); // require is synchronous, the effectis the same if you do: //   const calc = (function () {return require('./dep.js')})();  console.log('Calc is loaded.'); // Let's mess with unwary devs require('fs').writeFileSync('./data.json', 'false');  // Is calc.isValueTrue() true or false here? console.log(calc.isValueTrue()); 

As a general solution, this is not feasible. While vast majority of modules will likely not have nested require statements, HTTP calls, or even internal, service calls, global variables and similar, there are those who do. You cannot guarantee that this would work.

Now, as a your local implementation: e.g. you require exactly specific versions of NPM modules that you have tested well with this technique and you know it will work, or you published them yourself, it is somewhat feasible.

However, even if this case, if this is a team effort, there's bound to be a developer down the line who will not know where your dependency is used or how, use globals (not on purpose, but by ommission, e.g they wrongly calculate this) or simply not know the implications of whatever they are doing. If you have strong integration testing suite, you could guard against this, but the thing is, it's unpredictable. Personally I think that when you can choose between unpredictable and predictable, almost always you should use predictable.

Now, if you have an explicitly stated purpose for a certain library to be used in MongoDB mapReduce, this would work. You would have to guard well against ommisions and problems, and have strong testing infra, but I would make certain the purpose is explicit before feeling safe enough to do this. But of course, if you're using something that is so complex that you need several npm packages to do, maybe you can have those functions directly on MongoDB server, maybe you can do your mapReducing in something better suited for the purpose, or similar.

To conclude: As a purposefuly built library with explicit mission statement that it is to be used with node and MongoDB mapReduce, I would make sure my tests cover all my mission-critical and important functionality, and then import such npm package. Otherwise I would not use nor recommend this approach.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment