I have two numpy masked arrays:

>>> x masked_array(data = [1 2 -- 4],              mask = [False False  True False],        fill_value = 999999) >>> y masked_array(data = [4 -- 0 4],              mask = [False  True False False],        fill_value = 999999)

If I try to divide x by y, the division operation is not actually performed when one of the operands is masked, so I don't get a divide-by-zero error.

>>> x/y masked_array(data = [0.25 -- -- 1.0],              mask = [False  True  True False],        fill_value = 1e+20)

This even works if I define my own division function, div:

>>> def div(a,b):     return a/b  >>> div(x, y) masked_array(data = [0.25 -- -- 1.0],              mask = [False  True  True False],        fill_value = 1e+20)

However, if I wrap my function with vectorize, the function is called on masked values and I get an error:

>>> np.vectorize(div)(x, y) Traceback (most recent call last):   File "<input>", line 1, in <module>   File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1811, in __call__     return self._vectorize_call(func=func, args=vargs)   File "/usr/lib64/python3.4/site-packages/numpy/lib/function_base.py", line 1880, in _vectorize_call     outputs = ufunc(*inputs)   File "<input>", line 2, in div ZeroDivisionError: division by zero

Is there a way I can call a function with array arguments, and have the function only be executed when all of the arguments are unmasked?

1 Answers

Answers 1

The problem

Calling the function directly worked because, when you call div(x,y), div's arguments a and b become the MaskedArrays x and y, and the resulting code for a/b is x.__div__(y) (or __truediv__).

Now, since x is a MaskedArray, it has the intelligence to perform the division on another MaskedArray, following its rules.

However, when you vectorize it, your div function is not going to see any MaskedArrays, just scalars, a couple of ints in this case. So, when it tries a/b in the third items, it will be 'something' by zero, and you get the error.

MaskedArray's implementation seems to be based on re-implementing much of Numpy specifically for MaskedArrays. See, for example, that you have both numpy.log and numpy.ma.log. Compare running both of them on a MaskedArray that contains negative values. Both actually return a proper MaskedArray, but the plain numpy version also outputs some complains about dividing by zero:

In [116]: x = masked_array(data = [-1, 2, 0, 4],      ...:              mask = [False, False,  True, False],      ...:        fill_value = 999999)  In [117]: numpy.log(x) /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log   #!/usr/bin/python3 /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log   #!/usr/bin/python3 Out[117]:  masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],              mask = [ True False  True False],        fill_value = 999999)  In [118]: numpy.ma.log(x) Out[118]:  masked_array(data = [-- 0.6931471805599453 -- 1.3862943611198906],              mask = [ True False  True False],        fill_value = 999999)

If you run the numpy.log version on a plain list, it will return nan and inf for invalid values, not throw an error like the ZeroDivisionError you're getting.

In [138]: a = [1,-1,0]  In [139]: numpy.log(a) /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in log   #!/usr/bin/python3 /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in log   #!/usr/bin/python3 Out[139]: array([  0.,  nan, -inf])

Simpler solution

With that, I see two alternatives: first, for the simpler case you listed, you could replace the bad values by a no-op: 1 in div's case (note that the data is slightly different from yours, as there is a zero you didn't mark as masked):

x = masked_array(data = [1, 2, 0, 4],              mask = [False, False,  True, False],        fill_value = 999999) y = masked_array(data = [4, 0, 0, 4],              mask = [False,  True, True, False],        fill_value = 999999) In [153]: numpy.vectorize(div)(x,y.filled(1)) Out[153]:  masked_array(data = [0.25 2.0 -- 1.0],              mask = [False False  True False],        fill_value = 999999)

The problem with that approach is that the filled values are listed as non-masked on the result, which is probably not what you want.

Better solution

Now, div was probably just an example, and you probably want more complex behavior for which there is not a 'no-op' argument. In this case, you can do as Numpy did for log, and avoid throwing an exception, instead returning a specific value. In this case, numpy.ma.masked. div's implementation becomes this:

In [154]: def div(a,b):      ...:     try:      ...:         return a/b      ...:     except Exception as e:      ...:         warnings.warn (str(e))      ...:         return numpy.ma.masked      ...:           ...:           In [155]: numpy.vectorize(div)(x,y) /usr/bin/ipython:5: UserWarning: division by zero   start_ipython() /usr/lib/python3.6/site-packages/numpy/lib/function_base.py:2813:     UserWarning: Warning: converting a masked element to nan.   res = array(outputs, copy=False, subok=True, dtype=otypes[0]) Out[155]:  masked_array(data = [0.25 -- -- 1.0],              mask = [False  True  True False],        fill_value = 999999)

Coding Question

Sunday, August 6, 2017

Calling function on valid values of masked arrays

1 Answers

Answers 1

The problem

Simpler solution

Better solution

More generic solution

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook