Define an algorithm which gets a number and a list and returns a scalar based on number's distance to average of the list

Let's suppose that we have got a list which appends an integer in each iteration which is between 15, 32(let's call the integer rand). I want to design an algorithm which assigns a reward around 1 (between 1.25 and 0.75) to each rand. the rule for assigning the reward goes like this.

first we calculate the average of the list. Then if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1. The more distance between average and rand, the more reward increases/decreases. for example:

rand = 15, avg = 23 then reward = 1.25

rand = 32, avg = 23 then reward = 0.75

rand = 23, avg = 23 then reward = 1 and so on.

I had developed the code below for this algorithm:

import numpy as np  rollouts = np.array([]) i = 0  def modify_reward(lst, rand):   reward = 1   constant1 = 0.25   constant2 = 1    std = np.std(lst)   global avg   avg = np.mean(lst)   sub = np.subtract(avg, rand)   landa = sub / std if std != 0 else 0   coefficient =  -1 + ( 2 / (1 + np.exp(-constant2 * landa)))   md_reward = reward + (reward * constant1 * coefficient)   return md_reward  while i < 100:   rand = np.random.randint(15, 33)   rollouts = np.append(rollouts, rand)   modified_reward = modify_reward(rollouts, rand)   i += 1   print([i,rand, avg, modified_reward])  # test the reward for upper bound and lower bound  rand1, rand2 = 15, 32 reward1, reward2 = modify_reward(rollouts, rand1), modify_reward(rollouts, rand2)  print(['reward for upper bound', rand1, avg, reward1]) print(['reward for lower bound', rand2, avg, reward2])

The algorithm works quite fine, but if you look at examples below, you would notice the problem with algorithm.

rand = 15, avg = 23.94 then reward = 1.17 # which has to be 1.25

rand = 32, avg = 23.94 then reward = 0.84 # which has to be 0.75

rand = 15, avg = 27.38 then reward = 1.15 # which has to be 1.25

rand = 32, avg = 27.38 then reward = 0.93 # which has to be 0.75

As you might have noticed, Algorithm doesn't consider the distance between avg and bounds (15, 32). The more avg moves towards lower bound or higher bound, the more modified_reward gets unbalanced.

I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound. Can anyone suggest some modification to this algorithm which could consider the distance between avg and bounds of the list.

4 Answers

Answers 1

I don't understand why you are calculating md_reward like this. Please provide logic and reason. But

landa = sub / std if std != 0 else 0 coefficient =  -1 + ( 2 / (1 + np.exp(-constant2 * landa))) md_reward = reward + (reward * constant1 * coefficient)

will not give what you are looking for. Because lets consider below cases

for md_reward to be .75  -->  coefficient should be -1     -->  landa == -infinite (negative large value, i.e. , rand should be much larger than 32)  for md_reward to be 1 -->  coefficient should be 0     -->  landa == 0 (std == 0 or sub == 0) # which is possible  for md_reward to be 1.25  -->  coefficient should be 1     -->  landa == infinite (positive large value, i.e. , rand should be much smaller than 15)

If you want to normalize reward from avg to max and avg to min. check below links. https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range https://stats.stackexchange.com/questions/70553/what-does-normalization-mean-and-how-to-verify-that-a-sample-or-a-distribution

Now modify your function with something below.

def modify_reward(lst, rand):     reward = 1     constant1 = 0.25     min_value = 15     max_value = 32     avg = np.mean(lst)     if rand >= avg:         md_reward = reward - constant1*(rand - avg)/(max_value - avg) # normalize rand from avg to max     else:         md_reward = reward + constant1*(1 - (rand - min_value)/(avg - min_value)) # normalize rand from min to avg     return md_reward

I have used below method

Normalized: (X−min(X))/(max(X)−min(X))

for case rand >= avg

min(X) will be avg and max(X) is max_value

and for case rand < avg

min(X) in min_value and max(X) is avg

Hope this helps.

Answers 2

Putting together these two requirements:

if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1.

I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound.

is slightly tricky, depending on what you mean by 'uniformly'.

If you want 15 to always be rewarded with 1.25, and 32 to always be rewarded with 0.75, you can't have a single linear relationship while also respecting your first requirement.

If you are happy with two linear relationships, you can aim for a situation where modified_reward depends on rand like this:

which I produced with this Wolfram Alpha query. As you can see, this is two linear relationships, with a 'knee' at avg. I expect you'll be able to derive the formulae for each part without too much trouble.

Answers 3

This code implements a linear distribution of weights proportional to the distance from average towards your given limits.

import numpy as np  class Rewarder(object):     lo = 15     hi = 32     weight = 0.25      def __init__(self):         self.lst = np.array([])      def append(self, x):         self.lst = np.append(self.lst, [x])      def average(self):         return np.mean(self.lst)      def distribution(self, a, x, b):         '''             Return a number between 0 and 1 proportional to              the distance of x from a towards b.             Note: Modify this fraction if you want a normal distribution             or quadratic etc.         '''         return (x - a) / (b - a)      def reward(self, x):         avg = self.average()          if x > avg :             w = self.distribution(avg, x, self.hi)         else:             w = - self.distribution(avg, x, self.lo)          return 1 - self.weight * w  rollouts = Rewarder() rollouts.append(23) print rollouts.reward(15) print rollouts.reward(32) print rollouts.reward(23)

Producing:

1.25 0.75 1.0

The code in your question seems to be using np.std which I presume is an attempt to get a normal distribution. Remember that the normal distribution never actually gets to zero.

If you tell me what shape you want for the distribution we can modify Rewarder.distribution to suit.

Answers 4

Try this

def modify_reward(lst, rand):     reward = 1     constant = 0.25 #Think of this as the +/- amount from initial reward      global avg     avg = np.mean(lst)     sub = np.subtract(avg, rand)     dreward = 0     if sub>0:         dreward = sub/(avg-15) #put your lower boundary instead of 15     elif sub<0:         dreward = sub/(32-avg) #put your higher boundary instead of 32     md_reward = reward +(dreward*constant)     return md_reward

This is the linear solution inspired by @AakashM. I don't know if this is what you were looking for, but this fits your description.

Coding Question

Sunday, October 8, 2017

Define an algorithm which gets a number and a list and returns a scalar based on number's distance to average of the list

4 Answers

Answers 1

Answers 2

Answers 3

Answers 4

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook