Sunday, January 29, 2017

Stuck in implementing a method for mapping symbols to an interval - if-else loop not working properly implementation does not match theory

Leave a Comment

I am trying out an encoding - decoding method that had been asked in this post Matlab : Help in implementing a mathematical equation for generating multi level quantization

and a related one Generate random number with given probability matlab

There are 2 parts to this question - encoding and decoding. Encoding of a symbolic sequence is done using inverse interval mapping using the map f_inv. The method of inverse interval mapping yields a real valued number. Based on the real valued number, we iterate the map f(). The solution in the post in the first link does not work - because once the final interval is found, the iteration of the map f() using the proposed solution does not yield the same exact symbolic array. So, I tried by directly implementing the equations for the forward iteration f() given in the paper for the decoding process, but the decoding does not generate the same symbolic sequence.

Here is a breif explanation of the problem.

Let there be an array b = [1,3,2,6,1] containing N = 5 integer valued elements with probability of occurence of each unique integer as 0.4, 0.2, 0.2, 0.2 respectively. The array b can take any integers from the unique symbol set 1,2,3,4,5,6,7,8. Let n = 8 elements in the symbol set. In essence, the probability for the above data b is p= [ 0.4 (for symbol 1), 0.2 (for symbol 2) , 0.2 (symbol 3), 0 (for symbol 4 not occuring), 0 (for symbol 5), 0.2(for symbol 6), 0 (for symbol 7), 0 (for symbol 8)]

An interval [0,1] is split into 8 regions. Let, the interval for the data b assumed to be known as Interval_b = [0, 0.4, 0.6, 0.8, 1];

In general, for n = 8 unique symbols, there are n = 8 intervals such as I_1, I_2, I_3, I_4, I_5, I_6, I_6,I_7,I_8 and each of these intervals is assigned a symbol such as [ 1 2 3 4 5 6 7 8]

Let, x = 0.2848 that has been obtained from the reverse interval mapping for the symbol array b from the solution for the encoding procedure in the link. There is a mapping rule which maps x to the symbol depending on the interval in which x lies and we should obtain the same symbol elements as in b. The rule is

if x in I_1, assign symbol = 1 and  y= x/p(1); if x lies in I_2, assign symbol = 2 and y = (x-p(1))./p(2); if x lies in  I_3 , assign symbol = 3 y = (x-(p(1)+p(2)))./p(3);; if x lies in  I_4 , assign symbol = 4  y = (x-(p(1)+p(2)+p(3)))./p(4);; if x lies in  I_5 , assign symbol = 5 y = (x-(p(1)+p(2)+p(3)+p(4)))./p(5); if x lies in  I_6 , assign symbol = 6  y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)))./p(6); if x lies in  I_7, assign symbol = 7 and y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)+p(6)))./p(7) if x lies in  I_8, assign symbol = 8 and y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)+p(6) + p(7)))./p(8) 

where y is basically the next value of x. In this way, I will get an array of floating point numbers x and symbols = b. I need to map the elements in x to these symbols using the intervals and obtain the value y.

Theoretically, for x = 0.2848, the code goes to the first if branch since x is in interval I_1 (x lies in the range [0,0.4) in Interval_b ). This yields symbol = 1, which is the same as b(1) = 1. Then, the value of y should be y = 0.2848 /0.4 = 0.7120. Then, based on this y value obtained, I find the next x or y and its corresponding symbol. y = 0.7120 lies in between the range [0.6, 0.8), so the corresponding symbol is 3. In this way I should get an array symbols which must be the same as the original array b = [1,3,2,6,1] and an array y.

PROBLEMS : During decoding, I am getting infinity value for y and all incorrect results for symbols. This is because, the probability of occurence of symbol 7 is zero, but still the code goes to that if statement. How to associate an interval to the probability of occurence of integers / symbols in array b so that the correct if-else branch is selected. Then, the if-else branch for the symbol not occuring and hence its probability =0 will not be visited. Example:

  [y1,symbol1] = ObtainSymbols(x(1),p_arr,Interval);     [y2,symbol2] = ObtainSymbols(y1,p_arr,Interval);     [y3,symbol3] = ObtainSymbols(y2,p_arr,Interval);     [y4,symbol4] = ObtainSymbols(y3,p_arr,Interval);     [y5,symbol5] = ObtainSymbols(y4,p_arr,Interval);     Symbols = [symbol1,symbol2,symbol3,symbol4,symbol5]     y = [y1,y2,y3,y4,y5]  Symbols =       1     3     2     3     4   y =      0.7120    0.5600    0.8000    1.0000       Inf 

Unable how to properly apply the Interval array and the if-else condition. Please help.

UPDATE : Based on the revised answer in the first link, still the same problem persists. Here is the full implementation where the decoding still does not produce the exact symbol sequence and also throws error. How can I use the intervals and plug into the equation for f() map that is implemented in the function ObtainSymbols()

N = 5; b = [1,3,2,6,1];   [uniqueSym,~,idxUnq]=unique(b); p = hist(b , uniqueSym); p = p/sum(p); Interval = cumsum([0 p]);    f_inv = @(I, idxsymbol) Interval(idxsymbol) + p(idxsymbol) * I; Reduced_interval = [0; 1]; for k = N:-1:1     Reduced_interval = f_inv(Reduced_interval , idxUnq(k)); end  %interp1 is very useful function to find interval of a X.  %X = rand(1,N); %compute center of intervals X = mean(Reduced_interval); ii= zeros(1,N); x(1) = X;  %find indexes of intervals related to X for k = 1:N     ii(k)=interp1(Interval(1:end-1), 1:numel(uniqueSym), X,'previous','extrap');     %octave ii(k)=interp1(sigma(1:end-1), 1:numel(uniqueSym), X,'left','extrap');     ii(k) = min(floor(ii(k)), numel(uniqueSym));     X = (X - Interval(ii(k))) ./ p(ii(k));     fx(k) = X; end  result = uniqueSym(ii);   p_1 = sum(b==1)/length(b); p_2 = sum(b==2)/length(b); p_3 = sum(b==3)/length(b); p_4 = sum(b==4)/length(b); p_5 = sum(b==5)/length(b); p_6 = sum(b==6)/length(b); p_7 = sum(b==7)/length(b); p_8 = sum(b==8)/length(b);  p_arr = [p_1,p_2,p_3,p_4,p_5,p_6,p_7,p_8];       % recompute Interval for all symbols     Interval = cumsum([0, p_arr]);     [y1,symbol1] = ObtainSymbols(x(1),p_arr,Interval);     Symbols = [symbol1,symbol2,symbol3,symbol4,symbol5]     y = [y1,y2,y3,y4,y5]          function [y,sym] = ObtainSymbols(x,p,Interval)  if (double(x)>=Interval(1)) && (double(x)<Interval(2))  %interval I1     y= x/p(1);     sym =  1;  elseif (double(x)>=Interval(2)) && (double(x)<Interval(3)) %interval I2     y = (x-p(1))./p(2);     sym =  2;   elseif (double(x)>=Interval(3)) && (double(x)<Interval(4)) %interval I3       y = (x-(p(1)+p(2)))./p(3);       sym =  3;  elseif (double(x)>=Interval(4)) && (double(x)<Interval(5)) %interval I4     y = (x-(p(1)+p(2)+p(3)))./p(4);     sym =  4;  elseif (double(x)>=Interval(5)) && (double(x)<Interval(6)) %interval I5    y = (x-(p(1)+p(2)+p(3)+p(4)))./p(5);    sym =  5;  elseif (double(x)>=Interval(6)) && (double(x)<Interval(7)) %interval I6     y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)))./p(6);     sym =  6;%interval I6  elseif (double(x)>=Interval(7)) && (double(x)<Interval(8))      y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)+p(6)))./p(7);     sym = 7; else y = (x-(p(1)+p(2)+p(3)+p(4)+p(5)+p(6)+p(7)))./p(8) ;    sym = 8;      end 

There is a problem in the way the Interval array is calculated and implemented in the original code because the use of the Interval array gives the following error:

Attempted to access Interval(6); index out of bounds because numel(Interval)=5.  Error in ObtainSymbols (line 16) elseif (double(x)>=Interval(5)) && (double(x)<Interval(6)) %interval I5  Error in Untitled11 (line 69)     [y5,symbol5] = ObtainSymbols(y4,p_arr,Interval); 

The error means that there should be an interval for every element occuring in the data, otherwise the if-else branch will not be visited properly.

UPDATE based on the answer by @askadv:

clear all n = 8 N = 20; b = randi([0 n-1],1,N); % this is a different array of symbols     [uniqueSym,~,idxUnq]=unique(b); p = hist(b , uniqueSym); p = p/sum(p); Interval = cumsum([0 p]);    f_inv = @(I, idxsymbol) Interval(idxsymbol) + p(idxsymbol) * I; Reduced_interval = [0; 1]; for k = N:-1:1     Reduced_interval = f_inv(Reduced_interval , idxUnq(k)); end  %interp1 is very useful function to find interval of a X.  %X = rand(1,N); %compute center of intervals X = mean(Reduced_interval); ii= zeros(1,N); x(1) = X;  %find indexes of intervals related to X for k = 1:N     ii(k)=interp1(Interval(1:end-1), 1:numel(uniqueSym), X,'previous','extrap');     %octave ii(k)=interp1(sigma(1:end-1), 1:numel(uniqueSym), X,'left','extrap');     ii(k) = min(floor(ii(k)), numel(uniqueSym));     X = (X - Interval(ii(k))) ./ p(ii(k));     end   p_1 = sum(b==1)/length(b); p_2 = sum(b==2)/length(b); p_3 = sum(b==3)/length(b); p_4 = sum(b==4)/length(b); p_5 = sum(b==5)/length(b); p_6 = sum(b==6)/length(b); p_7 = sum(b==7)/length(b); p_8 = sum(b==8)/length(b);  p_arr = [p_1,p_2,p_3,p_4,p_5,p_6,p_7,p_8];   % recompute Interval for all symbols Interval = cumsum([0, p_arr]);   [y1,symbol1] = ObtainSymbols(x(1),p_arr,Interval);   %looping Symbols= zeros(1,N); y = zeros(1,N); y(1) = y1; Symbols(1) = symbol1; for k = 2:N    [y(k),symbol(k)] = ObtainSymbols(y(k-1),p_arr,Interval);    Symbols(k) = symbol(k); end 

Symbols does not contain the same values as in b array. The revised interval found in the line Interval = cumsum([0, p_arr]); gives two zeros in the beginning. This changes the flow of the if-else loop. So how to modify the function ObtainSymbols?

`0  0   0.100000000000000   0.250000000000000   0.450000000000000   0.700000000000000   0.750000000000000   0.800000000000000   0.800000000000000` 

1 Answers

Answers 1

Looks like the argument Interval passed to function ObtainSymbols should contain entries for all elements, including the ones with probability 0. This can be done by adding the statement

Interval = cumsum([0, p_arr]); 

immediately before the calls to function ObtainSymbols.

The following is the output with this modificaiton:

... p_arr = [p_1,p_2,p_3,p_4,p_5,p_6,p_7,p_8]; % unchanged script above this  % recompute Interval for all symbols Interval = cumsum([0, p_arr]); % [0   0.4   0.6   0.8   0.8   0.8   1.0   1.0   1.0]  % unchanged script below     [y1,symbol1] = ObtainSymbols(x(1),p_arr,Interval); [y2,symbol2] = ObtainSymbols(y1,p_arr,Interval); [y3,symbol3] = ObtainSymbols(y2,p_arr,Interval); [y4,symbol4] = ObtainSymbols(y3,p_arr,Interval); [y5,symbol5] = ObtainSymbols(y4,p_arr,Interval); Symbols = [symbol1,symbol2,symbol3,symbol4,symbol5] y = [y1,y2,y3,y4,y5]  % Symbols = [1     3     2     6     1] % y = [0.7136    0.5680    0.8400    0.2000    0.5000] 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment