Tuesday, December 19, 2017

Interview quеstion: data structure to set all values in O(1)

Leave a Comment

Looking through interview questions on the Internet I encountered the following one.

Describe a data structure for which getValue(int index), setValue(int index, int value), and setAllValues(int value) are all O(1).

Though array is good enough for the first and second operations to be performed in O(1), what can be proposed for the third (setAllValues)?

10 Answers

Answers 1

How about an array of tuples {timestamp, value}, with an additional {timestamp, value} called all. Since you only care about the relative time of insertions, you can use a monotonically increasing id for the values of timestamp:

type Entry {   int timestamp,    int value }  type structure {     int     id     Entry   all     Entry[] array } 

Initialise all fields to 0. Then the following should work for you:

  • setValue(index i, value v):

    array[i] = {id++, value} 
  • value getValue(index i)

    if(all.timestamp > array[i].timestamp) return all.value else return array[i].value 
  • setAll(value v)

    all = {id++, value} 

A problem with this approach is that eventually you'll run out of ids for timestamp, and might wrap around. If you chose a 64 bit value to store timestamps, then this gives you 18,446,744,073,709,551,616 insertions or setAlls before this happens. Depending on the expected use of the datastructure, an O(n) cleanup phase might be appropriate, or you could just throw an exception.

Another issue that might need to be considered is multi-threading. Three obvious problems:

  • if id++ isn't atomic and two threads obtained a new id at the same time then you could get two entries with the same id.
  • if the incrementation of id and the assignment of the created tuple aren't atomic together (they're probably not) and there were two simultaneous inserts, then you could get a race condition where the older id wins.
  • if the assignment of the tuple isn't atomic, and there's an insert() at the same time as a get() then you might end up in a position where you've got say {new_id, old_value} in the array, causing the wrong value to be returned.

If any of these are problems, the absolute easiest solution to this is to put "not thread safe" in the documentation (cheating). Alternatively, if you can't implement the methods atomically in your language of choice, you'd need to put some sort of synchronisation locks around them.

Answers 2

How about an array of pointers to a single common value? Set the value, and all references will point to the single changed value in O(1)..

Answers 3

We can have a variable V which stores an int and an array of containing Tuple as {Value, id}..

And a global int variable G (which will act like identity in SQL and whenever any set or setAll operation is done its value get increment by 1)

initial all Ids and V value will be default say null..

so V = null All Tuple = {null, null} set(index i, val v) -> G = G+1, arr[i].Val = v, arr[i].Id = G  get(index i) -> if V.Id > arr[i].Id return V.Val else return arr[i]    set-all(val v) -> G= G+1, V.Id= G, V.Val = v 

Answers 4

I got the same question in one of the technical interviews.
Here is my complete ready-to-use Java-implementation including the test cases.

The key idea is to keep the value of setAll() in special variable (e.g. joker) and then trace the change of this value in a proper way.

In order to keep the code clean, some access modifiers have been abolished.

Node class:

class Node {      int value;     Integer joker;      Node(int val, Integer jok) {         value = val;         joker = jok;     } } 

DS class:

class DS {      Node[] arr;      DS(int len) {         arr = new Node[len];     } } 

DataStructure class:

class DataStructure {      private boolean isJokerChanged;     private Integer joker;     private DS myDS;      DataStructure(int len) {         myDS = new DS(len);     }      Integer get(int i) {          Integer result = null;          if (myDS.arr.length < i) {             return null;         }          // setAll() has been just called         if (isJokerChanged) {             return joker;         }          if (myDS.arr[i] == null) {              // setAll() has been previously called             if (joker != null) {                 result = joker;             } else {                 result = null;              }          } else if (myDS.arr[i].joker == joker) {             // cell value has been overridden after setAll() call             result = myDS.arr[i].value;         } else {             result = joker;         }          return result;     }      void setAll(int val) {         isJokerChanged = true;         joker = val;     }      void set(int i, int val) {         isJokerChanged = false;         myDS.arr[i] = new Node(val, joker);     } } 

Main class:

class Main {      public static void main(String[] args) {          DataStructure ds = new DataStructure(100);          Integer res;          res = ds.get(3);          ds.set(3, 10);          res = ds.get(3);          ds.setAll(6);          res = ds.get(3);          res = ds.get(15);          ds.set(4, 7);          res = ds.get(4);          res = ds.get(3);          ds.setAll(45);          ds.set(8, 2);          res = ds.get(3);     } } 

Answers 5

I was just asked this question very recently in an Interview. I came up with a hash table implementation. It would solve the problem of running out of the timestamp values but the thread safety feature needs to be implemented (probably using lazy initialization techniques)

Let's say in our class we have a private variable _defaultValue to hold a default value and we also have a private hashtable or dictionary _hashtable. SetAllValues could just set _defaultValue equal to the value passed and _hashtable initialized/set to a new hashtable and discard any reference to the old hash table. SetValue should just add a new value to _hashtable or update the value if the key ( or index) is already present in the _hashtable. GetValue should check whether the key (or index) is present in _hashtable, then return it, else return the value stored in _defaultValue.

This is my first answer on StackOverflow. I am a little lazy in writing up the code. Will probably edit the answer soon.

The interviewer nodded yes to this solution but insisted to implement it without using a hashtable. I guess, he was asking me to give a similar approach as Timothy's answer. And I wasn't able to get it at that moment :(. Anyways, Cheers!

EDIT: Posting the code (in C#)

class MyDataStruct {     private int _defaultValue;     private Dictionary<int,int> _hashtable;      public MyDataStruct()     {         _defaultValue = 0; // initialize default with some value         _hashtable = new Dictionary<int, int>();     }      public void SetAllValues(int val)     {         _defaultValue = val;         _hashtable = new Dictionary<int, int>();     }      public void SetValue(int index, int val)     {         if (_hashtable.ContainsKey(index))         {             _hashtable.Add(index, val);         }         else         {             _hashtable[index] = val;         }     }      public int GetValue(int index)     {         if (_hashtable.ContainsKey(index))         {             return _hashtable[index];         }         else         {             return _defaultValue;         }     } } 

Answers 6

All existing answers use a timestamp that is incremented on every setVal operation. This is not necessary. In fact, it is necessary only to increment the timestamp on setAll. Another issue some raised was arithmetic overflow. This can be handled without breaking the constant time bounds by updating a single cell on each setAll and performing the time comparison carefully.

How it works

The basic concept is essentially similar to that of the other answers, but with a twist.

What they do: Store the value used for the last setAll operation separately, and keep track of the time that operation was performed. Each time they perform a setVal, they store the current time along with the given value in the array. Each time they perform a getVal, they compare the time in the given location to the time the last setAll occurred, and then choose either the value at the location or the setAll value depending on which is greater.

Why this can be a problem: Suppose the current time overflows and a setAll operation occurs soon afterward. It will appear as though most of the stored array values are newer than the setAll value, when they are actually older.

The solution: Stop imagining that we're keeping track of the total amount of time that has passed since data structure initialization. Imagine a giant clock with a "second hand" that ticks not 60 times around the circle but rather 2^n times around the circle. The position of the second hand represents the time of the most recent setAll operation. Each setVal operation stores this time along with a value. So if we perform a setAll when the "clock" is at 45, and then perform six setVal operations on different elements, the setAll time and the times at all six locations will be the same. We wish to maintain the following invariant:

The time at a given element location equals the setAll time if and only if that element was set with setVal more recently than the last setAll operation.

Clearly, the procedure described above automatically ensures that if the element was set recently, then its time will equal the setAll time. The challenge is to make the reverse implication hold as well.

To be continued ....

The code

I've written this in Haskell because that is the language I know best, but it is not exactly the most natural language for the job.

{-# LANGUAGE BangPatterns #-} module RepArr where  import Control.Monad.Primitive import Data.Primitive.MutVar import qualified Data.Vector.Mutable as V import Data.Vector.Mutable (MVector) import Control.Applicative import Prelude hiding (replicate) import Control.Monad.ST import Data.Word  -- The Int in the MutVar is the refresh pointer data RepArr s a = RepArr (MutVar s (Word, a, Int)) (MVector s (Word,a))  -- Create a fancy array of a given length, initially filled with a given value replicate :: (PrimMonad m, Applicative m) => Int -> a -> m (RepArr (PrimState m) a) replicate n a = RepArr <$> newMutVar (0,a,0) <*> V.replicate n (0,a)  getVal :: PrimMonad m => RepArr (PrimState m) a -> Int -> m a getVal (RepArr allv vec) n = do   (vectime, vecval) <- V.read vec n   (alltime, allval, _) <- readMutVar allv   if (fromIntegral (alltime - vectime) :: Int) > 0     then return allval     else return vecval  setVal :: PrimMonad m => RepArr (PrimState m) a -> Int -> a -> m () setVal (RepArr allv vec) i v = do   (!alltime, _, _) <- readMutVar allv   V.write vec i (alltime, v)  setAll :: PrimMonad m => RepArr (PrimState m) a -> a -> m () setAll r@(RepArr allv vec) v = do   (oldt, _, oldc) <- readMutVar allv   getVal r oldc >>= setVal r oldc   let !newc = case oldc+1 of         op1 | op1 == V.length vec -> 0             | otherwise -> op1   let !newt = oldt+1   writeMutVar allv (newt, v, newc) 

To avoid potential (rare) garbage collection pauses, it's actually necessary to unbox the Int and Word values, as well as using unboxed vectors instead of polymorphic ones, but I'm not really in the mood and this is a fairly mechanical task.

Here's a version in C (completely untested):

#include <malloc.h>  struct Pair {   unsigned long time;   void* val; };  struct RepArr {   unsigned long allT;   void* allV;   long count;   long length;   struct Pair vec[]; };  struct RepArr *replicate (long n, void* val) {   struct RepArr *q = malloc (sizeof (struct RepArr)+n*sizeof (struct Pair));   q->allT = 1;   q->allV = val;   q->count = 0;   q->length = n;    int i;   for (i=0; i<n; i++) {     struct Pair foo = {0,val};     q->vec[i] = foo;   }   return q; }   void* getVal (struct RepArr *q, long i) {   if ((long)(q->vec[i].time - q->allT) < 0)     return q->allV;   else     return q->vec[i].val; }  void setVal (struct RepArr *q, long i, void* val) {   q->vec[i].time = q->allT;   q->vec[i].val = val; }  void setAll (struct RepArr *q, void* val) {   setVal (q, q->count, getVal (q, q->count));   q->allV = val;   q->allT++;   q->count++;   if (q->count == q->length)     q->count = 0; } 

Answers 7

/* 

At the time I am writing this, all of the solutions on this page will double (or more) the amount of space required to store the array. The following solution reduces the amount of wasted space from Ω(n) to θ(n/w), where w is the number of bits in a computer "word". On my machine, that's 64.

This prose in this answer is in C comments, so you can copy-and-paste this answer verbatim and compile it with your C compiler.

*/  #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <stdlib.h>  /* 

The problem is to support reading and writing values in an array in O(1) time along with bulk writes in which all values in the array are written at once in O(1) time. This is possible using a technique invented by Aho, Hopcroft and Ullman, as far as I know. I will present a version due to Gonzalo Navarro, "Constant-Time Array Initialization in Little Space".

The idea is to keep three metadata arrays along with the data array. We also keep two integers: unset, which is the last value used in a bulk write operation, and size, an approximation for the number of values that have been set since the last bulk write. At all times, the number of distinct values written since the last bulk write is between size and w * size.

The three metadata arrays describe information about blocks of w values in the data array. They are:

  • nth: nth[i] is the ith unique block to be written to since the last bulk write

  • inverse_nth: inverse_nth[i] is the order in in which the ith block of the array was written, counting from 0 at the last bulk write.

  • bitset: The jth bit of bitset[i] is 1 when the array cell numbered 64*i + j has been written to since the last bulk write.

bitset[i] and inverse_nth[i] are allowed to be invalid if i is not a member of the set {nth[0], nth[1], ... , nth[size-1]}. In other words, inverse_nth[i] and bitset[i] are valid if and only if inverse_nth[i] < size and nth[inverse_nth[i]] == i.

Rather than store three separate arrays of the same length, I chose to store one array, is_set, with three fields.

*/  typedef struct {   int nth_;   int inverse_nth_;   uint64_t bitset_; } IsSetCell;  typedef struct {   int unset_;   int size_;   IsSetCell is_set_[]; } IsSetArray;  typedef struct {   IsSetArray * metadata_;   int payload_[]; } ResettableArray;  /* 

To construct an array, we need a default value to return when the reading a value that has never been written to.

*/  ResettableArray * ConstructResettableArray(int n, int unset) {   ResettableArray* result =       malloc(offsetof(ResettableArray, payload_) + n * sizeof(int));   if (!result) return NULL;   n = (n + 63) / 64;   result->metadata_ =       malloc(offsetof(IsSetArray, is_set_) + n * sizeof(IsSetCell));   if (!result->metadata_) {     free(result);     return NULL;   }   result->metadata_->size_ = 0;   result->metadata_->unset_ = unset;   return result; }  void DestructResettableArray(ResettableArray * a) {   if (a) free(a->metadata_);   free(a); }  /* 

The bulk of the algorithm is in writing and reading the metadata. After IsSet() and Set() are defined (below), reading and writing the arrays is straightforward.

*/  bool IsSet(const IsSetArray * a, int i); void Set(IsSetArray * a, int i);  int GetValue(const ResettableArray * a, int i) {   if (!IsSet(a->metadata_, i)) return a->metadata_->unset_;   return a->payload_[i]; }  void SetValue(ResettableArray * a, int i, int v) {   a->payload_[i] = v;   Set(a->metadata_, i); }  void SetAllValues(ResettableArray * a, int v) {   a->metadata_->unset_ = v; }  /* 

The complex part of reading and writing is the bidirectional relationship between inverse_nth and nth. If they point to each other at location i (is_set[is_set[i].inverse_nth].nth == i) then location i contains valid data that was written after the last bulk write, as long as is_set[i].inverse_nth < size.

*/  uint64_t OneBit(int i) {   return UINT64_C(1) << i; }  bool IsSet(const IsSetArray * a, int i) {   const int cell = i/64, offset = i & 63;   const int inverse_nth = a->is_set_[cell].inverse_nth_;   return inverse_nth < a->size_ && a->is_set_[inverse_nth].nth_ == cell &&          a->is_set_[cell].bitset_ & OneBit(offset); }  void Set(IsSetArray * a, int i) {   const int cell = i/64, offset = i & 63;   const int inverse_nth = a->is_set_[cell].inverse_nth_;   if (inverse_nth >= a->size_ || a->is_set_[inverse_nth].nth_ != cell) {     a->is_set_[cell].inverse_nth_ = a->size_;     a->is_set_[cell].bitset_ = 0;     a->is_set_[a->size_].nth_ = cell;     ++a->size_;   }   a->is_set_[cell].bitset_ |= OneBit(offset); } 

Answers 8

Python example

class d:     def __init__(self, l):         self.len = l         self.g_p = [i for i in range(self.len)]         self.g_v = [0 for i in range(self.len)]         self.g_s = self.len - 1         self.g_c = 0        def getVal(self, i):         if (i < 0 or i >= self.len):             return          if (self.g_p[i] <= self.g_s):             return self.g_v[self.g_p[i]]          return self.g_c      def setVal(self, i, v):         if (i < 0 or i >= self.len):             return          if (self.g_p[i] > self.g_s):             self.g_s += 1              self.g_p[self.g_s], self.g_p[i] = self.g_p[i], self.g_p[self.g_s]          self.g_v[self.g_p[i]] = v      def setAll(self, v):         self.g_c = v         self.g_s = -1 

Answers 9

Regarding Timothy Jone's answer:

A problem with this approach is that eventually you'll run out of ids for timestamp, and might wrap around. If you chose a 64 bit value to store timestamps, then this gives you 18,446,744,073,709,551,616 insertions or setAlls before this happens. Depending on the expected use of the datastructure, an O(n) cleanup phase might be appropriate, or you could just throw an exception.

This is exactly the worst case scenario which make this solution O(n) too, and not O(1). This stucture, although saves a lot of potential O(n) insert operation, is still in O(n) effeciency.

Answers 10

The correct solution in C#:

public sealed class SpecialDictionary<T, V> {     private Dictionary<T, Tuple<DateTime, V>> innerData;     private Tuple<DateTime, V> setAllValue;     private DateTime prevTime;      public SpecialDictionary()     {         innerData = new Dictionary<T, Tuple<DateTime, V>>();     }     public void Set(T key, V value) => innerData[key] = new Tuple<DateTime, V>(GetTime(), value);     public void SetAll(V value) => setAllValue = new Tuple<DateTime, V>(GetTime(), value);     public V Get(T key)     {         Tuple<DateTime, V> tmpValue = innerData[key];         if (setAllValue?.Item1 > tmpValue.Item1)         {             return setAllValue.Item2;         }         else         {             return tmpValue.Item2;         }     }     private DateTime GetTime()     {         if (prevTime == null)         {             prevTime = DateTime.Now;          }         else         {             if (prevTime == DateTime.Now)             {                 Thread.Sleep(1);             }             prevTime = DateTime.Now;         }         return prevTime;     } } 

And the test:

static void Main(string[] args) {     SpecialDictionary<string, int> dict = new SpecialDictionary<string, int>();     dict.Set("testA", 1);     dict.Set("testB", 2);     dict.Set("testC", 3);     Console.WriteLine(dict.Get("testC"));     dict.SetAll(4);     dict.Set("testE", 5);     Console.WriteLine(dict.Get("testC"));     Console.WriteLine(dict.Get("testE"));     Console.ReadKey(); } 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment