Thursday, August 3, 2017

Punch multiple strings into a single (shortest possible) string that includes all the chars of each strings in forward direction

Leave a Comment

My purpose is to punch multiple strings into a single (shortest) string that will contain all the character of each string in a forward direction. The question is not specific to any language, but more into the algorithm part. (probably will implement it in a node server, so tagging nodejs/javascript).

So, to explain the problem:

Let's consider I have few strings

["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"] 

The Resultant string should be something like:

"sjmachppoalidveonrk" 

jack: sjmachppoalidveonrk

apple: sjmachppoalidveonrk

solid: sjmachppoalidveonrk

====================================>>>> all in the forward direction

These all are manual evaluation and the output may not 100% perfect in the example.

So, the point is all the letters of each string have to exist in the output in FORWARD DIRECTION (here the actual problem belongs), and possibly the server will send the final strings and numbers like 27594 will be generated and passed to extract the token, in the required end. If I have to punch it in a minimal possible string it would have much easier (That case only unique chars are enough). But in this case there are some points:

  1. Letters can be present multiple time, though I have to reuse any letter if possible, eg: for solid and hold o > l > d can be reused as forward direction but for apple (a > p) and spark (p > a) we have to repeat a as in one case it appears before p for apple, and after p for sparks so either we need to repeat a or p. we cannot do p > a > p as it will cover both the case because we need two p after a

  2. We directly have no option to place a single p and use the same index twice in a time of extract, we need multiple p with no option left as the input token contains that

  3. I am (not) sure, that there is multiple outputs possible for a set of strings. but the concern is it should be minimal in length, the combination doesn't matter if its cover all the tokens in a forward direction. all (or one ) outputs of minimal possible length need to trace.
  4. Adding this point as an EDIT to this post. After reading the comments and knowing that it's already an existing problem is known as shortest common supersequence problem we can define that the resultant string will be the shortest possible string from which we can re generate any input string by simply removing some (0 to N) chars, this is same as all inputs can be found in a forward direction in the resultant string.

I have tried, by starting with an arbitrary string, and then made an analysis of next string and splitting all the letters, and place them accordingly, but after some times, it seems that current string letters can be placed in a better way, If the last string's (or a previous string's) letters were placed according to the current string. But again that string was analysed and placed based on something (multiple) what was processed, and placing something in the favor of something that is not processed seems difficult because to that we need to process that. Or might me maintaining a tree of all processed/unprocessed tree will help, building the building the final string? Any better way than it, It seems a brute force!

Note: I know there are a lot of other transformation possible, please try not to suggest anything else to use, we are doing a bit research on it, and anything in the context of question will be highly appreciated. If anything is not clear or need any further explanation feel free to comment. Thanks a ton for reading the entire problem with patience.

5 Answers

Answers 1

I came up with a somewhat brute force method. This way finds the optimal way to combine 2 words then does it for each element in the array.

This strategy works by trying finding the best possible way to combine 2 words together. It is considered the best by having the fewest letters. Each word is fed into an ever growing "merged" word. Each time a new word is added the existing word is searched for a matching character which exists in the word to be merged. Once one is found both are split into 2 sets and attempted to be joined (using the rules at hand, no need 2 add if letter already exists ect..). The strategy generally yields good results.

The join_word method takes 2 words you wish to join, the first parameter is considered to be the word you wish to place the other into. It then searches for the best way to split into and word into 2 separate parts to merge together, it does this by looking for any shared common characters. This is where the splits_on_letter method comes in.

The splits_on_letter method takes a word and a letter which you wish to split on, then returns a 2d array of all the possible left and right sides of splitting on that character. For example splits_on_letter('boom', '0') would return [["b","oom"],["bo","om"],["boo","m"]], this is all the combinations of how we could use the letter o as a split point.


The sort() at the beginning is to attempt to place like elements together. The order in which you merge the elements generally effects the results length. One approach I tried was to sort them based upon how many common letters they used (with their peers), however the results were varying. However in all my tests I had maybe 5 or 6 different word sets to test with, its possible with a larger, more varying word arrays you might find different results.

Output is

spmjhooarckpplivden 


var words = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"];  var result = minify_words(words);  document.write(result);    function minify_words(words) {      // Theres a good sorting method somewhere which can place this in an optimal order for combining them,      // hoever after quite a few attempts i couldnt get better than just a regular sort... so just use that      words = words.sort();        /*          Joins 2 words together ensuring each word has all its letters in the result left to right      */      function join_word(into, word) {          var best = null;          // straight brute force each word down. Try to run a split on each letter and           for(var i=0;i<word.length;i++) {              var letter = word[i];              // split our 2 words into 2 segments on that pivot letter              var intoPartsArr = splits_on_letter(into, letter);              var wordPartsArr = splits_on_letter(word, letter);              for(var p1=0;p1<intoPartsArr.length;p1++) {                  for(var p2=0;p2<wordPartsArr.length;p2++) {                      var intoParts = intoPartsArr[p1], wordParts = wordPartsArr[p2];                      // merge left and right and push them together                      var result = add_letters(intoParts[0], wordParts[0]) + add_letters(intoParts[1], wordParts[1]);                      if(!best || result.length <= best.length) {                          best = result;                      }                  }              }          }            // its possible that there is no best, just tack the words together at that point          return best || (into + word);      }        /*          Splits a word at the index of the provided letter      */      function splits_on_letter(word, letter) {          var ix, result = [], offset = 0;;          while((ix = word.indexOf(letter, offset)) !== -1) {              result.push([word.substring(0, ix), word.substring(ix, word.length)]);              offset = ix+1;          }          result.push([word.substring(0, offset), word.substring(offset, word.length)]);          return result;      }          /*          Adds letters to the word given our set of rules. Adds them starting left to right, will only add if the letter isnt found      */      function add_letters(word, addl) {          var rIx = 0;          for (var i = 0; i < addl.length; i++) {              var foundIndex = word.indexOf(addl[i], rIx);              if (foundIndex == -1) {                  word = word.substring(0, rIx) + addl[i] + word.substring(rIx, word.length);                  rIx += addl[i].length;              } else {                  rIx = foundIndex + addl[i].length;              }          }          return word;      }        // For each of our words, merge them together      var joinedWords = words[0];      for (var i = 1; i < words.length; i++) {          joinedWords = join_word(joinedWords, words[i]);      }      return joinedWords;  }

Answers 2

A first try, not really optimized (183% shorter):

function getShort(arr){  var perfect="";  //iterate the array  arr.forEach(function(string){    //iterate over the characters in the array    string.split("").reduce(function(pos,char){          var n=perfect.indexOf(char,pos+1);//check if theres already a possible char      if(n<0){              //if its not existing, simply add it behind the current          perfect=perfect.substr(0,pos+1)+char+perfect.substr(pos+1);        return pos+1;      }      return n;//continue with that char     },-1);   })   return perfect; } 

In action


This can be improved trough simply running the upper code with some variants of the array (200% improvement):

var s=["jack",...]; var perfect=null; for(var i=0;i<s.length;i++){  //shift  s.push(s.shift());  var result=getShort(s);  if(!perfect || result.length<perfect.length) perfect=result; } 

In action

Thats quite close to the minimum number of characters ive estimated ( 244% minimization might be possible in the best case)

Ive also wrote a function to get the minimal number of chars and one to check if a certain word fails, you can find them here

Answers 3

I have used the idea of Dynamic programming to first generate the shortest possible string in forward direction as stated in OP. Then I have combined the result obtained in the previous step to send as a parameter along with the next String in the list. Below is the working code in java. Hope this would help to reach the most optimal solution, in case my solution is identified to be non optimal. Please feel free to report any countercases for the below code:

public String shortestPossibleString(String a, String b){     int[][] dp = new int[a.length()+1][b.length()+1];             //form the dynamic table consisting of              //length of shortest substring till that points      for(int i=0;i<=a.length();i++){         for(int j=0;j<=b.length();j++){             if(i == 0)                 dp[i][j] = j;             else if(j == 0)                 dp[i][j] = i;                             else if(a.charAt(i-1) == b.charAt(j-1))                 dp[i][j] = 1+dp[i-1][j-1];             else                 dp[i][j] = 1+Math.min(dp[i-1][j],dp[i][j-1]);          }     }             //Backtrack from here to find the shortest substring             char[] sQ = new char[dp[a.length()][b.length()]];             int s = dp[a.length()][b.length()]-1;             int i=a.length(), j=b.length();             while(i!=0 && j!=0){                 // If current character in a and b are same, then                 // current character is part of shortest supersequence                 if(a.charAt(i-1) == b.charAt(j-1)){                     sQ[s] = a.charAt(i-1);                     i--;                     j--;                     s--;                 }                 else {                     // If current character in a and b are different                     if(dp[i-1][j] > dp[i][j-1]){                         sQ[s] = b.charAt(j-1);                         j--;                         s--;                     }                     else{                         sQ[s] = a.charAt(i-1);                         i--;                         s--;                     }                 }                                     }             // If b reaches its end, put remaining characters             // of a in the result string             while(i!=0){                 sQ[s] = a.charAt(i-1);                 i--;                 s--;             }             // If a reaches its end, put remaining characters             // of b in the result string             while(j!=0){                 sQ[s] = b.charAt(j-1);                 j--;                 s--;             }     return String.valueOf(sQ); }     public void getCombinedString(String... values){         String sSQ = shortestPossibleString(values[0],values[1]);         for(int i=2;i<values.length;i++){             sSQ = shortestPossibleString(values[i],sSQ);         }         System.out.println(sSQ);     } 

Driver program:

e.getCombinedString("jack", "apple", "maven", "hold",              "solid", "mark", "moon", "poor", "spark", "live"); 

Output:

jmapphsolivecparkonidr

Worst case time complexity of the above solution would be O(product of length of all input strings) when all strings have all characters distinct and not even a single character matches between any pair of strings.

Answers 4

Here is an optimal solution based on dynamic programming in JavaScript, but it can only get through solid on my computer before it runs out of memory. It differs from @CodeHunter's solution in that it keeps the entire set of optimal solutions after each added string, not just one of them. You can see that the number of optimal solutions grows exponentially; even after solid there are already 518,640 optimal solutions.

const STRINGS = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"] function map(set, f) {     const result = new Set     for (const o of set) result.add(f(o))     return result } function addAll(set, other) {     for (const o of other) set.add(o)     return set } function shortest(set) { //set is assumed non-empty     let minLength     let minMatching     for (const s of set) {         if (!minLength || s.length < minLength) {             minLength = s.length             minMatching = new Set([s])         }         else if (s.length === minLength) minMatching.add(s)     }     return minMatching } class ZipCache {     constructor() {         this.cache = new Map     }     get(str1, str2) {         const cached1 = this.cache.get(str1)         if (!cached1) return undefined         return cached1.get(str2)     }     set(str1, str2, zipped) {         let cached1 = this.cache.get(str1)         if (!cached1) {             cached1 = new Map             this.cache.set(str1, cached1)         }         cached1.set(str2, zipped)     } } const zipCache = new ZipCache function zip(str1, str2) {     const cached = zipCache.get(str1, str2)     if (cached) return cached      if (!str1) { //str1 is empty, so only choice is str2         const result = new Set([str2])         zipCache.set(str1, str2, result)         return result     }     if (!str2) { //str2 is empty, so only choice is str1         const result = new Set([str1])         zipCache.set(str1, str2, result)         return result     }     //Both strings start with same letter     //so optimal solution must start with this letter     if (str1[0] === str2[0]) {         const zipped = zip(str1.substring(1), str2.substring(1))         const result = map(zipped, s => str1[0] + s)         zipCache.set(str1, str2, result)         return result     }      //Either do str1[0] + zip(str1[1:], str2)     //or        str2[0] + zip(str1, str2[1:])     const zip1 = zip(str1.substring(1), str2)     const zip2 = zip(str1, str2.substring(1))     const test1 = map(zip1, s => str1[0] + s)     const test2 = map(zip2, s => str2[0] + s)     const result = shortest(addAll(test1, test2))     zipCache.set(str1, str2, result)     return result } let cumulative = new Set(['']) for (const string of STRINGS) {     console.log(string)     const newCumulative = new Set     for (const test of cumulative) {         addAll(newCumulative, zip(test, string))     }     cumulative = shortest(newCumulative)     console.log(cumulative.size) } console.log(cumulative) //never reached 

Answers 5

var words = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"]; var result = ""; words.forEach(function(word) {     for (var i = 0; i < word.length; i++) {        if(!result.includes(word[i])){            result += word[i];        }     } }); 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment