Sunday, August 6, 2017

Checking if two patterns match one another?

Leave a Comment

This Leetcode problem is about how to match a pattern string against a text string as efficiently as possible. The pattern string can consists of letters, dots, and stars, where a letter only matches itself, a dot matches any individual character, and a star matches any number of copies of the preceding character. For example, the pattern

ab*c. 

would match ace and abbbbcc. I know that it's possible to solve this original problem using dynamic programming.

My question is whether it's possible to see whether two patterns match one another. For example, the pattern

bdbaa.* 

can match

bdb.*daa 

Is there a nice algorithm for solving this pattern-on-pattern matching problem?

5 Answers

Answers 1

Here's one approach that works in polynomial time. It's slightly heavyweight and there may be a more efficient solution, though.

The first observation that I think helps here is to reframe the problem. Rather than asking whether these patterns match each other, let's ask this equivalent question:

Given patterns P1 and P2, is there a string w where P1 and P2 each match w?

In other words, rather than trying to get the two patterns to match one another, we'll search for a string that each pattern matches.

You may have noticed that the sorts of patterns you're allowed to work with are a subset of the regular expressions. This is helpful, since there's a pretty elaborate theory of what you can do with regular expressions and their properties. So rather than taking aim at your original problem, let's solve this even more general one:

Given two regular expressions R1 and R2, is there a string w that both R1 and R2 match?

The reason for solving this more general problem is that it enables us to use the theory that's been developed around regular expressions. For example, in formal language theory we can talk about the language of a regular expression, which is the set of all strings that the regex matches. We can denote this L(R). If there's a string that's matched by two regexes R1 and R2, then that string belongs to both L(R1) and L(R2), so our question is equivalent to

Given two regexes R1 and R2, is there a string w in L(R1) ∩ L(R2)?

So far all we've done is reframe the problem we want to solve. Now let's go solve it.

The key step here is that it's possible to convert any regular expression into an NFA (a nondeterministic finite automaton) so that every string matched by the regex is accepted by the NFA and vice-versa. Even better, the resulting NFA can be constructed in polynomial time. So let's begin by constructing NFAs for each input regex.

Now that we have those NFAs, we want to answer this question: is there a string that both NFAs accept? And fortunately, there's a quick way to answer this. There's a common construction on NFAs called the product construction that, given two NFAs N1 and N2, constructs a new NFA N' that accepts all the strings accepted by both N1 and N2 and no other strings. Again, this construction runs in polynomial time.

Once we have N', we're basically done! All we have to do is run a breadth-first or depth-first search through the states of N' to see if we find an accepting state. If so, great! That means there's a string accepted by N', which means that there's a string accepted by both N1 and N2, which means that there's a string matched by both R1 and R2, so the answer to the original question is "yes!" And conversely, if we can't reach an accepting state, then the answer is "no, it's not possible."

I'm certain that there's a way to do all of this implicitly by doing some sort of implicit BFS over the automaton N' without actually constructing it, and it should be possible to do this in something like time O(n2). If I have some more time, I'll revisit this answer and expand on how to do that.

Answers 2

You can solve this with backtracking too, not very efficiently (because the match of the same substrings may be recalculated many times, which could be improved by introducing a lookup table where all non-matching pairs of strings are saved and the calculation only happens when they cannot be found in the lookup table), but seems to work (js, the algorithm assumes the simple regex are valid, which means not beginning with * and no two adjacent * [try it yourself]):

function canBeEmpty(s) {     if (s.length % 2 == 1)         return false;     for (let i = 1; i < s.length; i += 2)         if (s[i] != "*")             return false;     return true; }  function match(a, b) {     if (a.length == 0 || b.length == 0)         return canBeEmpty(a) && canBeEmpty(b);     let x = 0, y = 0;     // process characters up to the next star     while ((x + 1 == a.length || a[x + 1] != "*") &&            (y + 1 == b.length || b[y + 1] != "*")) {         if (a[x] != b[y] && a[x] != "." && b[y] != ".")             return false;         x++; y++;         if (x == a.length || y == b.length)             return canBeEmpty(a.substr(x)) && canBeEmpty(b.substr(y));     }     if (x + 1 < a.length && y + 1 < b.length && a[x + 1] == "*" && b[y + 1] == "*")         // star coming in both strings         return match(a.substr(x + 2), b.substr(y)) || // try skip in a                match(a.substr(x), b.substr(y + 2)); // try skip in b     else if (x + 1 < a.length && a[x + 1] == "*") // star coming in a, but not in b         return match(a.substr(x + 2), b.substr(y)) || // try skip * in a                ((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching                       match(a.substr(x), b.substr(y + 1))); // try skip char in b     else // star coming in b, but not in a         return match(a.substr(x), b.substr(y + 2)) || // try skip * in b                ((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching                       match(a.substr(x + 1), b.substr(y))); // try skip char in a } 

For a little optimization you could normalize the strings first:

function normalize(s) {     while (/([^*])\*\1([^*]|$)/.test(s) || /([^*])\*\1\*/.test(s)) {         s = s.replace(/([^*])\*\1([^*]|$)/, "$1$1*$2"); // move stars right         s = s.replace(/([^*])\*\1\*/, "$1*"); // reduce     }     return s; } // example: normalize("aa*aa*aa*bb*b*cc*cd*dd") => "aaaa*bb*ccc*ddd*" 

There is a further reduction of the input possible: x*.* and .*x* can both be replaced by .*, so to get the maximal reduction you would have to try to move as many stars as possible next to .* (so moving some stars to the left can be better than moving all to the right).

Answers 3

I have worked on my idea of DP and came out with the below implementation of the above problem. Please feel free to edit the code in case someone finds any test cases failed. From my side, I tried few test cases and passed all of them, which I will be mentioning below as well.

Please note that I have extended the idea which is used to solve the regex pattern matching with a string using DP. To refer to that idea, please refer to the LeetCode link provided in the OP and look out for discussion part. They have given the explanation for regex matching and the string.

The idea is to create a dynamic memoization table, entries of which will follow the below rules:

  1. If pattern1[i] == pattern2[j], dp[i][j] = dp[i-1][j-1]
  2. If pattern1[i] == '.' or pattern2[j] == '.', then dp[i][j] = dp[i-1][j-1]
  3. The trick lies here: If pattern1[i] = '*', then if dp[i-2][j] exists, then dp[i][j] = dp[i-2][j] || dp[i][j-1] else dp[i][j] = dp[i][j-1].
  4. If pattern2[j] == '*', then if pattern1[i] == pattern2[j-1], then dp[i][j] = dp[i][j-2] || dp[i-1][j] else dp[i][j] = dp[i][j-2]

pattern1 goes row-wise and pattern2 goes column-wise. Also, please note that this code should also work for normal regex pattern matching with any given string. I have verified it by running it on LeetCode and it passed all the available test cases there!

Below is the complete working implementation of the above logic:

boolean matchRegex(String pattern1, String pattern2){     boolean dp[][] = new boolean[pattern1.length()+1][pattern2.length()+1];     dp[0][0] = true;             //fill up for the starting row     for(int j=1;j<=pattern2.length();j++){         if(pattern2.charAt(j-1) == '*')             dp[0][j] = dp[0][j-2];      }             //fill up for the starting column     for(int j=1;j<=pattern1.length();j++){         if(pattern1.charAt(j-1) == '*')             dp[j][0] = dp[j-2][0];      }              //fill for rest table     for(int i=1;i<=pattern1.length();i++){         for(int j=1;j<=pattern2.length();j++){                         //if second character of pattern1 is *, it will be equal to                          //value in top row of current cell                         if(pattern1.charAt(i-1) == '*'){                             dp[i][j] = dp[i-2][j] || dp[i][j-1];                         }                          else if(pattern1.charAt(i-1)!='*' && pattern2.charAt(j-1)!='*'                                      && (pattern1.charAt(i-1) == pattern2.charAt(j-1)                                      || pattern1.charAt(i-1)=='.' || pattern2.charAt(j-1)=='.'))                 dp[i][j] = dp[i-1][j-1];                         else if(pattern2.charAt(j-1) == '*'){                 boolean temp = false;                 if(pattern2.charAt(j-2) == pattern1.charAt(i-1)                                              || pattern1.charAt(i-1)=='.'                                              || pattern1.charAt(i-1)=='*'                                              || pattern2.charAt(j-2)=='.')                       temp = dp[i-1][j];                 dp[i][j] = dp[i][j-2] || temp;              }         }     }             //comment this portion if you don't want to see entire dp table             for(int i=0;i<=pattern1.length();i++){                 for(int j=0;j<=pattern2.length();j++)                     System.out.print(dp[i][j]+" ");                 System.out.println("");             }     return dp[pattern1.length()][pattern2.length()]; } 

Driver method:

System.out.println(e.matchRegex("bdbaa.*", "bdb.*daa"));  Input1: bdbaa.* and bdb.*daa Output1: true  Input2: .*acd and .*bce Output2: false  Input3: acd.* and .*bce Output3: true 

Time complexity: O(mn) where m and n are lengths of two regex patterns given. Same will be the space complexity.

Answers 4

IIUC, you are asking: "Can a regex pattern match another regex pattern?"

Yes, it can. Specifically, . matches "any character" which of course includes . and *. So if you have a string like this:

bdbaa.* 

How could you match it? Well, you could match it like this:

bdbaa.. 

Or like this:

b.* 

Or like:

.*ba*.* 

Answers 5

Just modify the string p with the anchors (in Ruby):

def is_match(s, p)     m=s[Regexp.new "^" << p << "$"]     return m == nil ? false : true end 

(According to LeetCode, that is the fastest Ruby solution yet at 79 ms...)

Then you can use the built-in regex engine to tell if it is a match.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment