I need an regex to find <Field ...name="document">
or <FieldArray ...name="document">
to replace with an empty string. They can be defined across multiple lines.
This is not html or xhtml, it's just a text string containing <Field>
and <FieldArray>
Example with Field:
<Field component={FormField} name="document" typeInput="selectAutocomplete" />
Example with FieldArray:
<FieldArray component={FormField} typeInput="selectAutocomplete" name="document" />
the are inside a list of components. Example:
<Field name="amount" component={FormField} label={t('form.amount')} /> <Field name="datereception" component={FormField} label={t('form.datereception')} /> <Field component={FormField} name="document" typeInput="selectAutocomplete" /> <Field name="datedeferred" component={FormField} label={t('form.datedeferred')} />
I've have read some solutions like to find src in Extract image src from a string but his structure is different a what i'm looing for.
3 Answers
Answers 1
It is not advisable to parse [X]HTML with regex. If you have a possibility to use a domparser, I would advise using that instead of regex.
If there is no other way, you could this approach to find and replace your data:
<Field(?:Array)?\b(?=[^\/>]+name="document")[^>]+\/>
Explanation
- Match
<Field
with optional "Array" and end with a word boundary<Field(?:Array)?\b
- A positive lookahead
(?=
- Which asserts that following is not
/>
and encounters name="document"[^\/>]+name="document"
- Match not a > one or more times
[^>]+
- Match
\/>
var str = `<Field name="amount" component={FormField} label={t('form.amount')} /> <Field name="datereception" component={FormField} label={t('form.datereception')} /> <Field component={FormField} name="document" typeInput="selectAutocomplete" /> <Field name="datedeferred" component={FormField} label={t('form.datedeferred')} /> <FieldArray component={FormField} typeInput="selectAutocomplete" name="document" /><FieldArray component={FormField} typeInput="selectAutocomplete" name="document" />` ; str = str.replace(/<Field(?:Array)?\b(?=[^\/>]+name="document")[^>]+\/>/g, ""); console.log(str);
Answers 2
Here's an answer with actual XML parsing and no regular expressions:
var xml = document.createElement("xml"); xml.innerHTML = ` <Field name="amount" component={FormField} label={t('form.amount')} /> <FieldDistractor component={FormField} name="document" typeInput="selectAutocomplete" /> <Field name="datereception" component={FormField} label={t('form.datereception')} /> <Field component={FormField} name="document" typeInput="selectAutocomplete" /> <Field name="datedeferred" component={FormField} label={t('form.datedeferred')} /> <FieldArray component={FormField} typeInput="selectAutocomplete" name="document" /><FieldArray component={FormField} typeInput="selectAutocomplete" name="document" /> `; var match = xml.querySelectorAll( `field:not([name="document"]), fieldarray:not([name="document"]), :not(field):not(fieldarray)` ); var answer = ""; for (var m=0, ml=match.length; m<ml; m++) { // cloning the node removes children, working around the DOM bug answer += match[m].cloneNode().outerHTML + "\n"; } console.log(answer);
In writing this answer, I found a bug in the DOM parser for both Firefox (Mozilla Core bug 1426224) and Chrome (Chromium bug 796305) that didn't allow creating empty elements via innerHTML. My original answer used regular expressions to pre- and post-process the code to make it work, but using regexes on XML is so unsavory that I later changed it to merely strip off children by using cloneNode()
(with its implicit deep=false
).
So we dump the XML into a dummy DOM element (which we don't need to place anywhere), then we run querySelectorAll()
to match some CSS that specifies your requirements:
field:not([name="document"])
"Field" elements lackingname="document"
attributes, orfieldarray:not([name="document"])
"FieldArray" elements lacking that attribute, or:not(field):not(fieldarray)
Any other element
Answers 3
You can parse HTML tags with regex because parsing the tags themselves are nothing special and are the first thing parsed as an atomic operation.
But, you can't use regex to go beyond the atomic tag.
For example, you can't find the balanced tag closing to match the open as
this would put a tremendous strain on regex capability.
What a Dom parser does is use regex to parse the tags, then uses internal
algorithms to create a tree and carry out processing instructions to interpret
and recreate an image.
And of course regex doesn't do that.
Sticking to strictly parsing tags, including invisible content (like script),
is not that easy as well.
Content can hide or embed tags that, when you look for them, you shouldn't
find them.
So, in essence, you have to parse the entire html file to find the real
tag your looking for.
There is a general regex that can do this that I will not include here.
But if you need it let me know.
So, if you want to jump straight into the fire without parsing all the
tags of the entire file, this is the regex to use.
It is essentially a cut up version of the one that parses all tags.
This flavor finds the tag and any attribute=value that you need,
and also finds them out-of-order.
It can also be used to find out-of-order, multiple attr/val's within the same tag.
This is for your usage:
/<Field(?:Array)?(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(?:(['"])\s*document\s*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+\/>/
Explained/Formatted
< Field # Field or FieldArray tag (?: Array )? (?= # Asserttion (a pseudo atomic group) (?: [^>"'] | " [^"]* " | ' [^']* ' )*? \s name \s* = \s* (?: ( ['"] ) # (1), Quote \s* document \s* # With name = "document" \1 ) ) \s+ (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+ />
Running demo: https://regex101.com/r/ieEBj8/1
0 comments:
Post a Comment