Regex: Keep everything in & lt; Profession & gt; & Lt; / Profession...
source link: https://www.codesd.com/item/regex-keep-everything-in-profession-profession-keywords.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Regex: Keep everything in & lt; Profession & gt; & Lt; / Profession & gt; Keywords
I have a large XML, looking like this:
<gender>M</gender>
<last-name>*</last-name>
<profession>2165dda2-dc59-41af-acb5-06d8914c4841</profession>
<first-name>*</first-name>
<mail-confirmation>1</mail-confirmation>
<fax-confirmation>1</fax-confirmation>
I only want to keep the tags. I found a way to search IN the tag, like this:
<profession[^>]*>([^<]*?)</profession>
but how do I search everything outside of it? I tried to just flip it, like:
</profession[^>]*>([^<]*?)<profession>
</profession>([^<]*?)<profession[^>]*>
but that won't work.
Strictly you can't parse XML with a regex.
Quick and dirty solution with sed is to grep the lines with profession then replace "profession" and "/profession" with "" (markup is stripping the < > )
Related Articles
What kind of chart method can I use to keep everything organized when tracking this program?
Regex: matches everything except the reference
Which regex uses to keep everything before the last & ldquo; \ & Rdquo; character
Regex excludes everything but what's inside [] including line breaks
Regex matches everything before the question mark
RegEx: Delete everything between (including) the last underscore and the file extension
Regex for everything before the last slash forward or backward
Regex selects everything except
Regex removes everything before and after the first & lt; P & gt; Mark
Regex - find everything between & lt; Td & gt; Keywords
Python Regex gets everything in parentheses except in quotes
The complex regex matches everything between a set of characters?
Regex - Removes everything from a string that does not match an expression
Regex: compares everything before the FIRST underline and everything that happens after AFTER
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK