4

Regex: Keep everything in & lt; Profession & gt; & Lt; / Profession...

 3 years ago
source link: https://www.codesd.com/item/regex-keep-everything-in-profession-profession-keywords.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Regex: Keep everything in & lt; Profession & gt; & Lt; / Profession & gt; Keywords

advertisements

I have a large XML, looking like this:

<gender>M</gender>
<last-name>*</last-name>
<profession>2165dda2-dc59-41af-acb5-06d8914c4841</profession>
<first-name>*</first-name>
<mail-confirmation>1</mail-confirmation>
<fax-confirmation>1</fax-confirmation>

I only want to keep the tags. I found a way to search IN the tag, like this:

<profession[^>]*>([^<]*?)</profession>

but how do I search everything outside of it? I tried to just flip it, like:

</profession[^>]*>([^<]*?)<profession>

</profession>([^<]*?)<profession[^>]*>

but that won't work.


Strictly you can't parse XML with a regex.

Quick and dirty solution with sed is to grep the lines with profession then replace "profession" and "/profession" with "" (markup is stripping the < > )

Related Articles

What kind of chart method can I use to keep everything organized when tracking this program?

Here is one of my self-study homework problems. I am supposed to write down the output of this program without actually running it. I understand all the syntax and the variable-passing here, (and I have the answers) but for some reason, tracing it ou

Regex: matches everything except the reference

I have the following example lines: a_a b_c How (using grep/egrep) would I match the lines where the first letter is not equal to the last letter? I have tried the following but this does not seem to work correclty. egrep ([ab])_[^\1] Working with eg

Which regex uses to keep everything before the last & ldquo; \ & Rdquo; character

I have a string like this: $logFile = "\\MyServer\Shared\Folder1\test.log" With PowerShell i would like to parse this string to have: $logFileTree = "\\MyServer\Shared\Folder1\" I tried this: $logFileTree = $logFile -replace '^.*\\' bu

Regex excludes everything but what's inside [] including line breaks

I want to clean a text that has some notes in [ ]. I just want to have this notes and nothing else. There's some notes with line breaks, but I want to include that also. So in: lorem ipsum [foo bar] lorem ipsum I want to exclude the lorem ipsum, but

Regex matches everything before the question mark

Alright, I'm attempting to rewrite a URL in .htaccess Right now the URL looks like this: http://host.com/catalog/search/item/465?item=465 I want it to look like http://host.com/catalog/search/item/465 My rewrite is the following, which seems to work

RegEx: Delete everything between (including) the last underscore and the file extension

I want to remove the ISO codes and leading underscore from all elements in an array while keeping the file extension. The ISO code always comes before the file extension. The source array: var SrcFiles = [ "File_with_nr1_EN.txt", "File_has_

Regex for everything before the last slash forward or backward

I need a regex that will find everything in a string up to and including the last \ or /. For example, c:\directory\file.txt should result in c:\directory\Try this: (Rubular) /^(.*[\\\/])/ Explanation: ^ Start of line/string ( Start capturing group .

Regex selects everything except

I'm building a syntax highlighting script and i've got some problems with my regex. I've been failing for about 2 days now so i need help. I want to select everything that not /* */ or between them and this is what i got atm, not working tho, but see

Regex removes everything before and after the first & lt; P & gt; Mark

I need to get the content of the first p tag in a string (but without the actual tags). Example: <h1>I don't want the title</h1> <p>This is the text I want</p> <p>I don't want this</p> <p>I also don't want this<

Regex - find everything between & lt; Td & gt; Keywords

I am trying to get everything between (and ) tags using regex. I am new to this, but I have tried the following $html = file_get_contents($inputUrl); preg_match_all('<td>([^"]*)</td>/', $html, $result); foreach ($result as $val) { print $

Python Regex gets everything in parentheses except in quotes

Given the string S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)" I'd like to extract everything within the parentheses UNLESS the parens are inside a quotation. So far

The complex regex matches everything between a set of characters?

Regex! This isn't for a specific language. It's for a multi-file renamer that lets you use regex. So I'm just looking for a "pure" regex solution. I'm having trouble finding an answer that fits so I figured I'd ask. Here is an example of the kin

Regex - Removes everything from a string that does not match an expression

I am creating a JavaScript function that will take a user inputted value and and create a CSS class name out of it. The following expression can be used to detect whether the end result follows valid css rules -?[_a-zA-Z]+[_a-zA-Z0-9-]* but I need to

Regex: compares everything before the FIRST underline and everything that happens after AFTER

I have an expression like test_abc_HelloWorld_there could be more here. I'd like a regex that takes the first word before the first underscore. So get "test" I tried [A-Za-z]{1,}_ but that didn't work. Then I'd like to get "abc" or any

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK