Regex Black Magic... how to stop matching if char?

Jon · 30 March 2007 15:34

I'm trying to translate a strange derivative of xml into valid xml. Here
is an example line:

<SUBEVENTSTATUS
1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><......and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

I'm trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

instead of just this:

<code_set_list 1:2>

...because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 30 March 2007 15:40

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

robert

···

On 30.03.2007 17:34, Jon wrote:

I'm trying to translate a strange derivative of xml into valid xml. Here
is an example line:

<SUBEVENTSTATUS
1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><......and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

I'm trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

<Request><code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

..because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

F_Senault · 30 March 2007 15:45

I'd simply use /<[^>]+\s+(\d+):(\d+)>/ (untested, but you get my
drift)...

Fred

···

Le 30 mars à 17:34, Jon a écrit :

..because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

--

Microsoft sucks, sucks, sucks.

Which wouldn't be such a bad thing, if it were cuter, didn't use its
teeth at inopportune moments, didn't hog the bed, cooked well, and had
good taste in films. Sadly, that's not the case. (Dan Birchall, SDM)

Brian_Candler · 31 March 2007 08:33

Try (\w+) instead of (\S+)

···

On Sat, Mar 31, 2007 at 12:34:25AM +0900, Jon wrote:

<SUBEVENTSTATUS
1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><......and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

I'm trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

<Request><code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

Jon · 30 March 2007 15:43

Robert Klemme wrote:

···

On 30.03.2007 17:34, Jon wrote:

<code_set_list 1:2>

..because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

robert

awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?

--
Posted via http://www.ruby-forum.com/\.

Rob_Biedenharn1 · 30 March 2007 16:17

Within a character set it inverts the selection so [^>] matches any character that's NOT a '>'

My solution is: .gsub(/<([^>]*?\b\s+)(\d+):(\d+)>/, '<\1no="\2" of="\3">')

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Mar 30, 2007, at 11:43 AM, Jon Fi wrote:

Robert Klemme wrote:

On 30.03.2007 17:34, Jon wrote:

<code_set_list 1:2>

..because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

robert

awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?

Topic		Replies	Views
Replace string between xml tags that contains special characters ruby-talk	5	118	17 July 2011
Problem with Regular Expression ruby-talk	4	90	20 May 2009
Can't get subgroup of regex to repeat with +... what the? ruby-talk	8	85	18 May 2007
Regex questions ruby-talk	0	59	27 January 2005
Regular Expression Help ruby-talk	5	106	6 October 2012

Regex Black Magic... how to stop matching if char?

Related Topics