Can't find appropriate regexp

spamassassin blocked my previous post :-((((
copy below:

what i got:
lines (strings) of the fo_ll_ow_ing fo_rm

---- start of example-lines ----
foo “foobar”;
bar “foo” “bar”;
fob “foo “bar””;
---- end of example-lines ----

i want a regexp, which returns me the following (applied with String::scan)
from the given lines:

[“foobar”]
[“foo”,“bar”]
[“foo “bar””]

i thought about something like: /"([^"]|\")+"/

explanation of my idea:

  1. the match should start with a "
  2. then it should continue with one or more (anything which is not " or
    which is ")
  3. the match should end with a "
  4. the ‘or "’ thing in 2. is needed because i don’t want the match to end
    at a ‘"’ - it should only end at a single ‘"’ without a ‘’ in front of it

well - my approach isn’t working. i wonder if i just got a detail wrong or
if i messed up the whole thing and talk nonsense all the way…

thx beforehand for tipps…

patrick

Take a look about String.scan, be careful the grouping.
Here is my suggestion ( I am sure someone will have better solution), but at least, it works.

Example:

str = <<EOF
foo "foobar"
bar “foo” “bar”;
fob “foo “bar””;
EOF

str.each_line{ |line|
p line.scan(/"(?:[^"]|\")+"/).collect! {|e| e.sub!(/^"(.*)"$/, ‘\1’)}
}

The first regexp, take out the grouping by using (?: … )
The sub! take out the begin and end quote.

Dave

···

Patrick Zesar jonnypichler@gmx.net wrote:

i thought about something like: /"([^"]|\")+"/


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

Oops, sorry the previous one does not match well the third case.

change:
p line.scan(/"(?:[^"]|\")+"/).collect! {|e| e.sub!(/^"(.*)"$/, ‘\1’)}

to:
p line.scan(/".+(?!\)"/).collect! {|e| e.sub!(/^"(.*)"$/, ‘\1’)}

need explanation about /".+(?!\)"/ ?

Now it works for all 3 cases.

Dave

···

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

spamassassin blocked my previous post :-((((
copy below:

I see your previous post, me don’t understand ??

---- start of example-lines ----
foo “foobar”;
bar “foo” “bar”;
fob “foo "bar"”;
---- end of example-lines ----
[“foobar”]
[“foo”,“bar”]
[“foo "bar"”]

You want to reject if there is an odd number of slashes
and include if there is zero or an equal number of slashes?

The best I could come up with… it doesn’t work :frowning:

inp = %Q{a “b” .“c.” …“d…” …“e…” …“f…”}
s = inp.gsub(/./, “\”)
result = s.scan(/(\\)"(.?)(\\)*“/)
p result
#=> [[nil, “b”, nil], [nil, “c\”, nil], [”\\“, “d”, “\\”], [”\\“, “e\”, “\\”], [”\\", “f”, “\\”]]

Is there anyone who know how to do the reject thing ?

···

On Mon, 23 Jun 2003 22:31:03 +0200, Patrick Zesar wrote:


Simon Strandgaard

I got wrong again, finally, I test and test and come out this one:

str = <<'EOF’
foo “foobar"
bar “foo” “bar”;
fob “foo “bar””;
EOF
str.each_line{ |line|
p line.scan(/”(?:[^\"]|\.)+"/).collect! {|e| e.sub!(/^"(.*)"$/, ‘\1’)}
}

It seems to me that works.

···

Patrick Zesar jonnypichler@gmx.net wrote:
---- start of example-lines ----
foo “foobar”;
bar “foo” “bar”;
fob “foo “bar””;
---- end of example-lines ----


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

Patrick,

what i got:
lines (strings) of the fo_ll_ow_ing fo_rm

---- start of example-lines ----
foo “foobar”;
bar “foo” “bar”;
fob “foo "bar"”;
---- end of example-lines ----

i want a regexp, which returns me the following
(applied with String::scan) from the given lines:

[“foobar”]
[“foo”,“bar”]
[“foo "bar"”]

i thought about something like: /“([^”]|\“)+”/

You were close, but there are a few problems with your regular

expression:

1) Your parentheses would affect what String#scan returns.  You could

change the opening parenthesis to “(?:” to prevent this.

2) This one is more subtle.  The problem is that the /[^"]/ is going to

match your ‘\’ before the /\“/ ever sees it. You could rework your logic
from (anything which is not " or which is ") to ((anything that is not " or
\ ) or is \ followed by anything). This works out to /([^”\]|\.)/.

3) Since you are using scan, everything before the first double-quote is

ignored. This would include a backslash. To get around this you would have
to reverse the strings, then scan for /“(?:[^”]|“\)+”(?!\)/ (i.e. a
double-quote followed by one or more (non double-quotes or a double-quote
followed by a backslash) followed by a double-quote not followed by a
backslash), then reverse the results.

4) As ahoward pointed out, you will not match the "bar" in 'foo

\\“bar”’ even though normal quoting rules could be interpreted to mean
that the first backslash cancels the meaning of the second backslash,
leaving the double-quote unescaped. To get around this you would have to
add tests for an even number of backslashes.

Putting all of this together would give you:

line.reverse.scan(/“(?:[^”]|"\(?:\\)(?!\))+"(?:\\)(?!\)/).map

line> line.reverse }

That would work, but it's pretty ugly and not very readable.  However

there is a much simpler method of doing this if you don’t care about
problem 4) and there exists a string that your strings are guaranteed not to
contain. A good candidate would be “\0”. If your strings will never
contain a “\0” (ASCII 0), you could simply replace each ‘\"’ with “\0”,
scan on /“[^”]+“/, then change each “\0” back to '\”':

line.gsub(/\“/,”\0").scan(/“[^”]+“/).map { |line| line.gsub(/\0/,'\”') }

One last note, you are not going to match the empty string in 'foo ""'.

To match the empty string, change the ‘+’ to ‘*’.

I hope this helps!

- Warren Brown

thank you all so much for replying

it works now AND - maybe even more important - i do understand WHY it
works…

i can’t recall the ‘?:’-thing mentioned in “programming ruby” - this could
be a worthy extension to the book. maybe i just didn’t see it - well…

the whole -thing is annoying me anyway - maybe the solution of warren is
the best - converting all ‘’ to another thing that would never occur under
normal circumstances and then re-substitute it after matching.

finally a word on why i asked this stuff - i wasn’t just bored:
we’re doing a PHP-project in school, it’s now rather big with about 100
PHP-files of which i guess at least 80 have echo-calls in it.
we notized (much too late), that the standard-echo in PHP does not convert
some characters to their appropriate HTML-whatever-this-is-called (for
instance the euro-sign should be €).
so we wrote a new function called secho, which accepts ONE string as
parameter - not a combination out of strings and varnames (i.e. $var) - and
there, the problem started: how to convert the echo-calls to correct
secho-calls???
i had the questionable honour to work out a solution for this - a good
chance to brush up my ruby-skills.
now i have a script iterating over all PHP-files in our project-directory
(and it’s subdirectories, hehe…) and changing each echo to a secho - even
the most “complicated” case:
echo “foo” $bar ““foobar””;
gets converted to
secho(“foo”.$bar."“foobar”");
(the ‘.’ is the PHP string-concat-operator)

thanks again to the people that helped me out with that,
greetings from austria,
patrick

I’m sure this is my misunderstanding, and/or possibly a known thing,
but when I put a ‘gets’ inside a timeout block, it blocks forever.

Is this a known windows issue? (XP, if that matters)

Here’s test code that does what I expect:

···

===============
require ‘timeout.rb’

$stdout.sync = true

begin
timeout(5) {
1.upto(10) { |x|
puts x
sleep 1
}
}

rescue TimeoutError
puts "too much time taken"
exit 1
end

==================================

It spits out 1 to 5, then the too much time taken. Fine.

When I use THIS however…

====================================
require ‘timeout.rb’

ans = “”
$stdout.sync = true

begin
timeout(5) {
print "enter name:"
ans = gets.chomp
}

rescue TimeoutError
puts "too much time taken"
exit 1
end

puts “name is #{ans}”

==================================

it sits at the prompt forever. (Interestingly, if i enter a name and
wait past the timeout period, I DO get the error printed, but I’m
forced to actually enter something. Even MORE interestingly, if I
enter NOTHING and wait, but just hit return (after the timeout), I
get no error.)

Pointers?

Thanks,

Mike


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

(?!pattern) rejects.

but as to the original problem:

the problem is that odd numbered escapes contain even numbered escapes and
vise versa, so you also need some sort of anchor ([^\], ^, \A) etc. but that
too becomes problematic… this seems to work because it handle odd number
escapes contained in even numbered escapes from being accepted - it other
words it forces odd numbered escapes to standalone :

~/eg/ruby > cat quotematch.rb
#!/usr/bin/env ruby
tests = DATA.readlines

re = %r/
[“]
(?:
^\*[\][”] | # prevent even escaped quotes
# from being accepted for containing
# and odd escaped quote

  (?:[\\][\\])*[\\]["]      | # accept odd escaped quotes

  (?:[\\])+[^"]?            | # accept any escape sequence which
                              # does not escape a quote

  (?:[^\\"]+)                 # accept anthing neither quote nor escaped
)*

["]
/iomx

tests.map{|t|f=nil;t.scan(re).map{|m| print"[#{m}] ";f=1;};puts if f}

END
foo “foobar”;
bar “foo” “bar”;
a “foo "bar"”;
b “foo \“bar\””;
c “foo \"bar\"”;
d “foo \\“bar\\””;
e “foo \t” “\t bar”
e “foo \t” “\t bar”

~/eg/ruby > ruby quotematch.rb
[“foobar”]
[“foo”] [“bar”]
[“foo "bar"”]
[“foo \”] [“”]
[“foo \"bar\"”]
[“foo \\”] [“”]
[“foo \t”] [“\t bar”]
[“foo \t”] [“\t bar”]

-a

···

On Mon, 23 Jun 2003, Simon Strandgaard wrote:

Is there anyone who know how to do the reject thing ?

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
~ > ruby -e ‘p(%.\x2d\x29…intern)’
====================================

str = <<‘EOF’
foo “foobar”
bar “foo” “bar”;
fob “foo "bar"”;
EOF
str.each_line{ |line|
p line.scan(/“(?:[^\”]|\.)+“/).collect! {|e| e.sub!(/^”(.*)"$/, ‘\1’)}
}

It seems to me that works.

thanx - that’s perfect

i wasn’t aware of the ‘?:’-thing - i’m still not sure what it does exactly -
but it’s working ;-))

thanx again,
patrick

Is there anyone who know how to do the reject thing ?

(?!pattern) rejects.

Ok… I think I solved it… tell me if it works ?

cat q2.rb
def extract_quotes(str)
tmp = str.scan(/^\"(.?^\*)"/)
tmp.collect { |res| res[1] }
end

#s = %Q{a “b” .“c.” …“d…” …“e…” …“f…”}.gsub(/./, “\”)
#p extract_quotes(s)

p extract_quotes(‘foo “foobar”;’)
p extract_quotes(‘bar “foo” “bar”;’)
p extract_quotes(‘fob “foo "bar"”;’)

ruby q2.rb
[“foobar”]
[“foo”, “bar”]
[“foo \"bar\"”]

···

On Mon, 23 Jun 2003 22:25:56 +0000, ahoward wrote:

On Mon, 23 Jun 2003, Simon Strandgaard wrote:


Simon Strandgaard

(?: … ) is “group without capture” – it’s the same grouping effect
as ( … ), but it doesn’t affect $1, $2, etcetera.

Ari

···

On Tue, 2003-06-24 at 10:10, Patrick Zesar wrote:

str = <<‘EOF’
foo “foobar”
bar “foo” “bar”;
fob “foo "bar"”;
EOF
str.each_line{ |line|
p line.scan(/“(?:[^\”]|\.)+“/).collect! {|e| e.sub!(/^”(.*)"$/, ‘\1’)}
}

It seems to me that works.

thanx - that’s perfect

i wasn’t aware of the ‘?:’-thing - i’m still not sure what it does exactly -
but it’s working ;-))

You are welcome.

Yesterday, I though I did saw somewhere about
this is classic question; that’s right, it is in the
“Mastering Regular Expression” book.

After check, the solution on the book is just like
what I came out, but in the book,
they gave other one for more performance:

/“(?:[^\”]+|\.)+"/ # did you see the different? :slight_smile:

BTW: if you like to match empty string like “” you can
change to
/“(?:[^\”]+|\.)*"/

You can check the book for more information.

Dave

···

Patrick Zesar jonnypichler@gmx.net wrote:

str = <<‘EOF’
foo “foobar”
bar “foo” “bar”;
fob “foo "bar"”;
EOF
str.each_line{ |line|
p line.scan(/“(?:[^\”]|\.)+“/).collect! {|e| e.sub!(/^”(.*)"$/, ‘\1’)}
}

It seems to me that works.

thanx - that’s perfect

i wasn’t aware of the ‘?:’-thing - i’m still not sure what it does exactly -
but it’s working ;-))

thanx again,
patrick


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

/usr/home/howardat > irb

irb(main):001:0> ‘“an escaped" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped" quote)
=> nil

irb(main):002:0> ‘“an escaped\" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped\)
=> nil

this doesn’t seem to work?

-a

···

On Wed, 25 Jun 2003, D T wrote:

/“(?:[^\”]+|\.)+"/ # did you see the different? :slight_smile:

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
~ > ruby -e ‘p(%.\x2d\x29…intern)’
====================================

It works, if you try this
test.rb # put it on a file and run by ruby not irb

str = <<‘EOF’
“an escaped" quote”
“an escaped\" quote”
EOF
str.each_line{ |line|
p line
p line.scan(/“(?:[^\”]+|\.)+“/).collect! {|e| e.sub!(/^”(.*)"$/, ‘\1’)}
}

For your example, I think the problem is on irb:
try this on irb and you know what I mean.
irb(main):001:0> ‘\’.length
1 # see, it is 1 not 2, kind of in irb simple quote string still substitute \
irb(main):002:0>

Dave

/“(?:[^\”]+|\.)+"/ # did you see the different? :slight_smile:

/usr/home/howardat > irb

irb(main):001:0> ‘“an escaped" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped" quote)
=> nil

irb(main):002:0> ‘“an escaped\" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped\)
=> nil

this doesn’t seem to work?

-a

···

ahoward ahoward@fsl.noaa.gov wrote:
On Wed, 25 Jun 2003, D T wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
~ > ruby -e ‘p(%.\x2d\x29…intern)’
====================================


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

OK, I check the pickaxe book to make sure I said the right thing.
Yes, becareful the single-quoted string,
it still substitutes the \ and '
that is why your example did not work.

Dave

···

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

For your example, I think the problem is on irb:

ah - yes it is - not the first time that has happened. good catch!

i does seem to work then, and it’s much simpler than mine!

-a

try this on irb and you know what I mean.
irb(main):001:0> ‘\’.length
1 # see, it is 1 not 2, kind of in irb simple quote string still substitute \
irb(main):002:0>

Dave

/“(?:[^\”]+|\.)+"/ # did you see the different? :slight_smile:

/usr/home/howardat > irb

irb(main):001:0> ‘“an escaped" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped" quote)
=> nil

irb(main):002:0> ‘“an escaped\" quote”’ =~ /“((?:[^\”]+|\.)+)“/ && puts(”(#{$1})")
(an escaped\)
=> nil

this doesn’t seem to work?

-a

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
~ > ruby -e ‘p(%.\x2d\x29…intern)’
====================================


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
–0-2003196418-1056484417=:12613
Content-Type: text/html; charset=us-ascii

It works, if you try this
test.rb  # put it on a file and run by ruby not irb
 
str = <<'EOF'
"an escaped\" quote"
"an escaped\\\" quote"
EOF
str.each_line{ |line|
  p line
  p line.scan(/"(?:[^\\"]+|\\.)+"/).collect! {|e| e.sub!(/^"(.*)"$/, '\1')}
}
 
For your example, I think the problem is on irb:
try this on irb and you know what I mean.
irb(main):001:0> '\\'.length
1  # see, it is 1 not 2, kind of in irb simple quote string still substitute \\
irb(main):002:0>
Dave

ahoward <ahoward@fsl.noaa.gov> wrote:
On Wed, 25 Jun 2003, D T wrote:

> /"(?:[^\\"]+|\\.)+"/ # did you see the different? :-)

/usr/home/howardat > irb

irb(main):001:0> '"an escaped\" quote"' =~ /"((?:[^\\"]+|\\.)+)"/ && puts("(#{$1})")
(an escaped\" quote)
=> nil

irb(main):002:0> '"an escaped\\\" quote"' =~ /"((?:[^\\"]+|\\.)+)"/ && puts("(#{$1})")
(an escaped\\)
=> nil

this doesn't seem to work?

-a
--
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ara.t.howard@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| ~ > ruby -e 'p(%.\x2d\x29..intern)'
====================================


Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month! --0-2003196418-1056484417=:12613--

-a

···

On Wed, 25 Jun 2003, D T wrote:

ahoward ahoward@fsl.noaa.gov wrote:
On Wed, 25 Jun 2003, D T wrote:

====================================
Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
~ > ruby -e ‘p(%.\x2d\x29…intern)’
====================================