Regexp Question: Checking for [joe][/joe] pairs

Andrew Johnson wrote:

William James wrote:
[snip]
>
> Rehearsal ------------------------------------------
> regexp 6.870000 0.000000 6.870000 ( 7.391000)
> array 2.653000 0.000000 2.653000 ( 2.874000)
> --------------------------------- total: 9.523000sec
>
> user system total real
> regexp 6.940000 0.000000 6.940000 ( 7.441000)
> array 2.634000 0.000000 2.634000 ( 2.854000)
>
> 10000
> 10000

The regex engine makes a difference in this case -- ruby1.8.5 + the
oniguruma engine gives:

  Rehearsal ------------------------------------------
  regexp 2.740000 0.010000 2.750000 ( 2.960385)
  array 3.120000 0.000000 3.120000 ( 3.120808)
  --------------------------------- total: 5.870000sec

               user system total real
  regexp 2.750000 0.000000 2.750000 ( 2.743031)
  array 3.140000 0.000000 3.140000 ( 3.137098)

  10000
  10000

cheers,
andrew

Oh, yeah? Try this on for size, Oni!

require 'benchmark'

$n = 10_000
$strings = [
  %Q{
  good [joe] Wasn't that what [i]he[/i] was seeking? [/joe]
     [joe] Can't you [b]see[/b] that? [/joe]
  Early one June morning in 1872 I murdered my father---an act
  which made a deep impression on me at the time.

  We know better the needs of ourselves than of others. To
  serve oneself is economy of administration.

  self-evident, adj. Evident to one's self and to nobody else.

  senate, n. A body of elderly gentlemen charged with high
  duties and misdemeanors.
  [joe]
  "Thou wretch! -- thou vixen! -- thou shrew!" said I to my
  wife on the morning after our wedding; "thou witch! -- thou
  hag! -- thou whippersnapper -- thou sink of iniquity! --
  thou fiery-faced quintessence of all that is abominable! --
  thou -- thou--" here standing upon tiptoe, seizing her by the
  throat, and placing my mouth close to her ear, I was [/joe]
  preparing to launch forth a new and more decided epithet of
  opprobrium, which should not fail, if ejaculated, to
  convince her of her insignificance, when to my extreme
  horror and astonishment I discovered that I had [i]lost my
  breath[/i].
  },
  "bad was Peck's boy [/joe] [joe] But he'll never know. [/joe]",
  "bad to the bone [joe] Or will he?! [/joe] mish mash mush
   Marching on Tom Tidler's ground fatigues me. [/joe]",
  "bad: too many [joe] [/joe] [joe] [/joe] [joe] [/joe] [joe] [/joe]",
  "bad: too few"
]

def regexp
  $regexp_good = 0
  $n.times{ $strings.each { |s|
    $regexp_good += 1 if s =~
/\A(((?!\[\/?joe\]).)*(\[joe\]((?!\[\/?joe\]).)+\[\/joe\])){1,3}((?!\[\/?joe\]).)*\Z/m
  } }
end
def array
  $array_good = 0
  $n.times{ $strings.each { |s|
    ary = s.scan( %r{\[/?joe\]} )
    if [2,4,6].include?(ary.size) and
      ary == ary.partition{|t| "[joe]"==t}.inject{|a,b| a.zip(b)}.
        flatten
      $array_good += 1
    end
  } }
end

Benchmark.bmbm do |x|
  x.report("regexp") { regexp }
  x.report("array") { array }
end

puts ; p $regexp_good, $array_good

Rehearsal ------------------------------------------
regexp 29.432000 3.354000 32.786000 ( 36.513000)
array 3.225000 0.000000 3.225000 ( 3.485000)
-------------------------------- total: 36.011000sec

             user system total real
regexp 29.382000 3.656000 33.038000 ( 36.793000)
array 3.195000 0.000000 3.195000 ( 3.525000)

10000
10000