False positives in editing data

Alex_Young · 21 November 2007 18:37

Clifford Heath wrote:

RichardOnRails wrote:

sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/

sName

Did you draw attention to this because of the Hungarian notation? If
so, do you think I'm unwise to adopt the style once advocated by
Charles Simoni, super-programmer and co-founder of a giant software
company?

Yes. Adamantly, and definitely yes.

Interesting article on why (and where it comes from, and why it might not have been such a bad idea) here:

Making Wrong Code Look Wrong – Joel on Software

···

--
Alex

Richard3 · 22 November 2007 01:10

RichardOnRails wrote:
<snip>
>> Those evals are TERRIFYING. Don't use them.

> How else can you loop through $1, $2 ... without repetitive code?

I don't think anyone has replied to this, so...

This is what you've got:

> sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/

> # Save match variables in n[1] ... n[MLN+1]
> n = Array.new
> (1..MLN+1).each { |i| eval %{n[i] = $#{i}} }

You can get an equivalent for n with the String#match method:

irb(main):001:0> name = "2.2.02Topic 2.2.2"
=> "2.2.02Topic 2.2.2"
irb(main):002:0> m = name.match( /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/ )
=> #<MatchData:0x3a3a324>
irb(main):003:0> m[1]
=> "2"
irb(main):004:0> m[2]
=> "2"
irb(main):005:0> m[3]
=> "02"

You should be able to use that to get around your second eval, too.

--
Alex

Hi Alex,

m = name.match( /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/ )

Thanks.

Best wishes,
Richard

···

On Nov 21, 1:53 pm, Alex Young <a...@blackkettle.org> wrote:

> On Nov 19, 2:23 am, Ryan Davis <ryand-r...@zenspider.com> wrote:

Richard3 · 22 November 2007 01:30

RichardOnRails wrote:
>>>> sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/
>> sName

> Did you draw attention to this because of the Hungarian notation? If
> so, do you think I'm unwise to adopt the style once advocated by
> Charles Simoni, super-programmer and co-founder of a giant software
> company?

Yes. Adamantly, and definitely yes.

I might have been a bad idea when he had it, even though to his
credit he was trying to make the best of a bad situation, where
MS had bought the worst C compiler on the planet because the good
ones weren't for sale - they could make more money *not* selling
to MS. Because of a spate of bugs and bad code churned out by the
MS software factory, many caused by type mismatches on function
parameters that weren't detected either at compile time or at
runtime, Hungarian notation *might* have been a good idea once.

It's definitely *not* a good idea with modern C, and even less of
a good idea with Ruby.

Clifford Heath.

Hi Clifford,

It's definitely *not* a good idea with modern C, and even less of
a good idea with Ruby.

I don't know anything about Microsoft's choice of compilers. But I
used several C compilers in the '80s, and all cases found Hungarian
notation helpful. I don't think Microsoft's initial choice of
compilers is relevant to my and other's successful employment of that
convention.

Why would it be a bad idea with modern C compilers or with Ruby? You
offer no reason. All it does is add one or two letters before names!
That doesn't bother any human or compiler or interpreter. To my
ears, it sounds like you're telling me you like chocolate after
learning that I like vanilla.

I'm reminded of George Wallace's assessment of the difference between
Democrats vs. Republicans: "There's not a dime's worth of difference.

Best wishes,
Richard

···

On Nov 21, 12:39 pm, Clifford Heath <n...@spam.please.net> wrote:

Richard3 · 26 November 2007 20:25

Hi Rick,

I loved your blog. Thanks for posting it and informing me about it.

I think my usage of "Hungarian" consistent with Simonyi's intent, at
least how I understand it. In any case, I find my uasage helpful, as
I mentioned to Alex Young on this thread, though I may have to
sanitize future posts to avoid people who don't respond to what I
mean but instead waste time on how I express my question.

Best wishes,
Richard

···

On Nov 21, 8:44 pm, Rick DeNatale <rick.denat...@gmail.com> wrote:

On Nov 20, 2007 9:35 PM, RichardOnRails > > <RichardDummyMailbox58...@uscomputergurus.com> wrote:

> On Nov 19, 2:23 am, Ryan Davis <ryand-r...@zenspider.com> wrote:

> > >> sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/

> > sName

> Did you draw attention to this because of the Hungarian notation? If
> so, do you think I'm unwise to adopt the style once advocated by
> Charles Simoni, super-programmer and co-founder of a giant software
> company?

Actually this form of Hungarian notation, which was called System
Hungarian in Microsoft, is NOT what Simonyi originally sugested (and
what was used in the Application Division).

http://talklikeaduck.denhaven2.com/articles/2007/04/09/hungarian-ducks

--
Rick DeNatale

My blog on Rubyhttp://talklikeaduck.denhaven2.com/

Raul_Parolari · 27 November 2007 01:36

RichardOnRails wrote:

I forgot to tell you that I finally understand your second example.

md = s.match( /^ (.*) [a-zA-Z] /x )
md[1] # => "2.1Topi"

Without the question mark, in principal, the ".* initially consumes
all the characters, but then it sees the match fails, because there's
no match for the "[a-zA-Z]". So the ".*" sort of "backs off" and
satisfies it self with "2.1Topi", leaving the "c" to satisfy "[a-zA-
Z]".

Very good, Richard!
It is a question on when one is 'content'; if you need a metaphor to
remember it, think of a WallStreet banker (.*, .+) vs a Franciscan monk
(.*?, .+?)

The one I like settled on is:

s="2.1Topic 2.1"
md = s.match( /^ ([\.\d]*) [^\.\d] /x )
#md[0]=2.1T
#md[1]=2.1

I see that you have solved the problem in your previous post (that I
could not reply to), when you wrote (removing all other code):

s = "2.002.1Topic 2.2.1"

s =~ /^ (\d+[.]?)+ [^\.\d] /x

I must confess: I was stunned myself that it did not work; foolish of
us, in fact it was working, but you failed to collect the bounty! you
needed parenthesis to include the '+'!

s =~ /^ ((\d+[.]?)+) [^\.\d] /x

p $1, 2 # => "2.002.1", "1"

However it is better to avoid collecting also the inner results as they
overwrite each other in $2 and then confuse us (that's the reason that
you saw the last digit captured above..); so let's use the '?:' trick,
to avoid writing in $2, where instead we will capture the 'non
digits/dots' that come after:

s =~ /^ ((?:\d+[.]?)*) ([^\.\d]+) /x

p $1, $2 # => "2.002.1", "Topic "

Do you see it? I think you do. Now, to finish, let's examine how you
solved the problem in this post:

s="2.1Topic 2.1"
md = s.match( /^ ([.\d]*) [^\.\d] /x )

Ah, you resorted to 'pragmatism'.. you said: "the bloody '\d+[.]?)+'
does not work, so I will change it". This was ok, but do you see the
difference between:

((?:\d+[.]?)*) # I changed + -> * to compare

([\d[.]]*)

aside that the second one is easier to read? (you may want to stop
reading and think about this as this is your test to graduate from
"intermediate level regexp"

Ok: if they could speak, they would say respectively:
1) I want 0 or more sequences of (digits followed optionally by a dot)
2) I want 0 or more combinations of digits and dots as they come

Do you see?
both would match: "2.002.1" but the 2nd would also match "...1..37"!

The last question you had was: how do I pick up the digits once I
collected the "2.002.1"? Study scan in Pickaxe and then do:

str = "2.002.1"

str.scan(/ (\d+) /x) # => [["2"], ["002"], ["1"]]

All right, let's call it a Regexp day,

Raul

···

--
Posted via http://www.ruby-forum.com/\.

Jordan_Callicoat · 27 November 2007 02:15

Hi Richard,

Here's a cheat-sheet for ruby regular expression syntax:

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Regards,
Jordan

Alex_Young · 22 November 2007 01:42

RichardOnRails wrote:

RichardOnRails wrote:

sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/

sName

Did you draw attention to this because of the Hungarian notation? If
so, do you think I'm unwise to adopt the style once advocated by
Charles Simoni, super-programmer and co-founder of a giant software
company?

Yes. Adamantly, and definitely yes.

I might have been a bad idea when he had it, even though to his
credit he was trying to make the best of a bad situation, where
MS had bought the worst C compiler on the planet because the good
ones weren't for sale - they could make more money *not* selling
to MS. Because of a spate of bugs and bad code churned out by the
MS software factory, many caused by type mismatches on function
parameters that weren't detected either at compile time or at
runtime, Hungarian notation *might* have been a good idea once.

It's definitely *not* a good idea with modern C, and even less of
a good idea with Ruby.

Clifford Heath.

Hi Clifford,

It's definitely *not* a good idea with modern C, and even less of
a good idea with Ruby.

I don't know anything about Microsoft's choice of compilers. But I
used several C compilers in the '80s, and all cases found Hungarian
notation helpful. I don't think Microsoft's initial choice of
compilers is relevant to my and other's successful employment of that
convention.

Why would it be a bad idea with modern C compilers or with Ruby? You
offer no reason. All it does is add one or two letters before names!
That doesn't bother any human or compiler or interpreter.

It depends on the type of Hungarian that you're using, and it's not clear from your code sample which it is. If you're using an abbreviation prefix to denote a semantic difference within a type, then that's (potentially) useful, in both Ruby and C:

us_username = read_unsafe_input()
s_username = sanitise(us_username)

with us_ meaning unsafe and s_ meaning safe, for example. Not something I'd use myself, but I can see the utility. If it's denoting a class, then it's not something I can see as useful, in either C or Ruby. In C, you're duplicating the compiler's type-checking, and in Ruby, duck-typing means that you shouldn't need to care; it becomes readability-damaging line noise.

···

On Nov 21, 12:39 pm, Clifford Heath <n...@spam.please.net> wrote:

--
Alex

Todd_Benson · 22 November 2007 02:44

Harry is quoting from the movie Airplane (1980)

Todd

···

On Nov 20, 2007 7:24 PM, RichardOnRails <RichardDummyMailbox58407@uscomputergurus.com> wrote:

On Nov 20, 10:22 am, Harry Kakueki <list.p...@gmail.com> wrote:
> By the way, stop calling me Shirley.
> And stop calling me Ryan.
>
> Good luck,
>
> Harry
Hi Harry,

I apologize for the "Ryan" thing. Ryan was the first response on this
thread and got mixed up. I don't know about the "Shirley" thing.

Richard3 · 27 November 2007 03:15

Hi Richard,

Here's a cheat-sheet for ruby regular expression syntax:

サービス終了のお知らせ

Regards,
Jordan

Hi Jordan,

サービス終了のお知らせ

Thanks. I'm running ruby 1.8.2 (2004-12-25) [i386-mswin32]. How can
I tell if it uses Oniguruma RE ver.5.6.0?

"gem list oniguruma -b" gave me:
*** LOCAL GEMS ***
[nothing]
*** REMOTE GEMS ***
oniguruma (1.1.0, 1.0.1, 1.0.0, 0.9.1, 0.9.0)

Judging by this result, I'd say the "5.6.0" is the version of the
Cheat Sheet itself; and that I don't have Oniguruma installed.

I've got The Ruby Way, ver. 2, that covers Oniguruma. But I'm fairly
new to Ruby, so I wonder whether stepping up to Oniguruma is
prudent.

Any ideas?

Regards,
Richard
Regards,
Richard

···

On Nov 26, 9:14 pm, MonkeeSage <MonkeeS...@gmail.com> wrote:

Richard3 · 28 November 2007 01:30

RichardOnRails wrote:

> I forgot to tell you that I finally understand your second example.

>> md = s.match( /^ (.*) [a-zA-Z] /x )
>> md[1] # => "2.1Topi"

> Without the question mark, in principal, the ".* initially consumes
> all the characters, but then it sees the match fails, because there's
> no match for the "[a-zA-Z]". So the ".*" sort of "backs off" and
> satisfies it self with "2.1Topi", leaving the "c" to satisfy "[a-zA-
> Z]".

Very good, Richard!
It is a question on when one is 'content'; if you need a metaphor to
remember it, think of a WallStreet banker (.*, .+) vs a Franciscan monk
(.*?, .+?)

> The one I like settled on is:

> s="2.1Topic 2.1"
> md = s.match( /^ ([\.\d]*) [^\.\d] /x )
> #md[0]=2.1T
> #md[1]=2.1

I see that you have solved the problem in your previous post (that I
could not reply to), when you wrote (removing all other code):

s = "2.002.1Topic 2.2.1"

s =~ /^ (\d+[.]?)+ [^\.\d] /x

I must confess: I was stunned myself that it did not work; foolish of
us, in fact it was working, but you failed to collect the bounty! you
needed parenthesis to include the '+'!

s =~ /^ ((\d+[.]?)+) [^\.\d] /x

p $1, 2 # => "2.002.1", "1"

However it is better to avoid collecting also the inner results as they
overwrite each other in $2 and then confuse us (that's the reason that
you saw the last digit captured above..); so let's use the '?:' trick,
to avoid writing in $2, where instead we will capture the 'non
digits/dots' that come after:

s =~ /^ ((?:\d+[.]?)*) ([^\.\d]+) /x

p $1, $2 # => "2.002.1", "Topic "

Do you see it? I think you do. Now, to finish, let's examine how you
solved the problem in this post:

> s="2.1Topic 2.1"
> md = s.match( /^ ([.\d]*) [^\.\d] /x )

Ah, you resorted to 'pragmatism'.. you said: "the bloody '\d+[.]?)+'
does not work, so I will change it". This was ok, but do you see the
difference between:

((?:\d+[.]?)*) # I changed + -> * to compare

([\d[.]]*)

aside that the second one is easier to read? (you may want to stop
reading and think about this as this is your test to graduate from
"intermediate level regexp"

Ok: if they could speak, they would say respectively:
1) I want 0 or more sequences of (digits followed optionally by a dot)
2) I want 0 or more combinations of digits and dots as they come

Do you see?
both would match: "2.002.1" but the 2nd would also match "...1..37"!

The last question you had was: how do I pick up the digits once I
collected the "2.002.1"? Study scan in Pickaxe and then do:

str = "2.002.1"

str.scan(/ (\d+) /x) # => [["2"], ["002"], ["1"]]

All right, let's call it a Regexp day,

Raul

--
Posted viahttp://www.ruby-forum.com/.

Hi Raul,

Thank you for your further support of my obstinacy º Your help has
guided me to the solution I wanted. Your original one is succinct,
perhaps even elegant in that it decomposes the problem into two sub-
problems which admit of essentially one-line solutions. While I truly
appreciate that approach, I wanted to find a "natural" solution,
which is the one included below. It has one caveat: it's aimed at
processing files of only a few megabytes. That said, I'd be pleased
to hear of any downsides you may foresee.

> I forgot to tell you that I finally understand your second example.
[snip]
Very good, Richard!
It is a question on when one is 'content'; if you need a metaphor ...

Thanks. I've got that stuff wired into brain now.

I must confess: I was stunned myself that it did not work; foolish of
us, in fact it was working, but you failed to collect the bounty! you
needed parenthesis to include the '+'!

That approach is old news, now that I've conceived of my "natural"
approach

However it is better to avoid collecting also the inner results as they
overwrite each other in $2 and then confuse us

Understood! As you'll see, I avoided that pitfall below.

[snip]

Do you see it? I think you do.

Quit so.

[snip]
This was ok, but do you see the

difference between:

((?:\d+[.]?)*) # I changed + -> * to compare

([\d[.]]*)

[snip]

Do you see?

For sure!

All right, let's call it a Regexp day,

I'll drink to that!

With Thanks and Best Wishes, I remain
Yours truly,
Richard

# "Natural" Solution
input = <<DATA
05Topic 05
1.0Topic 1.0
2.002.1Topic 2.2.1
3.15.26.37Topic 3.15.26.37
DATA

MaxDepth = 5
sRE = "^"
(1..MaxDepth).each { |i|
sRE << ' (\d*)(?:\.?)'
}
sRE += ' ([^\.\d].*)'
re = Regexp.new(sRE, Regexp::EXTENDED)

input.each { |line|
  puts '='*10
  puts line
  puts '='*10

  # puts re.to_s # Debug
  md = line.match( re )
  (0..MaxDepth+1).each { |i|
    puts "md[#{i}] = " + md[i] if md[i]
  }
  puts
}

···

On Nov 26, 8:36 pm, Raul Parolari <raulparol...@gmail.com> wrote:

Richard3 · 26 November 2007 19:55

RichardOnRails wrote:
>> RichardOnRails wrote:
>>>>>> sName =~ /^([\d]+)?\.?([\d]+)?\.?([\d]+)?\.?/
>>>> sName
>>> Did you draw attention to this because of the Hungarian notation? If
>>> so, do you think I'm unwise to adopt the style once advocated by
>>> Charles Simoni, super-programmer and co-founder of a giant software
>>> company?
>> Yes. Adamantly, and definitely yes.

>> I might have been a bad idea when he had it, even though to his
>> credit he was trying to make the best of a bad situation, where
>> MS had bought the worst C compiler on the planet because the good
>> ones weren't for sale - they could make more money *not* selling
>> to MS. Because of a spate of bugs and bad code churned out by the
>> MS software factory, many caused by type mismatches on function
>> parameters that weren't detected either at compile time or at
>> runtime, Hungarian notation *might* have been a good idea once.

>> It's definitely *not* a good idea with modern C, and even less of
>> a good idea with Ruby.

>> Clifford Heath.

> Hi Clifford,

>> It's definitely *not* a good idea with modern C, and even less of
>> a good idea with Ruby.

> I don't know anything about Microsoft's choice of compilers. But I
> used several C compilers in the '80s, and all cases found Hungarian
> notation helpful. I don't think Microsoft's initial choice of
> compilers is relevant to my and other's successful employment of that
> convention.

> Why would it be a bad idea with modern C compilers or with Ruby? You
> offer no reason. All it does is add one or two letters before names!
> That doesn't bother any human or compiler or interpreter.

It depends on the type of Hungarian that you're using, and it's not
clear from your code sample which it is. If you're using an
abbreviation prefix to denote a semantic difference within a type, then
that's (potentially) useful, in both Ruby and C:

us_username = read_unsafe_input()
s_username = sanitise(us_username)

with us_ meaning unsafe and s_ meaning safe, for example. Not something
I'd use myself, but I can see the utility. If it's denoting a class,
then it's not something I can see as useful, in either C or Ruby. In C,
you're duplicating the compiler's type-checking, and in Ruby,
duck-typing means that you shouldn't need to care; it becomes
readability-damaging line noise.

--
Alex

Hi Alex,

abbreviation prefix to denote a semantic difference within a
type, then that's (potentially) useful, in both Ruby and C:
us_username = read_unsafe_input()
s_username = sanitise(us_username)

I like that.

If it's denoting a class,
then it's not something I can see as useful

I like to know merely by inspection whether a referent denotes an
integer, a string, a hash or an array of such things. I'd like to
avoid "Syntax error" simply because I failed to include a to_s, to_i,
or whatever. I really can't see why a prefixed lower-case letter
or two before a camel-case object name can create so much discussion
irrelevant to the question at hand.

Maybe I'll have so "sanitize" all the code I post to exemplify a
coding issue.

Thank you for your response, notwithstanding my lack of total
agreement.

Best wishes,
Richard

···

On Nov 21, 8:42 pm, Alex Young <a...@blackkettle.org> wrote:

> On Nov 21, 12:39 pm, Clifford Heath <n...@spam.please.net> wrote:

Richard3 · 26 November 2007 20:35

Thanks, Todd. That was over my head. "Airplane" was not a movie I'd
run to see

Regards,
Richard

···

On Nov 21, 9:44 pm, Todd Benson <caduce...@gmail.com> wrote:

On Nov 20, 2007 7:24 PM, RichardOnRails > > <RichardDummyMailbox58...@uscomputergurus.com> wrote:

> On Nov 20, 10:22 am, Harry Kakueki <list.p...@gmail.com> wrote:
> > By the way, stop calling me Shirley.
> > And stop calling me Ryan.

> > Good luck,

> > Harry
> Hi Harry,

> I apologize for the "Ryan" thing. Ryan was the first response on this
> thread and got mixed up. I don't know about the "Shirley" thing.

Harry is quoting from the movie Airplane (1980)

Todd

Jordan_Callicoat · 27 November 2007 03:35

On Nov 26, 9:13 pm, RichardOnRails

Thanks. I'm running ruby 1.8.2 (2004-12-25) [i386-mswin32]. How can
I tell if it uses Oniguruma RE ver.5.6.0?

You don't actually need oniguruma, it's the same syntax as class
Regexp in ruby 1.8 (well, a few things don't work, but 99% does).

Regards,
Jordan

Raul_Parolari · 28 November 2007 07:25

RichardOnRails wrote:

Thank you for your further support of my obstinacy �� Your help has
guided me to the solution I wanted. Your original one is succinct,
perhaps even elegant in that it decomposes the problem into two sub-
problems which admit of essentially one-line solutions. While I truly
appreciate that approach, I wanted to find a "natural" solution,
which is the one included below.

Hi, Richard

you mention your 'obstinacy'... and indeed, you have found a way to
implement with Regexps your original design (you did not fool me! :-);
great ingenuity.

However, as much as I am stunned by your progress (your original program
at the top of this post and this one seem like Dante going from Inferno
to Paradiso), I must be frank.

I do not like building arrays to meet some 'maximum treshold', leaving
portions of them empty; it just does not make me 'happy' (in the
Matz/Ruby sense, do you understand?). Of course, it is just an array of
5 positions, but it does not matter; it is just echologically wrong for
me.

I however understand your feeling towards my solution (and the contrast
you create with your 'natural one'); got it! but then I invite you to
explore scan with \G; look at this, that may be doing something more
'natural':

# \G 'anchors' start of next search to end of previous one
re_prefix = %r/\G (\d+ [.]?) /x

input.each { |line| p line.scan(re_prefix).flatten }

Output:
["05"]
["1", "0"]
["2", "002", "1"]
["3", "15", "26", "37"]

Small (1 line!), fast, precise: a beauty.

It is not the complete solution as '\G scan' in Ruby does not allow you
to change the regexp without interrupting the job (there are sad
workarounds for it), so the job is not complete. The library
StringScanner ('strscan') is of interest, as it solves this problem
nicely (and is in C, so is fast).

Perhaps, examine \G and/or StringScanner, and see if you can find a
solution that meets what you are looking for, without (as seen from me)
imperfections.

Congrats for your progress (in 1 week!)

Raul

  Some people, when confronted with a problem, think:
  "I know, I'll use regular expressions".
  Now they have two problems.

Jamie Zawinski

:-)

···

--
Posted via http://www.ruby-forum.com/\.

Alex_Young · 26 November 2007 20:56

RichardOnRails wrote:
<snip>

Hi Alex,

abbreviation prefix to denote a semantic difference within a
type, then that's (potentially) useful, in both Ruby and C:
us_username = read_unsafe_input()
s_username = sanitise(us_username)

I like that.

If it's denoting a class,
then it's not something I can see as useful

I like to know merely by inspection whether a referent denotes an
integer, a string, a hash or an array of such things. I'd like to
avoid "Syntax error" simply because I failed to include a to_s, to_i,
or whatever.

You'll tend to find that types are much less relevant in Ruby than in C. The actual class of an object is much less important than the methods it responds to, and you won't get a syntax error unless the syntax is actually wrong; this won't be a problem with variables because of the lack of compile-time type checking. I've tried the method you're espousing myself, and it didn't actually help me at all. I found it just wasn't worth the effort. However, I'm more than willing to accept that it's a difference between your coding style and mine rather than any fundamental problem with the concept that made the difference.

In terms of posting code here, the most important thing is to make it readable. Most people won't know what your hungarian prefixes mean, so they're just line noise to them.

···

--
Alex

Raul_Parolari · 26 November 2007 21:23

RichardOnRails wrote:

I like to know merely by inspection whether a referent denotes an
integer, a string, a hash or an array of such things. I'd like to
avoid "Syntax error" simply because I failed to include a to_s, to_i,
or whatever. I really can't see why a prefixed lower-case letter
or two before a camel-case object name can create so much discussion
irrelevant to the question at hand.

Richard,

I always found fascinating the issue of 'how we name things', and not
only for philosophical reasons; personally I think that the horrendous
amount of time spent to-day in what is called "Testing Dept" is due in
part to problems like that.

A bad and careless Naming methodology (when there is one!) leads
(especially in a project where people share code) to subtle errors,
flawed assumptions, and ultimately to errors (unfortunately, when there
is somebody put in charge of the 'naming standards', he is often very
politically correct, but not the brightest guy around, and the result is
even worse than 'no standards').

One person who wrote something intelligent about this subject is Damian
Conway in his book "Perl Best Practices" (ok, I will get the usual
parochial boos for naming that language, but ok, life continues),
specifically chapter 3 "Naming Conventions".

Even if the examples are on Perl syntax, the substance goes beyond. He
suggests something different than your approach; the name should
indicate not so much the class/type, but the MEANING of the data
structure; for example (of course I will not use Perl syntax, and avoid
the examples that make sense for Perl only):

# scalars
running_total games_count # (rather than 'total','count')

# booleans
is_valid has_end_tag loading_finished

# arrays
events handlers unknowns
# the iteration var
event handler unknown

# hashes
title_of count_for sales_from isbn_from

He even discusses the role of 'nouns' and 'adjectives' in names.. (a
delight to read!).

The emphasis is in the MEANING of the data, not on the 'class/type': do
you see? but the objective is similar to yours: grant somebody looking
at code of somebody else (translated: ourselves 6 months later!) at
least a hope to vaguely understand what is going on!

You may want to glance at it, if you find that approach of interest.

Raul

Names are but noise and smoke,
obscuring heavenly light

Johann Wolfgang von Goethe, "Faust: Part I"

···

--
Posted via http://www.ruby-forum.com/\.

Richard3 · 28 November 2007 01:00

On Nov 26, 9:13 pm, RichardOnRails

> Thanks. I'm running ruby 1.8.2 (2004-12-25) [i386-mswin32]. How can
> I tell if it uses Oniguruma RE ver.5.6.0?

You don't actually need oniguruma, it's the same syntax as class
Regexp in ruby 1.8 (well, a few things don't work, but 99% does).

Regards,
Jordan

Hi Jordan,

You don't actually need oniguruma, it's the same syntax as class
Regexp in ruby 1.8 (well, a few things don't work, but 99% does).

Great! Thank you very much for the Cheat Sheet.

Best wishes,
Richard

···

On Nov 26, 10:32 pm, MonkeeSage <MonkeeS...@gmail.com> wrote:

Raul_Parolari · 28 November 2007 07:40

Just to correct the position of a parenthesis above:

re_prefix = %r/\G (\d+ [.]?) /x

re_prefix = %r/\G (\d+) [.]? /x

Raul

···

--
Posted via http://www.ruby-forum.com/\.

Richard3 · 29 November 2007 02:45

Hi Raul,

> re_prefix = %r/\G (\d+ [.]?) /x

I have learned something: I saw immediately that the mistyped version
was not what you intended because the dots would have been captured.
I applied your correction and things worked as advertised.

... you have found a way to implement with Regexps your
original design (you did not fool me! :-);

:BG

great ingenuity [snip] stunned by your progress

Thanks for the compliments, but they're not merited in this respect:

I wrote my first program circa 1955 (on paper only; no execution)
after receiving a letter from a former high-school classmate
announcing that he had encountered new-fangled machines at Princeton
called "computers". He furthermore recounted their instruction set.
I was hooked. After grinding out a degree from night-college and
earning an NSF graduate fellowship in math, I finally got a job
programming a real computer, which I continued until I retired a few
years ago.

I must be frank.

Absolutely. I've invited that, and am also impressed with your
gracious approach.

I do not like building arrays to meet some 'maximum treshold', leaving

portions of them empty; it just does not make me 'happy' (in the
Matz/Ruby sense, do you understand?). Of course, it is just an array
of
5 positions, but it does not matter; it is just echologically wrong
for
me.

I acknowledge and share the aesthetic validity of your displeasure.

I however understand your feeling towards my solution (and the contrast

you create with your 'natural one'); got it! but then I invite you to
explore scan with \G; look at this, that may be doing something more
'natural':

# \G 'anchors' start of next search to end of previous one

re_prefix = %r/\G (\d+ [.]?) /x
input.each { |line| p line.scan(re_prefix).flatten }

Small (1 line!), fast, precise: a beauty.

I agree fully!

Of course I couldn't leave "well enough" alone, so here's my mod:

input.each { |line| line.scan(re_prefix).flatten.each { |e|
printf("%s ",e)}; puts }

Perhaps, examine \G and/or StringScanner, ...

I took a fast look at ruby-doc.org/core/ ... looks good. Thanks

see if you can find a

solution that meets what you are looking for, without (as seen from
me)
imperfections.

Your latest approach suits my requirement (and taste) perfectly. I'm
off now to continue work on the project I'm developing, which you
might find interesting. Since it's off-topic, send me an email if you
want details. (My email-address looks artificial in order to deter
spammers, but I do have a mail box for it which I only check
sporadically unless I anticipate legitimate email.)

Some people, when confronted with a problem, think:

"I know, I'll use regular expressions".
Now they have two problems.

:BG

Again, thank you and
Best Wishes,
Richard

Topic		Replies	Views
A regular expression problem ruby-talk	6	102	5 March 2007
Specification of Ruby regex? ruby-talk	31	214	28 August 2003
Perl to Ruby: regex captures to assignment ruby-talk	36	252	25 December 2012
Regex simplifier? ruby-talk	16	178	18 February 2011
Do You Understand Regular Expressions? ruby-talk	19	136	22 June 2007

False positives in editing data

Related topics