Method improvement request .--

Charles_D_Hixson · 19 September 2004 21:03

James Edward Gray II wrote:

···

On Sep 18, 2004, at 6:14 PM, Charles Hixson wrote:

James Edward Gray II wrote:

On Sep 18, 2004, at 5:04 PM, James Edward Gray II wrote:

What about something like:

def parse1(chunk)
    if chunk =~ / ^([^-A-Za-z0-9]*) # pre-match
               (.*) # middle

Sorry, that probably needs to be:

(.*?)

My bad.

James Edward Gray II

That would only work if the postmatch pattern were included in the same pattern as the prematch, thus:
if chunk =~ /^([^-A-Za-z0-9.]*)(.*?)[^-A-Za-z0-9.]*)$/

I did include all three in the same pattern, I just used the whitespace and comment modifier to pretty it up a bit.

/^([^-A-Za-z0-9.]*)(.*?)([^-A-Za-z0-9.]*)$/

Is the same as:

/
    ^([^-A-Za-z0-9.]*) # pre-match
    (.*?) # middle
    ([^-A-Za-z0-9.]*)$ # post-match
/x

Note the trailing /x.

James Edward Gray II

Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)

Felipe_Malta_de_Oliv · 20 September 2004 03:55

Thanks, that cleared it up quite a bit!

Felipe

···

----- Original Message -----
From: "Markus" <markus@reality.com>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Monday, September 20, 2004 12:27 AM
Subject: Re: Nuby question about symbols

I'll give it a shot:

      * Symbols are an idea borrowed from lisp. They are immutable,
        atomic, named, globally unique values with an efficient internal
        storage.
              * immutable, like (say) nil or an integer, in that you
                can't change them, give them new values, update them,
                etc.
              * atomic in that you can't "take apart" a symbol or "peek
                inside it" like you can with a string
              * named, in that each symbol has a human-readable form
                (unlike, say, pointers)
              * globally unique in that if a symbol is referenced
                anywhere in the program it is the same object that the
                same reference would get you anywhere else. This is the
                same way integers work (7 is 7, no matter where it
                occurs in the program), but unlike how arrays and
                strings work (you can, for example, have the string
                "seven" in several places in your program, and they are
                NOT the same object).
              * efficient in that they are usually implemented as
                something like an integer or a pointer, and thus are
                quick to compare, small to store, etc.
      * Symbols are used wherever they are useful.
              * Symbols fill a roll in ruby (and in lisp) something like
                enumerated types in pascal--in fact, if you single
                imagined a pre-existing enumerated type containing all
                possible values, that would work sort of like symbols.
              * Symbols can be used for arbitrary state or condition
                labels (e.g. :male/:female, :jan, :feb, :mar...,
                :on,:off,:standby,:out_of_service,... :reverse,:neutral,
                :first,:second,:third,:overdrive etc.)
              * Symbols can be used as "exceptional" values (e.g.
                :not_a_number, :to_be_determined, etc.) much as nil or
                -1 often are, but in a way that is much easier to read.
                They are much more efficient than strings, which are
                often also used in such contexts

If that doesn't help, let me know and I'll try to dredge up some online
references--or you can always google.

-- MarkusQ

On Sun, 2004-09-19 at 18:50, Felipe Malta de Oliveira wrote:
> In spite of the recent comments stating that newbies should not be

afraid to

> post silly questions, I now ask...
>
> Could anybody give me a little knowledge about symbols? Like what they

are,

> why and where they're used and such...Or give me a pointer to somewhere

I

> can find that information?
>
> Thanks a lot,
>
> Felipe
>

Markus · 20 September 2004 15:07

At a later stage it will start babbling using reasonable phrases as
chunks, and transitionsing from phrase to phrase based on some kind of
statistical relationship. Still later...well, I don't yet know just how
far this can go. I'm hoping it will become interesting. I intend to
feed it a bunch of books from Gutenberg as background, but I'm starting
with Alice30.txt (Alice in Wonderland).

If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.

-- Markus

James_Edward_Gray_II · 19 September 2004 21:15

You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.

Hope that helps.

James Edward Gray II

···

On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:

Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)

Charles_D_Hixson · 20 September 2004 19:08

Markus wrote:

At a later stage it will start babbling using reasonable phrases as chunks, and transitionsing from phrase to phrase based on some kind of statistical relationship. Still later...well, I don't yet know just how far this can go. I'm hoping it will become interesting. I intend to feed it a bunch of books from Gutenberg as background, but I'm starting with Alice30.txt (Alice in Wonderland).

If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.

-- Markus

Well, since I plan to eventually release full sources...that may well happen if it's successful. Then again, it could probably also be used to sort spam from ham.

I basically think of this as a part of an AI project, and as such will have multiple uses. E.g., one test of ham is that most of what it contains consists of reasonable phrases. If it doesn't have reasonable phrases, it's probably something else. Which, unfortunately, includes programs. So you'd need a separate recognizer to decide that it was or wasn't a program. And possibly others.

But the spam/ham problem is an arms race. I suspect that a final answer is impossible this side of individually tailored filters. Bayes is already a start at this, but it's just a start. To be really effective the filter will need to dip into the semantic level. (So far I'm pretty much staying at the syntactic level, because it's more tractable...but semantics will need to be added.)

Charles_D_Hixson · 19 September 2004 21:40

James Edward Gray II wrote:

···

On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:

Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)

You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.

Hope that helps.

James Edward Gray II

Does that work inside character class definitions( delimited groups)?

Austin_Ziegler5 · 20 September 2004 19:24

I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.

It seems that you're basically doing Markov chain analysis here.

-austin

···

On Tue, 21 Sep 2004 04:08:44 +0900, Charles Hixson <charleshixsn@earthlink.net> wrote:

Markus wrote:
>>At a later stage it will start babbling using reasonable phrases as
>>chunks, and transitionsing from phrase to phrase based on some kind of
>>statistical relationship. Still later...well, I don't yet know just how
>>far this can go. I'm hoping it will become interesting. I intend to
>>feed it a bunch of books from Gutenberg as background, but I'm starting
>>with Alice30.txt (Alice in Wonderland).
>If it turns out you're using this to try to get past spam filters I
>think a lot of us will be very disappointed.
Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.

I basically think of this as a part of an AI project, and as such will
have multiple uses.
E.g., one test of ham is that most of what it contains consists of
reasonable phrases. If it doesn't have reasonable phrases, it's
probably something else. Which, unfortunately, includes programs. So
you'd need a separate recognizer to decide that it was or wasn't a
program. And possibly others.

But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations

Charles_D_Hixson · 19 September 2004 21:43

Charles Hixson wrote:

James Edward Gray II wrote:

Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)

You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.
Hope that helps.
James Edward Gray II

Does that work inside character class definitions( delimited groups)?

Silly of me, of course not. /s *IS* a character class definition.

But this does mean that I won't be able to use /.../x in the code. Still, it's great for clarifying the examples, now that I understand it.

···

On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:

James_Edward_Gray_II · 19 September 2004 21:50

Contrary to what you expect (judging by your later message), it sure does.

[\saeiou]

Will match a whitespace or vowel character.

James Edward Gray II

···

On Sep 19, 2004, at 4:40 PM, Charles Hixson wrote:

Does that work inside character class definitions( delimited groups)?

Charles_D_Hixson · 20 September 2004 19:53

Austin Ziegler wrote:

···

On Tue, 21 Sep 2004 04:08:44 +0900, Charles Hixson ><charleshixsn@earthlink.net> wrote:

Markus wrote:


A...d).


If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.


Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.

I..gram. And possibly others.

But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)

I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.

Two Spam Filters 10 Times As Accurate As Humans - Slashdot

It seems that you're basically doing Markov chain analysis here.

-austin

Well, certainly not formally. But then I haven't gotten well started. Still, there does seem to be a lot of overlap in the "state space". I'll have to remember that for when I get hung up on how to proceed.

Charles_D_Hixson · 19 September 2004 22:27

James Edward Gray II wrote:

···

On Sep 19, 2004, at 4:40 PM, Charles Hixson wrote:

Does that work inside character class definitions( delimited groups)?

Contrary to what you expect (judging by your later message), it sure does.
[\saeiou]
Will match a whitespace or vowel character.
James Edward Gray II

Whuuf! That *is* a surprise! Thanks. That may make some of my regexps much more readable.

David_A_Black3 · 20 September 2004 01:26

Hi --

···

On Mon, 20 Sep 2004, Charles Hixson wrote:

Charles Hixson wrote:

> James Edward Gray II wrote:
>
>> On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:
>>
>>> Sorry. It took me a bit of digging to find the /.../x documentation
>>> even after you explicitly pointed it out to me. (This won't work
>>> for me, because my actual pre- and post- patterns also exclude
>>> spaces, but it can certainly clarify the example, if one understands
>>> it!)
>>
>> You can match space characters in an /.../x regex. The easiest way
>> is to use the whitespace character class escape \s.
>> Hope that helps.
>> James Edward Gray II
>
> Does that work inside character class definitions( delimited groups)?

Silly of me, of course not. /s *IS* a character class definition.

But this does mean that I won't be able to use /.../x in the code.
Still, it's great for clarifying the examples, now that I understand it.

There's no regex without /x that cannot be expressed with /x

David

--
David A. Black
dblack@wobblini.net

Topic		Replies	Views
Parsing a string using multiple regexs ruby-talk	9	144	22 June 2006
Questions of idiom ruby-talk	4	97	8 June 2010
Iterating through a string and removing leading characters ruby-talk	43	219	3 April 2005
String#split converts string args to regexes --? ruby-talk	40	287	12 July 2002
False positives in editing data ruby-talk	38	129	29 November 2007

Method improvement request .--

Related topics