That would only work if the postmatch pattern were included in the same pattern as the prematch, thus:
if chunk =~ /^([^-A-Za-z0-9.]*)(.*?)[^-A-Za-z0-9.]*)$/
I did include all three in the same pattern, I just used the whitespace and comment modifier to pretty it up a bit.
Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)
----- Original Message -----
From: "Markus" <markus@reality.com>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Monday, September 20, 2004 12:27 AM
Subject: Re: Nuby question about symbols
I'll give it a shot:
* Symbols are an idea borrowed from lisp. They are immutable,
atomic, named, globally unique values with an efficient internal
storage.
* immutable, like (say) nil or an integer, in that you
can't change them, give them new values, update them,
etc.
* atomic in that you can't "take apart" a symbol or "peek
inside it" like you can with a string
* named, in that each symbol has a human-readable form
(unlike, say, pointers)
* globally unique in that if a symbol is referenced
anywhere in the program it is the same object that the
same reference would get you anywhere else. This is the
same way integers work (7 is 7, no matter where it
occurs in the program), but unlike how arrays and
strings work (you can, for example, have the string
"seven" in several places in your program, and they are
NOT the same object).
* efficient in that they are usually implemented as
something like an integer or a pointer, and thus are
quick to compare, small to store, etc.
* Symbols are used wherever they are useful.
* Symbols fill a roll in ruby (and in lisp) something like
enumerated types in pascal--in fact, if you single
imagined a pre-existing enumerated type containing all
possible values, that would work sort of like symbols.
* Symbols can be used for arbitrary state or condition
labels (e.g. :male/:female, :jan, :feb, :mar...,
:on,:off,:standby,:out_of_service,... :reverse,:neutral,
:first,:second,:third,:overdrive etc.)
* Symbols can be used as "exceptional" values (e.g.
:not_a_number, :to_be_determined, etc.) much as nil or
-1 often are, but in a way that is much easier to read.
They are much more efficient than strings, which are
often also used in such contexts
If that doesn't help, let me know and I'll try to dredge up some online
references--or you can always google.
-- MarkusQ
On Sun, 2004-09-19 at 18:50, Felipe Malta de Oliveira wrote:
> In spite of the recent comments stating that newbies should not be
afraid to
> post silly questions, I now ask...
>
> Could anybody give me a little knowledge about symbols? Like what they
are,
> why and where they're used and such...Or give me a pointer to somewhere
I
> can find that information?
>
> Thanks a lot,
>
> Felipe
>
At a later stage it will start babbling using reasonable phrases as
chunks, and transitionsing from phrase to phrase based on some kind of
statistical relationship. Still later...well, I don't yet know just how
far this can go. I'm hoping it will become interesting. I intend to
feed it a bunch of books from Gutenberg as background, but I'm starting
with Alice30.txt (Alice in Wonderland).
If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.
You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.
Hope that helps.
James Edward Gray II
···
On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:
Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)
At a later stage it will start babbling using reasonable phrases as chunks, and transitionsing from phrase to phrase based on some kind of statistical relationship. Still later...well, I don't yet know just how far this can go. I'm hoping it will become interesting. I intend to feed it a bunch of books from Gutenberg as background, but I'm starting with Alice30.txt (Alice in Wonderland).
If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.
-- Markus
Well, since I plan to eventually release full sources...that may well happen if it's successful. Then again, it could probably also be used to sort spam from ham.
I basically think of this as a part of an AI project, and as such will have multiple uses. E.g., one test of ham is that most of what it contains consists of reasonable phrases. If it doesn't have reasonable phrases, it's probably something else. Which, unfortunately, includes programs. So you'd need a separate recognizer to decide that it was or wasn't a program. And possibly others.
But the spam/ham problem is an arms race. I suspect that a final answer is impossible this side of individually tailored filters. Bayes is already a start at this, but it's just a start. To be really effective the filter will need to dip into the semantic level. (So far I'm pretty much staying at the syntactic level, because it's more tractable...but semantics will need to be added.)
On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:
Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)
You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.
Hope that helps.
James Edward Gray II
Does that work inside character class definitions( delimited groups)?
I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.
It seems that you're basically doing Markov chain analysis here.
-austin
···
On Tue, 21 Sep 2004 04:08:44 +0900, Charles Hixson <charleshixsn@earthlink.net> wrote:
Markus wrote:
>>At a later stage it will start babbling using reasonable phrases as
>>chunks, and transitionsing from phrase to phrase based on some kind of
>>statistical relationship. Still later...well, I don't yet know just how
>>far this can go. I'm hoping it will become interesting. I intend to
>>feed it a bunch of books from Gutenberg as background, but I'm starting
>>with Alice30.txt (Alice in Wonderland).
>If it turns out you're using this to try to get past spam filters I
>think a lot of us will be very disappointed.
Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.
I basically think of this as a part of an AI project, and as such will
have multiple uses.
E.g., one test of ham is that most of what it contains consists of
reasonable phrases. If it doesn't have reasonable phrases, it's
probably something else. Which, unfortunately, includes programs. So
you'd need a separate recognizer to decide that it was or wasn't a
program. And possibly others.
But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations
Sorry. It took me a bit of digging to find the /.../x documentation even after you explicitly pointed it out to me. (This won't work for me, because my actual pre- and post- patterns also exclude spaces, but it can certainly clarify the example, if one understands it!)
You can match space characters in an /.../x regex. The easiest way is to use the whitespace character class escape \s.
Hope that helps.
James Edward Gray II
Does that work inside character class definitions( delimited groups)?
Silly of me, of course not. /s *IS* a character class definition.
But this does mean that I won't be able to use /.../x in the code. Still, it's great for clarifying the examples, now that I understand it.
···
On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:
On Tue, 21 Sep 2004 04:08:44 +0900, Charles Hixson ><charleshixsn@earthlink.net> wrote:
Markus wrote:
A...d).
If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.
Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.
I..gram. And possibly others.
But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)
I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.
It seems that you're basically doing Markov chain analysis here.
-austin
Well, certainly not formally. But then I haven't gotten well started. Still, there does seem to be a lot of overlap in the "state space". I'll have to remember that for when I get hung up on how to proceed.
> James Edward Gray II wrote:
>
>> On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:
>>
>>> Sorry. It took me a bit of digging to find the /.../x documentation
>>> even after you explicitly pointed it out to me. (This won't work
>>> for me, because my actual pre- and post- patterns also exclude
>>> spaces, but it can certainly clarify the example, if one understands
>>> it!)
>>
>> You can match space characters in an /.../x regex. The easiest way
>> is to use the whitespace character class escape \s.
>> Hope that helps.
>> James Edward Gray II
>
> Does that work inside character class definitions( delimited groups)?
Silly of me, of course not. /s *IS* a character class definition.
But this does mean that I won't be able to use /.../x in the code.
Still, it's great for clarifying the examples, now that I understand it.
There's no regex without /x that cannot be expressed with /x