Excessively verbose request for help with regex and arrays

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :slight_smile: still working it out.

so I would like two of the fields of this array to be hashes..
array[0] being a hash and having a numerical value, array[1] being a
hash and having personA or personB value, and then array[2] being a
string. that works, I think...?

wow! I had no idea I knew this much when I started the e-mail. :slight_smile: any
hints/solutions for me to play around with? it's the parentheses of
the regex that kind of has me stuck, mostly, as well as how to deal
with clock arithmetic when it rolls over at midnight, I foresee that
being confusing.

Well, you haven't explained what you really want to do with your data yet, so that all sounds quite a bit complicated. Why not start out with just a simple split on space:

time, speaker, content = text.split ' ', 3

Then you can parse the time:

require 'time'
time = Time.parse time

Cut off the ':' on the speaker:

speaker = speaker.sub(/:$/, '')

and you'll be left with:

p time, speaker, content
Wed Aug 15 20:29:55 -0700 2007
"awhilewhileaway"
"I also need to assemble the cover/back, and figure out the innards of the aimlog formatting/keyword searches"

···

On Aug 15, 2007, at 21:08, Simon Schuster wrote:

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :slight_smile: still working it out.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

I'm not sure I fully understand what you want to do, perhaps you should post a more complete set of data. The part where you describe Arrays of Hashes is a bit confusing as well, can you describe your data structure using code rather than English? The first regexp part is easy enough though:

text = "(20:29:55) awhilewhileaway: I also need to assemble"
text.scan(/(\(.+?\)) (.+?): (.+)/){|time,name,data|
   p time
   p name
   p data
}

This doesn't check for multi-line strings, names with ':' in and other weirdness though. So be careful with real world data.

If you have an Array of text lines then you can just iterate through (I use map below), scan each one and store the data in another Array (no need for Hashes unless I misunderstand you):

irb(main):031:0> text_a = ["(20:29:55) awhilewhileaway: I also need to assemble","(20:39:55) away: I also need embl"]
=> ["(20:29:55) awhilewhileaway: I also need to assemble", "(20:39:55) away: I also need embl"]
irb(main):032:0> res = text_a.map{|l| l.scan(/(\(.+?\)) (.+?): (.+)/)[0]}
=> [["(20:29:55)", "awhilewhileaway", "I also need to assemble"], ["(20:39:55)", "away", "I also need embl"]]
irb(main):033:0> res[0]
=> ["(20:29:55)", "awhilewhileaway", "I also need to assemble"]
irb(main):034:0> res[0][0]
=> "(20:29:55)"

Hope that helps.

Alex Gutteridge

Bioinformatics Center
Kyoto University

···

On 16 Aug 2007, at 13:08, Simon Schuster wrote:

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :slight_smile: still working it out.

so I would like two of the fields of this array to be hashes..
array[0] being a hash and having a numerical value, array[1] being a
hash and having personA or personB value, and then array[2] being a
string. that works, I think...?

wow! I had no idea I knew this much when I started the e-mail. :slight_smile: any
hints/solutions for me to play around with? it's the parentheses of
the regex that kind of has me stuck, mostly, as well as how to deal
with clock arithmetic when it rolls over at midnight, I foresee that
being confusing.

thanks, but since the time is only going to be used for arithmetic
parsing it for additional information isn't helpful, and the roll-over
will be problematic.

(23:54:45) - (00:03:45) != 00:09:00

as for the use of the data, basically, at this stage, I'm working on
formatting aimlogs into "bookish" dialogue, with an eventual goal of
utilizing lulu.com's API to generate books behind my back. :smiley: maybe
thinking about making a gaim plugin if it turns out, with many more
ideas for what else I could do, but not nearly the ruby rigors I need
to actualize them (yet!!) :stuck_out_tongue:

···

Well, you haven't explained what you really want to do with your data
yet, so that all sounds quite a bit complicated. Why not start out
with just a simple split on space:

time, speaker, content = text.split ' ', 3

Then you can parse the time:

require 'time'
time = Time.parse time

Cut off the ':' on the speaker:

speaker = speaker.sub(/:$/, '')

and you'll be left with:

p time, speaker, content
Wed Aug 15 20:29:55 -0700 2007
"awhilewhileaway"
"I also need to assemble the cover/back, and figure out the innards
of the aimlog formatting/keyword searches"

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

Does this help?

Parse the two dates like Eric suggested. If the second (later) time is less than the first then add a day to it (60*60*24 seconds). Then subtract one from the other to get the difference in seconds.

irb(main):023:0> t1 = Time.parse('23:54:45')
=> Thu Aug 16 23:54:45 +0900 2007
irb(main):024:0> t2 = Time.parse('00:03:45')
=> Thu Aug 16 00:03:45 +0900 2007
irb(main):025:0> t2 += (60 * 60 * 24) if t2 < t1
=> Fri Aug 17 00:03:45 +0900 2007
irb(main):026:0> diff = t2 - t1
=> 540.0

Alex Gutteridge

Bioinformatics Center
Kyoto University

···

On 16 Aug 2007, at 13:46, Simon Schuster wrote:

thanks, but since the time is only going to be used for arithmetic
parsing it for additional information isn't helpful, and the roll-over
will be problematic.

(23:54:45) - (00:03:45) != 00:09:00

yes, this helps a lot! I should have assumed that parsing the time
would enable arithmetic like Fri - Thurs, instead I assumed I'd have
to put it all to integers.

I will have to think over more of exactly what I'm trying to do with
the arrays/hashes, after I read more about hashes, I think. thanks!

···

On 8/15/07, Alex Gutteridge <alexg@kuicr.kyoto-u.ac.jp> wrote:

On 16 Aug 2007, at 13:46, Simon Schuster wrote:

> thanks, but since the time is only going to be used for arithmetic
> parsing it for additional information isn't helpful, and the roll-over
> will be problematic.
>
> (23:54:45) - (00:03:45) != 00:09:00

Does this help?

Parse the two dates like Eric suggested. If the second (later) time
is less than the first then add a day to it (60*60*24 seconds). Then
subtract one from the other to get the difference in seconds.

irb(main):023:0> t1 = Time.parse('23:54:45')
=> Thu Aug 16 23:54:45 +0900 2007
irb(main):024:0> t2 = Time.parse('00:03:45')
=> Thu Aug 16 00:03:45 +0900 2007
irb(main):025:0> t2 += (60 * 60 * 24) if t2 < t1
=> Fri Aug 17 00:03:45 +0900 2007
irb(main):026:0> diff = t2 - t1
=> 540.0

Alex Gutteridge

Bioinformatics Center
Kyoto University