New guy... Intoduction and first question on some direction

Oscar_Gon · 8 December 2005 18:35

Hi everyone. I'm new to these forums. I am sysadmin in California and
I'm learning Ruby. I've been working on an automated web application
testing using WATIR and I really like this language. Its the first one
I've actually contiuned learning after the "hello world" example.

It even made me want to write something on my own outside of work and
here's the basics of the project... I just don't even know where to
start. I can't seem to be searching for the right terms thus I can't
find modules that would help me out.

-- sorry for the lenght of the post --

I want to merge 2 data txt files together.
- Each file has sections and subsections.
- Each section and subsection has data that may or may not be on both
files.
- The data that is in both files may be slightly different and in this
case I need it to be "magically merged together" if its within certain
arbitrary range
- The data that is in both files that is outside of the range I mention
above needs to be considered "new" data for the resulting file...
- I'm not good with reg expressions but I can learn if that is part of
the solution.

I was going to post 2 samples of the files I want to merge but the post
would have been over 4 pages long! Do you guys think the "needs" I've
posted are enough to point me in the right direction?

···

--
Posted via http://www.ruby-forum.com/.

Robert · 8 December 2005 18:47

Oscar Gonzalez wrote:

Hi everyone. I'm new to these forums. I am sysadmin in California and
I'm learning Ruby. I've been working on an automated web application
testing using WATIR and I really like this language. Its the first one
I've actually contiuned learning after the "hello world" example.

That's great news! Welcom aboard.

It even made me want to write something on my own outside of work and
here's the basics of the project... I just don't even know where to
start. I can't seem to be searching for the right terms thus I can't
find modules that would help me out.

-- sorry for the lenght of the post --

I want to merge 2 data txt files together.
- Each file has sections and subsections.
- Each section and subsection has data that may or may not be on both
files.
- The data that is in both files may be slightly different and in this
case I need it to be "magically merged together" if its within certain
arbitrary range
- The data that is in both files that is outside of the range I
mention above needs to be considered "new" data for the resulting
file... - I'm not good with reg expressions but I can learn if that
is part of the solution.

I was going to post 2 samples of the files I want to merge but the
post would have been over 4 pages long! Do you guys think the "needs"
I've posted are enough to point me in the right direction?

It's difficult to help out with some hints as we don't know much yet.
Probably just post as much from those files so we can recognize how
sections and subsections are recognized.

From what I know so far: you might want to have classes Section and
SubSection with obvious meaning. I don't know whether there's some kind
of optimization possible with your data but in the worst case you'll have
O(n*m) effor to compare all possible pairs of SubSections. Also the
algorithm to decide whether they are close or not might be tricky.

Kind regards

robert

ako · 8 December 2005 18:52

hello,

1. what should happen if the files have different structure (different
set of sections/subsections)?

2. please define "data", "magically merged together", "range".

konstantin

Oscar_Gon · 8 December 2005 19:13

akonsu wrote:

hello,

1. what should happen if the files have different structure (different
set of sections/subsections)?

2. please define "data", "magically merged together", "range".

Thanks for the responses guys... Here's a little more info based on
them.

The sections and subsections and data are defined by {} and and
values... for example.

DataGroup = {
     [1] = {
           [1] = {
                 ["dataidentifier"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 45.5,
                              ["count"] = 1,
                              ["image"] = 4,
                              ["y"] = 18.8,
                          },
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 21.5,
                              ["count"] = 5,
                              ["image"] = 4,
                              ["y"] = 31.8,
                          },
             },
           [2] = {
                 ["dataidentifier2"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 74.5,
                              ["count"] = 1,
                              ["image"] = 3,
                              ["y"] = 11.8,
                          },
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 27.5,
                              ["count"] = 5,
                              ["image"] = 3,
                              ["y"] = 36.8,
                          },
             },

···

-----------------

Disregard the last piece of my post when I said I wanted to merge data
within a range. After reviewing what I want to do, this is no longer the
case. I only want to merge data if the value is the same.

take for example the "x" and "y" values above... For the first section
where there is "dataidentifier", both of my files have that section... I
want that if the "x" and "y" values are the same then to just add the
values under "count" If the "x" and "y" values are different then I just
need to have the resulting file showing the section with each of the
dataidentifier data for each of the x and y. Obviously these "x" and
"y" values are coordinates.

Maybe if I explain where I'm coming from it will make more sense. Say
I'm looking for widgets at x and y, and I need to record how many I find
and where.

Scenario 1.
I find 2 of the widgets at 15,18. That gets recorded into file 1.
I find 1 of the widgets at the same location, 15,18 These get recorded
into file 2.

Scenario 2.
I find 2 widgets at location 21,30. This goes into File 1
I find 2 contraptions at location 10,23. This goes into File 1
I find 3 widgets at location 13,40. This goes into File 2

On scenario 1, I want the resulting file to show that I found a total of
3 items at location 15,18.

On scenario 2, I want the resulting file to show that I found 2 widgets
at location 21,30, 3 widgets at location 13,40 and 2 contraptions at
location 10,23.

Is what I want to do better explained now? There is obviously the
complication (I think) of the coordinates having decimal points. I
can't aovid this. I also can't avoid writing the 2 files, this is why I
am working on this. I know it would be ideal to have just a single data
file where everything gets written to but this is beyond my control for
this project...

I really appreciate how fast you guys replied and I hope I'm helping you
help me.

--
Posted via http://www.ruby-forum.com/\.

ako · 8 December 2005 20:07

i would write a parser for these files. represent the contents as a set
of hashes/arrays. if you have control over the format of the files, you
might want to make them simpler so that you won't have to write a real
parser.

Steve_Litt · 8 December 2005 21:49

If you can count on indentation like you have above, the easy way might be to
run it through the OutlineParser object of Node.rb
(http://www.troubleshooters.com/projects/Node.rb/index.htm\). Once the data is
in a Node tree instead of a file, you can use Walker objects and simple
callbacks to put massage the data and then output it in any form you'd like,
including XML or SQL.

If you cannot count on the indentation, you could remove all indentation with
a simple sed script, then run a Ruby program to convert every opening brace
to a new level of indentation and convert ever closing brace to a previous
level of indentation, and then use that conversion through Node.rb's parser.

SteveT

Steve Litt
http://www.troubleshooters.com
slitt@troubleshooters.com

···

On Thursday 08 December 2005 02:13 pm, Oscar Gonzalez wrote:

akonsu wrote:
> hello,
>
> 1. what should happen if the files have different structure (different
> set of sections/subsections)?
>
> 2. please define "data", "magically merged together", "range".

Thanks for the responses guys... Here's a little more info based on
them.

The sections and subsections and data are defined by {} and and
values... for example.

DataGroup = {
     [1] = {
           [1] = {
                 ["dataidentifier"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 45.5,
                              ["count"] = 1,
                              ["image"] = 4,
                              ["y"] = 18.8,
                          },
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 21.5,
                              ["count"] = 5,
                              ["image"] = 4,
                              ["y"] = 31.8,
                          },
             },
           [2] = {
                 ["dataidentifier2"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 74.5,
                              ["count"] = 1,
                              ["image"] = 3,
                              ["y"] = 11.8,
                          },
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 27.5,
                              ["count"] = 5,
                              ["image"] = 3,
                              ["y"] = 36.8,
                          },
             },

Bill_Guindon1 · 8 December 2005 22:19

akonsu wrote:

How accurate is this example? Just wondering if the mockup has
copy/paste errrors. more below...

DataGroup = {
     [1] = {
           [1] = {
                 ["dataidentifier"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 45.5,
                              ["count"] = 1,
                              ["image"] = 4,
                              ["y"] = 18.8,
                          },
                        [1] = {

Does the [1] really repeat here, or should this be [2] (or some other number)?

                              ["type"] = 1,
                              ["x"] = 21.5,
                              ["count"] = 5,
                              ["image"] = 4,
                              ["y"] = 31.8,
                          },

Should there be a '}' here to close off the 'dataidentifier'?

             },
           [2] = {
                 ["dataidentifier2"] = {
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 74.5,
                              ["count"] = 1,
                              ["image"] = 3,
                              ["y"] = 11.8,
                          },
                        [1] = {
                              ["type"] = 1,
                              ["x"] = 27.5,
                              ["count"] = 5,
                              ["image"] = 3,
                              ["y"] = 36.8,
                          },
             },

I'm assuming any missing '}' here would be at the end of the file.

If my guesses are right, it wouldn't be too tough to convert this
quickly with something along the lines of:

require 'pp'

text = File.read('some.log')
text.gsub!(/Datagroup = /, '')
text.gsub!(/\["?(.*?)"?\] =/, '"\1" =>')

datagroup = eval(text)

pp datagroup

···

On 12/8/05, Oscar Gonzalez <rakxzo@gmail.com> wrote:

--
Bill Guindon (aka aGorilla)

Oscar_Gon · 8 December 2005 21:31

akonsu wrote:

i would write a parser for these files. represent the contents as a set
of hashes/arrays. if you have control over the format of the files, you
might want to make them simpler so that you won't have to write a real
parser.

Well I do'nt have control over the format of the files...

I'll try to look into the parser thing... the thing is this is the first
time I do any real coding so I don't even know what to look for in the
ruby libraries to help me with this... What modules are out there that
can help me or what are some keywords I should use to search for this.
And are there any ruby parsers out there that I can look at? And this
seems like a complex project... am I taking on too big of a project for
a beginner?

···

--
Posted via http://www.ruby-forum.com/\.

Oscar_Gon · 8 December 2005 23:39

Well thats a lot of info so I have to digest on it. I'll post back as
soon as I have a better grasp of your responses... However I do not want
to use Lua for this because I want to learn Ruby... I don't think its a
problem that the data is in Lua syntax, from what I can see, it doesnt
matter what format the data is in. It seems to be a matter of finding a
pattern and being able to merge the data from two files into one.

For the purpose of being accurate on the smple, I've posted the file on
my site so maybe if you see the actual file I'm working with you'll get
a better idea of what I want.

http://www.muychingon.com/gatherer.txt

···

--
Posted via http://www.ruby-forum.com/.

Martin_DeMello1 · 9 December 2005 14:47

Very nice piece of software indeed.

martin

···

Steve Litt <slitt@earthlink.net> wrote:

If you can count on indentation like you have above, the easy way might be to
run it through the OutlineParser object of Node.rb
(http://www.troubleshooters.com/projects/Node.rb/index.htm\). Once the data is

Oscar_Gon · 8 December 2005 21:34

Well I do'nt have control over the format of the files...

If it helps at all... I believe the syntax of the files I'm parsing is
Lua based.

···

--
Posted via http://www.ruby-forum.com/\.

ako · 8 December 2005 21:57

a parser is a big project for a beginner. parsing is a process of
translating a text stream in to a memory representation of the contents
of the stream. to do that, you will have to be able to split the stream
in to chunks called tokens, and then check if the combination of these
tokens is valid, that is if this combination corresponds to the so
called grammar for your language. there is a theory behind all that. if
your file was simpler and for example had each line precisely
identifying a data item like this for example:

/1/1/dataidentifier/1/type = 1

then you could just scan the file line by line and get all you need.
there are tools used to generate parsers. the original ones are called
lex, and yacc. lex would split your stream in to tokens, and yacc would
check if the resulting sequence of tokens satisfies the grammar. i am
not sure if there are parser generators for ruby, although it is
comparatively easy to write them because they are based on a sound
theory.

hope this helps.
konstantin

Bill_Guindon1 · 9 December 2005 00:50

Ok, seems I was right about the format. Don't peek if you want to
solve it on your own

http://www.mvgo.com/anarchy/lua.rb.txt

a fun little ruby quiz (I _think_ I got it right).

···

On 12/8/05, Oscar Gonzalez <rakxzo@gmail.com> wrote:

Well thats a lot of info so I have to digest on it. I'll post back as
soon as I have a better grasp of your responses... However I do not want
to use Lua for this because I want to learn Ruby... I don't think its a
problem that the data is in Lua syntax, from what I can see, it doesnt
matter what format the data is in. It seems to be a matter of finding a
pattern and being able to merge the data from two files into one.

For the purpose of being accurate on the smple, I've posted the file on
my site so maybe if you see the actual file I'm working with you'll get
a better idea of what I want.

http://www.muychingon.com/gatherer.txt

--
Bill Guindon (aka aGorilla)

Steve_Litt · 9 December 2005 02:27

Below my sig is a 45 line program using Node.rb that converts the file into
Node objects, each with a name and value. You can see how Walker objects and
callback routines work. In order to output your chosen format (which I didn't
completely understand), you'd need to create probably a couple more Walkers
and a couple more callback routines.

This program assumes consistent indentation. If that cannot be assumed, you
need to either do something else (maybe what Bill Guindon suggested), or
create a tiny brace to indent converter and then run the result through my
program.

HTH

SteveT

Steve Litt

slitt@troubleshooters.com

#!/usr/bin/ruby
require "Node.rb"

class Callbacks
  def cb_look_data(checker, level)
    print "\t" * level
    print "Name = ", checker.name
    print ", Value = " , checker.value unless checker.firstchild
    print "\n"
  end

  def cb_get_fields(checker, level)
    if checker.value =~ /\s*}/
      checker.deleteSelf()
    end
    checker.value.gsub!(/,\s*$/, "")
    checker.value.strip!
    checker.value =~ /\[([^\]]*)\]/
    checker.name = $1 if $1
    if level == 1
      checker.value =~ /(.*)\s*=/
      checker.name = $1 if $1
    end

    checker.value =~ /=\s*(.*)/
    checker.value = $1 if $1
    checker.value = "" if checker.value == "{"
  end
end

cb = Callbacks.new() # INSTANTIATE CALLBACKS OBJECT

#### PARSE THE FILE
parser = OutlineParser.new()
head = parser.parse("/home/slitt/gatherer.txt")

#### PARSE THE NODE TREE NODES INTO NAME AND VALUE FIELDS
walker = Walker.new(head, cb.method(:cb_get_fields), nil)
walker.walk()

#### PRINT THE NAME FIELDS FOR CONTAINERS,
#### AND NAME AND VALUE FIELDS FOR LEAF LEVELS
walker = Walker.new(head, cb.method(:cb_look_data), nil)
walker.walk()

···

On Thursday 08 December 2005 06:39 pm, Oscar Gonzalez wrote:

Well thats a lot of info so I have to digest on it. I'll post back as
soon as I have a better grasp of your responses... However I do not want
to use Lua for this because I want to learn Ruby... I don't think its a
problem that the data is in Lua syntax, from what I can see, it doesnt
matter what format the data is in. It seems to be a matter of finding a
pattern and being able to merge the data from two files into one.

For the purpose of being accurate on the smple, I've posted the file on
my site so maybe if you see the actual file I'm working with you'll get
a better idea of what I want.

http://www.muychingon.com/gatherer.txt

Steve_Litt · 9 December 2005 15:06

Thanks Martin,

It should be nice. I've written it in three different languages so far

I use VimOutliner (http://www.vimoutliner.org) to create tab indented
outlines, and find that Node.[pm py rb] makes processing outlines trivial for
substantial jobs, and doable for arduous ones (like converting an outline
into a menu system).

Thanks for the compliment.

SteveT

Steve Litt

slitt@troubleshooters.com

···

On Friday 09 December 2005 09:47 am, Martin DeMello wrote:

Steve Litt <slitt@earthlink.net> wrote:
> If you can count on indentation like you have above, the easy way might
> be to run it through the OutlineParser object of Node.rb
> (http://www.troubleshooters.com/projects/Node.rb/index.htm\). Once the
> data is

Very nice piece of software indeed.

martin

Logan_Capaldo · 8 December 2005 22:12

Perhaps this is a matter of the right tool for the job. Maybe you want to consider using Lua for this project if they really are Lua files.

···

On Dec 8, 2005, at 4:34 PM, Oscar Gonzalez wrote:

Well I do'nt have control over the format of the files...

If it helps at all... I believe the syntax of the files I'm parsing is
Lua based.

-- Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Ruby WATIR ruby-talk	6	85	24 April 2008
Ruby & Watir ruby-talk	11	84	12 August 2009
Would like to learn ruby step by step ruby-talk	3	120	6 February 2008
New to Ruby Need a little help from the pros ruby-talk	2	146	21 May 2012
Newbie questions ruby-talk	1	63	16 December 2003

New guy... Intoduction and first question on some direction

Related topics