Counting Tabs and splitting by that number

Nick_Bo · 28 September 2008 21:52

Basically i have a document which I am opening and then i am reading
each line of the file and having to split it up into two arrays and then
into a hash in which i have to get some sort of output like this:

application/activemessage has no extensions
application/andrew-inset has extensions ez
application/applefile has no extensions
application/atom has extensions atom
application/atomcat+xml has extensions atomcat
application/atomicmail has no extensions
application/atomserv+xml has extensions atomsrv
application/batch-SMTP has no extensions
application/beep+xml has no extensions
application/cals-1840 has no extensions

I have determined that if there are no tabs in the document then the
file has no extension so what i did was an if statement in the beginning
to see if the line contained the tab if not then it would save false to
the position in the array that i was at in the each loop.

file.each_line do |line|
        next if line[0] == ?#
        next if line == "\n"
        string = line
        if string.include?("\t") == false
            mimeValue[i] = false
            mimeKey[i]=string.split
        else

#THIS IS WHERE MY ISSUE IS NOW
mimeKey[i], mimeValue[i] = string.split("\t\t\t")
end

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1. So I am in a rut now How do i determine how many tabs are in
the line(string variable) thus so i can split the two parts into their
appropriate arrays. I was thinking I could do some kind of recurssion
which would test to see if tab and if so then add 1 to count and then be
able to do something like

mimeKey[i], mimeValue[i] = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

···

--
Posted via http://www.ruby-forum.com/.

Siep_Korteling · 28 September 2008 22:45

Nick Bo wrote:

Basically i have a document which I am opening and then i am reading

(...)

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Split on \t anyway and dump all empty results, like this:

str = 'beep+xml\t\t\t atom'
res = str.split('\t').reject{|item|item.empty?}
p res

hth,

Siep

···

--
Posted via http://www.ruby-forum.com/\.

Brabuhr · 28 September 2008 22:53

Your tabs are consecutive and you don't actually care how many there are?
string.split(/\t+/)
?

···

On Sun, Sep 28, 2008 at 5:52 PM, Nick Bo <bornemann1@nku.edu> wrote:

#THIS IS WHERE MY ISSUE IS NOW
mimeKey[i], mimeValue[i] = string.split("\t\t\t")

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1.

mimeKey[i], mimeValue[i] = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Nick_Bo · 28 September 2008 23:05

incorrect if i do it that way then if i have 5 tabs in between the two
parts i want to separate then i get 4 blank arrays. giving me a total of
6 arrays.
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.

···

--
Posted via http://www.ruby-forum.com/.

Bill_Kelly · 28 September 2008 23:18

eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont matche the pattern given to the split at all so it makes whole thing part of the array.

Huh?

eg = "abcdefg \t\t\t\t\t hi"

=> "abcdefg \t\t\t\t\t hi"

eg.split(/\t+/)

=> ["abcdefg ", " hi"]

Regards,

Bill

···

From: "Nick Bo" <bornemann1@nku.edu>

Nick_Bo · 28 September 2008 23:37

Bill Kelly wrote:

From: "Nick Bo" <bornemann1@nku.edu>

eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.

Huh?

eg = "abcdefg \t\t\t\t\t hi"

=> "abcdefg \t\t\t\t\t hi"

eg.split(/\t+/)

=> ["abcdefg ", " hi"]

Regards,

Bill

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

loop
arrayKey[i] = splitArray[0]
arrayValue[i] = splitArray[1]

Thanks for everyones help

···

--
Posted via http://www.ruby-forum.com/\.

Mark_Thomas · 29 September 2008 14:44

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

IMO, the regex solution is better

splitArray = eg.split(/\t+/)

I think you put it in quotes. Leave the quotes out.

-- Mark.

Topic		Replies	Views
Trouble Counting Words, Sentences and Paragraphs ruby-talk	6	107	22 July 2009
File & split ruby-talk	0	72	1 May 2007
Hash ruby-talk	6	71	4 December 2007
Need a hash/iteration tutorial...text reading ruby-talk	8	112	20 June 2009
Data count problem in array ruby-talk	2	99	12 October 2007

Counting Tabs and splitting by that number

Related Topics