Extraction of single subarrays from multidimensional array

Hi there,

     I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> ["tcg", "agt", "tct", "agc", "tca", "tcc"]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

I tried iterations like:

v"#{n}"= ss.clone(n) do |n|
end
or

ss(n).each do |n|
v"#{n}" = ss.clone(n)
end

with no success.
Any help is greatly appreciated. Thanks.

-- Maurizio

You're working too hard. The most simple way usually works. Just

ss.each do |codon_cluster|
  #do something with codon_cluster, like;
  p codon_cluster
end

is enough.
Since all these strings are different objects (eating memory) you might
want to convert them into symbols (google them) as soon as possible.

···

--
Posted via http://www.ruby-forum.com/.

      I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

Btw, I assume you do bioinformatics. If all your Arrays contain three letter sequences you should probably change entries to Symbols (there are only 4 ^ 3 = 64 of them).

I'd probably replace these Arrays by a custom class for handling sequences and work with that. Then you can optimize internal representation (e.g. use a Fixnum to code a three letter sequence) to save even more memory. I believe there are libraries for bioinformatics out there which probably do exactly that.

What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> ["tcg", "agt", "tct", "agc", "tca", "tcc"]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

Do you actually need a copy or do you want to reference the original? If you need the original here's the simplest approach

a, b, c = *ss

For copy you can do (in 1.9.*)

a, b, c = *ss.map &:clone # 1.9.*
a, b, c = *ss.map {|x| x.clone} # 1.8.6 and earlier

Note that then you still share String instances! So if you want to manipulate individual strings you need to take a different approach (e.g.)

a, b, c = *ss.map {|arr| arr.map {|s| s.dup}}
a, b, c = *Marshal.load(Marshal.dump(ss))

I tried iterations like:

  v"#{n}"= ss.clone(n) do |n|
end
or

Apart from that it does not work, where's the point in creating variables with calculated names with indexes if you can do indexed access via the Array already? That does not seem like a viable approach.

Kind regards

  robert

···

On 26.10.2010 01:01, Maurizio Cirilli wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks a lot Siep for your prompt help.

-- Maurizio

···

On Oct 26, 1:41 am, Siep Korteling <s.kortel...@gmail.com> wrote:

You're working too hard. The most simple way usually works. Just

ss.each do |codon_cluster|
#do something with codon_cluster, like;
p codon_cluster
end

is enough.
Since all these strings are different objects (eating memory) you might
want to convert them into symbols (google them) as soon as possible.

--
Posted viahttp://www.ruby-forum.com/.

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

-- Maurizio

···

On Oct 26, 8:29 am, Robert Klemme <shortcut...@googlemail.com> wrote:

On 26.10.2010 01:01, Maurizio Cirilli wrote:

> I am a new member of this group and a newbie about Ruby. After a
> not successful and
> extensive search on this topic, I ask your help solving a problem in
> picking out single subarrays
> from a multidimesional array.
> In short, I somehow stored the following m-array (single strings are
> DNA codons):
> ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
> ["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

Btw, I assume you do bioinformatics. If all your Arrays contain three
letter sequences you should probably change entries to Symbols (there
are only 4 ^ 3 = 64 of them).

I'd probably replace these Arrays by a custom class for handling
sequences and work with that. Then you can optimize internal
representation (e.g. use a Fixnum to code a three letter sequence) to
save even more memory. I believe there are libraries for bioinformatics
out there which probably do exactly that.

> What I would like to get are the separated arrays s[0], s[1] and s[2]
> by iteration over array ss.
> The method array.clone looks perfect for this aim:

> irb(main):039:0> v0 = ss[0].clone
> => ["tcg", "agt", "tct", "agc", "tca", "tcc"]

> but I did not find the right way to iterate this method over the m-
> array and get indexed subarrays.

Do you actually need a copy or do you want to reference the original?
If you need the original here's the simplest approach

a, b, c = *ss

For copy you can do (in 1.9.*)

a, b, c = *ss.map &:clone # 1.9.*
a, b, c = *ss.map {|x| x.clone} # 1.8.6 and earlier

Note that then you still share String instances! So if you want to
manipulate individual strings you need to take a different approach (e.g.)

a, b, c = *ss.map {|arr| arr.map {|s| s.dup}}
a, b, c = *Marshal.load(Marshal.dump(ss))

> I tried iterations like:

> v"#{n}"= ss.clone(n) do |n|
> end
> or

Apart from that it does not work, where's the point in creating
variables with calculated names with indexes if you can do indexed
access via the Array already? That does not seem like a viable approach.

Kind regards

    robert

--
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/

This is simpler.

a, b, c = ss

···

On Oct 26, 1:29 am, Robert Klemme <shortcut...@googlemail.com> wrote:

Do you actually need a copy or do you want to reference the original?
If you need the original here's the simplest approach

a, b, c = *ss

It's usually called the splat operator, and its function in the above
expression is to take the array elements one by one and use them in
the parallel assigment, so that the first element is assigned to a,
the second to b, the third to c, and any other is discarded.

It's also used to collect the rest of the parameters in an assigment
or in a method call:

irb(main):001:0> ss = [1,2,3,4,5]
=> [1, 2, 3, 4, 5]
irb(main):002:0> a,b,c = *ss
=> [1, 2, 3, 4, 5]
irb(main):003:0> a
=> 1
irb(main):004:0> b
=> 2
irb(main):005:0> c
=> 3
irb(main):006:0> a,b,*c = *ss
=> [1, 2, 3, 4, 5]
irb(main):007:0> a
=> 1
irb(main):008:0> b
=> 2
irb(main):009:0> c
=> [3, 4, 5]
irb(main):010:0> def test a,b,*c
irb(main):011:1> p [a,b,c]
irb(main):012:1> end
=> nil
irb(main):013:0> test 1,2,3,4,5,6
[1, 2, [3, 4, 5, 6]]

Jesus.

···

On Tue, Oct 26, 2010 at 4:05 PM, Maurizio Cirilli <mauricirl@gmail.com> wrote:

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.

There's an explanation of *array in the online Programming Ruby, probably in
the sections on assignment and/or method calls: I did think about searching
for it, but the link below looks as though it has a reasonable explanation.
Subject to correction by anyone more knowledgeable than me, the second
statement below (extracted from the linked page) also applies to assignment,
so you can do something like:
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment statement
is an array, and there are two or more variables on the left hand side of
the assignment statement, then Ruby automatically expands the array for you,
so you can omit the "*" operator if you want to..

http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Method_Calls
...
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which is
sometimes called the 'splat' operator. This indicates that more parameters
may be passed to the function. Those parameters are collected up and an
array is created.
...
The asterisk operator may also precede an Array argument in a method call.
In this case the Array will be expanded and the values passed in as if they
were separated by commas.
...

···

On Tue, Oct 26, 2010 at 3:05 PM, Maurizio Cirilli <mauricirl@gmail.com>wrote:

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!) :slight_smile:

Thanks again.

- Maurizio

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

There's an explanation of *array in the online Programming Ruby, probably in
the sections on assignment and/or method calls: I did think about searching
for it, but the link below looks as though it has a reasonable explanation.
Subject to correction by anyone more knowledgeable than me, the second
statement below (extracted from the linked page) also applies to assignment,
so you can do something like:
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment statement
is an array, and there are two or more variables on the left hand side of
the assignment statement, then Ruby automatically expands the array for you,
so you can omit the "*" operator if you want to..

It even works with one variable to the left - but then you need a comma:

09:11:30 ~$ ruby19 -e 'a=%w{foo bar baz};b,=a;p b'
"foo"

While splat alone does not work in this case:

09:11:50 ~$ ruby19 -e 'a=%w{foo bar baz};b=*a;p b'
["foo", "bar", "baz"]

You need to add the comma here as well

09:12:24 ~$ ruby19 -e 'a=%w{foo bar baz};b,=*a;p b'
"foo"

Of course, you could also do

09:12:50 ~$ ruby19 -e 'a=%w{foo bar baz};b=a.first;p b'
"foo"
09:13:18 ~$ ruby19 -e 'a=%w{foo bar baz};b=a[0];p b'
"foo"

Or, if destruction is allowed:

09:13:23 ~$ ruby19 -e 'a=%w{foo bar baz};b=a.shift;p b'
"foo"

Ruby Programming/Syntax/Method Calls - Wikibooks, open books for an open world
...
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which is
sometimes called the 'splat' operator. This indicates that more parameters
may be passed to the function. Those parameters are collected up and an
array is created.
...

Actually this is not correct any more for 1.9.*: here the splat
operator can occur at _any_ position and Ruby will do the pattern
matching for you:

09:13:29 ~$ ruby19 -e 'def f(a,*b,c) p a, b, c end;f(1,2,3,4,5)'
1
[2, 3, 4]
5
09:15:03 ~$ ruby19 -e 'def f(*a,b,c) p a, b, c end;f(1,2,3,4,5)'
[1, 2, 3]
4
5
09:15:42 ~$ ruby19 -e 'def f(a,b,*c) p a, b, c end;f(1,2,3,4,5)'
1
2
[3, 4, 5]

Kind regards

robert

···

On Tue, Oct 26, 2010 at 5:21 PM, Colin Bartlett <colinb2r@googlemail.com> wrote:

On Tue, Oct 26, 2010 at 3:05 PM, Maurizio Cirilli <mauricirl@gmail.com>wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

It is not possible to do what you're proposing. If you just want to
iterate over the array contents, us the each method of the array object:

ss.each do |item|
  # Do something with the item here.
end

-Jeremy

···

On 10/26/2010 11:05 AM, Maurizio Cirilli wrote:

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!) :slight_smile:

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!) :slight_smile:

It is not possible to do what you're proposing.

I go further and say: it is not even reasonable to do that. That's
the same as setting local variables with calculated names like v1, v2,
v3 etc. If someone wants to do that he must be aware that access to
these variables (since they are generated) must be generated as well.
In this case using an Array indexing is the more appropriate
mechanism.

If you just want to
iterate over the array contents, us the each method of the array object:

Exactly!

Kind regards

robert

···

On Tue, Oct 26, 2010 at 6:11 PM, Jeremy Bopp <jeremy@bopp.net> wrote:

On 10/26/2010 11:05 AM, Maurizio Cirilli wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio

It almost certainly can. I think you just need to rephrase your question
so people can see what exactly you want to do. Here is an irb session
showing you one way of doing what I think you want to do:

ss = [[1,2,3],[4,5,6],[7,8,9]]

=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

main_scope = binding()

=> #<Binding:0x1011c2828>

ss.each_with_index{|x,i| eval("v#{i} = x.clone",main_scope) }

=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

v1

=> [4, 5, 6]

v0

=> [1, 2, 3]

v2

=> [7, 8, 9]

It's almost certainly not the 'right' way to do what you really want
though.

···

On Wed, 27 Oct 2010 19:15:13 +0900, Maurizio Cirilli <mauricirl@gmail.com> wrote:

--
Alex Gutteridge

I can only chime in to what Alex wrote: you *can* extract arrays from
arbitrary nested arrays but generating variable names is almost
certainly the wrong way to go about it. What exactly do you want to
do? Can you describe the input and what you want to do with it with
more context than you provided so far?

Cheers

robert

···

On Wed, Oct 27, 2010 at 12:15 PM, Maurizio Cirilli <mauricirl@gmail.com> wrote:

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

If you don't know the length of ss when you write your program,
then obviously you don't know how many variables to use in your
assignment statement. You don't know whether to say

a,b,c = ss

or

a,b,c,d = ss

or

a,b,c,d,e = ss

However, those variables are not needed at all. To extract the
first subarray, say ss[0] or ss.first. To extract the last
subarray, say ss[-1] or ss.last. To extract each in turn along
with its index:

ss.each_with_index{|x,i| p i, x}
0
["tcg", "agt", "tct", "agc", "tca", "tcc"]
1
["aaa", "aag"]
2
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]

All of this will become obvious after you have some programming
experience.

···

On Oct 27, 5:14 am, Maurizio Cirilli <mauric...@gmail.com> wrote:

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.
Sorry again for misunderstanding.

- Maurizio

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.
Sorry again for misunderstanding.

- Maurizio

Dear Robert,

   this is I want to do:

(1) extract subarrays from a md-arrays (but the number of subarrays
     can vary from case to case)
(2) use separated subarrays to make all possible nucleotide sequences
     is possible to build from them permutating the codons at each
position
     (row) i.e. all the 6*2*6 sequences 9 nucleotide long in my
example
(3) convert them to string and put in a single array ( this is
required for
     compatibility with BioRuby classes and methods that deal withDNA
     sequences as strings)
(4) make further analysis on these sequences by BioRuby.

That's it.

While is clear to me how to do points 2,3 and 4 I really struggled
how
to accomplish point 1 , which is the subject of this thread.

-- Maurizio

···

On Oct 27, 12:46 pm, Robert Klemme <shortcut...@googlemail.com> wrote:

On Wed, Oct 27, 2010 at 12:15 PM, Maurizio Cirilli <mauric...@gmail.com> wrote:
> OK, so looks like there is no way with Ruby to extract
> single subarrays from md-arrays with unknown dimensions.

I can only chime in to what Alex wrote: you *can* extract arrays from
arbitrary nested arrays but generating variable names is almost
certainly the wrong way to go about it. What exactly do you want to
do? Can you describe the input and what you want to do with it with
more context than you provided so far?

Cheers

robert

--
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/

Then why don't you just iterate the outermost Array and be done?

robert

···

On Wed, Oct 27, 2010 at 1:10 PM, Maurizio Cirilli <mauricirl@gmail.com> wrote:

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/