Regarding the proposal mentioned in the subject (see also
http://www.rubygarden.org/article.php?sid=258), I decided to grab the ruby
source and see how difficult it would be to implement, and if anything in
the grammar itself precluded *array in the middle of the list.
The findings of fact (well, OK, my opinion) are these:
*array is only allowed at the end of arrays because it’s implemented as a hack.
For those unfamiliar with the inner workings, usually parse.y creates a
NODE_ARRAY for an array, which is a linked list of nodes with the first node
containing the length of the array. If a *arg value is found, a
NODE_ARGSCAT node is inserted via arg_concat()[1], which has as arguments
the original array and the arg.
[1] parse.y, look for tSTAR arg, e.g. in call_args or aref_args.
In eval.c, which walks the node tree (“AST”) and evaluates it, NODE_ARRAY
walks the list and builds a ruby array (via array.c:rb_ary_new2() and direct
manipulation of the content, inserting the result of evaluating each
element-node); NODE_ARGSCAT evals the array and the arg (which is usually
also an array), and calls rb_ary_concat (which may also call to_ary).
Simple enough, but you can see why this precludes easily allowing *array
mid-list.
Changing the parser to allow *array mid-list (in call args and [] arrays,
and probably other places[2]) is trivial, but deciding what sort of node
structure to build is more complicated. I assume slowing down NODE_ARRAY’s
processing is a definite no-no, since (from looking at ruth parse trees)
it’s used everywhere. However, if tSTAR arg_value instead added a new node,
NODE_XELEMENT (expand element) to the array, e.g.
args ‘,’ tSTAR arg_value { list_append($1,NEW_XELEMENT($4)); }
and the processing of NODE_ARRAY in eval.c worked something like:
case NODE_ARRAY:
{
VALUE ary;
long i;
i = node->nd_alen;
ary = rb_ary_new2(i);
for (i=0;node;node=node->nd_next)
if(NODE_XELEMENT == nd_type(node->nd_head)) {
for(;node;node=node->nd_next)
if(NODE_XELEMENT == nd_type(node->nd_head))
rb_ary_concat(ary,rb_eval(self,node->nd_head->nd_head));
else
rb_ary_push(ary,rb_eval(self,node->nd_head));
break;
} else {
RARRAY(ary)->ptr[i++] = rb_eval(self, node->nd_head);
RARRAY(ary)->len = i;
}
result = ary;
}
break;
(which doesn’t slow down the normal array processing except to add an
equality check which is probably only a couple of operations), then *array
can appear safely anywhere in a list.
However this still doesn’t work, because eval.c’s SETUP_ARGS macro (probably
among others) assumes that the array node’s alen parameter is accurate and
goes and allocates a chunk of memory based on it. The solution to this is
to add a NODE_XARRAY (extended array) that is NODE_ARRAY except it adds
*-interpolation; if anyone was worrying about the overhead of the
NODE_XELEMENT test in regular array processing, it’s gone; the only change
is that as soon as a NODE_XELEMENT is added to an array, the array’s node
type must become NODE_XARRAY instead. This also allows for the ditching of
NODE_ARGSCAT, maybe NODE_REST(ARGS|ARY) too. (Furthermore, SETUP_ARGS will
automatically call rb_eval on a NODE_XARRAY, which can hand it back an
array.)
[2] rhs only of course, lhs is more complicated but it would be nice for
consistency if (as discussed on OPN #ruby-talk) e.g. a,*b,c = f()
assigned the first element returned to a, the last to c, and the rest
to b, in effect doing tmp = f(); a = tmp.unshift; c = tmp.pop; b = tmp,
but that’s a separate (if related) problem.
While I’m looking at NODE_ARRAY, is there any reason why NEW_ZARRAY couldn’t
have been (node.h) #define NEW_ZARRAY rb_node_newnode(NODE_ARRAY,0,0,0)?
Was it done for speed?
Apologies if this has already been discussed, I couldn’t find it in the
archives.
If this idea looks workable I’d be happy to discuss it further or submit a
patch at some point. (Hm… maybe NODE_ARGSCAT could be used multiple times
and multiple NODE_ARRAYs created instead, but that’s ugly.)
Hmm… something still seems rotten in the state of Denmark about how ‘*’ is
parsed, maybe greater reforms are needed.
···
–
Dave
Isa. 40:31