Ruby wish-list

James_Tucker · 15 October 2007 17:33

I don't really see the reason why the GC would need or want a specific thread to itself - for a start, such a design makes the system slower on low end systems. There may also be cases where it is possible to choose 'optimal' times to run the GC within a single thread context.

One thing regarding the GC I am unsure about are the conditions under which the GC is actually run. One not uncommon problem with external libraries (classic and common example is RMagick) do not malloc using the correct api, Ruby often fails to call the GC, at all.

A call to GC.start under these conditions can prevent an OOME, as calling GC.start does in fact cause RMagick to free memory - but ruby doesn't know about this.

The simplest solution to this issue I can see is to ensure that the GC is run when an OOME occurs, or more particularly, all loaded extensions are told to free when an OOME occurs (this does not seem to happen under these conditions). Whilst I know this is not really the responsibility of Ruby, this simple addition could solve problems for quite a number of scripts, thus removing a FAQ.

More regular GC runs may actually be sensible, depending on the real performance issues that might arise with longer running applications and fragmentation. A documented example of such a problem, and a solution is here: Heap fragmentation in a long running Ruby process – Open Source Teddy Bears

Robert Klemme wrote:

···

On 15.10.2007 18:45, Robert Klemme wrote:

On 15.10.2007 17:24, Roger Pack wrote:

>ensure_uninterruptible # (or call it ensure_critical)

It's not as simple as you've expected. First we need to define how
"uninterruptible" section work.

I agree. One definition would be to mark Thread.critical, then runs the block, then unmark. I would use it

Bad idea in light of native threads IMHO. Every construct that halts all threads should be avoided. If you need exclusive access to resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a certain point in time. Your suggested feature would make it unnecessarily complex and slow to use native threads (there would have to be regular synchronization on some hidden global lock, which defies the whole purpose of using (native) threads).

robert

Roger_Pack4 · 15 October 2007 17:51

I agree. Thinking out loud...with a true 'native' threaded model I
don't know if it would be a spectacular idea to be able to block all
threads. I've often wondered how Ruby 1.9 will implement
Thread.critical, at all. If it does attempt to, then maybe this
suggestion (though aimed mostly at 1.8.6) might still be useful (if you
don't mind the possible slowdown). If not then yeah--probably not worth
the hassle

Other suggestions of how ensure_uninterruptable might work (like 'this
thread doesn't accept interruptions [thread_name.raise's] for awhile')
seem like even worse ideas.

The benefit of having such a feature in the first place would be that
you can 'nest' timeouts and other code that executes
other_thread_name.raise, without some dangerous issues cropping up when
two raises occur very close to the same time. Or basically that you can
execute other_thread_name.raise on more complex code without the
drawbacks that might occur.

An example of this is if you nest two timeouts one within another, and
one happens to expire when the other is not finished processing its
ensure block. This will possibly cause a 'random' exception to be
raised on the origin thread later. I guess basically currently the use
of other_thread_name.raise is dangerous, this would help that.

Just my $.02
Thought welcome.
-Roger

Robert Klemme wrote:

···

On 15.10.2007 18:45, Robert Klemme wrote:

all threads should be avoided. If you need exclusive access to
resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a
certain point in time. Your suggested feature would make it
unnecessarily complex and slow to use native threads (there would have
to be regular synchronization on some hidden global lock, which defies
the whole purpose of using (native) threads).

robert

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 12 March 2008 22:41

Joel VanderWerf wrote:

irb(main):001:0> 35+89
=> 124
irb(main):002:0> _
=> 124

awesome. Thanks.

···

--
Posted via http://www.ruby-forum.com/\.

Thomas_Wieczorek · 27 March 2008 23:28

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

···

On Fri, Mar 28, 2008 at 12:21 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

Next wish
I wish that you could make arrays more easily. As in

a = [1,2,3]
a.map{|n| n,n }

and it would work

Robert_K1 · 28 March 2008 07:45

You can do

irb(main):002:0> Array.new(3) {|n| [n,n]}
=> [[0, 0], [1, 1], [2, 2]]
irb(main):003:0> (1..3).map {|n| [n,n]}
=> [[1, 1], [2, 2], [3, 3]]

But you don't get rid of the brackets that way.

robert

···

On 28.03.2008 00:21, Roger Pack wrote:

Next wish
I wish that you could make arrays more easily. As in

a = [1,2,3]
a.map{|n| n,n }

and it would work
-R

Roger_Pack4 · 29 July 2008 22:58

Joel VanderWerf wrote:

The check for garbage can't be done concurrently with ruby code:

   a =
   $a = a
   a = nil

depending on how the gc process is scheduled, that empty array may
falsely appear to be garbage:

   # start scan * SNAPSHOT *
   a =
   # scan globals
   $a = a
   a = nil
   # scan locals

Definitely a consideration.

Do you think this help? The child process is basically operating off a
snapshot of memory, since it was a fork. It will thus only 'mark as
free' the ruby objects that are inaccessible at the time of the
snapshot, thus, in this example, not collecting "a" since it...hasn't
been created yet.

The memory is 'reclaimed' by the parent "all at once" so should avoid
concurrency weirdness that way, too. There is always "at most" one fork
doing a GC. Come to think of it, I'm not sure how this would work if
you had a process that actually called fork while running, but for
single [true] threaded scripts it should work.

Thoughts?

-R

···

--
Posted via http://www.ruby-forum.com/\.

Mikael_Hoilund · 8 August 2008 17:52

Perhaps a bit controversial, but I'd like to see a keyword for the “current block” as a way to refer to the Proc instance created inside that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

without resorting to ugly syntax like

[[1,2,[3]],[[[5],6],7],8].map &(deep_to_s = proc { |e|
Array === e ? e.map(&deep_to_s) : e.to_s
})

···

--
instance_variable_set(%@\@%sample@%%@ew@.succ, Class.new(&proc{def self.net;$;,
$/='','/';%;.fqn-cmtkhng;end}));Kernel.send(:"define_method",:method_missing){|
n,$_|$_<<"?kd!jhl";n=split.map{|q|q.succ}*'';puts n.reverse.chomp.tr(*%w{" a})}
me@example.net

Roger_Pack4 · 15 October 2007 18:21

So you'd prefer a few tweaks:

I don't really see the reason why the GC would need or want a specific
thread to itself - for a start, such a design makes the system slower on
low end systems. There may also be cases where it is possible to choose
'optimal' times to run the GC within a single thread context.

So if it were someday created to run as a separate thread, you'd like to
still be able to have a call 'GC.start.join' or what not, to let it
finish during an 'optimal' time?

...A call to GC.start under these conditions can prevent an OOME, or more
particularly, that all loaded extensions
are told to free when an OOME occurs

And you'd prefer a small change to the GC such that it also starts on
OOME's, correct?

Wow I hope I never run into any memory issues like that!

Yeah those also sound reasonable
Wish lists have no bias
Take care.
-Roger

···

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 27 March 2008 23:34

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

Yeah I'm just a little lazy and dislike all the brackets

···

--
Posted via http://www.ruby-forum.com/\.

Joel_VanderWerf1 · 29 July 2008 23:19

Roger Pack wrote:

Joel VanderWerf wrote:

The check for garbage can't be done concurrently with ruby code:

   a =
   $a = a
   a = nil

depending on how the gc process is scheduled, that empty array may
falsely appear to be garbage:

   # start scan * SNAPSHOT *
   a =
   # scan globals
   $a = a
   a = nil
   # scan locals

Definitely a consideration.

Do you think this help? The child process is basically operating off a snapshot of memory, since it was a fork. It will thus only 'mark as free' the ruby objects that are inaccessible at the time of the snapshot, thus, in this example, not collecting "a" since it...hasn't been created yet.

The memory is 'reclaimed' by the parent "all at once" so should avoid concurrency weirdness that way, too. There is always "at most" one fork doing a GC. Come to think of it, I'm not sure how this would work if you had a process that actually called fork while running, but for single [true] threaded scripts it should work.

Thoughts?

-R

Oh, I wasn't understanding your idea, which does seem to make sense now. There's the overhead of essentially duplicating the object space of the original process in the fork (since mark will write to memory pages). Maybe it would be worth this cost (esp. on a second processor) for the benefit of not having to stop the main process?

Interesting!

···

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Adam_Shelly · 8 August 2008 18:19

It's not exactly the same, but couldn't you just define a
recursive map function?

class Array
  def map_r &block
    map{|e|Array===e ? e.map_r(&block) : block[e]}
  end
end
p [[1,2,[3]],[[[5],6],7],8].map_r{|e|e.to_s}

=> [["1", "2", ["3"]], [[["5"], "6"], "7"], "8"]

···

On 8/8/08, Mikael Høilund <mikael@hoilund.org> wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the "current
block" as a way to refer to the Proc instance created inside that block.
That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

Joel_VanderWerf1 · 8 August 2008 19:30

Mikael Høilund wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the “current block” as a way to refer to the Proc instance created inside that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

without resorting to ugly syntax like

[[1,2,[3]],[[[5],6],7],8].map &(deep_to_s = proc { |e|
Array === e ? e.map(&deep_to_s) : e.to_s
})

IIUC, this would also let us take this idiom:

pr = proc {|h,k| h[k] = Hash.new(&pr)}
h = Hash.new(&pr)
h[1][2][3] = 6
p h # ==> {1=>{2=>{3=>6}}}

and simplify to:

h = Hash.new {|h,k| h[k] = Hash.new(&current_block)}

···

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Roger_Pack4 · 7 October 2008 23:43

Mikael Høilund wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the
�current block� as a way to refer to the Proc instance created inside
that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

Appears maybe possible [Y combinator in Ruby]:
http://53cr.com/blog/2008/10/recursive-lambdas-in-ruby/
Cheers.
-=R

···

--
Posted via http://www.ruby-forum.com/\.

Ari_Brown · 16 October 2007 00:08

Hey!

Not **quite** on topic of garbage collection.... But how hard would it be to create maybe a style of method creation that doesn't use the . to represent Object.behavior ?

In retrospect, it seems like definitely a core language feature that may or may not be impossible to get at... But I figured I'd ask

Also, Is there an easy/hard way to define new %{} style methods? Like for a Rope object, maybe %m{} or something.

Just a newbie's musings.

Thanks,
Ari
--------------------------------------------|
If you're not living on the edge,
then you're just wasting space.

Roger_Pack4 · 18 October 2007 18:06

GC wish list:
Isn't it possible to create your own reference checking style object?
Would this be possible, for example.

Class Object

attr_reader :internal_object # only used if you call
new_with_timely_death_wrapper for your new call--see below
def new_with_timely_death_wrapper class_to_use, *args
@internal_object = class_to_use.new *args # since this object is
'only internal' deleting it later will be OK
end

def assign_to_me this_wrapped_object
  dec
  @internal_object = this_wrapped_object.internal_object
  inc
end

def do_method name, *args
eval("@internal_object.#{name} *args") # ok there's probably a better
way maybe object.internal_object.do_whatever args
end

def dec
@internal_object.count -= 1
recycle_current_object if count == 0
end

def inc
@internal_object.count += 1
end

def recycle_current_object
# traverse internal members of @internal_object--force_recycle them,
unless they're wrappers, then just dec them.
@internal_object.recycle
end

def inc
@internal_object.count += 1
end

def go_out_of_scope
dec
self.force_recycle # we are toast -- this is scary and might not be
right
end

end

Then the example:

a = Array.new_with_timely_death_wrapper(0,0)
b = Array.new_with_timely_death_wrapper(0,0) # only time you should use
assign is on start
b.assign_to_me a #recycle's b's object, assigns internal_object to a's
internal_object.a, sets count to 2
a.go_out_of_scope # a set to 1
b.go_out_of_scope # a set to 0 -- recycled.

?

···

--
Posted via http://www.ruby-forum.com/.

Roger_Pack4 · 31 October 2007 19:16

cross-post from core:
After examining how the 1.8.6 gc works, I had a few thoughts:

Background:

It seems that on a 'cpu intensive' program (one that generates a lot of
discardable objects--quite common), there is a competition between two
aspects of the gc to call garbage_collect first. They are:

1) If you run out of available heap slots ruby calls garbage_collect,
and
if
"FREE_MIN" slots now exist (by default 4096) then it returns and leaves
the heap the same size. It also resets the current 'malloc'ed bytes'
counter to 0, since it called garbage_collect.

2) If you reach GC_MALLOC_LIMIT of malloc'ed bytes, then it calls
garbage_collect, resets it to 0.

Anyway so what happens in today's implementations is that number 1 is
called often (I believe) preventing number 2 from ever even springing,
as it
resets the current count of malloc'ed bytes. It's like garbage_collect
is
trying to serve 2 masters, and ends up serving just the one. I see this
as curious as it basically disallows GC_MALLOC_LIMIT from being reached,
which is not what you would expect.

Thoughts?

On another point, I have a question on this line of code, run at the end
of garbage collection:
if (malloc_increase > malloc_limit) {
malloc_limit += (malloc_increase - malloc_limit) * (double)live /
(live + freed); // this line
if (malloc_limit < GC_MALLOC_LIMIT) malloc_limit = GC_MALLOC_LIMIT;
}

I haven't checked this, but it seems to me that It seems to me that
(malloc_increase - malloc_limit) will always be a very small number (?)
which may not be what was expected. I could be wrong.

So my question is "what should the GC do, and when?
Any thoughts?
In my opinion, if it runs out of heap slots available, it should call
garbage_collect AND increase the heap size (so that next time it won't
run out, and will have enough to hopefully reach GC_MALLOC_LIMIT).

I think when it does reach GC_MALLOC_LIMIT malloc'ed bytes, it should
set it

new_malloc_limit = GC_MALLOC_LIMIT *
(1 - percent_of_recent_allocated_memory_that_was_freed)

to allow the malloc_limit to change dynamically, maybe with a fixed max
size.

So my question is what should best happen?
Ruby rox.

···

--
Posted via http://www.ruby-forum.com/.

Gordon_Thiesfeld · 27 March 2008 23:42

> You want to create a two-dimensional array, right?
> It works when you add square braces:
>
> a = [1,2,3]
> a.map{|n| [n,n] }
Yeah I'm just a little lazy and dislike all the brackets

--
Posted via http://www.ruby-forum.com/\.

No brackets, huh?

a = [1,2,3]

=> [1, 2, 3]

a.zip(a)

=> [[1, 1], [2, 2], [3, 3]]

···

On Thu, Mar 27, 2008 at 6:34 PM, Roger Pack <rogerpack2005@gmail.com> wrote:

Tim_Hunter4 · 27 March 2008 23:44

Roger Pack wrote:

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

Yeah I'm just a little lazy and dislike all the brackets

I read somewhere that the best computer in the world was the one that that Scotty uses on Star Trek, because no matter what he wanted to do he could program it in two or three button presses.

Until we get one of Scotty's computers, though, programming requires typing. Lots and lots of typing. Maybe a little less with Ruby than, say, with Java, but still...lots.

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

Roger_Pack4 · 30 July 2008 16:19

Joel VanderWerf wrote:

# start scan * SNAPSHOT *

Oh, I wasn't understanding your idea, which does seem to make sense now.
There's the overhead of essentially duplicating the object space of the
original process in the fork (since mark will write to memory pages).
Maybe it would be worth this cost (esp. on a second processor) for the
benefit of not having to stop the main process?

Now that you mention it, this might would work even better in
conjunction with Hongli's Copy on write friendly GC. Then you wouldn't
have to copy the entire ruby space per GC

Interesting!

I can't get much credit for the idea since it just came to me while
reading scriptures yesterday morning
Now you know what distracts me a lot.

I'll probably hack it up sometime, first version as a straight patch to
1.8.6

-R

···

--
Posted via http://www.ruby-forum.com/\.

Mikael_Hoilund · 8 August 2008 19:49

Perhaps a bit controversial, but I'd like to see a keyword for the "current
block" as a way to refer to the Proc instance created inside that block.
That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

It's not exactly the same, but couldn't you just define a
recursive map function?

class Array
def map_r &block
map{|e|Array===e ? e.map_r(&block) : block[e]}
end
end
p [[1,2,[3]],[[[5],6],7],8].map_r{|e|e.to_s}

=> [["1", "2", ["3"]], [[["5"], "6"], "7"], "8"]

That's not optimal if it's just for a single usage, especially not if it requires monkey-patching a built-in class. I could imagine this being used pretty much any time recursive behavior is necessary for an iterator. And besides, what's more descriptive?

array.map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

or

array.map_r { |e| e.to_s } # Please look somewhere in the source tree for the definition of Array#map_r

?

IIUC, this would also let us take this idiom:

pr = proc {|h,k| h[k] = Hash.new(&pr)}
h = Hash.new(&pr)
h[1][2][3] = 6
p h # ==> {1=>{2=>{3=>6}}}

It's dangerous to go alone. Until we have this sort of divine magic, take this:

>> h = Hash.new { |h, k| h[k] = Hash.new(&h.default_proc) }
>> h[:a][:b][:c] = :o
>> h
=> {:b=>{:c=>:o}}

···

On Aug 8, 2008, at 20:19, Adam Shelly wrote:

On 8/8/08, Mikael Høilund <mikael@hoilund.org> wrote:

On Aug 8, 2008, at 21:30, Joel VanderWerf wrote:

--
Mikael Høilund
http://hoilund.org/

Topic		Replies	Views
Things That Newcomers to Ruby Should Know (10/16/02) ruby-talk	28	170	25 October 2002
RCR 296: Destructive methods return self ruby-talk	89	249	25 March 2005
Ruby Language Q's ruby-talk	75	301	12 August 2002
Little Things ruby-talk	164	374	8 January 2007
Yet Another useless Ruby 2 Idea ruby-talk	67	375	30 August 2005

Ruby wish-list

Related topics