Ruby wish-list

I don't really see the reason why the GC would need or want a specific thread to itself - for a start, such a design makes the system slower on low end systems. There may also be cases where it is possible to choose 'optimal' times to run the GC within a single thread context.

One thing regarding the GC I am unsure about are the conditions under which the GC is actually run. One not uncommon problem with external libraries (classic and common example is RMagick) do not malloc using the correct api, Ruby often fails to call the GC, at all.

A call to GC.start under these conditions can prevent an OOME, as calling GC.start does in fact cause RMagick to free memory - but ruby doesn't know about this.

The simplest solution to this issue I can see is to ensure that the GC is run when an OOME occurs, or more particularly, all loaded extensions are told to free when an OOME occurs (this does not seem to happen under these conditions). Whilst I know this is not really the responsibility of Ruby, this simple addition could solve problems for quite a number of scripts, thus removing a FAQ.

More regular GC runs may actually be sensible, depending on the real performance issues that might arise with longer running applications and fragmentation. A documented example of such a problem, and a solution is here: Heap fragmentation in a long running Ruby process – Open Source Teddy Bears

Robert Klemme wrote:

···

On 15.10.2007 18:45, Robert Klemme wrote:

On 15.10.2007 17:24, Roger Pack wrote:

>ensure_uninterruptible # (or call it ensure_critical)

It's not as simple as you've expected. First we need to define how
"uninterruptible" section work.

I agree. One definition would be to mark Thread.critical, then runs the block, then unmark. I would use it :slight_smile:

Bad idea in light of native threads IMHO. Every construct that halts all threads should be avoided. If you need exclusive access to resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a certain point in time. Your suggested feature would make it unnecessarily complex and slow to use native threads (there would have to be regular synchronization on some hidden global lock, which defies the whole purpose of using (native) threads).

    robert

I agree. Thinking out loud...with a true 'native' threaded model I
don't know if it would be a spectacular idea to be able to block all
threads. I've often wondered how Ruby 1.9 will implement
Thread.critical, at all. If it does attempt to, then maybe this
suggestion (though aimed mostly at 1.8.6) might still be useful (if you
don't mind the possible slowdown). If not then yeah--probably not worth
the hassle :slight_smile:

Other suggestions of how ensure_uninterruptable might work (like 'this
thread doesn't accept interruptions [thread_name.raise's] for awhile')
seem like even worse ideas.

The benefit of having such a feature in the first place would be that
you can 'nest' timeouts and other code that executes
other_thread_name.raise, without some dangerous issues cropping up when
two raises occur very close to the same time. Or basically that you can
execute other_thread_name.raise on more complex code without the
drawbacks that might occur.

An example of this is if you nest two timeouts one within another, and
one happens to expire when the other is not finished processing its
ensure block. This will possibly cause a 'random' exception to be
raised on the origin thread later. I guess basically currently the use
of other_thread_name.raise is dangerous, this would help that.

Just my $.02
Thought welcome.
-Roger

Robert Klemme wrote:

···

On 15.10.2007 18:45, Robert Klemme wrote:

all threads should be avoided. If you need exclusive access to
resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a
certain point in time. Your suggested feature would make it
unnecessarily complex and slow to use native threads (there would have
to be regular synchronization on some hidden global lock, which defies
the whole purpose of using (native) threads).

  robert

--
Posted via http://www.ruby-forum.com/\.

Joel VanderWerf wrote:

irb(main):001:0> 35+89
=> 124
irb(main):002:0> _
=> 124

awesome. Thanks.

···

--
Posted via http://www.ruby-forum.com/\.

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

···

On Fri, Mar 28, 2008 at 12:21 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

Next wish :slight_smile:
I wish that you could make arrays more easily. As in

a = [1,2,3]
a.map{|n| n,n }

and it would work :slight_smile:

You can do

irb(main):002:0> Array.new(3) {|n| [n,n]}
=> [[0, 0], [1, 1], [2, 2]]
irb(main):003:0> (1..3).map {|n| [n,n]}
=> [[1, 1], [2, 2], [3, 3]]

But you don't get rid of the brackets that way.

  robert

···

On 28.03.2008 00:21, Roger Pack wrote:

Next wish :slight_smile:
I wish that you could make arrays more easily. As in

a = [1,2,3]
a.map{|n| n,n }

and it would work :slight_smile:
-R

Joel VanderWerf wrote:

The check for garbage can't be done concurrently with ruby code:

   a =
   $a = a
   a = nil

depending on how the gc process is scheduled, that empty array may
falsely appear to be garbage:

   # start scan * SNAPSHOT *
   a =
   # scan globals
   $a = a
   a = nil
   # scan locals

Definitely a consideration.

Do you think this help? The child process is basically operating off a
snapshot of memory, since it was a fork. It will thus only 'mark as
free' the ruby objects that are inaccessible at the time of the
snapshot, thus, in this example, not collecting "a" since it...hasn't
been created yet.

The memory is 'reclaimed' by the parent "all at once" so should avoid
concurrency weirdness that way, too. There is always "at most" one fork
doing a GC. Come to think of it, I'm not sure how this would work if
you had a process that actually called fork while running, but for
single [true] threaded scripts it should work.

Thoughts?

-R

···

--
Posted via http://www.ruby-forum.com/\.

Perhaps a bit controversial, but I'd like to see a keyword for the “current block” as a way to refer to the Proc instance created inside that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
   Array === e ? e.map(&current_block) : e.to_s
}

without resorting to ugly syntax like

[[1,2,[3]],[[[5],6],7],8].map &(deep_to_s = proc { |e|
   Array === e ? e.map(&deep_to_s) : e.to_s
})

···

--
instance_variable_set(%@\@%sample@%%@ew@.succ, Class.new(&proc{def self.net;$;,
$/='','/';%;.fqn-cmtkhng;end}));Kernel.send(:"define_method",:method_missing){|
n,$_|$_<<"?kd!jhl";n=split.map{|q|q.succ}*'';puts n.reverse.chomp.tr(*%w{" a})}
me@example.net

So you'd prefer a few tweaks:

I don't really see the reason why the GC would need or want a specific
thread to itself - for a start, such a design makes the system slower on
low end systems. There may also be cases where it is possible to choose
'optimal' times to run the GC within a single thread context.

So if it were someday created to run as a separate thread, you'd like to
still be able to have a call 'GC.start.join' or what not, to let it
finish during an 'optimal' time?

...A call to GC.start under these conditions can prevent an OOME, or more
particularly, that all loaded extensions
are told to free when an OOME occurs

And you'd prefer a small change to the GC such that it also starts on
OOME's, correct?

Wow I hope I never run into any memory issues like that!

Yeah those also sound reasonable :slight_smile:
Wish lists have no bias :slight_smile:
Take care.
-Roger

···

--
Posted via http://www.ruby-forum.com/\.

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

Yeah I'm just a little lazy and dislike all the brackets :slight_smile:

···

--
Posted via http://www.ruby-forum.com/\.

Roger Pack wrote:

Joel VanderWerf wrote:

The check for garbage can't be done concurrently with ruby code:

   a =
   $a = a
   a = nil

depending on how the gc process is scheduled, that empty array may
falsely appear to be garbage:

   # start scan * SNAPSHOT *
   a =
   # scan globals
   $a = a
   a = nil
   # scan locals

Definitely a consideration.

Do you think this help? The child process is basically operating off a snapshot of memory, since it was a fork. It will thus only 'mark as free' the ruby objects that are inaccessible at the time of the snapshot, thus, in this example, not collecting "a" since it...hasn't been created yet.

The memory is 'reclaimed' by the parent "all at once" so should avoid concurrency weirdness that way, too. There is always "at most" one fork doing a GC. Come to think of it, I'm not sure how this would work if you had a process that actually called fork while running, but for single [true] threaded scripts it should work.

Thoughts?

-R

Oh, I wasn't understanding your idea, which does seem to make sense now. There's the overhead of essentially duplicating the object space of the original process in the fork (since mark will write to memory pages). Maybe it would be worth this cost (esp. on a second processor) for the benefit of not having to stop the main process?

Interesting!

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

It's not exactly the same, but couldn't you just define a
recursive map function?

class Array
  def map_r &block
    map{|e|Array===e ? e.map_r(&block) : block[e]}
  end
end
p [[1,2,[3]],[[[5],6],7],8].map_r{|e|e.to_s}

=> [["1", "2", ["3"]], [[["5"], "6"], "7"], "8"]

···

On 8/8/08, Mikael Høilund <mikael@hoilund.org> wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the "current
block" as a way to refer to the Proc instance created inside that block.
That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

Mikael Høilund wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the “current block” as a way to refer to the Proc instance created inside that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
  Array === e ? e.map(&current_block) : e.to_s
}

without resorting to ugly syntax like

[[1,2,[3]],[[[5],6],7],8].map &(deep_to_s = proc { |e|
  Array === e ? e.map(&deep_to_s) : e.to_s
})

IIUC, this would also let us take this idiom:

pr = proc {|h,k| h[k] = Hash.new(&pr)}
h = Hash.new(&pr)
h[1][2][3] = 6
p h # ==> {1=>{2=>{3=>6}}}

and simplify to:

h = Hash.new {|h,k| h[k] = Hash.new(&current_block)}

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Mikael Høilund wrote:

Perhaps a bit controversial, but I'd like to see a keyword for the
�current block� as a way to refer to the Proc instance created inside
that block. That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
   Array === e ? e.map(&current_block) : e.to_s
}

Appears maybe possible [Y combinator in Ruby]:
http://53cr.com/blog/2008/10/recursive-lambdas-in-ruby/
Cheers.
-=R

···

--
Posted via http://www.ruby-forum.com/\.

Hey!

Not **quite** on topic of garbage collection.... But how hard would it be to create maybe a style of method creation that doesn't use the . to represent Object.behavior ?

In retrospect, it seems like definitely a core language feature that may or may not be impossible to get at... But I figured I'd ask :slight_smile:

Also, Is there an easy/hard way to define new %{} style methods? Like for a Rope object, maybe %m{} or something.

Just a newbie's musings.

Thanks,
Ari
--------------------------------------------|
If you're not living on the edge,
then you're just wasting space.

GC wish list:
Isn't it possible to create your own reference checking style object?
Would this be possible, for example.

Class Object

attr_reader :internal_object # only used if you call
new_with_timely_death_wrapper for your new call--see below
def new_with_timely_death_wrapper class_to_use, *args
   @internal_object = class_to_use.new *args # since this object is
'only internal' deleting it later will be OK
end

def assign_to_me this_wrapped_object
  dec
  @internal_object = this_wrapped_object.internal_object
  inc
end

def do_method name, *args
   eval("@internal_object.#{name} *args") # ok there's probably a better
way :slight_smile: maybe object.internal_object.do_whatever args
end

def dec
  @internal_object.count -= 1
  recycle_current_object if count == 0
end

def inc
  @internal_object.count += 1
end

def recycle_current_object
   # traverse internal members of @internal_object--force_recycle them,
unless they're wrappers, then just dec them.
   @internal_object.recycle
end

def inc
  @internal_object.count += 1
end

def go_out_of_scope
  dec
  self.force_recycle # we are toast :slight_smile: -- this is scary and might not be
right
end

end

Then the example:

a = Array.new_with_timely_death_wrapper(0,0)
b = Array.new_with_timely_death_wrapper(0,0) # only time you should use
assign is on start
b.assign_to_me a #recycle's b's object, assigns internal_object to a's
internal_object.a, sets count to 2
a.go_out_of_scope # a set to 1
b.go_out_of_scope # a set to 0 -- recycled.

?

···

--
Posted via http://www.ruby-forum.com/.

cross-post from core:
After examining how the 1.8.6 gc works, I had a few thoughts:

Background:

It seems that on a 'cpu intensive' program (one that generates a lot of
discardable objects--quite common), there is a competition between two
aspects of the gc to call garbage_collect first. They are:

1) If you run out of available heap slots ruby calls garbage_collect,
and
if
"FREE_MIN" slots now exist (by default 4096) then it returns and leaves
the heap the same size. It also resets the current 'malloc'ed bytes'
counter to 0, since it called garbage_collect.

2) If you reach GC_MALLOC_LIMIT of malloc'ed bytes, then it calls
garbage_collect, resets it to 0.

Anyway so what happens in today's implementations is that number 1 is
called often (I believe) preventing number 2 from ever even springing,
as it
resets the current count of malloc'ed bytes. It's like garbage_collect
is
trying to serve 2 masters, and ends up serving just the one. I see this
as curious as it basically disallows GC_MALLOC_LIMIT from being reached,
which is not what you would expect.

Thoughts?

On another point, I have a question on this line of code, run at the end
of garbage collection:
if (malloc_increase > malloc_limit) {
    malloc_limit += (malloc_increase - malloc_limit) * (double)live /
(live + freed); // this line
    if (malloc_limit < GC_MALLOC_LIMIT) malloc_limit = GC_MALLOC_LIMIT;
}

I haven't checked this, but it seems to me that It seems to me that
(malloc_increase - malloc_limit) will always be a very small number (?)
which may not be what was expected. I could be wrong.

So my question is "what should the GC do, and when?
Any thoughts?
In my opinion, if it runs out of heap slots available, it should call
garbage_collect AND increase the heap size (so that next time it won't
run out, and will have enough to hopefully reach GC_MALLOC_LIMIT).

I think when it does reach GC_MALLOC_LIMIT malloc'ed bytes, it should
set it

new_malloc_limit = GC_MALLOC_LIMIT *
(1 - percent_of_recent_allocated_memory_that_was_freed)

to allow the malloc_limit to change dynamically, maybe with a fixed max
size.

So my question is what should best happen?
Ruby rox.

···

--
Posted via http://www.ruby-forum.com/.

> You want to create a two-dimensional array, right?
> It works when you add square braces:
>
> a = [1,2,3]
> a.map{|n| [n,n] }
Yeah I'm just a little lazy and dislike all the brackets :slight_smile:

--
Posted via http://www.ruby-forum.com/\.

No brackets, huh?

a = [1,2,3]

=> [1, 2, 3]

a.zip(a)

=> [[1, 1], [2, 2], [3, 3]]

···

On Thu, Mar 27, 2008 at 6:34 PM, Roger Pack <rogerpack2005@gmail.com> wrote:

Roger Pack wrote:

You want to create a two-dimensional array, right?
It works when you add square braces:

a = [1,2,3]
a.map{|n| [n,n] }

Yeah I'm just a little lazy and dislike all the brackets :slight_smile:

I read somewhere that the best computer in the world was the one that that Scotty uses on Star Trek, because no matter what he wanted to do he could program it in two or three button presses.

Until we get one of Scotty's computers, though, programming requires typing. Lots and lots of typing. Maybe a little less with Ruby than, say, with Java, but still...lots.

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

Joel VanderWerf wrote:

   # start scan * SNAPSHOT *

Oh, I wasn't understanding your idea, which does seem to make sense now.
There's the overhead of essentially duplicating the object space of the
original process in the fork (since mark will write to memory pages).
Maybe it would be worth this cost (esp. on a second processor) for the
benefit of not having to stop the main process?

Now that you mention it, this might would work even better in
conjunction with Hongli's Copy on write friendly GC. Then you wouldn't
have to copy the entire ruby space per GC :slight_smile:

Interesting!

I can't get much credit for the idea since it just came to me while
reading scriptures yesterday morning :stuck_out_tongue:
Now you know what distracts me a lot.

I'll probably hack it up sometime, first version as a straight patch to
1.8.6

-R

···

--
Posted via http://www.ruby-forum.com/\.

Perhaps a bit controversial, but I'd like to see a keyword for the "current
block" as a way to refer to the Proc instance created inside that block.
That would allow for recursive anonymous blocks like:

[[1,2,[3]],[[[5],6],7],8].map { |e|
Array === e ? e.map(&current_block) : e.to_s
}

It's not exactly the same, but couldn't you just define a
recursive map function?

class Array
def map_r &block
   map{|e|Array===e ? e.map_r(&block) : block[e]}
end
end
p [[1,2,[3]],[[[5],6],7],8].map_r{|e|e.to_s}

=> [["1", "2", ["3"]], [[["5"], "6"], "7"], "8"]

That's not optimal if it's just for a single usage, especially not if it requires monkey-patching a built-in class. I could imagine this being used pretty much any time recursive behavior is necessary for an iterator. And besides, what's more descriptive?

array.map { |e|
   Array === e ? e.map(&current_block) : e.to_s
}

or

array.map_r { |e| e.to_s } # Please look somewhere in the source tree for the definition of Array#map_r

?

IIUC, this would also let us take this idiom:

pr = proc {|h,k| h[k] = Hash.new(&pr)}
h = Hash.new(&pr)
h[1][2][3] = 6
p h # ==> {1=>{2=>{3=>6}}}

It's dangerous to go alone. Until we have this sort of divine magic, take this:

>> h = Hash.new { |h, k| h[k] = Hash.new(&h.default_proc) }
>> h[:a][:b][:c] = :o
>> h
=> {:b=>{:c=>:o}}

···

On Aug 8, 2008, at 20:19, Adam Shelly wrote:

On 8/8/08, Mikael Høilund <mikael@hoilund.org> wrote:

On Aug 8, 2008, at 21:30, Joel VanderWerf wrote:

--
Mikael Høilund
http://hoilund.org/