GC help

I am still running out of memory with my ruby application. For some reason,
there are objects that are not getting reclaimed long after their use is up.
I understand the mark and sweep GC algorithm. Is there a method in Ruby to
set an object's memory space to be collected?

···

--
Joey Marino

Joey Marino wrote:

I am still running out of memory with my ruby application. For some reason,
there are objects that are not getting reclaimed long after their use is up.
I understand the mark and sweep GC algorithm. Is there a method in Ruby to
set an object's memory space to be collected?

Ruby's GC will eventually collect unreferenced objects without any intervention on your part. It may not collect them on the first sweep, but it will collect them. You can force GC to run earlier than normal by calling GC.start, but that doesn't necessarily mean that every unreferenced object will be collected on that sweep.

In one of your earlier posts you said you were using a 3rd-party library? Have you determined that it's not responsible for the memory allocations?

Have you determined that no references exist to objects that you think should have been cleaned up?

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file. The file handlers are getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don't understand is that the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right? I've invoked GC at
the end of the method and after each iteration that the method gets called.
I've even set the strings to nil after they've been saved to a file and
before gc is called. The app is using 2G of ram and 4G of swap before it
runs of out memory and crashes about 1/3rd of the way through. I'm really
starting to doubt Ruby's ability to do memory intensive work. Any ideas?

···

On Sun, Mar 30, 2008 at 3:53 PM, Tim Hunter <TimHunter@nc.rr.com> wrote:

Joey Marino wrote:
> I am still running out of memory with my ruby application. For some
reason,
> there are objects that are not getting reclaimed long after their use is
up.
> I understand the mark and sweep GC algorithm. Is there a method in Ruby
to
> set an object's memory space to be collected?

Ruby's GC will eventually collect unreferenced objects without any
intervention on your part. It may not collect them on the first sweep,
but it will collect them. You can force GC to run earlier than normal by
calling GC.start, but that doesn't necessarily mean that every
unreferenced object will be collected on that sweep.

In one of your earlier posts you said you were using a 3rd-party
library? Have you determined that it's not responsible for the memory
allocations?

Have you determined that no references exist to objects that you think
should have been cleaned up?

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

--
Joey Marino

Joey Marino wrote:

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file.

Why? If the images come from the Internet and go to disk, why do you need to read them into Ruby working storage? Can't you just shell out to "wget" or "curl"?

Joey Marino wrote:

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file. The file handlers are getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don't understand is that the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right? I've invoked GC at
the end of the method and after each iteration that the method gets called.
I've even set the strings to nil after they've been saved to a file and
before gc is called. The app is using 2G of ram and 4G of swap before it
runs of out memory and crashes about 1/3rd of the way through. I'm really
starting to doubt Ruby's ability to do memory intensive work. Any ideas?

Many people have used Ruby for long-running tasks that use a lot of memory. If Ruby was not collecting unused strings, don't you think somebody would have noticed it by now?

Actually, I think you've ruled out Ruby being the problem. What else is running? Could it be using up the memory?

···

--
RMagick: http://rmagick.rubyforge.org/
RMagick 2: http://rmagick.rubyforge.org/rmagick2.html

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file. The file handlers are getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don't understand is that the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right? I've invoked GC at
the end of the method and after each iteration that the method gets called.
I've even set the strings to nil after they've been saved to a file and
before gc is called. The app is using 2G of ram and 4G of swap before it
runs of out memory and crashes about 1/3rd of the way through. I'm really
starting to doubt Ruby's ability to do memory intensive work. Any ideas?

Do you have a small bit of code that reproduces the problem?

How are you downloading the images? Net::HTTP ? Or...?

If you can provide a small program that exhibits the memory
leak I'm sure others here would be happy to try it on their
systems as well.

Regards,

Bill

···

From: "Joey Marino" <joey.da3rd@gmail.com>

Usually, yes. However, there are ways a reference could be kept live
beyond the end of the method. Obviously, if you pass the string out of the
method to something else that keeps a reference to it, that would do it, but
also if you pass a closure (e.g., a proc object) that refers to the methods
application's binding (I think that's the right terminology) that gets stored
somewhere else, that would keep the method's local variables "live" even after
the method exits.

Its hard to say if something like that might be happening without the code.

···

On Sun, Mar 30, 2008 at 5:07 PM, Joey Marino <joey.da3rd@gmail.com> wrote:

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file. The file handlers are getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don't understand is that the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right?

Good question, unfortunately this is not a simple HTTP server. It's an
industry standard client/server communication called RETS. Real Estate
Transaction Standard. It requires a third party library in order to
interface with the server. I wish it were that simple!!

···

On Sun, Mar 30, 2008 at 8:16 PM, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

Joey Marino wrote:
> I found out it is not the 3rd party code. I am downloading images
(~320,000
> ct.) as strings, and saving each to a file.

Why? If the images come from the Internet and go to disk, why do you
need to read them into Ruby working storage? Can't you just shell out to
"wget" or "curl"?

--
Joey Marino

I shouldn't have criticized ruby like that, it actually is my second
favorite language right behind PHP. If I learned RoR, it might be my most
favorite. It really does a good job, is easy to write, and has a lot of
features. Not to mention a great community.
I am just really frustrated about this stupid app that I can't get working
right. I am working on a workaround since it is only going to do a major
download once then update daily.
Thanks for all the help though, it feels good that there is a place to turn
to for help.

···

On Sun, Mar 30, 2008 at 10:31 PM, Bill Kelly <billk@cts.com> wrote:

From: "Joey Marino" <joey.da3rd@gmail.com>
>
> I found out it is not the 3rd party code. I am downloading images
(~320,000
> ct.) as strings, and saving each to a file. The file handlers are
getting
> closed. Using the Dike gem, I think I determined the bulk of the leaked
> memory is in these string variables. But what I don't understand is that
the
> variables containing the strings are local to a method of an object and
> should be unreferenced when that method is done, right? I've invoked GC
at
> the end of the method and after each iteration that the method gets
called.
> I've even set the strings to nil after they've been saved to a file and
> before gc is called. The app is using 2G of ram and 4G of swap before it
> runs of out memory and crashes about 1/3rd of the way through. I'm
really
> starting to doubt Ruby's ability to do memory intensive work. Any ideas?

Do you have a small bit of code that reproduces the problem?

How are you downloading the images? Net::HTTP ? Or...?

If you can provide a small program that exhibits the memory
leak I'm sure others here would be happy to try it on their
systems as well.

Regards,

Bill

--
Joey Marino

I have noticed :->

Michal

···

On 31/03/2008, Tim Hunter <TimHunter@nc.rr.com> wrote:

Joey Marino wrote:

> I found out it is not the 3rd party code. I am downloading images
(~320,000
> ct.) as strings, and saving each to a file. The file handlers are getting
> closed. Using the Dike gem, I think I determined the bulk of the leaked
> memory is in these string variables. But what I don't understand is that
the
> variables containing the strings are local to a method of an object and
> should be unreferenced when that method is done, right? I've invoked GC at
> the end of the method and after each iteration that the method gets
called.
> I've even set the strings to nil after they've been saved to a file and
> before gc is called. The app is using 2G of ram and 4G of swap before it
> runs of out memory and crashes about 1/3rd of the way through. I'm really
> starting to doubt Ruby's ability to do memory intensive work. Any ideas?
>

Many people have used Ruby for long-running tasks that use a lot of memory.
If Ruby was not collecting unused strings, don't you think somebody would
have noticed it by now?

Ok, I was able to get it all into one class. The problem lies in this class:
class Picture

def initialize(db,rets,rets_class)
   @db = db
   @rets = rets
   @rets_class = rets_class
   @attempts = 0
end

def getPic(key)
   begin
     get_object_request = GetObjectRequest.new(@rets_class, "Photo")
     get_object_request.add_all_objects(key)
     get_object_response = @rets.session.get_object(get_object_request)
     content_type_suffixes = { "image/jpeg" => "jpg"}
     makePicDir(key)
     get_object_response.each_object do |object_descriptor|
       object_key = object_descriptor.object_key
       obj_id = object_descriptor.object_id
       content_type = object_descriptor.content_type
       description = object_descriptor.description
       #print "#{object_key} object \##{object_id}"
       #print ", description: #{description}" if !description.empty?
       #puts
       suffix = content_type_suffixes[content_type]
       pic = object_descriptor.data_as_string
       savePic(key,obj_id.to_s,suffix,description,pic)
     end
     get_object_response = nil
  rescue => e
    puts "Error retrieving pictures for #{key}: " + e
    if @attempts <= 5
      @attempts += 1
      puts "retrying"
      retry
    else
      puts "failed"
      @attempts = 0
    end
  end
  @attempts = 0
end

def getThumb(key)
   begin
     get_object_request = GetObjectRequest.new(@rets_class, "Thumbnail")
     get_object_request.add_all_objects(key)
     get_object_response = @rets.session.get_object(get_object_request)
     content_type_suffixes = { "image/jpeg" => "jpg"}
     get_object_response.each_object do |object_descriptor|
       object_key = object_descriptor.object_key
       obj_id = object_descriptor.object_id
       content_type = object_descriptor.content_type
       description = object_descriptor.description
       #print "#{object_key} object \##{object_id}"
       #print ", description: #{description}" if !description.empty?
       #puts
       suffix = content_type_suffixes[content_type]
       pic = object_descriptor.data_as_string
       savePic(key,obj_id.to_s,suffix,description,pic,true)
     end
     get_object_response = nil
  rescue => e
    puts "Error retrieving thumbs for #{key}: " + e
    if @attempts <= 5
      @attempts += 1
      puts "retrying"
      retry
    else
      puts "failed"
      @attempts = 0
    end
  end
  @attempts = 0
end

def makePicDir(key)
   FileUtils.mkpath("#{$pic_dir}#{key}/thumb")
end

def savePic(key,id,suffix,desc,pic,thumb_bool=false)
   if thumb_bool
     file_name = $pic_dir + key + "/thumb/" + id + "." + suffix
     location = "/" + key + "/thumb/" + id + "." + suffix
   else
     file_name = $pic_dir + key + "/" + id + "." + suffix
     location = "/" + key + "/" + id + "." + suffix
   end
   self.savePicFile(file_name,pic)
   size = File.size(file_name)
   if thumb_bool
     self.insertThumbDB(key,id,location)
   else
     self.insertPicDB(key,id,desc,size,location)
   end
end

def savePicFile(file_name,pic)
   f = File.open(file_name, "wb")
   f << pic
   f.close
end

def insertPicDB(key,id,desc,size,location)
   description = @db.database.escape_string(desc)
   if
@db.DBinsert("PICS","pkey,id,description,size,location","#{key},#{id},'#{description}','#{size}','#{location}'")
   # puts "#{key} #{id} pic added"
     print ":"
   end
end

def insertThumbDB(key,id,location)
   if @db.DBupdate("PICS","thumb = '#{location}'"," pkey = #{key} and id =
#{id}")
    # puts "#{key} #{id} thumb added"
      print "."
   end
end

def deletePic(key)
   self.deletePicDir(key)
   self.deletePicDB(key)
end

def deletePicDir(key)
   if File.exists?("#{$pic_dir}#{key}")
     FileUtils.remove_dir("#{$pic_dir}#{key}")
   end
end

def deletePicDB(key)
  if @db.DBdelete("PICS","pkey = #{key}")
    print "-"
# puts "#{key} pics deleted from db"
  end
end

end #end class

···

On Wed, Apr 2, 2008 at 6:02 AM, Michal Suchanek <hramrach@centrum.cz> wrote:

On 31/03/2008, Tim Hunter <TimHunter@nc.rr.com> wrote:
> Joey Marino wrote:
>
> > I found out it is not the 3rd party code. I am downloading images
> (~320,000
> > ct.) as strings, and saving each to a file. The file handlers are
getting
> > closed. Using the Dike gem, I think I determined the bulk of the
leaked
> > memory is in these string variables. But what I don't understand is
that
> the
> > variables containing the strings are local to a method of an object
and
> > should be unreferenced when that method is done, right? I've invoked
GC at
> > the end of the method and after each iteration that the method gets
> called.
> > I've even set the strings to nil after they've been saved to a file
and
> > before gc is called. The app is using 2G of ram and 4G of swap before
it
> > runs of out memory and crashes about 1/3rd of the way through. I'm
really
> > starting to doubt Ruby's ability to do memory intensive work. Any
ideas?
> >
>
> Many people have used Ruby for long-running tasks that use a lot of
memory.
> If Ruby was not collecting unused strings, don't you think somebody
would
> have noticed it by now?

I have noticed :->

Michal

--
Joey Marino

Many people have used Ruby for long-running tasks that use a lot of memory.
If Ruby was not collecting unused strings, don't you think somebody would
have noticed it by now?

I have noticed :->

Michal

I have too, and it drives me crazy when my mongrel instances eat up
600MB of memory.
I'd be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.
Probably a rails thing, but anyway.
Enough ranting.
Have a good one.
-R

···

--
Posted via http://www.ruby-forum.com/\.

Few suggestions, some not really related to the problem:

Ok, I was able to get it all into one class. The problem lies in this class:
class Picture

  def initialize(db,rets,rets_class)
   @db = db
   @rets = rets
   @rets_class = rets_class
   @attempts = 0
  end

  def getPic(key)
   begin
     get_object_request = GetObjectRequest.new(@rets_class, "Photo")
     get_object_request.add_all_objects(key)
     get_object_response = @rets.session.get_object(get_object_request)
     content_type_suffixes = { "image/jpeg" => "jpg"}

Make content_type_suffixes a class constant, or member if you need to
append to it.
Now you are constructing and destructing the object on each method call.

     makePicDir(key)
     get_object_response.each_object do |object_descriptor|
       object_key = object_descriptor.object_key
       obj_id = object_descriptor.object_id
       content_type = object_descriptor.content_type
       description = object_descriptor.description
       #print "#{object_key} object \##{object_id}"

- #print ", description: #{description}" if !description.empty?
+ #print ", description: #{description}" unless
description.empty? # a matter of taste/style

       #puts
       suffix = content_type_suffixes[content_type]
       pic = object_descriptor.data_as_string
       savePic(key,obj_id.to_s,suffix,description,pic)
     end
     get_object_response = nil
  rescue => e
    puts "Error retrieving pictures for #{key}: " + e
    if @attempts <= 5
      @attempts += 1
      puts "retrying"
      retry
    else
      puts "failed"
      @attempts = 0
    end
  end
  @attempts = 0
  end

It seems that you could refactor the common code of these two methods
in to a new one.
The benefit would be shorter/more readable code and better
responsibility split, the drawback slower execution.

  def getThumb(key)
   begin
     get_object_request = GetObjectRequest.new(@rets_class, "Thumbnail")
     get_object_request.add_all_objects(key)
     get_object_response = @rets.session.get_object(get_object_request)
     content_type_suffixes = { "image/jpeg" => "jpg"}
     get_object_response.each_object do |object_descriptor|
       object_key = object_descriptor.object_key
       obj_id = object_descriptor.object_id
       content_type = object_descriptor.content_type
       description = object_descriptor.description
       #print "#{object_key} object \##{object_id}"
       #print ", description: #{description}" if !description.empty?
       #puts
       suffix = content_type_suffixes[content_type]
       pic = object_descriptor.data_as_string
       savePic(key,obj_id.to_s,suffix,description,pic,true)
     end
     get_object_response = nil
  rescue => e
    puts "Error retrieving thumbs for #{key}: " + e
    if @attempts <= 5
      @attempts += 1
      puts "retrying"
      retry
    else
      puts "failed"
      @attempts = 0
    end
  end
  @attempts = 0
  end

  def makePicDir(key)
   FileUtils.mkpath("#{$pic_dir}#{key}/thumb")
  end

  def savePic(key,id,suffix,desc,pic,thumb_bool=false)
   if thumb_bool
     file_name = $pic_dir + key + "/thumb/" + id + "." + suffix
     location = "/" + key + "/thumb/" + id + "." + suffix
   else
     file_name = $pic_dir + key + "/" + id + "." + suffix
     location = "/" + key + "/" + id + "." + suffix
   end
   self.savePicFile(file_name,pic)

self is not necessary, the same below

   size = File.size(file_name)
   if thumb_bool
     self.insertThumbDB(key,id,location)
   else
     self.insertPicDB(key,id,desc,size,location)
   end
  end

  def savePicFile(file_name,pic)

- f = File.open(file_name, "wb")
+ File.open(file_name, "wb") do |f|

   f << pic

- f.close
+ end # automatic close on exceptions

  end

  def insertPicDB(key,id,desc,size,location)
   description = @db.database.escape_string(desc)
   if
@db.DBinsert("PICS","pkey,id,description,size,location","#{key},#{id},'#{description}','#{size}','#{location}'")
   # puts "#{key} #{id} pic added"
     print ":"
   end
  end

  def insertThumbDB(key,id,location)
   if @db.DBupdate("PICS","thumb = '#{location}'"," pkey = #{key} and id = #{id}")
    # puts "#{key} #{id} thumb added"
      print "."
   end
  end

  def deletePic(key)
   self.deletePicDir(key)
   self.deletePicDB(key)
  end

  def deletePicDir(key)

- if File.exists?("#{$pic_dir}#{key}")
- FileUtils.remove_dir("#{$pic_dir}#{key}")
- end
+ pic_dir = $pic_dir + key # save one allocation
+ FileUtils.remove_dir pic_dir if File.directory? pic_dir

  end

  def deletePicDB(key)
  if @db.DBdelete("PICS","pkey = #{key}")
    print "-"
# puts "#{key} pics deleted from db"
  end
  end

end #end class

I'd suspect the database code is keeping some cache. This code seems fine.

I see you are using libRETS. Did you try to rule it out by replacing
calls to libRETS (especially to data_as_string) with some stubs
(create a random long string on the fly).
If you are on unix, you can use IO.read("/dev/random", 100000). If you
are on Windows, choose another long enough file to read.

The documentation says that GetData() abandons ownership to the object
it returns. It's possible that SWIG-generated wrapper doesn't handle
this properly.

Jano

···

On Wed, Apr 2, 2008 at 6:52 PM, Joey Marino <joey.da3rd@gmail.com> wrote:

>> Many people have used Ruby for long-running tasks that use a lot of memory.
>> If Ruby was not collecting unused strings, don't you think somebody would
>> have noticed it by now?
>
> I have noticed :->
>
> Michal

I have too, and it drives me crazy when my mongrel instances eat up
600MB of memory.
I'd be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.

Send me the 150 and i send you a copy of ramaze :wink:

···

On 4/5/08, Roger Pack <rogerpack2005@gmail.com> wrote:

Probably a rails thing, but anyway.
Enough ranting.
Have a good one.
-R
--
Posted via http://www.ruby-forum.com/\.

What version of ruby are you running? I recently saw an issue using
1.8.6 with a low patch level. Upgrading to p114 might solve things for
you.

···

On Fri, Apr 4, 2008 at 7:26 PM, Roger Pack <rogerpack2005@gmail.com> wrote:

>> Many people have used Ruby for long-running tasks that use a lot of memory.
>> If Ruby was not collecting unused strings, don't you think somebody would
>> have noticed it by now?
>
> I have noticed :->
>
> Michal

I have too, and it drives me crazy when my mongrel instances eat up
600MB of memory.
I'd be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.
Probably a rails thing, but anyway.
Enough ranting.
Have a good one.
-R
--
Posted via http://www.ruby-forum.com/\.

hat very