Tools for debugging memory leak in ruby process

Hi all:
    I had written a ruby program for fetching webpage from a batch of urls.
But as the program ran for a while, it took a lot of memory, which lead to
killed by the system unexpectedly. I want to debug this problem. Are there
some amazing tools for me to see the all the objects(including the object's
type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard

Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.

I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)

Bye
Berg

···

Am 14.09.2016 07:36 schrieb "timlen tse" <tinglenxan@gmail.com>:

Hi all:
    I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

​There's these:
https://rubygems.org/search?utf8=✓&query=memory+profiler

I can't vouch for anything there, BTW; it was just a search off the top of
my head.

Cheers

···

On 14 September 2016 at 15:35, timlen tse <tinglenxan@gmail.com> wrote:

Hi all:
    I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecord,get it's url(a field) and fetch webpage using HTTParty and
parse the webpage using Nokogiri then extract target tag content and store
into database.

A Berger <aberger7890@gmail.com>于2016年9月14日周三 下午2:48写道:

···

Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.

I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)

Bye
Berg
Am 14.09.2016 07:36 schrieb "timlen tse" <tinglenxan@gmail.com>:

Hi all:
    I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Can you show us your code (you can use pastebin http://pastebin.com/\),
maybe there's a bottleneck in it we could identify ?

···

On Wed, Sep 14, 2016, at 09:35, timlen tse wrote:

I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecord,get it's url(a field) and fetch webpage using HTTParty
and parse the webpage using Nokogiri then extract target tag content
and store into database.

A Berger <aberger7890@gmail.com>于2016年9月14日周三 下午2:48写道:

Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which
seems to be "normal"), maybe because the whole website is being
parsed.
I have seen many -g flags when compiling the sources, so gdb could
help (but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)
Bye
Berg

Am 14.09.2016 07:36 schrieb "timlen tse" <tinglenxan@gmail.com>:

Hi all:
    I had written a ruby program for fetching webpage from a batch
    of urls. But as the program ran for a while, it took a lot of
    memory, which lead to killed by the system unexpectedly. I want
    to debug this problem. Are there some amazing tools for me to
    see the all the objects(including the object's type) which was
    not released by GC when the program running.
Any suggestions are welcome.

Best Regard

Unsubscribe: <mailto:ruby-talk-request@ruby-
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Here's a very simplistic approach to counting object which might or
might not help:

class MemDiff
  def dump
    counts = ObjectSpace.each_object.each_with_object(Hash.new 0)
{|o,h| h[o.class] +=1 }

    if @last
      counts.keys.sort_by {|c| c.name || c.inspect}.each do |c|
        diff = counts[c] - @last[c]
        printf "%-30s %20d\n", c, diff if diff != 0
      end
    end

    @last = counts
    self
  end
end

md = MemDiff.new

md.dump

10.times.map &:to_s

md.dump

This is of course not a real memory debugger as it won't give you
allocation site. But the type of object that is increasing might give
you an indication.

Kind regards

robert

···

On Wed, Sep 14, 2016 at 9:35 AM, timlen tse <tinglenxan@gmail.com> wrote:

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can
- without end}
http://blog.rubybestpractices.com/

Here is my code, a rake task

   1. desc "Daily update"
   2. task :daily_update do
   3. crawler = Search.new
   4. machine_count = 4
   5. machine_id = ENV["MACHINE_ID"] || 1
   6. loop do
   7. items = Item.select(:id,:product_id,:sold_history).where("done=0
   and id % #{machine_count}=#{machine_id}").take(100)
   8. break if items empty?
   9. items.each do |item|
   10. begin
   11. data = crawler.fetch_item_info(item.product_id)
   12. sold_history = JSON.parse(item.sold_history).push(data[
   :sold])
   13. sold_history = JSON.generate(sold_history)
   14. item.update(data.merge({sold_history:sold_history,:done=>
   1}))
   15. rescue=>e
   16. warn "Error in daily update #{e.message}"
   17. next
   18. end
   19. end
   20. end
   21. end

and the follow are customize class

   1. class Search
   2. def initialize
   3. @user_agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +
   http://www.google.com/bot.html\)"
   4. end
   5.
   6. def fetch_item_info(product_id)
   7. webpage = get("http://list.qoo10.sg/g/#{product_id\}&quot;\)
   8. DetailParser.parse(webpage)
   9. end
   10.
   11. private
   12. def get(url)
   13. HTTParty.get(url, headers: {"User-Agent"=>@user_agent}).body
   14. end
   15. end
   16.

   1. class DetailParser
   2.
   3. def self.parse(webpage)
   4. webpage = Nokogiri::HTML(webpage)
   5. price = webpage.xpath("//strong[@data-price!='']")[-1]
   6. if price.nil?
   7. price = webpage.at(
   "//div[@id='ctl00_ctl00_MainContentHolder_MainContentHolderNoForm_retailPricePanel']/dl/dd"
   )
   8. end
   9. price = price.nil? ? 0.00 : price.text[/(\d|\.){1,}/]
   10. price ||= 0.00
   11. sold = webpage.at("//span[@class='sold']/strong")
   12. sold = sold.nil? ? 0 : sold.text.to_i
   13. ship_from = webpage.at("//tr[@class='shpng']/td/text()") || ''
   14. img_url = webpage.at("//input[@id='basic_image']")
   15. img_url = img_url.nil? ? "" : URI.decode(img_url['value'])
   16.
   17. {sold: sold, pic: img_url, shipping_from: ship_from.to_s, price:
   price}
   18. end
   19.
   20. end

Jérémy SEBAN <jeremy@seban.eu>于2016年9月14日周三 下午5:07写道:

···

Can you show us your code (you can use pastebin http://pastebin.com/\),
maybe there's a bottleneck in it we could identify ?

On Wed, Sep 14, 2016, at 09:35, timlen tse wrote:

I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecord,get it's url(a field) and fetch webpage using HTTParty and
parse the webpage using Nokogiri then extract target tag content and store
into database.

A Berger <aberger7890@gmail.com>于2016年9月14日周三 下午2:48写道:

Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.

I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)

Bye
Berg

Am 14.09.2016 07:36 schrieb "timlen tse" <tinglenxan@gmail.com>:

Hi all:
    I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe
<ruby-talk-request@ruby-lang.org?subject=unsubscribe>>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;