Hi All,
I am a newbie at scraping and multi-threading too. Recently, I
have to implement these two in one of my application. Please find my
code in the attachment. I was facing some peculiar issues and the reason
I don't know!!! The code works pretty well on my ubuntu7.04 but throws
some "sysread", "read_status_line" errors when run on Windows. What
could be the obvious problem? Any help greatly appreciated.
regards,
Venkat Bagam
Attachments:
http://www.ruby-forum.com/attachment/1517/scrape.rb
···
--
Posted via http://www.ruby-forum.com/.
There are a number of problems, among them:
1. deriving from Monitor does not do anything for you in
this case, since you don't use any of its locking
facilities (deriving from Monitor like this probably
isn't a good approach anyway)
2. WWW::Mechanize instances are not safe to share
between threads; it's best to create a separate agent
per thread.
3. Using the Timeout class on complicated libraries
can often break them. If you really need operations
to time out, it is best to see if the library provides
direct support for timeouts on its operations.
-mental
sysread uses a "low-level" read. In general, I wouldn't be confident
that anything marked as "low-level" can be mixed well with
multi-threading. On my system, sysread is slower than ordinary read,
so I fail to see the advantage versus the standard IO#read.
Daniel Brumbaugh Keeney
···
On Wed, Mar 5, 2008 at 3:10 AM, Venkat Bagam <bagam_venkat@hotmail.com> wrote:
Hi All,
I am a newbie at scraping and multi-threading too. Recently, I
have to implement these two in one of my application. Please find my
code in the attachment. I was facing some peculiar issues and the reason
I don't know!!! The code works pretty well on my ubuntu7.04 but throws
some "sysread", "read_status_line" errors when run on Windows. What
could be the obvious problem? Any help greatly appreciated.
regards,
Venkat Bagam
Mental Guy wrote:
There are a number of problems, among them:
1. deriving from Monitor does not do anything for you in
this case, since you don't use any of its locking
facilities (deriving from Monitor like this probably
isn't a good approach anyway)
-mental
I used a synchronized block in "download_file(html_link)"
method...Doesn't that make any sense of locking ??? Am I wrong again !!!
···
--
Posted via http://www.ruby-forum.com/\.
In the code you gave, shared resources like @@html_links and @agent
aren't protected at all.
-mental
···
On Mon, 2008-03-10 at 15:21 +0900, Venkat Bagam wrote:
Mental Guy wrote:
> There are a number of problems, among them:
>
> 1. deriving from Monitor does not do anything for you in
> this case, since you don't use any of its locking
> facilities (deriving from Monitor like this probably
> isn't a good approach anyway)
> -mental
I used a synchronized block in "download_file(html_link)"
method...Doesn't that make any sense of locking ??? Am I wrong again !!!