.each skipping elements

I have an array of links from a webpage. I need to clean up the links
so it only has city links in it. So I do a .each and test for regex.

page.links.each{

if link.text =~ /(Blog|About Us|Status|Help|TOS|Privacy|Are we missing
an area?)/
     page.links.delete(link)
end
end

For some reason the deleting of a link/element causes it to skip the
next link/element

in the array the last 7 links are

About us
Blog
Status
Help
TOS
Privacy
Are we missing an area?

but after running the .each on the array I still end up with

blog
help
privacy

if I run it again

help

lol, so why can't I do this on just one run threw with .each?
or why would deleting an element cause it to skip the next one.

I rebuilt it with an ugly while loop with a counter ..same problem

···

--
Posted via http://www.ruby-forum.com/.

Hi --

I have an array of links from a webpage. I need to clean up the links
so it only has city links in it. So I do a .each and test for regex.

page.links.each{

if link.text =~ /(Blog|About Us|Status|Help|TOS|Privacy|Are we missing
an area?)/
    page.links.delete(link)
end

That can't be the code you're actually running; it doesn't assign
anything to the link variable.

For some reason the deleting of a link/element causes it to skip the
next link/element

in the array the last 7 links are

About us
Blog
Status
Help
TOS
Privacy
Are we missing an area?

but after running the .each on the array I still end up with

blog
help
privacy

if I run it again

help

lol, so why can't I do this on just one run threw with .each?
or why would deleting an element cause it to skip the next one.

I rebuilt it with an ugly while loop with a counter ..same problem

You're doing a destructive operation on the array while you're iterating
over it, which is going to give odd results. Ruby's internal counter is
going to be pointing to the wrong array entry if one of them disappears.

You're also doing too much work. Try this:

   page.links.delete_if {|link| link =~ /..../ }

David

···

On Sun, 12 Sep 2010, Cameron Vessey wrote:

--
David A. Black, Senior Developer, Cyrus Innovation Inc.

   The Ruby training with Black/Brown/McAnally
   Compleat Philadelphia, PA, October 1-2, 2010
   Rubyist http://www.compleatrubyist.com

Thanks for the reply and I think I get it now

if we delete element 50
element 51 gets sloted into 50
then the pointer moves to 51 never addressing the original 51..ok

page.links.delete_if{|link| link =~ /(Blog|About
Us>Status>Help>TOS>Privacy>Are we missing an area?)/
   }

I tried it .. it runs... no errors... It loops threw all the link
elements.. but never does any thing.. nothing gets deleted

I see how it should work ...but it doesn't

thanks for the help though

···

--
Posted via http://www.ruby-forum.com/.

Hi --

Thanks for the reply and I think I get it now

if we delete element 50
element 51 gets sloted into 50
then the pointer moves to 51 never addressing the original 51..ok

page.links.delete_if{|link| link =~ /(Blog|About
Us>Status>Help>TOS>Privacy>Are we missing an area?)/

(Note that the ? in that regex is a special character and will not match
an actual question mark. It's a zero-or-one quantifier, operating on the
"a" character before it.)

  }

I tried it .. it runs... no errors... It loops threw all the link
elements.. but never does any thing.. nothing gets deleted

I see how it should work ...but it doesn't

Do you need to make it case insensitive? It definitely works:

$ cat del.rb array = ["Keep1", "Blog", "Status", "Keep2", "TOS", "Help", "Keep3"]
array.delete_if {|word| word =~ /Blog|Status|TOS|Help/ }
p array

$ ruby del.rb ["Keep1", "Keep2", "Keep3"]

so something else must be going on.

David

···

On Sun, 12 Sep 2010, Cameron Vessey wrote:

--
David A. Black, Senior Developer, Cyrus Innovation Inc.

   The Ruby training with Black/Brown/McAnally
   Compleat Philadelphia, PA, October 1-2, 2010
   Rubyist http://www.compleatrubyist.com

Maybe page.links returns a new array every time you call it instead of
an access to an internal structure. This would mean you modify a copy
and not the original data. You could verify by doing

3.times do
  puts page.links.object_id
end

If you see different object ids chances are that you get a copy and
are not modifying the original structure.

Kind regards

robert

···

On Sun, Sep 12, 2010 at 1:28 AM, Cameron Vessey <cameron1inm@hotmail.com> wrote:

Thanks for the reply and I think I get it now

if we delete element 50
element 51 gets sloted into 50
then the pointer moves to 51 never addressing the original 51..ok

page.links.delete_if{|link| link =~ /(Blog|About
Us>Status>Help>TOS>Privacy>Are we missing an area?)/
}

I tried it .. it runs... no errors... It loops threw all the link
elements.. but never does any thing.. nothing gets deleted

I see how it should work ...but it doesn't

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

a = ["About us", "Blog", "Status", "Help", "TOS", "Privacy", "Are we missing
an area?"]

a.delete_if{|link| link =~ /(Blog|About Us|Status|Help|TOS|Privacy|Are we
missing an area\?)/ }

p a # return ["About us"]

it works. maybe you need try {|link| link.text=~/../}

Guten Tag. Linux Ruby

facebook.com/gutenlinux
Signature powered by
<Professional Email Signatures by WiseStamp;
WiseStamp<Professional Email Signatures by WiseStamp;

···

On Sun, Sep 12, 2010 at 7:51 AM, David A. Black <dblack@rubypal.com> wrote:

Hi --

On Sun, 12 Sep 2010, Cameron Vessey wrote:

Thanks for the reply and I think I get it now

if we delete element 50
element 51 gets sloted into 50
then the pointer moves to 51 never addressing the original 51..ok

page.links.delete_if{|link| link =~ /(Blog|About
Us>Status>Help>TOS>Privacy>Are we missing an area?)/

(Note that the ? in that regex is a special character and will not match
an actual question mark. It's a zero-or-one quantifier, operating on the
"a" character before it.)

  }

I tried it .. it runs... no errors... It loops threw all the link
elements.. but never does any thing.. nothing gets deleted

I see how it should work ...but it doesn't

Do you need to make it case insensitive? It definitely works:

$ cat del.rb array = ["Keep1", "Blog", "Status", "Keep2", "TOS", "Help",
"Keep3"]
array.delete_if {|word| word =~ /Blog|Status|TOS|Help/ }
p array

$ ruby del.rb ["Keep1", "Keep2", "Keep3"]

so something else must be going on.

David

--
David A. Black, Senior Developer, Cyrus Innovation Inc.

The Ruby training with Black/Brown/McAnally
Compleat Philadelphia, PA, October 1-2, 2010
Rubyist http://www.compleatrubyist.com

Yep yep!

needed to add the .text at the end...

Thanks you guys are great..

def city_update
   city_name = []
   agent = Mechanize.new
   page = agent.get('http://www.craigslist.org/about/sites')
   8.times {page.links.delete_at(0)}
   page.links.delete_if{|link|

     link.text =~ /(Blog|About Us|Status|Help|TOS|Privacy|Are we missing
an area\?)/
   }
   page.links.each{|link|
     city_name << link.text
   }
end
puts city_update

Thats the whole method... basicly you want to make sure you have a
current list of availible Craigslist cities.. and you want to cut out
all the non needed links.. thanks again

···

--
Posted via http://www.ruby-forum.com/.