Nikogiri

A_Berger · 10 August 2016 15:45

Hello
anyone using Nokogiri? -- or liking it?
Are there similar "easier" gems?

I dont understand Nokogiri -
nokogiri.org has not much infos (no syntax-explanations), generally there
are no good docs, seems ri is missing completely!

Why cant I wrap the surroundings from the content? <a ... > or <td ... /td>

Do you know a good (==understandable) complete reference, explaining each
function(ality)?

Thanks a lot
Berg

Leam_Hall · 10 August 2016 16:04

I am using it, but not well. There's a Google Group and IRC channel but neither seem too active.

I found reading up on xpath to help understand Nokogiri a bit.

Leam

···

On 08/10/16 11:45, A Berger wrote:

Hello
anyone using Nokogiri? -- or liking it?
Are there similar "easier" gems?

I dont understand Nokogiri -
nokogiri.org <http://nokogiri.org> has not much infos (no
syntax-explanations), generally there are no good docs, seems ri is
missing completely!

Why cant I wrap the surroundings from the content? <a ... > or <td ... /td>

Do you know a good (==understandable) complete reference, explaining
each function(ality)?

Thanks a lot
Berg

Paris_John_Sinclair · 10 August 2016 19:34

Searching on "xpath" tutorials should give better results than searching on
"nokogiri".

···

On Aug 10, 2016 8:45 AM, "A Berger" <aberger7890@gmail.com> wrote:

Hello
anyone using Nokogiri? -- or liking it?
Are there similar "easier" gems?

I dont understand Nokogiri -
nokogiri.org has not much infos (no syntax-explanations), generally there
are no good docs, seems ri is missing completely!

Why cant I wrap the surroundings from the content? <a ... > or <td ...
/td>

Do you know a good (==understandable) complete reference, explaining each
function(ality)?

Thanks a lot
Berg

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

abinoam · 10 August 2016 15:54

Hi Berger,

If you provide a more complete xml fragment and tell us what you to
accomplish perhaps we can help you around.

(I also didn't find nokogiri easy at first sight).

Best regards,
Abinoam Jr.

···

2016-08-10 12:45 GMT-03:00 A Berger <aberger7890@gmail.com>:

Hello
anyone using Nokogiri? -- or liking it?
Are there similar "easier" gems?

I dont understand Nokogiri -
nokogiri.org has not much infos (no syntax-explanations), generally there
are no good docs, seems ri is missing completely!

Why cant I wrap the surroundings from the content? <a ... > or <td ... /td>

Do you know a good (==understandable) complete reference, explaining each
function(ality)?

Thanks a lot
Berg

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

stomar · 10 August 2016 16:35

anyone using Nokogiri? -- or liking it?

Yes, many -- it's "the" xml parsing library for Ruby.

I dont understand Nokogiri -
nokogiri.org <http://nokogiri.org> has not much infos (no
syntax-explanations), generally there are no good docs, seems ri is
missing completely!

Did you try the tutorials on www.nokogiri.org?
They cover most of the common tasks you should need.
(Some knowledge of xpath and/or CSS is required, though.)

Also, there are many blog posts and articles online.

Missing ri docs might be a problem of your installation,
but they are also online, e.g. under
http://www.rubydoc.info/github/sparklemotion/nokogiri/

Regards,
Marcus

···

Am 10.08.2016 um 17:45 schrieb A Berger:

--
GitHub: stomar (Marcus Stollsteimer) · GitHub
PGP: 0x6B3A101A

A_Berger · 10 August 2016 17:57

Hi all, Hi Abinoam,
I attached that (longer) file (so mail is not polluted for whom is not
interested in the data!)

There are many different field-"types" I would be interested to extract.
For me it looks simpler to do it with regex, but thats not the intended way

Hope you can help me, then I suppose I'll understand Nokogiri!

PS: ri-docs are there, but most items are like "method(): extract items..."
Not much usable/helping infos.

Thx! Berg

b.htm.gz (1.69 KB)

···

Am 10.08.2016 17:55 schrieb "Abinoam Jr." <abinoam@gmail.com>:

Hi Berger,

If you provide a more complete xml fragment and tell us what you to
accomplish perhaps we can help you around.

(I also didn't find nokogiri easy at first sight).

Best regards,
Abinoam Jr.

2016-08-10 12:45 GMT-03:00 A Berger <aberger7890@gmail.com>:
> Hello
> anyone using Nokogiri? -- or liking it?
> Are there similar "easier" gems?
>
> I dont understand Nokogiri -
> nokogiri.org has not much infos (no syntax-explanations), generally
there
> are no good docs, seems ri is missing completely!
>
> Why cant I wrap the surroundings from the content? <a ... > or <td ...
/td>
>
> Do you know a good (==understandable) complete reference, explaining each
> function(ality)?
>
> Thanks a lot
> Berg
>
>
>
> Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=
>
> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
>

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

timlen_tse · 11 August 2016 04:04

Nokogiri is a xml/ html parser written in ruby.
I suggested you read docs about Xpath and css selector，which you can select
elements from html or xml, access attributes,text of nodes...

A Berger <aberger7890@gmail.com>于2016年8月10日周三下午11:45写道：

···

Hello
anyone using Nokogiri? -- or liking it?
Are there similar "easier" gems?

I dont understand Nokogiri -
nokogiri.org has not much infos (no syntax-explanations), generally there
are no good docs, seems ri is missing completely!

Why cant I wrap the surroundings from the content? <a ... > or <td ...
/td>

Do you know a good (==understandable) complete reference, explaining each
function(ality)?

Thanks a lot
Berg

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

abinoam · 10 August 2016 18:59

Hi Berger,

Have you tried Mechanize?
I think is more what you're searching for.

http://docs.seattlerb.org/mechanize/EXAMPLES_rdoc.html

I use Capybara for testing.

By the way, to get the <title> you could do like.

doc = File.open("target_file.html") { |f| Nokogiri::HTML(f) }

nodeset = doc.xpath('//html/head/title')
# Returns a Nokogiri::XML::NodeSet (all nodes in criteria)

node = nodeset.first

node.text
# => 'The text title string'

Look at Searching a XML/HTML document - Nokogiri

As others said, most of the harassement in using nokogiri is related
to understand how XPath works.

Abinoam Jr.

···

2016-08-10 14:57 GMT-03:00 A Berger <aberger7890@gmail.com>:

Hi all, Hi Abinoam,
I attached that (longer) file (so mail is not polluted for whom is not
interested in the data!)

There are many different field-"types" I would be interested to extract.
For me it looks simpler to do it with regex, but thats not the intended way

Hope you can help me, then I suppose I'll understand Nokogiri!

PS: ri-docs are there, but most items are like "method(): extract items..."
Not much usable/helping infos.

Thx! Berg

Am 10.08.2016 17:55 schrieb "Abinoam Jr." <abinoam@gmail.com>:

Hi Berger,

If you provide a more complete xml fragment and tell us what you to
accomplish perhaps we can help you around.

(I also didn't find nokogiri easy at first sight).

Best regards,
Abinoam Jr.

2016-08-10 12:45 GMT-03:00 A Berger <aberger7890@gmail.com>:
> Hello
> anyone using Nokogiri? -- or liking it?
> Are there similar "easier" gems?
>
> I dont understand Nokogiri -
> nokogiri.org has not much infos (no syntax-explanations), generally
> there
> are no good docs, seems ri is missing completely!
>
> Why cant I wrap the surroundings from the content? <a ... > or <td ...
> /td>
>
> Do you know a good (==understandable) complete reference, explaining
> each
> function(ality)?
>
> Thanks a lot
> Berg
>
>
>
> Unsubscribe:
> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
>

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

stomar · 10 August 2016 18:32

I attached that (longer) file (so mail is not polluted for whom is not
interested in the data!)

The attachment is there nevertheless... please use e.g. gists
in the future.

There are many different field-"types" I would be interested to extract.
For me it looks simpler to do it with regex, but thats not the intended
way

HTML can be rather complicated (content could include newlines,
tags can include classes, ids, ...), so I would strongly advice
against regexp.

PS: ri-docs are there, but most items are like "method(): extract items..."
Not much usable/helping infos.

Did you try some of the very numerous tutorials available online?

Regards,
Marcus

···

Am 10.08.2016 um 19:57 schrieb A Berger:

--
GitHub: stomar (Marcus Stollsteimer) · GitHub
PGP: 0x6B3A101A

stomar · 10 August 2016 19:13

Have you tried Mechanize?
I think is more what you're searching for.

That's more for interacting with a web page (like following links, ...)
and less for extracting information.

Look at Searching a XML/HTML document - Nokogiri

That's one of the tutorials I already did point out to the OP,
thanks for corroborating -- maybe it helps

Regards,
Marcus

···

Am 10.08.2016 um 20:59 schrieb Abinoam Jr.:

--
GitHub: stomar (Marcus Stollsteimer) · GitHub
PGP: 0x6B3A101A

klaus_schilling · 11 August 2016 05:44

Is it possible to implement an object-oriented substitute for XSLT
with Nokogiri, similar to former Perl module XML-XPathScript?

Klaus Schilling

A_Berger · 11 August 2016 07:54

Hello!
Thanks for the tipps - its both:
first extract all possible information (e.g. xpath vars, vars in comments,
js-functions - is that possible?), then execute code and submit.

I didnt find infos how to extract these different "types" of fields.
Is there any syntax reference, where the meaning of // / . > etc in
Nokogiri is shown?
- thats the hard part, but I will look further.

Cheers
Berg

abinoam · 10 August 2016 21:05

Right. That's what IMHO is what he is _really_ willing to do as what I
could grab from the attached file.
(I may be wrong of course)
He mentions executing javascripting, submiting values, etc.

···

2016-08-10 16:13 GMT-03:00 <sto.mar@web.de>:

Am 10.08.2016 um 20:59 schrieb Abinoam Jr.:

Have you tried Mechanize?
I think is more what you're searching for.

That's more for interacting with a web page (like following links, ...)
and less for extracting information.

stomar · 10 August 2016 21:31

Sorry, I agree then, of course.
My remark was based on his posts, where he only mentioned
extracting values.

···

Am 10.08.2016 um 23:05 schrieb Abinoam Jr.:

2016-08-10 16:13 GMT-03:00 <sto.mar@web.de>:

Am 10.08.2016 um 20:59 schrieb Abinoam Jr.:

Have you tried Mechanize?
I think is more what you're searching for.

That's more for interacting with a web page (like following links, ...)
and less for extracting information.

Right. That's what IMHO is what he is _really_ willing to do as what I
could grab from the attached file.
(I may be wrong of course)
He mentions executing javascripting, submiting values, etc.

--
GitHub: https://github.com/stomar/
PGP: 0x6B3A101A

stomar · 11 August 2016 08:18

As has been pointed out several times already, that's part of the
XPath language, it is _not_ specific to Nokogiri.

A quick web search gave me several "Getting started with Nokogiri"
tutorials, e.g. one by Aaron Patterson, did you read one of these?
Or the tutorials on www.nokogiri.org?
They also give a basic introduction to XPath or at least point
you to further resources about it. You could also try a web search
for "xpath tutorial", that's what I would do, and it also has been
recommended to you already several times.

Frankly, by asking the same questions over and over again and
apparently not reading the answers you are really wasting our time.

Regards,
Marcus

···

Am 11.08.2016 um 09:54 schrieb A Berger:

Is there any syntax reference, where the meaning of // / . > etc in
Nokogiri is shown?
- thats the hard part, but I will look further.

--
GitHub: https://github.com/stomar/
PGP: 0x6B3A101A

Matthew_Kerwin · 11 August 2016 08:23

Hello!
Thanks for the tipps - its both:
first extract all possible information (e.g. xpath vars, vars in comments,

Does not compute. What are you trying to say here about 'xpath vars'
and 'vars in comments'?

js-functions - is that possible?), then execute code and submit.

It's easy to select all 'script' and/or 'script/@src' nodes from an
XML/HTML DOM using XPath. Interpreting their content as javascript and
executing it, though, would require a javascript engine. That's where
mechanize-js and friends come in.

I didnt find infos how to extract these different "types" of fields.
Is there any syntax reference, where the meaning of // / . > etc in
Nokogiri is shown?
- thats the hard part, but I will look further.

That is the syntax of an XPath query. It's like a generalisation of
CSS selectors. Google and Wikipedia are really good helpers here.

···

On 11/08/2016, A Berger <aberger7890@gmail.com> wrote:

Cheers
Berg

--
Matthew Kerwin
http://matthew.kerwin.net.au/

Robert_K1 · 11 August 2016 21:07

Well, Nokogiri is OO, can use XPath and CSS selectors and can
manipulate a DOM, so I guess the answer is "yes".

robert

···

On Thu, Aug 11, 2016 at 7:44 AM, klaus schilling <schilling.klaus@web.de> wrote:

Is it possible to implement an object-oriented substitute for XSLT
with Nokogiri, similar to former Perl module XML-XPathScript?

--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can
- without end}
http://blog.rubybestpractices.com/

A_Berger · 12 August 2016 07:02

Hi experts!

How can such a syntax be getting standardized?!

How are keywords (instead shortcuts) differenced from items?
/ works while the keywords e.g. child@x looks for item "child".

Can I regex for a selector?
e.g. <tr> => searching for 't.*' ?

Which XPath-version does Nokogiri support?

Can I get "some" items like
tr[3..10] ?

Thx Berg

stomar · 12 August 2016 07:49

How can such a syntax be getting standardized?!

How are keywords (instead shortcuts) differenced from items?
/ works while the keywords e.g. child@x looks for item "child".

Please explain in more detail.
What do you mean with keywords, shortcuts, items???

Can I regex for a selector?
e.g. <tr> => searching for 't.*' ?

I guess not, but why would you need to?

Which XPath-version does Nokogiri support?

That information can be found in Nokogiri's README.

What functionality/version do you need?

Can I get "some" items like
tr[3..10] ?

Quote from
http://www.nokogiri.org/tutorials/searching_a_xml_html_document.html:

"The Node methods xpath and css actually return a NodeSet, which acts
very much like an array, and contains matching nodes from the document."

So you could first get the NodeSet, then access individual elements.

Regards,
Marcus

···

Am 12.08.2016 um 09:02 schrieb A Berger:

--
GitHub: https://github.com/stomar/
PGP: 0x6B3A101A

A_Berger · 13 August 2016 23:13

> How can such a syntax be getting standardized?!
>
> How are keywords (instead shortcuts) differenced from items?
> / works while the keywords e.g. child@x looks for item "child".

Please explain in more detail.
What do you mean with keywords, shortcuts, items???

> Can I regex for a selector?
> e.g. <tr> => searching for 't.*' ?

I guess not, but why would you need to?

> Which XPath-version does Nokogiri support?

That information can be found in Nokogiri's README.

What functionality/version do you need?

> Can I get "some" items like
> tr[3..10] ?

Hi,
I meant a 'range'
like " /array[3..10]/... "
how can you do that (at once)? (Haven't found that in any docs)
Do I have to repeat the whole line for each index?
How can you apply a function to different types of elements (path, element
or attribut)
tr/... works, how to use the not-abbrevated form?
tr child ... doesnt work...

Thanks Berg

···

Am 12.08.2016 09:49 schrieb <sto.mar@web.de>:

Am 12.08.2016 um 09:02 schrieb A Berger:

Quote from
http://www.nokogiri.org/tutorials/searching_a_xml_html_document.html:

"The Node methods xpath and css actually return a NodeSet, which acts
very much like an array, and contains matching nodes from the document."

So you could first get the NodeSet, then access individual elements.

Regards,
Marcus

Topic		Replies	Views
Using Nokogiri ruby-talk	17	116	13 November 2009
[ANN] nokogiri 1.4.1 Released ruby-talk	7	127	14 December 2009
Help with HTML parsing ruby-talk	12	113	5 November 2009
Hpricot or nokogiri? ruby-talk	7	118	12 February 2009
[ANN] nokogiri 1.0.0 Released ruby-talk	1	129	31 October 2008

Nikogiri

Related topics