HTML/XML Parsing

Ruby_Tuesday2 · 24 February 2004 22:19

I’m wondering if anyone ever come across an example on how to parse an html
table(with images) using either XSLT or Ruby scripts.

I’d like to be able to extract all the data and put them in the
database(MySQL, SQLite, etc).

There’s a twist though, some of the image cells has 2 or more jpeg images
instead of one. Since I’m not an expert database designer, how do you do
that?

Table fields:

···

xnum text(40) unique
image jpeg image (may have none, or 1+ images)
desc memo(256)
loc memo(256)

Thanks.

dhtapp · 24 February 2004 22:34

My first suggestion would be to

make a to-many relationship to the image records,
to store any IMG attributes parsed from the HTML in the image records
themselves (with maybe an ordering attribute within the to-many set, in case
sliced images are bumped up against each other for positioning), and
To decide whether to store the images themselves on a filesystem with
pathnames in the records, or to store image data as BLOBS within the records
themselves.

If you need to retrieve the images through a non-HTTP pipeline into another
process, then BLOBS may be the way to go. If I was simply going to generate
dynamic HTML, then I’d probably go ahead and put the images out on a
filesystem where both the database and the Webserver could get to 'em.

dan

“Ruby Tuesday” rubytuzdayz@yahoo.com wrote in message
news:c1gidb$1j6abi$1@ID-205437.news.uni-berlin.de…

I’m wondering if anyone ever come across an example on how to parse an
html

···

table(with images) using either XSLT or Ruby scripts.

I’d like to be able to extract all the data and put them in the
database(MySQL, SQLite, etc).

There’s a twist though, some of the image cells has 2 or more jpeg images
instead of one. Since I’m not an expert database designer, how do you do
that?

Table fields:

xnum text(40) unique
image jpeg image (may have none, or 1+ images)
desc memo(256)
loc memo(256)

Thanks.

Mark_Hubbart · 24 February 2004 22:57

“Ruby Tuesday” rubytuzdayz@yahoo.com wrote in message
news:c1gidb$1j6abi$1@ID-205437.news.uni-berlin.de…

I’m wondering if anyone ever come across an example on how to parse an
html
table(with images) using either XSLT or Ruby scripts.

I’d like to be able to extract all the data and put them in the
database(MySQL, SQLite, etc).

There’s a twist though, some of the image cells has 2 or more jpeg
images
instead of one. Since I’m not an expert database designer, how do you
do
that?

Table fields:

xnum text(40) unique
image jpeg image (may have none, or 1+ images)
desc memo(256)
loc memo(256)

Thanks.

It’s been a while since I worked with databases, but perhaps something
like this:

table 1: “cells”

id int autoincrement primary_key
xnum text(40) unique
desc …
loc …

table 2: “images”

id int autoincrement primary_key
cell_id int index
image blob

that way, more than one image could be linked to the same cell_id. then:

SELECT image, xnum FROM images, cells WHERE cell_id = cells.id;

…to select a list of records conating to fields: image data, and cell
numbers (assuming that’s what the xnum is)

Alternatively, you could forgo the ids, and link images via xnums. But
I understand that using ids is the “right” way, whatever that means.

Topic		Replies	Views
Parse Word/HTML Docs for database inserts ruby-talk	3	122	16 July 2009
Basic xml parsing question ruby-talk	3	88	27 March 2009
Xml to html display ruby-talk	1	97	10 January 2012
Parsing HTML tables ruby-talk	2	110	31 March 2009
Html table data extraction ruby-talk	0	89	18 February 2004

HTML/XML Parsing

Table fields:

Table fields:

Related topics