Hello all, just wondering whether someone else has observed this.
I am using REXML and at times, it seems to have difficulties "seeing" a
closing tag. I am wrapping XML-escaped binary data in XML and as a
result there might be a lot of special characters between tags. I
assume some of those special characters are causing problems...
I added '\n' chars at the end of the binary stream, which seemed to
help, but not completely solve the problem.
Anybody else observed this or has suggestions on how to overcome this
problem?
Hello all, just wondering whether someone else has observed this.
I am using REXML and at times, it seems to have difficulties "seeing" a
closing tag. I am wrapping XML-escaped binary data in XML and as a
result there might be a lot of special characters between tags. I
assume some of those special characters are causing problems...
Hello all, just wondering whether someone else has observed this.
I am using REXML and at times, it seems to have difficulties "seeing" a
closing tag. I am wrapping XML-escaped binary data in XML and as a
result there might be a lot of special characters between tags. I
assume some of those special characters are causing problems...
I added '\n' chars at the end of the binary stream, which seemed to
help, but not completely solve the problem.
Anybody else observed this or has suggestions on how to overcome this
problem?
Show us the problem. There are many kinds of character sequences that are
not allowed in XML data fields, and there are a number of ways to escape
the data fields, but they have to be applied in order to work. Arbitrary
data can't simply be dropped between XML delimiters, without certain
precautions being taken.
It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?
Would your process include using Base64?
Also, let's pretend that small size is desirable, but time spent
zipping is unacceptable.
Thanks in advance for any insight,
-Harold
···
On 10/21/06, Paul Lutus <nospam@nosite.zzz> wrote:
Show us the problem. There are many kinds of character sequences that are
not allowed in XML data fields, and there are a number of ways to escape
the data fields, but they have to be applied in order to work. Arbitrary
data can't simply be dropped between XML delimiters, without certain
precautions being taken.
It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?
"Base64 encoding, specified in RFC 2045 - MIME (Multipurpose Internet Mail Extensions) uses a 64-character subset (A-Za-z0-9+/) to represent binary data and = for padding. Base64 processes data as 24-bit groups, mapping this data to four encoded characters. It is sometimes referred to as 3-to-4 encoding. Each 6 bits of the 24-bit group is used as an index into a mapping table (the base64 alphabet) to obtain a character for the encoded data. According to the MIME specification the encoded data has line lengths limited to 76 characters, but this line length restriction does not apply when transmitting binary data as part of XML document."
It's a common, practical approach.
···
--
James Britt
"Every object obscures another object."
- Luis Bunuel
Show us the problem. There are many kinds of character sequences that are
not allowed in XML data fields, and there are a number of ways to escape
the data fields, but they have to be applied in order to work. Arbitrary
data can't simply be dropped between XML delimiters, without certain
precautions being taken.
It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?
Okay, you need to know I am famously lazy. In fact, I think Larry Wall was
describing me when he made his well-known remark about programmer laziness
and hubris. Being lazy, the first simple approach I would take is to
enclose the binary data like this:
<enclosing XML tag><![CDATA[(binary data here)]]></enclosing XML tag>
The next step would be to make sure neither the starting or ending CDATA tag
appears in the enclosed binary data, otherwise this strategy will fail.
The next step after that is to escape (and later unescape) the binary data
if needed to assure the uniqueness of the delimiters.
You need to understand that, with a sufficiently large and varied binary
data set, every imaginable character string will appear in the data,
eventually including the delimiters.
This, in turn, means that escaping the data is eventually a requirement, and
escaping the data means it will be larger than if this step were not
needed.
You should realize that another, possibly better, approach for truly large
binary globs is to store them as files, and store links to the files in the
XML data set, rather than the raw data itself.
···
On 10/21/06, Paul Lutus <nospam@nosite.zzz> wrote:
This is something that I personally use for a Ruby routine I have that
stores stock item images for the retail jewelry company I work for. I
extract JPG images and store them as Base-64 encoded elements in an XML
file. Then I port that into a SQL database. To extract the images I
just decode them. Works like a charm and perhaps saves some SQL
resources since I'm not storing the images and actual BLOB items...
James Britt wrote:
···
Harold Hausman wrote:
> It sounds like you might have some experience in this area. Not to
> hijack the OP, but could you possibly describe the process you would
> go through if you had a completely random pile of binary barf that you
> wanted to store as an XML attribute?
>
> Would your process include using Base64?
"Base64 encoding, specified in RFC 2045 - MIME (Multipurpose Internet
Mail Extensions) uses a 64-character subset (A-Za-z0-9+/) to represent
binary data and = for padding. Base64 processes data as 24-bit groups,
mapping this data to four encoded characters. It is sometimes referred
to as 3-to-4 encoding. Each 6 bits of the 24-bit group is used as an
index into a mapping table (the base64 alphabet) to obtain a character
for the encoded data. According to the MIME specification the encoded
data has line lengths limited to 76 characters, but this line length
restriction does not apply when transmitting binary data as part of XML
document."
It's a common, practical approach.
--
James Britt
"Every object obscures another object."
- Luis Bunuel
It's funny, to me, how laziness has become a defense mechanism. I
think *I* personnally kind of like it. (:
Storing the binary as a seperate file is a great solution. In our
particular case we like to have the data in one big xml file for the
purposes of source control. I'm sure I don't need to expound on the
greatness of plain text on the Ruby list, but the source control
system we use doesn't play exceptionally well with binary files.
Thanks again,
-Harold
···
On 10/21/06, Paul Lutus <nospam@nosite.zzz> wrote:
Harold Hausman wrote:
> On 10/21/06, Paul Lutus <nospam@nosite.zzz> wrote:
>>
>> Show us the problem. There are many kinds of character sequences that are
>> not allowed in XML data fields, and there are a number of ways to escape
>> the data fields, but they have to be applied in order to work. Arbitrary
>> data can't simply be dropped between XML delimiters, without certain
>> precautions being taken.
>>
>> --
>> Paul Lutus
>> http://www.arachnoid.com
>>
>
> Hi Paul,
>
> It sounds like you might have some experience in this area. Not to
> hijack the OP, but could you possibly describe the process you would
> go through if you had a completely random pile of binary barf that you
> wanted to store as an XML attribute?
Okay, you need to know I am famously lazy. In fact, I think Larry Wall was
describing me when he made his well-known remark about programmer laziness
and hubris. Being lazy, the first simple approach I would take is to
enclose the binary data like this:
<enclosing XML tag><![CDATA[(binary data here)]]></enclosing XML tag>
The next step would be to make sure neither the starting or ending CDATA tag
appears in the enclosed binary data, otherwise this strategy will fail.
The next step after that is to escape (and later unescape) the binary data
if needed to assure the uniqueness of the delimiters.
You need to understand that, with a sufficiently large and varied binary
data set, every imaginable character string will appear in the data,
eventually including the delimiters.
This, in turn, means that escaping the data is eventually a requirement, and
escaping the data means it will be larger than if this step were not
needed.
You should realize that another, possibly better, approach for truly large
binary globs is to store them as files, and store links to the files in the
XML data set, rather than the raw data itself.
I was just calling REXML::Text.normalize...I guess that is not
sufficient.
I give base64 encode a try.
THanks for you help.
Christian
gregarican wrote:
···
This is something that I personally use for a Ruby routine I have that
stores stock item images for the retail jewelry company I work for. I
extract JPG images and store them as Base-64 encoded elements in an XML
file. Then I port that into a SQL database. To extract the images I
just decode them. Works like a charm and perhaps saves some SQL
resources since I'm not storing the images and actual BLOB items...
James Britt wrote:
> Harold Hausman wrote:
>
> > It sounds like you might have some experience in this area. Not to
> > hijack the OP, but could you possibly describe the process you would
> > go through if you had a completely random pile of binary barf that you
> > wanted to store as an XML attribute?
> >
> > Would your process include using Base64?
>
> http://www.topxml.com/xml/articles/binary/
>
> "Base64 encoding, specified in RFC 2045 - MIME (Multipurpose Internet
> Mail Extensions) uses a 64-character subset (A-Za-z0-9+/) to represent
> binary data and = for padding. Base64 processes data as 24-bit groups,
> mapping this data to four encoded characters. It is sometimes referred
> to as 3-to-4 encoding. Each 6 bits of the 24-bit group is used as an
> index into a mapping table (the base64 alphabet) to obtain a character
> for the encoded data. According to the MIME specification the encoded
> data has line lengths limited to 76 characters, but this line length
> restriction does not apply when transmitting binary data as part of XML
> document."
>
>
>
> It's a common, practical approach.
>
>
> --
> James Britt
>
> "Every object obscures another object."
> - Luis Bunuel
Storing the binary as a seperate file is a great solution. In our
particular case we like to have the data in one big xml file for the
purposes of source control. I'm sure I don't need to expound on the
greatness of plain text on the Ruby list,
Or anywhere else IMHO. It's the ultimate in reusability and portability.
but the source control
system we use doesn't play exceptionally well with binary files.
At its base, this problem is one of statistics. The longer a pure-binary
data block becomes, the more likely that there will be an appearance of the
character sequence required to terminate the block. And if the obvious
solution is applied, that of using some coding that cannot deviate from a
safe syntax (like hexadecimal ASCII characters), the block becomes more
than twice as large as the original, seriously cutting into storage and
time efficiency.