Ruby method to strip out XML codes?

I am trying to process an XML file that includes various codes. The problem I am running into is that some of these codes are inserted into the middle of an encrypted string. If I display the file using a browser these codes do not show up and copying and pasting the string work fine. The problem occurs when I try to strip out the string in a program and these "extraneous" XML codes are included. This of course makes the decryption routine crash.
What I am looking for is a simple way to read through the file and remove all the XML codes leaving just plain text. I could probably write a series of regular expressions to remove each code that I can find in my text but am afraid I might miss some and it will come back to haunt me at a later time.

str.gsub /</?[^>]+>/, ''

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not &lt;), like:

   for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

  str.gsub! '&', '&amp;'
  str.gsub! '<', '&lt;'
  str.gsub! '>', '&gt;'
(and maybe even)
  str.gsub! '"', '&quot;'
  str.gsub! "'", '&apos;'

to make your string valid and escaped for use in an HTML context?

···

On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry...@worldnet.att.net> wrote:

I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these "extraneous" XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

Phrogz wrote:

I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these "extraneous" XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

str.gsub /</?[^>]+>/, ''

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not &lt;), like:

   for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

  str.gsub! '&', '&amp;'
  str.gsub! '<', '&lt;'
  str.gsub! '>', '&gt;'
(and maybe even)
  str.gsub! '"', '&quot;'
  str.gsub! "'", '&apos;'

to make your string valid and escaped for use in an HTML context?

My problem is that the XML file includes &#xD;&#xA; in the middle of a couple of fields, especially in the encrypted fields. If I just strip out the encrypted field and try to decrypt it the program crashes as the key is invalid. I have to remove the "bad" character strings before sending it to my decryption program. I would prefer to do this removal before sending the file to my programs so that I don't have to deal with these codes.
I assume that the string I am seeing is XML's way of saying CR/LF as DA in hex is CR/LF and the output in a browser shows the field being broken at that point. The problem is that is only the ones that I have noticed and there may be others hiding in the data. The XML file is being parsed for conversion to our accounts.

···

On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry...@worldnet.att.net> > wrote: