I am trying to process an XML file that includes various codes. The problem I am running into is that some of these codes are inserted into the middle of an encrypted string. If I display the file using a browser these codes do not show up and copying and pasting the string work fine. The problem occurs when I try to strip out the string in a program and these "extraneous" XML codes are included. This of course makes the decryption routine crash.
What I am looking for is a simple way to read through the file and remove all the XML codes leaving just plain text. I could probably write a series of regular expressions to remove each code that I can find in my text but am afraid I might miss some and it will come back to haunt me at a later time.
str.gsub /</?[^>]+>/, ''
This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not <), like:
for ( var i=0, len=a.length; i<len; ++i )
In that case you likely want a proper XML parser (like REXML) and to
use it.
Do you really want to remove the XML, or would it suffice to just:
str.gsub! '&', '&'
str.gsub! '<', '<'
str.gsub! '>', '>'
(and maybe even)
str.gsub! '"', '"'
str.gsub! "'", '''
to make your string valid and escaped for use in an HTML context?
···
On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry...@worldnet.att.net> wrote:
I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these "extraneous" XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.
Phrogz wrote:
I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these "extraneous" XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.str.gsub /</?[^>]+>/, ''
This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not <), like:for ( var i=0, len=a.length; i<len; ++i )
In that case you likely want a proper XML parser (like REXML) and to
use it.Do you really want to remove the XML, or would it suffice to just:
str.gsub! '&', '&'
str.gsub! '<', '<'
str.gsub! '>', '>'
(and maybe even)
str.gsub! '"', '"'
str.gsub! "'", '''to make your string valid and escaped for use in an HTML context?
My problem is that the XML file includes 
 in the middle of a couple of fields, especially in the encrypted fields. If I just strip out the encrypted field and try to decrypt it the program crashes as the key is invalid. I have to remove the "bad" character strings before sending it to my decryption program. I would prefer to do this removal before sending the file to my programs so that I don't have to deal with these codes.
I assume that the string I am seeing is XML's way of saying CR/LF as DA in hex is CR/LF and the output in a browser shows the field being broken at that point. The problem is that is only the ones that I have noticed and there may be others hiding in the data. The XML file is being parsed for conversion to our accounts.
···
On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry...@worldnet.att.net> > wrote: