Regexp working with mixed lines endings

Hey all,

i've an audio file (wav) containing some xml metadatas at start or
ending of the ausio datas.

my regexp works fine with unix lines endings.

however some recorder puts mixed line ending where my regexp isn't
working.

is their a special option able to work with all kind of endings ?

my regexps :

rgxstart=Regexp.new("<BWFXML>")
rgxstop=Regexp.new("</BWFXML>")

the comparaison i do :

rgxstart === l.chomp

l being :

File.open(<the sound file>).each { |l| ...}

···

--
une bévue

Une bévue wrote:

/ ...

is their a special option able to work with all kind of endings ?

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,"")

my regexps :

rgxstart=Regexp.new("<BWFXML>")
rgxstop=Regexp.new("</BWFXML>")

the comparaison i do :

rgxstart === l.chomp

l being :

File.open(<the sound file>).each { |l| ...}

Try this instead:

data = File.read(filename)

data.gsub!(/\r/,"")

array =

data.split("\n").each do |line|
  # process lines here
  array << line
end

By using this approach, all your XML lines will be made uniform. At the end
of the processing, you will need to reintegrate the lines into a block for
storage:

data = array.join("\n")

file.open(filename,"w") { |f| f.write data }

···

--
Paul Lutus
http://www.arachnoid.com

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,"")

>

<snip />

Try this instead:

data = File.read(filename)

data.gsub!(/\r/,"")

array =

data.split("\n").each do |line|
  # process lines here
  array << line
end

By using this approach, all your XML lines will be made uniform. At the end
of the processing, you will need to reintegrate the lines into a block for
storage:

data = array.join("\n")

file.open(filename,"w") { |f| f.write data }

OK fine thanks very much it's a nice solution somehow "normalizing" win*
line endings :wink:

In fact i've a little bit modified what u've wroten :
data.gsub!(/\r\n/,"\n")
data.gsub!(/\r/,"\n")

because i've discovered in the mean time i could have :
\r
\n
\r\n

lines endings )))

does \n\r exists ? (wikipedia says NO)

also because the most part of the audio input file is "binary" datas
there line ending is out of meaning, i suppose.

anyway, thanks a lot i'm now "armed" to face any situation :wink:

right now with the two first examples files i get doing my wav2xml and
reading the xml file gave me syntax colored results (within two
different text editors), then i think it is a proof the prob is cured !

···

Paul Lutus <nospam@nosite.zzz> wrote:
--
une bévue

Une bévue wrote:

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,"")

>

<snip />

Try this instead:

data = File.read(filename)

data.gsub!(/\r/,"")

array =

data.split("\n").each do |line|
  # process lines here
  array << line
end

By using this approach, all your XML lines will be made uniform. At the
end of the processing, you will need to reintegrate the lines into a
block for storage:

data = array.join("\n")

file.open(filename,"w") { |f| f.write data }

OK fine thanks very much it's a nice solution somehow "normalizing" win*
line endings :wink:

In fact i've a little bit modified what u've wroten :
data.gsub!(/\r\n/,"\n")
data.gsub!(/\r/,"\n")

What's the point? You have the following possibilities:

\r\n

\n\r

\n

All of these cases are handled by my posted method.

because i've discovered in the mean time i could have :
\r
\n
\r\n

lines endings )))

Okay, the first ("\r") might be old-style Macintosh line endings. Here is a
solution for all the possibilities:

data.gsub!(%r{(\r\n|\n\r|\r)},"\n")

does \n\r exists ? (wikipedia says NO)

Doesn't matter. Someone might type it in manually. If it exists, the above
method will handle it.

also because the most part of the audio input file is "binary" datas
there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don't try to filter line endings.

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

···

Paul Lutus <nospam@nosite.zzz> wrote:

--
Paul Lutus
http://www.arachnoid.com

Okay, the first ("\r") might be old-style Macintosh line endings. Here is a
solution for all the possibilities:

data.gsub!(%r{(\r\n|\n\r|\r)},"\n")

>
> does \n\r exists ? (wikipedia says NO)

Doesn't matter. Someone might type it in manually. If it exists, the above
method will handle it.

OK, thanks, i'll try that asap.

>
> also because the most part of the audio input file is "binary" datas
> there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don't try to filter line endings.

BUT I DON'T have the choice the audio files i get does have metadatas
writen in xml mixed with binary audio datas. The line endings are
"correct" within the xml. I have to face with the output given by
various recorders.

i've uploaded in <http://thoraval.yvon.free.fr/Audio&gt;

a *** truncated *** version of one of the file i'm getting the xml part,
this file is named "bidule-truncated.wav" don't play it as an audio file
because i've writen :

[audio part truncated]

in the middle of the audio part to make it lighter (4k instead of MBs).

anyway thanks a lot helping me for that line endings :wink:

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

then don't work...

···

Paul Lutus <nospam@nosite.zzz> wrote:
--
une bévue

Une bévue wrote:

Okay, the first ("\r") might be old-style Macintosh line endings. Here is
a solution for all the possibilities:

data.gsub!(%r{(\r\n|\n\r|\r)},"\n")

>
> does \n\r exists ? (wikipedia says NO)

Doesn't matter. Someone might type it in manually. If it exists, the
above method will handle it.

OK, thanks, i'll try that asap.

>
> also because the most part of the audio input file is "binary" datas
> there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don't try to filter line
endings.

BUT I DON'T have the choice the audio files i get does have metadatas
writen in xml mixed with binary audio datas. The line endings are
"correct" within the xml. I have to face with the output given by
various recorders.

If you read a file that is part text and part binary, DO NOT filter line
endings. Instead, write your parsing code to accommodate different line
endings on the fly. One way to do this is to read a specific block size
from the file (by detecting a delimiter that separates the text from the
binary parts), work on that block, then reattach the block to the file.

i've uploaded in <http://thoraval.yvon.free.fr/Audio&gt;

a *** truncated *** version of one of the file i'm getting the xml part,
this file is named "bidule-truncated.wav" don't play it as an audio file
because i've writen :

[audio part truncated]

in the middle of the audio part to make it lighter (4k instead of MBs).

anyway thanks a lot helping me for that line endings :wink:

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

then don't work...

Treat the text part differently than the binary part. Read the entire file,
split it up based on some kind of delimiters, edit the text part, recombine
the separated parts, save the file.

BTW, how is the binary data mixed with the text data? Is this an XML file
that uses the CDATA blocking convention? That scheme is quite manageable.

···

Paul Lutus <nospam@nosite.zzz> wrote:

--
Paul Lutus
http://www.arachnoid.com