WARN: mswin32 build binary file read change of behavior

Hello,

Maybe it’s been mentioned before, but since I’ve spent some time looking for
error in my code (things broke), here it goes anyway, in case somebody runs
into this issue (low chance, this mostly being Linux crowd, but still…).
Maybe putting it in a FAQ isn’t a bad idea.

On cygwin builds:

File.open( ‘somebinaryfile.bin’ ) do | file |
binaryContent = file.read()
end

worked fine, i.e. it read all that it should have.

In mswin32 builds one must explicitly specify “rb” flags.

File.open( ‘somebinaryfile.bin’, “rb” )

or the file will be truncated.

I suppose it would be good to have it fixed in mswin32 build, since a major
practical consequence is that libraries and programs that don’t take this
into consideration (like those done on Unix flavors and used on Win32) will
break. In my case it was a SMTP/MIME thingie. ( .pack(“m”) failed to produce
proper content).

Regards,
M.

Milan Maksimovic wrote:

Maybe putting it in a FAQ isn’t a bad idea.

Programmer’s have to deal with the automatic [LF] <=> [CRLF] conversion
on DOS/Windows based systems regardless of the programming language they
use. Your experience is not specific to Ruby. Better start to deal with
it, and add the binary file indicator to any binary file opening
operation, if your code might be executed on a DOS/Windows system.

On cygwin builds:

File.open( ‘somebinaryfile.bin’ ) do | file |
binaryContent = file.read()
end

worked fine, i.e. it read all that it should have.

Cygwin has an installation option how it should handle text files: same
as on unix (i.e. all files are effectively treated as binary), or same
as on a DOS system. You had luck and chose the unix option.

In mswin32 builds one must explicitly specify “rb” flags.

File.open( ‘somebinaryfile.bin’, “rb” )

or the file will be truncated.

Of course. If you want binary, then by all means say so. It’s not ruby’s
fault that Billyboy had to introduce this trap for you.

I suppose it would be good to have it fixed in mswin32 build, since a major

No! Ruby is not the place to fix OS design flaws. We would get user
complaints like “I cannot open the textfile output from your ruby script
in my notepad editor” if ruby would attempt to “fix” it. (Try it!)

practical consequence is that libraries and programs that don’t take this
into consideration (like those done on Unix flavors and used on Win32) will
break.

Yes, this is a problem. But the fix for it has to be done in these
libraries. Send a bug report to the author of the library.

T

On Saturday, December 28, 2002, 10:51:23 PM, Milan wrote [snipped]:

Hello,

Maybe it’s been mentioned before, but since I’ve spent some time looking for
error in my code (things broke), here it goes anyway, in case somebody runs
into this issue (low chance, this mostly being Linux crowd, but still…).
Maybe putting it in a FAQ isn’t a bad idea.

I appreciate your forethought for the FAQ, but I’m not really sure
what sort of entry to put. Could you please suggest a question and an
answer, and we’ll discuss it, and if we get something that makes
sense, it can go in the FAQ. Keep it on the list: the more comments
on the proposed entry, the better.

In mswin32 builds one must explicitly specify “rb” flags.

File.open( ‘somebinaryfile.bin’, “rb” )

I personally use the “b” flag whenever appropriate no matter what
platform I’m on. Even though it makes no difference on Unix, the “b”
flag is still a comment to make it more clear what we expect to do
with the data.

Cheers,
Gavin

Programmer’s have to deal with the automatic [LF] <=> [CRLF] conversion
on DOS/Windows based systems regardless of the programming language they
use.

I’d say that reading a bunch of bytes without caring what they are is
basic and common functionality.

My daytime job mostly consists of doing Windows C++ and Java programming
(and attempts to to introduce some Ruby along the way, where
possible/appropriate).

On Windows, using native Win32 API, I can (and do) open a file, read a
bunch of bytes (or all of them), and write that to some other file/buffer as
it is, with no fuss and without ever wondering if something happened with
those bytes in the meantime. Ok, I have to set up some silly flags that
don’t mean much most of the time, but none of them relate to binary/text
properies of the file.

In Java, I open a plain FileInputStream, and read all the byte that I
need, manipulate them and write them to some other OutputStream without ever
thinking about LF/CRLF conversion.

It happened that cygwin Ruby builds that I was using (before the mswin32
builds started appearing) behaved in the same way.

Therefore, I do it every day, in both languages, without ever having to
deal with LF/CRLF conversion, as long as I don’t care much about the meaning
of that content, as, in this particular insance, I don’t.

Your experience is not specific to Ruby.

My experience is, in this instance, specific to Ruby.

Cygwin has an installation option how it should handle text files: same
as on unix (i.e. all files are effectively treated as binary), or same
as on a DOS system. You had luck and chose the unix option.

You can say that it was luck, but not as you describe it.
I never got to installing cygwin by itself, I just installed the PragProg
cygwin Ruby build and that is how it behaved out of the box.
And it was good. :slight_smile:

But thanks for the tip, if I ever get to install cygwin I will keep an eye
on this.

Of course. If you want binary, then by all means say so. It’s not ruby’s
fault that Billyboy had to introduce this trap for you.

From your general tone I sense that something in my post has upset you, and
I’m sorry that something I wrote made you … upset.

It’s not about “ruby’s fault”, nor “faults” at all, nor Billyboy.

It’s just about wanting to be able to “trust” libraries and software
written by other people.
If using other people’s code involves having to grep for all possible
occurences of “open” then this pretty much defeats the idea of reusability,
code sharing, and portability, especially if it’s some not so small LOC
number. At least for those of us that choose or are forced to run stuff both
on Linux and Windows.

I suppose it would be good to have it fixed in mswin32 build, since a
major

No! Ruby is not the place to fix OS design flaws.

IMHO, it’s not so much about fixing OS design flaws, as it is about
providing proper abstractions for both high (such as string) and low (such
as “bunch of bytes”) level concepts, and hiding or exposing those
abstractions from/to the user, and making those abstractions behave the same
on as much OSes as possible. Things like POSIX and Cygwin are a proof that
this is possible up to some (at least for me) acceptable level.

We would get user complaints like "I cannot open the textfile output from

your ruby script in my notepad editor" if ruby would attempt to “fix” it.
(Try it!)

Let me paraphrase you - software that we make isn’t the right place to fix
MS Notepad design flaws. :wink:

I have a good idea on how those files look like, I get them every now and
then, and I don’t use notepad. UEdit recognizes them and offers to convert
them to DOS format.

practical consequence is that libraries and programs that don’t take
this
into consideration (like those done on Unix flavors and used on Win32)
will
break.

Yes, this is a problem. But the fix for it has to be done in these
libraries.
Send a bug report to the author of the library.

Since I’ve been flamed for a simple warning post, I can only imagine what
kind of flames would I be exposed to if I wrote to some unknown hard core
Linux proponent: “Your software doesn’t work on my Windows box. Fix it.”

But I realy don’t think of it as a library bug. Most of the people on this
list (people using Ruby in it’s native Linux environment) don’t care about
Ruby on Windows, and most probably shouldn’t.

Once again - it’s not about blame, and it’s not about “ruby’s fault” (as
you seem to have taken this).
It was just a warning “not to do as I have done”, along with a wish that it
is fixed “in one place only”, not “in all the libraries that were, are and
will be written”.

If it would make you happy, I can rephrase the warning as: “don’t be
stupid as I was to think that open(file).read() will behave the same on all
versions of Ruby”. There.

Regards,
M.

···

From: “Tobias Peters” t-peters@invalid.uni-oldenburg.de

From: “Gavin Sinclair”
I appreciate your forethought for the FAQ, but I’m not really sure
what sort of entry to put. Could you please suggest a question and an
answer, and we’ll discuss it, and if we get something that makes
sense, it can go in the FAQ. Keep it on the list: the more comments
on the proposed entry, the better.

Now that it turns out that there is common knowledge/agreement (“just use
‘rb’ wherever there’s a chance that the file is binary”) I’m not really so
sure that it is FAQ material. The F in FAQ stands for “frequently” after
all, and since I’m the only one who came up with this as “an issue”, I’m not
sure that “frequently” is justfied.

Still, maybe something in the line of the “Ruby On MS Windows” part of the
FAQ:

Q: What are the differences between cygwin and mswin32 Ruby builds that one
should be aware of?

existing file default behavior is “binary” since the cygwin layer is an
adapter for Unices that don’t differentiate between binary/text, etc. etc.
Then maybe a pointer or a short entry on CR/LF issue, and maybe even a note
(from Tobias Peters message nr. 60142) on choices when installing standalone
cygwin vs. PragProg Ruby cygwin bundle and it’s preset behavior.

or maybe:

Q: When doing a read() ( with a risk of starting a flame war:
having so many aliases in Ruby libraries, me thinks that a read_all() would
be good to have since it fully describes what happens ) on a
binary file, what is read gets truncated.

builds behave in-another-way, so if you’re switching from cygwin to mswin32
… and then the whole story, AND a recommendation to adopt the practice to
use ‘b’ whenever… etc.

(English is not my native language so I’m sure that somebody would phrase
all this better)

… if it is FAQ material after all, since I’m not sure that promoting my
own ignorance/bad practice to a FAQ is the best way to get “my own 5 minutes
of glory”. :wink:

I personally use the “b” flag whenever appropriate no matter what
platform I’m on. Even though it makes no difference on Unix, the “b”
flag is still a comment to make it more clear what we expect to do
with the data.

Thank you (and everybody else) for the advice, it appears to be sound
practice, so I will adopt it.

Regards,
M.

···

A: popen things don’t work on mswin32 builds, etc. etc. etc. When opening an
A: PragProg cygwin Ruby builds used to behave in-that-way, and mswin32

“Milan Maksimovic” maksa@sezampro.yu schrieb im Newsbeitrag
news:002101c2afd7$4faa8580$abd8ecd8@tao…

I’d say that reading a bunch of bytes without caring what they are is
basic and common functionality.

of course!

My daytime job mostly consists of doing Windows C++ and Java programming
(and attempts to to introduce some Ruby along the way, where
possible/appropriate).

On Windows, using native Win32 API, I can (and do) open a file, read a
bunch of bytes (or all of them), and write that to some other file/buffer
as
it is, with no fuss and without ever wondering if something happened with
those bytes in the meantime. Ok, I have to set up some silly flags that
don’t mean much most of the time, but none of them relate to binary/text
properies of the file.

In Java, I open a plain FileInputStream, and read all the byte that I
need, manipulate them and write them to some other OutputStream without
ever
thinking about LF/CRLF conversion.

to me it seems the appropriate way to do it in ruby is to use the flag “b”
as others already indicated. as you write yourself, on other platforms /
languages you distinguish explicitely between binaray and text access to
files (well done in java, apart from the PrintOutputStream) by either using
a different class or providing some flags. so the proper way to do it is to
use the distiguishing means that ruby provides.

all i’m trying to say is, there is a difference between text based file
access and binary file access although some languages and platforms do make
it unclear. thus, i regard opening a binary file in ruby without “b” a
mistake although this may work on some platforms.

kind regards

robert

(English is not my native language so I’m sure that somebody would phrase
all this better)

I would never have known from the quality of your writing.

… if it is FAQ material after all, since I’m not sure that promoting my
own ignorance/bad practice to a FAQ is the best way to get “my own 5 minutes
of glory”. :wink:

Well, I think you’re right; it doesn’t fit neatly into the FAQ :slight_smile:

The real issue at hand is that binary files should be trated as such,
no matter what you can reasonably expect from the patform.

Regards,
M.

Thanks for the effort,
Gavin

···

On Tuesday, December 31, 2002, 6:39:11 AM, Milan wrote: