Sysread and buffered I/O

I've been playing with telnet (and ssh) and I've been
pondering something.

Mixing sysread with buffered I/O is a Bad Thing (TM).

In fact, when I try to treat a telnet object as a socket
(which it is) and read from it using gets(), it crashes.
Not surprising.

So I was wondering: Would it make sense to have something like
"class SysIO < IO" where SysIO would do its own internal buffering
(and do everything in terms of sysread/-write at the lowest level)?

It would be a little like wheel-reinventing, but it would provide
an IO object on which these operations could be mixed.

Does this make sense or not?

Hal

In article <40FE101D.90603@hypermetrics.com>,
  Hal Fulton <hal9000@hypermetrics.com> writes:

I've been playing with telnet (and ssh) and I've been
pondering something.

Mixing sysread with buffered I/O is a Bad Thing (TM).

In fact, when I try to treat a telnet object as a socket
(which it is) and read from it using gets(), it crashes.
Not surprising.

I proposed stdio friendly sysread like method: readpartial.
[ruby-dev:23247] [ruby-talk:96220]

However it is not incorporated into ruby just because the name is not
good enough. Currently I think readchunk is better, though.

If net/telnet use readchunk instead of sysread, gets and other stdio
methods can be used as usual without buffering problem, maybe.

So I was wondering: Would it make sense to have something like
"class SysIO < IO" where SysIO would do its own internal buffering
(and do everything in terms of sysread/-write at the lowest level)?

It would be a little like wheel-reinventing, but it would provide
an IO object on which these operations could be mixed.

Does this make sense or not?

The new class SysIO is not required.
I think matz will accept stdio-less implementation for IO.

···

--
Tanaka Akira

Tanaka Akira wrote:

I proposed stdio friendly sysread like method: readpartial.
[ruby-dev:23247] [ruby-talk:96220]

Very interesting, I did not notice that. Thank you.

However it is not incorporated into ruby just because the name is not
good enough. Currently I think readchunk is better, though.

I am not sure I like readchunk. I think I prefer an underscore
at least. Other ideas might be:
   read_partial # looks better with underscore?
   read_part
   read_any
   read_all (?)
   read_waiting (?)
   read_avail # meaning "available"
   read_bytes
   readbytes # looks ok without underscore?

The new class SysIO is not required.
I think matz will accept stdio-less implementation for IO.

OK, I see. The code is already written? Are we just seeking
a name?

Hal

In article <40FE1A4C.9080403@hypermetrics.com>,
  Hal Fulton <hal9000@hypermetrics.com> writes:

I am not sure I like readchunk. I think I prefer an underscore
at least. Other ideas might be:
   read_partial # looks better with underscore?
   read_part
   read_any
   read_all (?)
   read_waiting (?)
   read_avail # meaning "available"
   read_bytes
   readbytes # looks ok without underscore?

An underscore is inconsistent with other IO read??? methods: readchar,
readline, readlines.

Also, readbytes is already used by lib/readbytes.rb.

OK, I see. The code is already written? Are we just seeking
a name?

Currently Ruby uses stdio because no one write it.

···

--
Tanaka Akira

Tanaka Akira wrote:

I am not sure I like readchunk. I think I prefer an underscore
at least. Other ideas might be:
  read_partial # looks better with underscore?
  read_part
  read_any
  read_all (?)
  read_waiting (?)
  read_avail # meaning "available"
  read_bytes
  readbytes # looks ok without underscore?

An underscore is inconsistent with other IO read??? methods: readchar,
readline, readlines.

I see. Well, maybe:
    readchars # with an s
    readstr # could be 'string' or 'stream'
    readsys # reminds us of sysread

But I do not really care much, readchunk is ok if it works. :slight_smile:

Currently Ruby uses stdio because no one write it.

I do not know much about low-level I/O. How hard would this
be to do?

Hal

In article <40FE1F86.6030005@hypermetrics.com>,
  Hal Fulton <hal9000@hypermetrics.com> writes:

I am not sure I like readchunk. I think I prefer an underscore
at least. Other ideas might be:
  read_partial # looks better with underscore?
  read_part
  read_any
  read_all (?)
  read_waiting (?)
  read_avail # meaning "available"
  read_bytes
  readbytes # looks ok without underscore?

An underscore is inconsistent with other IO read??? methods: readchar,
readline, readlines.

I see. Well, maybe:
    readchars # with an s
    readstr # could be 'string' or 'stream'
    readsys # reminds us of sysread

any: doesn't represent readpartial's behavior.
all: readpartial doesn't return all data from a stream.
waiting: readpartial may return non-waiting data if buffer is empty.
avail: readpartial may return data which is not available when readpartial is called.
bytes: doesn't represent the difference from IO#read.
chars: readpartial doesn't treat characters and encodings.
str: doesn't represent the difference from IO#read.
sys: readpartial is not system call.

But I do not really care much, readchunk is ok if it works. :slight_smile:

Good name is necessary to incorporate a method to ruby.

···

--
Tanaka Akira

Tanaka Akira wrote:

Good name is necessary to incorporate a method to ruby.

Certainly, I agree fully.

How much work is it to implement this method? I may not be
knowledgeable enough to do it myself, but I would be willing
to assist someone if I can.

Hal

Here's the relevant bit from the relevant ruby-dev summary:

  TANAKA Akira suggested a new method IO#readpartial, which is
  a better sysread() implementation supporting stdio buffer and
  avoiding troubles due to non-blocking I/O.

Tanaka, can you please give a brief description of the method (assume
I know nothing about advanced I/O), and I'll suggest a name. Of
course, other people will too.

Gavin

···

On Wednesday, July 21, 2004, 6:22:19 PM, Tanaka wrote:

But I do not really care much, readchunk is ok if it works. :slight_smile:

Good name is necessary to incorporate a method to ruby.

Tanaka Akira wrote:

Hal Fulton writes:

>>>I am not sure I like readchunk. I think I prefer an underscore
>>>at least. Other ideas might be:
>>> [snip]
>>> read_avail # meaning "available"
>>> [snip]
>>
>> An underscore is inconsistent with other IO read??? methods: readchar,
>> readline, readlines.

[snip]
avail: readpartial may return data which is not available when readpartial is called.
[snip]

> But I do not really care much, readchunk is ok if it works. :slight_smile:

Good name is necessary to incorporate a method to ruby.
--
Tanaka Akira

I thought Hal's read_avail (perhaps readavail) was a fair description, but ...

"readpartial may return data which is not available when readpartial is called"

Then 'readpartial' is magic ?-) (Returns unavailable data)
I know ... I misunderstand :frowning:

/*
  * call-seq:
  * ios.readpartial(integer [, buffer]) => string, buffer, or nil

···

*
  * Reads at most <integer> bytes from the I/O stream but
  * it blocks only if <ios> has no data immediately available.
  * If the optional <buffer> argument is present,
  * it must reference a String, which will receive the data.
  * It raises <EOFError> on end of file.
  *
  * STDIN.readpartial(4096) #=> "Data immediately available"
  */

read_immed ?
(readimmed is *very bad* without the underscore -- weird ! )

read_direct ? (probably not good)

read_instant, readinst, readsnap ? (at this instant in time)

I think the problem with the name 'readpartial' may be that if *all*
data is available, and 'readpartial' reads it all, then it has "failed"
because it has performed 'read', *not* 'readpartial'. :->

'readchunk' gives too much emphasis to the chunk, IMHO.
readchunk(256) looks too predictable.
If I wanted to read from <ios> in chunks, I would try that before
reading the docs.
Users of 'read_avail' or 'read_immed' would, one hopes, want to
refer to the docs before using. I think the docs of *other*
methods could say: "... if you need to reduce the risk of blocking,
'readpartial' may be more appropriate, here".

Errr, sorry to be of no help whatsoever.

daz

Two machine translations of [ruby-dev:23247] (*not* recommended reading :wink:

Excite (Japan): ===============================================

sysread thing which takes into consideration the buffer of stdio considered
since before since mind was suitable at last readpartial was mounted.
Demand of wanting to take in the data which has arrived although it does not
know how much data arrival is carried out from the pipe or the socket, if
there are such methods in the settled unit nonblock It can fill without using
sysread. Here, I do not want to use nonblock because nonblock is under a trouble.
Moreover, not wanting to use sysread is everything but IO. (the buffer of stdio
is treated) It is because it becomes impossible to use a method. nonblock is
avoidable, permitting using other methods of IO, if there is readpartial.
attaching dividing and coming out and saying [ to say ] like this

   [PATCH follows]

Babelfish: ===============================================

Finally, because the air faced, it tried mounting the sysread thing readpartial
which you thought from the time before, considers the buffer of stdio.
When there is such method, you do not understand about some data it has arrived
from the pipe and the socket, but it is, the data is to take in at the large
unit to be, with the request which is said without using nonblock and sysread,
it is possible to fill up. Therefore here, as for we would not like to using
nonblock, as for nonblock the origin of trouble is. In addition, because we
would not like to using sysread handles the buffer of other (stdio of IO)
becomes unable to use method is. If there is readpartial, while allowing the fact
that the other method of IO is used, it can avoid nonblock. With being the case
that it is said, the fact that you attach such how probably will be?

   [PATCH follows]

In article <40FE2A18.302@hypermetrics.com>,
  Hal Fulton <hal9000@hypermetrics.com> writes:

How much work is it to implement this method? I may not be
knowledgeable enough to do it myself, but I would be willing
to assist someone if I can.

See [ruby-dev:23247] and [ruby-dev:23248].

However now I think readpartial should raise EOFError on EOF.
The implementation in [ruby-dev:23247] returns nil on EOF.

···

--
Tanaka Akira

Does anyone know of a good way to basically seek through a large query
record by record (like we used to do with the SEEK command in Clipper?) ...
I am converting the Ruby BBS to use postgres. Basically, I have real
message numbers and the message numbers displayed by the system (so the
messages can be 1-x) but the real message numbers (which are unique and
assending) can be used to keep track of the last read pointer.

I could make an array of message numbers, each time a user trys to pull a
message up, but this seens wasteful.

Thanks.

Mark

"But Schindler is bueno! Senior Burns is El Diablo!"

···

--------------------------------------------------------------
Website - http://www.retrobbs.org
Tradewars - telnet tradewars.retrobbs.org
BBS - http://bbs.retrobbs.org:8000
IRC - irc.retrobbs.org #main
WIKI - http://www.tpoh.org/cgi-bin/tpoh-wiki

In article <189-1205065002.20040721185353@soyabean.com.au>,
  Gavin Sinclair <gsinclair@soyabean.com.au> writes:

Tanaka, can you please give a brief description of the method (assume
I know nothing about advanced I/O), and I'll suggest a name. Of
course, other people will too.

[ruby-dev:23247] contains a patch including RDoc comment in English.

/*
  * call-seq:
  * ios.readpartial(integer [, buffer]) => string, buffer, or nil

···

*
  * Reads at most <i>integer</i> bytes from the I/O stream but
  * it blocks only if <em>ios</em> has no data immediately available.
  * If the optional <i>buffer</i> argument is present,
  * it must reference a String, which will receive the data.
  * It raises <code>EOFError</code> on end of file.
  *
  * STDIN.readpartial(4096) #=> "Data immediately available"
  */

(The behavior on EOF is modified.)
--
Tanaka Akira

In article <gGSdnSk4oLg_DZ3cSa8jmA@karoo.co.uk>,
  "daz" <dooby@d10.karoo.co.uk> writes:

I thought Hal's read_avail (perhaps readavail) was a fair description, but ...

"readpartial may return data which is not available when readpartial is called"

Then 'readpartial' is magic ?-) (Returns unavailable data)
I know ... I misunderstand :frowning:

readpartial blocks in such case.

% (sleep 1; echo abc; sleep 1; echo def) | ./ruby -e 't1 = Time.now; p STDIN.readpartial(4096); t2 = Time.now; p t2-t1'
"abc\n"
1.001787

In this case, readpartial blocks because there are no data at first.
After 1 second, "abc" is sent over the pipe and readpartial reads and
returns it.

So, the name read_avail is not accurate.

read_immed ?
(readimmed is *very bad* without the underscore -- weird ! )

read_direct ? (probably not good)

read_instant, readinst, readsnap ? (at this instant in time)

How about them, matz?

I think the problem with the name 'readpartial' may be that if *all*
data is available, and 'readpartial' reads it all, then it has "failed"
because it has performed 'read', *not* 'readpartial'. :->

Yes.

'readchunk' gives too much emphasis to the chunk, IMHO.
readchunk(256) looks too predictable.
If I wanted to read from <ios> in chunks, I would try that before
reading the docs.

Also, several formats such as PNG defines "chunk" in their spec.

···

--
Tanaka Akira

are you asking about cursors?

-a

···

On Wed, 21 Jul 2004, Mark Firestone wrote:

Does anyone know of a good way to basically seek through a large query
record by record (like we used to do with the SEEK command in Clipper?) ...
I am converting the Ruby BBS to use postgres. Basically, I have real
message numbers and the message numbers displayed by the system (so the
messages can be 1-x) but the real message numbers (which are unique and
assending) can be used to keep track of the last read pointer.

I could make an array of message numbers, each time a user trys to pull a
message up, but this seens wasteful.

Thanks.

Mark

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Here are some other possible names that seems to fit with readchar, readline,
and readlines and yet catch the idea of "readpartial."

readportion
readparcel
readbundle
readatmost

Personally, "readatmost" would fit closes in my mind to the above description.

···

On Wed, 21 Jul 2004 18:56:01 +0900, Tanaka Akira <akr@m17n.org> wrote:

In article <189-1205065002.20040721185353@soyabean.com.au>,
  Gavin Sinclair <gsinclair@soyabean.com.au> writes:

Tanaka, can you please give a brief description of the method (assume
I know nothing about advanced I/O), and I'll suggest a name. Of
course, other people will too.

[ruby-dev:23247] contains a patch including RDoc comment in English.

/*
  * call-seq:
  * ios.readpartial(integer [, buffer]) => string, buffer, or nil
  *
  * Reads at most <i>integer</i> bytes from the I/O stream but
  * it blocks only if <em>ios</em> has no data immediately available.
  * If the optional <i>buffer</i> argument is present,
  * it must reference a String, which will receive the data.
  * It raises <code>EOFError</code> on end of file.
  *
  * STDIN.readpartial(4096) #=> "Data immediately available"
  */

(The behavior on EOF is modified.)

why not

   IO#receive

as in

   STDIN.receive(4096) #=> "we are receiving data as it becomes available"

-a

···

On Wed, 21 Jul 2004, Tanaka Akira wrote:

In article <189-1205065002.20040721185353@soyabean.com.au>,
Gavin Sinclair <gsinclair@soyabean.com.au> writes:

Tanaka, can you please give a brief description of the method (assume
I know nothing about advanced I/O), and I'll suggest a name. Of
course, other people will too.

[ruby-dev:23247] contains a patch including RDoc comment in English.

/*
* call-seq:
* ios.readpartial(integer [, buffer]) => string, buffer, or nil
*
* Reads at most <i>integer</i> bytes from the I/O stream but
* it blocks only if <em>ios</em> has no data immediately available.
* If the optional <i>buffer</i> argument is present,
* it must reference a String, which will receive the data.
* It raises <code>EOFError</code> on end of file.
*
* STDIN.readpartial(4096) #=> "Data immediately available"
*/

(The behavior on EOF is modified.)

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

What about this?

  1) read(n):
       current method; no problem if less than n bytes read,
       and no blocking

  2) read(n, :noeof):
       as (1), but raises EOFError on end of file

  3) read(n, :noeof, :pblock):
       as (2), but *partially* blocks; i.e. blocks only if no data
       is immediately available

  4) read(n, :exact):
       as (1), raise some error if less than n bytes read
       (implies :noeof)

  5) read(n, :exact, :block):
       as (4), but block until n bytes are available

  6) read(n, :exact, :pblock):
       as (4), but *partially* block, as per (3)

All methods can accept a String parameter which acts as the receiving
buffer. The 'n' parameter must come first; the order of the rest
doesn't matter.

The naming of "pblock" could definitely be better...

Gavin

···

On Wednesday, July 21, 2004, 7:56:01 PM, Tanaka wrote:

In article <189-1205065002.20040721185353@soyabean.com.au>,
  Gavin Sinclair <gsinclair@soyabean.com.au> writes:

Tanaka, can you please give a brief description of the method (assume
I know nothing about advanced I/O), and I'll suggest a name. Of
course, other people will too.

[ruby-dev:23247] contains a patch including RDoc comment in English.

/*
  * call-seq:
  * ios.readpartial(integer [, buffer]) => string, buffer, or nil
  *
  * Reads at most <i>integer</i> bytes from the I/O stream but
  * it blocks only if <em>ios</em> has no data immediately available.
  * If the optional <i>buffer</i> argument is present,
  * it must reference a String, which will receive the data.
  * It raises <code>EOFError</code> on end of file.
  *
  * STDIN.readpartial(4096) #=> "Data immediately available"
  */

(The behavior on EOF is modified.)

Tanaka Akira <akr@m17n.org> writes:

/*
  * call-seq:
  * ios.readpartial(integer [, buffer]) => string, buffer, or nil
  *
  * Reads at most <i>integer</i> bytes from the I/O stream but
  * it blocks only if <em>ios</em> has no data immediately available.
  * If the optional <i>buffer</i> argument is present,
  * it must reference a String, which will receive the data.
  * It raises <code>EOFError</code> on end of file.
  *
  * STDIN.readpartial(4096) #=> "Data immediately available"
  */

How about: STDIN.readposixly(4096) or STDIN.posix_read(4096)

Why posix?

The behaviour described above is similar to the behaviour of posix's
read function. posix's read function does a short read[1] if there is
not enough data available to satisfy the request, but perform a full
read if there is enough data. it will also block until there is some
data to be read.

I am actually leaning towards readposixly than posix_read. I read
'readposixly' as: do a read in a manner described in the posix
standard.

Meanwhile, 'posix_read' could be misinterpreted to be: read using a
posix function, which people on non-posix systems might interpret as
not being available on their systems.

Thanks,
YS.

Footnotes:
[1] reading less than requested. "short read" is a
well-established phrase. google for: ' "short read" posix '

read_next_avail?

H

Tanaka Akira <akr@m17n.org> wrote in message news:<873c3jm2lz.fsf@serein.a02.aist.go.jp>...

···

In article <gGSdnSk4oLg_DZ3cSa8jmA@karoo.co.uk>,
  "daz" <dooby@d10.karoo.co.uk> writes:

> I thought Hal's read_avail (perhaps readavail) was a fair description, but ...
>
> "readpartial may return data which is not available when readpartial is called"
>
> Then 'readpartial' is magic ?-) (Returns unavailable data)
> I know ... I misunderstand :frowning:

readpartial blocks in such case.

% (sleep 1; echo abc; sleep 1; echo def) | ./ruby -e 't1 = Time.now; p STDIN.readpartial(4096); t2 = Time.now; p t2-t1'
"abc\n"
1.001787

In this case, readpartial blocks because there are no data at first.
After 1 second, "abc" is sent over the pipe and readpartial reads and
returns it.

So, the name read_avail is not accurate.

> read_immed ?
> (readimmed is *very bad* without the underscore -- weird ! )
>
> read_direct ? (probably not good)
>
> read_instant, readinst, readsnap ? (at this instant in time)

How about them, matz?

> I think the problem with the name 'readpartial' may be that if *all*
> data is available, and 'readpartial' reads it all, then it has "failed"
> because it has performed 'read', *not* 'readpartial'. :->

Yes.

> 'readchunk' gives too much emphasis to the chunk, IMHO.
> readchunk(256) looks too predictable.
> If I wanted to read from <ios> in chunks, I would try that before
> reading the docs.

Also, several formats such as PNG defines "chunk" in their spec.

That is entirely possible (;

I'm not sure if that is the way to do it or not. My SQL is not the
greatest.

"But Schindler is bueno! Senior Burns is El Diablo!"

···

--------------------------------------------------------------
Website - http://www.retrobbs.org
Tradewars - telnet tradewars.retrobbs.org
BBS - http://bbs.retrobbs.org:8000
IRC - irc.retrobbs.org #main
WIKI - http://www.tpoh.org/cgi-bin/tpoh-wiki

----- Original Message -----
From: "Ara.T.Howard" <ahoward@noaa.gov>
Newsgroups: comp.lang.ruby
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Wednesday, July 21, 2004 1:32 PM
Subject: Re: ruby postgresql question

On Wed, 21 Jul 2004, Mark Firestone wrote:

> Does anyone know of a good way to basically seek through a large query
> record by record (like we used to do with the SEEK command in Clipper?)

...

> I am converting the Ruby BBS to use postgres. Basically, I have real
> message numbers and the message numbers displayed by the system (so the
> messages can be 1-x) but the real message numbers (which are unique and
> assending) can be used to keep track of the last read pointer.
>
> I could make an array of message numbers, each time a user trys to pull

a

> message up, but this seens wasteful.
>
> Thanks.
>
> Mark

are you asking about cursors?

-a
--

============================================================================

> EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
> PHONE :: 303.497.6469
> A flower falls, even though we love it;
> and a weed grows, even though we do not love it.
> --Dogen

============================================================================