Evading the limit of a pipe's standard input

Hi

I'm trying to write a tool that generates a really long SQL script and passes it to psql, but I've hit a problem because my script is too long to be sent to the stdin of psql. This is the first time I've run into the limit so it had me scratching my head for a while. I've tried a load of tricks, even putting the lines in an array, eg:

   MAX_SUCCESSFUL_TIMES = 3047

   query = ["BEGIN WORK;"]
   (MAX_SUCCESSFUL_TIMES + 1).times do
     query << "INSERT INTO people (name) VALUES ('Fred');"
   end
   query << "COMMIT;"

   IO.popen("psql -U test test","r+") do |pipe|
     query.each { |statement| pipe.puts statement }
   end

but it still fails when the total length of commands exceeds the limit (which by experiment I've found to be 128K on Mac OS X, hence the specific number of times above).

What's the best solution to this. I would like to stick to inter-process communication, and avoid temporary files and rewriting it to use DBD, if possible. Or are they my only options?

Thanks
Ashley

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

···

-----Original Message-----
From: Ashley Moran [mailto:work@ashleymoran.me.uk]
Sent: Tuesday, August 29, 2006 12:17 PM
To: ruby-talk ML
Subject: Evading the limit of a pipe's standard input

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a
while.
I've tried a load of tricks, even putting the lines in an array, eg:

   MAX_SUCCESSFUL_TIMES = 3047

   query = ["BEGIN WORK;"]
   (MAX_SUCCESSFUL_TIMES + 1).times do
     query << "INSERT INTO people (name) VALUES ('Fred');"
   end
   query << "COMMIT;"

   IO.popen("psql -U test test","r+") do |pipe|
     query.each { |statement| pipe.puts statement }
   end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Rewriting to DBI is not very difficult, so unless you have a reason other
than not wanting DBI calls to clutter your code or not wanting to
massively restructure your code, the following should work:

class DBI::DatabaseHandle
  #Takes a whole SQL script in a string
  #and executes it on the database.
  def batch(sql)
    sql=sql.split(";").delete_at(-1)
    sql.each{ |statement| self.do(statement) }
  end
end

in the sample code you gave above,
query.each {|statement|dbh.do(statement)}
should also work just fine.

Is it possible with the pipe solution that you need to read back the
output data so that pgsql doesn't back up its pipe buffer as Grennady
Bystritsky has suggested?

···

On Wed, 30 Aug 2006 04:16:35 +0900, Ashley Moran wrote:

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

   MAX_SUCCESSFUL_TIMES = 3047

   query = ["BEGIN WORK;"]
   (MAX_SUCCESSFUL_TIMES + 1).times do
     query << "INSERT INTO people (name) VALUES ('Fred');"
   end
   query << "COMMIT;"

   IO.popen("psql -U test test","r+") do |pipe|
     query.each { |statement| pipe.puts statement }
   end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

It's not, psql is commonly used to restore whole databases by feeding
it data from stdin.

Hi Gennady

I was not very clear in my e-mail, I didn't think psql was limited in the size
of file it can read. You were right it was just the buffer filling up. I
had actually tried reading from the buffer but not in parallel. This seems
to work:

   MAX_SUCCESSFUL_TIMES = 3047

   query = ["BEGIN WORK;"]
   (MAX_SUCCESSFUL_TIMES + 1).times do
     query << "INSERT INTO people (name) VALUES ('Fred');"
   end
   query << "COMMIT;"

   IO.popen("psql -U test test","r+") do |pipe|
     Thread.new { loop { pipe.read } } # fixes it
     query.each { |statement| pipe.puts statement }
   end

I assume pipe.read blocks, so the thread doesn't turn the CPU into a heating
element.

Cheers
Ashley

···

On Tuesday 29 August 2006 21:06, Gennady Bystritsky wrote:

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

--
"If you do it the stupid way, you will have to do it again"
  - Gregory Chudnovsky

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
this to work. Is there a less magical way to sync it? (I took a stab at
Process.wait but it doesn't work.)

Ashley

···

On Wednesday 30 August 2006 08:34, Ashley Moran wrote:

IO.popen("psql -U test test","r+") do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

--
"If you do it the stupid way, you will have to do it again"
  - Gregory Chudnovsky

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test >
/dev/null","r+")

--Ken

···

On Wed, 30 Aug 2006 17:02:08 +0900, Ashley Moran wrote:

On Wednesday 30 August 2006 08:34, Ashley Moran wrote:

IO.popen("psql -U test test","r+") do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
this to work. Is there a less magical way to sync it? (I took a stab at
Process.wait but it doesn't work.)

Ashley

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Thanks Ken the last one was what I was looking for. I forgot to say
that 'IO.popen("psql -U test test","w")' was actually one of the first things
I tried, but it dumps the (several thousand lines of) psql results back into
the shell, which renders the output of the script useless.

Ashley

···

On Wednesday 30 August 2006 14:10, Ken Bloom wrote:

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test > /dev/null","r+")

--Ken

--
"If you do it the stupid way, you will have to do it again"
  - Gregory Chudnovsky