Output file from clipboard in unix

Hi all,

I'm new to Ruby and pretty new to programming in general. My limited
prior experience is with JavaScript, so I'm not accustomed to handling
file I/O or working with Unix/Terminal. I'm working on a script that
takes a large quantity of text copied from a web browser and does some
reformatting so the result lines up neatly in a CSV and eventually a
spreadsheet. Basically, I'm using a few regex/gsub statements to remove
unnecessary text from the beginning of the page and to add/remove tabs
from certain places. I've got two issues with this script, please see
the attached files.

1. Right now, I can run the script in Terminal and direct the output to
a new file and everything works (except for one of the regexes, see #2
below), provided that I first save the text in a file that matches the
file name used in the script. If possible, I'd prefer to skip saving a
file with the input text and allow the user to simply copy and paste it
directly into Terminal. In other words, you could run the script in
Terminal, be prompted to paste your text in, and the script would make
all
the necessary changes to the text and output to a file. I tried adding
a prompt and gets statement to hold the text in a variable, but this
just grabs a single line of text, terminates the Ruby script, and then
produces a Terminal error for each subsequent line. Is there any way I
can accomplish this alternate behavior?

2. The regex I created to remove all unneeded content isn't working. I
checked this in Rubular and it worked as I expected, but it does not
work when I run the script. All the content that is needed in the
output follows a unique set of table headers on the web page, so I
should be able to find everything leading up to and including those
headers and remove it. The regex looks like this (with random terms
added in place of the table headers):

/.*hat\.goat\sthis thing\sthat thing\sstuff\scheese/m

As it is, the headers do get replaced, but the content that precedes it
does not. I get the same result if I remove the wildcard at the
beginning of the regex. Any ideas?

Attachments:
http://www.ruby-forum.com/attachment/8432/test_forum.rb
http://www.ruby-forum.com/attachment/8433/input.txt

···

--
Posted via http://www.ruby-forum.com/.

For #1 instead of having the user run the script which your generating
you can simply automate the shell with popen eval( i.e. the systems
eval not the ruby version) or save the script and run it after it's
generated from either system or backticks depending on your return
values needs.

Unless it's a design for the end user to make a decision console based
users tend not to copy and paste text like gui end users. If you don't
want to automate the process providing your end user a bit more
control you could just dump the generated script to the standard out
and allow the end user to use the shells redirection to put it into a
file for execution.

~Stu

I'm new to Ruby and pretty new to programming in general. My limited
prior experience is with JavaScript, so I'm not accustomed to handling
file I/O or working with Unix/Terminal. I'm working on a script that
takes a large quantity of text copied from a web browser and does some
reformatting so the result lines up neatly in a CSV and eventually a
spreadsheet. Basically, I'm using a few regex/gsub statements to remove
unnecessary text from the beginning of the page and to add/remove tabs
from certain places. I've got two issues with this script, please see
the attached files.

1. Right now, I can run the script in Terminal and direct the output to
a new file and everything works (except for one of the regexes, see #2
below), provided that I first save the text in a file that matches the
file name used in the script. If possible, I'd prefer to skip saving a
file with the input text and allow the user to simply copy and paste it
directly into Terminal. In other words, you could run the script in
Terminal, be prompted to paste your text in, and the script would make
all
the necessary changes to the text and output to a file. I tried adding
a prompt and gets statement to hold the text in a variable, but this
just grabs a single line of text, terminates the Ruby script, and then
produces a Terminal error for each subsequent line. Is there any way I
can accomplish this alternate behavior?

Let's look at this:

#read file and replace
fileObj = File.new("input.txt", "r")
while (line = fileObj.gets)
  substitute_line = line.gsub(clear_stuff, "")
  substitute_line1 = substitute_line.gsub(add_tabs, "\t\t\\1")
  substitute_line2 = substitute_line1.gsub(remove_tab, "\\1")
  print(substitute_line2)
end
fileObj.close

You can quite easily change that to use STDIN thusly:

#read file and replace
puts "Paste in your text below:"
while (line = gets)
  substitute_line = line.gsub(clear_stuff, "")
  substitute_line1 = substitute_line.gsub(add_tabs, "\t\t\\1")
  substitute_line2 = substitute_line1.gsub(remove_tab, "\\1")
  print(substitute_line2)
end

2. The regex I created to remove all unneeded content isn't working. I
checked this in Rubular and it worked as I expected, but it does not
work when I run the script. All the content that is needed in the
output follows a unique set of table headers on the web page, so I
should be able to find everything leading up to and including those
headers and remove it. The regex looks like this (with random terms
added in place of the table headers):

/.*hat\.goat\sthis thing\sthat thing\sstuff\scheese/m

As it is, the headers do get replaced, but the content that precedes it
does not. I get the same result if I remove the wildcard at the
beginning of the regex. Any ideas?

Since you are working on the file one line at a time, it will not
match anything preceeding that (if I understand this correctly) so the
m modifier at the end really makes no sense there.

If you want to throw away everything up to and including the line that
matches that Regexp, you should probably consider the flip-flop
instead of a gsub at that point:

#read file and replace
puts "Paste in your input:"
while (line = gets)
  next if $. == 1 .. line =~ clear_stuff # $. is a variable
that contains the
                    # number of the last line read
# from the file, in this case, STDIN
  print line.gsub(add_tabs, "\t\t\\1").gsub(remove_tab, "\\1")
# I've rewritten the two substitution
# lines as one chain. No need for
# for intermediate variables.
end

The flip-flop operator is rather interesting, if a bit arcane. What it
does is return false while the first part is false. When the first
part becomes true, it will return true up to the point the second
condition becomes true, then it will return false after that. (If you
use three dots: ... - the flip-flop becomes exclusive, not including
the last bit tested.)

So what the above is doing, is rolling through the input file, tossing
lines from the beginning, including the line that matches
`clear_stuff`. Then, as it continues to roll through, it fails the if
statement on the next, so drops through to the print statement and
performs the substitutions.

Since I can see already that gmail in it's stupidity will wrap my
code, I'm attaching it here as well as at

Attachments:
http://www.ruby-forum.com/attachment/8432/test_forum.rb
http://www.ruby-forum.com/attachment/8433/input.txt

--
Posted via http://www.ruby-forum.com/\.

Hope this helps...

filter.rb (325 Bytes)

···

On Sun, May 19, 2013 at 12:01 PM, JD JD <lists@ruby-forum.com> wrote: