Efficiency of string parsing

I have written a loop to basically parse a string, and at every 50th
character check to see if is a space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

        characterCount = 0
        positionCount = 0
        insertPoint = MAX_LINE_LENGTH

        while characterCount != copy.length
          characterCount += 1
          positionCount += 1
          if positionCount == MAX_LINE_LENGTH
            begin
              characterCount -= 1
              insertPoint -= 1
            end until copy[characterCount].eql?(ASCII_SPACE)
            copy.insert(characterCount+=1,'\n')
            imageHeight += LINE_HEIGHT
            positionCount = 0
          end

        end

Cheers,
Kev

There are quite a lot of posts about word wrapping which seems what you are trying to do. You should be able to find them via the archives (Google Groups, ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, "\\1\n")

Kind regards

  robert

···

On 12.03.2007 16:23, Kev wrote:

I have written a loop to basically parse a string, and at every 50th
character check to see if is a space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

        characterCount = 0
        positionCount = 0
        insertPoint = MAX_LINE_LENGTH

        while characterCount != copy.length
          characterCount += 1
          positionCount += 1
          if positionCount == MAX_LINE_LENGTH
            begin
              characterCount -= 1
              insertPoint -= 1
            end until copy[characterCount].eql?(ASCII_SPACE)
            copy.insert(characterCount+=1,'\n')
            imageHeight += LINE_HEIGHT
            positionCount = 0
          end

        end

Had to take a swipe 9^)

class String
   def wrap(wrap_col)
      retStr = self.dup
      start = 0
      while retStr[start,wrap_col].length >= wrap_col
         ws_pos = retStr[start,wrap_col].rindex(" ")
         break if ws_pos.nil?
         retStr[ws_pos+start] = "\n"
         start += ws_pos+1
      end
      retStr
   end
end

Cheers
Chris

And here's the start of a more sophisticated approach I just whipped up.

It uses split on a word boundary to split the string. It has some
option keywords which allow preserving all whitespace, or only at the
beginning of a line. If you don't preserve all whitespace, it
collapses whitespace within a line to a single space. If you don't
preserve whitespace at the beginning of a line, it elminates it,
otherwise it keeps it as is. The default is to only preserve
whitespace at the beginning of a line.

It does have a few bugs, which I didn't bother addressing and leave as
an exercise ot the reader.

1) It ignores existing new lines in the input string, which means that
the next line will be short.

2) It keeps whitespace at the end of a line, as opposed to putting the
newline after the last 'word'.

class String
  def wordwrap(linelength, kw_args={})
    keep_all = kw_args[:keep_all]
    keep_initial = keep_all ||kw_args[:keep_initial]
    keep_initial = true if keep_initial.nil?
    current_len = 0
    split(/\b/).inject("") do | result, chunk |
      if current_len + chunk.length >= linelength
        result << "\n"
  current_len = 0
        chunk = "" if chunk.strip.empty? unless keep_initial
      else
  chunk = " " if chunk.strip.empty? unless keep_all
      end
      current_len += chunk.length
      result << chunk
    end
  end
end

···

On 3/12/07, Robert Klemme <shortcutter@googlemail.com> wrote:

There are quite a lot of posts about word wrapping which seems what you
are trying to do. You should be able to find them via the archives
(Google Groups, ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, "\\1\n")

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Excellent sollution for coding efficiency. (though, I always think Regular Expressions should be commented well (broken into parts) due to the terseness of the syntax, especially for those who don't use RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

Clearly this is a tool to wrap text to 50 characters per line without breaking words. Curious to see more ideas/approaches on that.

···

On Mar 13, 2007, at 12:30 AM, Robert Klemme wrote:

On 12.03.2007 16:23, Kev wrote:

I have written a loop to basically parse a string, and at every 50th
character check to see if is a space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

There are quite a lot of posts about word wrapping which seems what you are trying to do. You should be able to find them via the archives (Google Groups, ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, "\\1\n")

Kind regards

  robert

I'm just curious what it is about Ruby iterators (I assume you mean methods like 'each') that you'd expect them to be more efficient than the gsub?

Tom

···

On Mar 12, 2007, at 12:35 PM, John Joyce wrote:

Excellent sollution for coding efficiency. (though, I always think Regular Expressions should be commented well (broken into parts) due to the terseness of the syntax, especially for those who don't use RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

Being new to Ruby thats a great piece of code to get my head around,
thanks all for suggestions thoughts and ideas :slight_smile:

···

On 12 Mar, 16:25, "Rick DeNatale" <rick.denat...@gmail.com> wrote:

On 3/12/07, Robert Klemme <shortcut...@googlemail.com> wrote:

> There are quite a lot of posts about word wrapping which seems what you
> are trying to do. You should be able to find them via the archives
> (Google Groups, ruby-talk archive).

> A simplistic approach would probably do something like this:

> str.gsub(/(.{1,50})\s+/, "\\1\n")

And here's the start of a more sophisticated approach I just whipped up.

It uses split on a word boundary to split thestring. It has some
option keywords which allow preserving all whitespace, or only at the
beginning of a line. If you don't preserve all whitespace, it
collapses whitespace within a line to a single space. If you don't
preserve whitespace at the beginning of a line, it elminates it,
otherwise it keeps it as is. The default is to only preserve
whitespace at the beginning of a line.

It does have a few bugs, which I didn't bother addressing and leave as
an exercise ot the reader.

1) It ignores existing new lines in the inputstring, which means that
the next line will be short.

2) It keeps whitespace at the end of a line, as opposed to putting the
newline after the last 'word'.

classString
  def wordwrap(linelength, kw_args={})
    keep_all = kw_args[:keep_all]
    keep_initial = keep_all ||kw_args[:keep_initial]
    keep_initial = true if keep_initial.nil?
    current_len = 0
    split(/\b/).inject("") do | result, chunk |
      if current_len + chunk.length >= linelength
        result << "\n"
        current_len = 0
        chunk = "" if chunk.strip.empty? unless keep_initial
      else
        chunk = " " if chunk.strip.empty? unless keep_all
      end
      current_len += chunk.length
      result << chunk
    end
  end
end
--
Rick DeNatale

My blog on Rubyhttp://talklikeaduck.denhaven2.com/

Iterators/callbacks using Ruby code blocks whatever.
Never said I expect them to be faster.
I was asking.
I don't know how much text is being parsed. I do assume it is unstructured and not indexed in any manner.
I'm just wondering if there isn't more to know about why and what for in order to reach the best solution for the situation.
Like they say in Perl... there's more than 1 way right? Some ways are just interesting, some are fast, some are useful, etc...

···

On Mar 13, 2007, at 2:02 AM, Tom Pollard wrote:

But would a Ruby iterator be faster?

I'm just curious what it is about Ruby iterators (I assume you mean methods like 'each') that you'd expect them to be more efficient than the gsub?