Whitespace string only

Hi!

What's the best way to test if a string only consists of whitespaces and newlines?

best I could come up with is

class String

   def is_whitespace_only?
     strings_to_test = split("\n")
     whitespace = /^\s+$/
     is_whitespace_only = true
     strings_to_test.each{ |str|
       unless whitespace.match(str) or str.empty?
         is_whitespace_only = false
  break
       end
     }
     is_whitespace_only
   end

end

But somehow I think there should be a better way to do it. Any ideas?
Is it okay to add such methods to class String itself?

Any advices appreciated.

regards,
Henrik

def only_whitespace?
    each_byte { |b| return false if b != 32 }
    true
end

···

On Thu, 2004-09-23 at 00:54, Henrik Horneber wrote:

Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

best I could come up with is

class String

   def is_whitespace_only?
     strings_to_test = split("\n")
     whitespace = /^\s+$/
     is_whitespace_only = true
     strings_to_test.each{ |str|
       unless whitespace.match(str) or str.empty?
         is_whitespace_only = false
  break
       end
     }
     is_whitespace_only
   end

end

But somehow I think there should be a better way to do it. Any ideas?
Is it okay to add such methods to class String itself?

Any advices appreciated.

regards,
Henrik

Unless you're being more specific:

  str.strip.length == 0

Also matching against something like

  /\A\s*\z/m

but I'm no Regexp expert by a long shot :wink:

T.

···

On Thursday 23 September 2004 03:54 am, Henrik Horneber wrote:

Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

--
( o _ カラチ
// trans.
/ \ transami@runbox.com

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

"Henrik Horneber" <ryco@gmx.net> schrieb im Newsbeitrag
news:4152820A.9080408@gmx.net...

Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

best I could come up with is

class String

   def is_whitespace_only?
     strings_to_test = split("\n")
     whitespace = /^\s+$/
     is_whitespace_only = true
     strings_to_test.each{ |str|
       unless whitespace.match(str) or str.empty?
         is_whitespace_only = false
break
       end
     }
     is_whitespace_only
   end

end

But somehow I think there should be a better way to do it. Any ideas?
Is it okay to add such methods to class String itself?

Any advices appreciated.

regards,
Henrik

rx = %r{\A\s*\z}

=> /\A\s*\z/

rx =~ ""

=> 0

rx =~ " "

=> 0

rx =~ " a"

=> nil

rx =~ " \n a"

=> nil

rx =~ " \n "

=> 0

rx =~ " \n \n"

=> 0

Regards

    robert

Henrik Horneber <ryco@gmx.net> writes:

Hi!

What's the best way to test if a string only consists of whitespaces
and newlines?

class String
  def is_whitespace_only?
    self !~ /[\s\n]/m
  end
end

In Message-Id: <4152820A.9080408@gmx.net>
Henrik Horneber <ryco@gmx.net> writes:

What's the best way to test if a string only consists of whitespaces
and newlines?

What about this?:

  string !~ /\S/

where "\S" means complement of "\s". If your white spaces are not
equal to "\s", you can use an appropriate character class, say
"[^ \n]" for a character except a space and a line feed.

···

--
kjana@dm4lab.to September 23, 2004
Slow and steady wins the race.

I think regexp should be is faster than each_byte. What about this?

class String

def whitespace_only? str
  str.split(/\n/).each { |x|
     return false unless x =~ /^\s*$/
  }
  true
end

end

MiG

Hi!

> if "#{s}".chomp.strip.length == 0

...

> rx = %r{\A\s*\z}

Obviously there is more than one way to do it ...and all are better than mine. :smiley:

Thanks everybody!

    self !~ /[\s\n]/m

1) \n is in \s with a character class, /m is useless
2) you are testing that it don't exist a whitespace character in the string

Guy Decoux

i use this alot:

   if s.strip.empty?

     # the string is whitespace only

   end

-a

···

On Thu, 23 Sep 2004, YANAGAWA Kazuhisa wrote:

In Message-Id: <4152820A.9080408@gmx.net>
Henrik Horneber <ryco@gmx.net> writes:

What's the best way to test if a string only consists of whitespaces
and newlines?

What about this?:

string !~ /\S/

where "\S" means complement of "\s". If your white spaces are not
equal to "\s", you can use an appropriate character class, say
"[^ \n]" for a character except a space and a line feed.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

I highly doubt a regex is faster than each_byte. each_byte has very
little code and is very fast (looping over the array in C and casting
the chars to fixnums), where as with a regex it has to pass through the
regex parser, get pulled back out as an object, pushed back into split,
which there in turn returns a potentially huge array which you pull back
again to run over with each. Then you've done another comparison with a
regex within the block which i guarantee is much slower then comparing 2
Fixnums.

My initial version didnt do \n, only white space, so here's my updated
version that even does tabs.

class String
    def only_ws?
        each_byte { |b| return false unless [9,10,32].include?(b) }
        true
    end
end

Evan Webb // evan@fallingsnow.net

···

On Thu, 2004-09-23 at 01:11, MiG wrote:

I think regexp should be is faster than each_byte. What about this?

class String

def whitespace_only? str
  str.split(/\n/).each { |x|
     return false unless x =~ /^\s*$/
  }
  true
end

end

MiG

ts <decoux@moulon.inra.fr> writes:

> self !~ /[\s\n]/m

1) \n is in \s with a character class, /m is useless
2) you are testing that it don't exist a whitespace character in the string

self !~ /[^\s]/

   if s.strip.empty?
     # the string is whitespace only

svg% ruby -e 'a = " \000\000"; p "OK" if a.strip.empty?'
"OK"
svg%

svg% ruby -e 'a = " \000\000 "; p "OK" if a.strip.empty?'
svg%

Guy Decoux

self !~ /[^\s]/

  or

   self !~ /[\S]/ # one less character :slight_smile:

Guy Decoux

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87isa5i8rw.fsf@igloo.phubuh.org...

ts <decoux@moulon.inra.fr> writes:

>
> > self !~ /[\s\n]/m
>
> 1) \n is in \s with a character class, /m is useless
> 2) you are testing that it don't exist a whitespace character in the

string

self !~ /[^\s]/

self !~ /\S/

:slight_smile:

    robert

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200409231451.i8NEphE08333@moulon.inra.fr...

> if s.strip.empty?
> # the string is whitespace only

svg% ruby -e 'a = " \000\000"; p "OK" if a.strip.empty?'
"OK"
svg%

svg% ruby -e 'a = " \000\000 "; p "OK" if a.strip.empty?'
svg%

Also I'd say the disadvantage of "a.strip.empty?" is that it creates a
copy of the string (=> a new instance) which is generally slower than a
simple regexp check.

Kind regards

    robert

ahh! that's terrible - i didn't know String#strip did that! the docs say

   "Returns a copy of str with leading and trailing whitespace removed."

since when is NUL whitespace!? defintely against POLS.

thanks for the pointer.

-a

···

On Thu, 23 Sep 2004, ts wrote:

> if s.strip.empty?
> # the string is whitespace only

svg% ruby -e 'a = " \000\000"; p "OK" if a.strip.empty?'
"OK"
svg%

svg% ruby -e 'a = " \000\000 "; p "OK" if a.strip.empty?'
svg%

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

So which method is fastest?

Considering how common this can be, one would think it were a built-in String
method (encoded in c) already.

T.

i assumed you were correct - but this is suprising:

   harp:~ > ruby b.rb

···

On Thu, 23 Sep 2004, Robert Klemme wrote:

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200409231451.i8NEphE08333@moulon.inra.fr...

> if s.strip.empty?
> # the string is whitespace only

svg% ruby -e 'a = " \000\000"; p "OK" if a.strip.empty?'
"OK"
svg%

svg% ruby -e 'a = " \000\000 "; p "OK" if a.strip.empty?'
svg%

Also I'd say the disadvantage of "a.strip.empty?" is that it creates a copy
of the string (=> a new instance) which is generally slower than a simple
regexp check.

   -
   small string strip-empty:
     elapsed : 0.0081329345703125
   -
   small string re:
     elapsed : 0.005950927734375
   -
   small string re-precompiled:
     elapsed : 0.00719404220581055
   -
   big string strip-empty:
     elapsed : 0.263929843902588
   -
   big string re:
     elapsed : 5.26733493804932
   -
   big string re-precompiled:
     elapsed : 5.51002883911133

   harp:~ > cat b.rb
   $VERBOSE = nil
   STDOUT.sync = true

   def time label
     fork do
       GC.disable
       puts "-\n#{ label }:\n"
       a = Time::now.to_f
       yield
       b = Time::now.to_f
       puts " elapsed : #{ b - a }"
     end
     Process::wait
   end

   s = "42"
   bs = s * 8192
   rep = %r/^\s*$/o

   time('small string strip-empty') do
     8192.times{ s.strip.empty? }
   end
   time('small string re') do
     8192.times{ s =~ %r/^\s*$/ }
   end
   time('small string re-precompiled') do
     8192.times{ s =~ rep }
   end
   time('big string strip-empty') do
     8192.times{ bs.strip.empty? }
   end
   time('big string re') do
     8192.times{ bs =~ %r/^\s*$/ }
   end
   time('big string re-precompiled') do
     8192.times{ bs =~ rep }
   end

at least it suprised me!

regards.

-a
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

???

Since when _isn't_ NUL whitespace? Despite the fact that it is
sometimes used as a delimiter (which is true for all the other
whitespace characters as well), it has no meaning, no glyph, does not
show up when printed--it doesn't even move the cursor/printhead. How
much more "whitespace" can you get?

-- MarkusQ

···

On Thu, 2004-09-23 at 08:34, Ara.T.Howard@noaa.gov wrote:

since when is NUL whitespace!? defintely against POLS.