Regular expressions and long text

Guillermo_Acilu · 20 June 2008 17:56

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

Sandro_Paganotti · 20 June 2008 19:01

I did not understand if you want to split the string on the full stop
str.split(".")
or divide the string in words and split them in two groups:

str = "Ruby is great. We all know that."
([(v=str.split(" "))[0...k=((l=(v.size))/2)]]+[v[k..l]]).map{|e|e.join(" ")}
=> ["Ruby is great.", "We all know that."]

···

On Fri, Jun 20, 2008 at 5:56 PM, <Guillermo.Acilu@koiaka.com> wrote:

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words =
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

--
Go outside! The graphics are amazing!

Mike_Austin4 · 24 June 2008 10:18

Guillermo.Acilu@koiaka.com pisze:

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

[cut]

I would like to do the following:
str = "Ruby is great. We all know that."
and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping through the string looking for the "."?

Hi,
maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"

greetings

Raveendran_P · 25 June 2008 05:25

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=
words=a.scan(/\w+/)

=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendran

unknown wrote:

···

Hello guys,

I've started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words =
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

--
Posted via http://www.ruby-forum.com/\.

Jun_Young_Kim · 11 December 2008 08:06

Hi,

I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

  contents = File.read("data")
  contents.gsub!(aPatten, aReplace)
  File.open("result", "w") do |file|
    file << contents
  end
end

Another class give an aPatten argument as a "/[aeiou]/" and aReplace as a "*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead of "/[aeiou]/".

Any ideas on how to do I convert string to patten?

Bryan_JJ_Buckley1 · 21 June 2008 09:23

You can split on a regex for a full-stop followed by (optional) whitespace.

>> str.split(/\.\s?/)
=> ["Ruby is great", "We all know that"]

···

--
JJ

Zhukov_Pavel · 24 June 2008 10:51

even more simple

irb(main):001:0> "Ruby is great. We all know that.".split(".")
=> ["Ruby is great", " We all know that"]

···

On Tue, Jun 24, 2008 at 2:18 PM, shaman <noone@nowhere.com> wrote:

Guillermo.Acilu@koiaka.com pisze:

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

[cut]

I would like to do the following:
str = "Ruby is great. We all know that."
and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Hi,
maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"

greetings

Caike · 4 July 2008 00:11

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=
words=a.scan(/\w+/)

=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendranhttp://raveendran.wordpress.com

unknown wrote:
> Hello guys,

> I've started with Ruby a month ago and I am doing some works with
> strings
> and regular expressions. I am trying to take a long text and store the
> individual sentences in an array. I can split a sentence in words and
> store them in an array, but I cannot manage to do it with sentences.

> I have used the following assignment to work with the words:

> str = "Ruby is great"
> words =
> words = str.scan(/\w+/)

> The result is words[0]="Ruby" words[1]="is" and words[3]="great"

> I would like to do the following:

> str = "Ruby is great. We all know that."

> and get words[0]="Ruby is great" and ruby[1]="We all know that"

> Any ideas on how to do it with a regular expression instead of looping
> through the string looking for the "."?

> Thanks,

> Guillermo

--
Posted viahttp://www.ruby-forum.com/.

If you want to stick to a regex based solution.

str = "one one one. two. three."

=> "one one one. two. three."

str.scan(/\w[\s|\w]*./)

=> ["one one one.", "two.", "three."]

And you could keep going adding more words in the same pattern

str = "one one one. two. three. four. five."

=> "one one one. two. three. four. five."

str.scan(/\w[\s|\w]*./)

=> ["one one one.", "two.", "three.", "four.", "five."]

It may not be the best solution to this problem, but it is always good
have your regexp skills up to date

···

On Jun 25, 2:25 am, Raveendran Jazzez <jazzezr...@gmail.com> wrote:

Robert_K1 · 11 December 2008 08:17

How about looking at the documentation?

http://www.ruby-doc.org/core/classes/Regexp.html

Btw, I rather tend to make it a requirement that the argument has the
appropriate type. Since #gsub is capable of working with String and
Regexp as pattern, I would not change your method's implementation but
the code invoking it.

Taking this one step further: I would choose a different abstraction:

def transform from_file, to_file
  repl = yield(File.read(from_file)) and
  File.open(to_file, "w") do |io|
    io.write(repl)
  end
end

Then you can do

transform "data", "result" do |content|
content.gsub! /[aeiou]/, "*"
content
end

Cheers

robert

···

2008/12/11 Jun Young Kim <jykim@altibase.com>:

I've one program to replace text's contents.

def replace (aPatten, aReplace)

       # I need some logic to translate string to patten

       contents = File.read("data")
       contents.gsub!(aPatten, aReplace)
       File.open("result", "w") do |file|
               file << contents
       end
end

Another class give an aPatten argument as a "/[aeiou]/" and aReplace as a
"*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead of
"/[aeiou]/".

Any ideas on how to do I convert string to patten?

--
remember.guy do |as, often| as.you_can - without end

Brian_Candler · 11 December 2008 08:38

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/

···

--
Posted via http://www.ruby-forum.com/\.

Hassan_Schroeder · 4 July 2008 02:23

Very late to this thread, but...

···

On Sat, Jun 21, 2008 at 2:23 AM, Bryan JJ Buckley <jjbuckley@gmail.com> wrote:

You can split on a regex for a full-stop followed by (optional) whitespace.

>> str.split(/\.\s?/)
=> ["Ruby is great", "We all know that"]

str="Dr. Feelgood will meet you at the corner of Foo St. and Bar Dr.
tonight at 8:00; bring $2.98 -- exact change -- to resolve the 5.5%
interest you owe."

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com

Jun_Young_Kim · 11 December 2008 09:06

thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=> /\/[aeiou]\//

2008. 12. 11, 오후 5:38, Brian Candler 작성:

···

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/

--
Posted via http://www.ruby-forum.com/\.

Jun_Young_Kim · 12 December 2008 03:00

I mean I have a regular expression as a string.

puts aPattern
=> "/[aeiou]/"

When I convert it as a Regexp instance, the result is
=> /\/[aeiou]\//

At this point, the given regular pattern is not regular expression anymore, it's just a string.

2008. 12. 11, 오후 6:06, Jun Young Kim 작성:

···

thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=> /\/[aeiou]\//

2008. 12. 11, 오후 5:38, Brian Candler 작성:

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/

--
Posted via http://www.ruby-forum.com/\.

_Pena_Botp1 · 12 December 2008 03:13

# I mean I have a regular expression as a string.
# puts aPattern
# => "/[aeiou]/"
# When I convert it as a Regexp instance, the result is
# => /\/[aeiou]\//
# At this point, the given regular pattern is not regular expression
# anymore, it's just a string.

it is stil a regex, not just the regex that you expected though.

you can either remove the surrounding slashes

s="/[aeiou]/"
Regexp.new s[1..-2]
#=> /[aeiou]/

or you can just eval it straight away

eval(s)
#=> /[aeiou]/

···

From: Jun Young Kim [mailto:jykim@altibase.com]

Jun_Young_Kim · 12 December 2008 04:56

Hi , all

There is a ruby parse library , as you know, called "Treetop".

some part of logic in my program try to parse regular expressions as a single token.

let me give example for easy understanding.

translate /[aeiou]/ "*"

this means translate all chars having a /[aeiou]/ to *.

any idea to create rule to parse it ?

Topic		Replies	Views
Tough Ruby Homework ruby-talk	17	105	16 September 2011
Split a sentence by regular expression ruby-talk	1	114	26 April 2008
#split vs. #length. Different returns ruby-talk	2	154	10 February 2013
Question about split method ruby-talk	5	84	16 March 2009
String spliting and inclusion ruby-talk	16	89	23 July 2009

Regular expressions and long text

Related topics