Regexp help needed

Brubix · 1 May 2008 08:41

I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).

This is an example of the documents I have to deal with:

*1) Is this the first question ?
a. Yes
b. No

*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont' know

Many thanks in advance,
Bruno

James_Edward_Gray_II · 1 May 2008 14:19

I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).

I would probably do it without leaning on regular expressions in this case:

#!/usr/bin/env ruby -wKU

   DATA.each("") do |qna|
     answers = qna.to_a
     question = answers.shift
     question << answers.shift until question.strip[-1] == ??

     puts "Question: #{question}"
     puts "Answers:"
     puts answers
   end

   __END__
   *1) Is this the first question ?
   a. Yes
   b. No

   *2) Is this question composed
   of two lines ?
   a. Yes, indeed
   b. Maybe
   c. I dont' know

If you really want the regular expression though, this seems to work:

#!/usr/bin/env ruby -wKU

DATA.read.scan(/^(\*\d+\) [\s\S]+?\?) *\n((?:^[a-z]. .+?\n)+)/m) do

q, a|

     puts "Question: #{q}"
     puts "Answers:"
     puts a.to_a
   end

   __END__
   *1) Is this the first question ?
   a. Yes
   b. No

   *2) Is this question composed
   of two lines ?
   a. Yes, indeed
   b. Maybe
   c. I dont' know

Hope that helps.

James Edward Gray II

···

On May 1, 2008, at 3:41 AM, Brubix wrote:

yermej · 1 May 2008 14:35

qa = "*1) Is this the first question ?
a. Yes
b. No

*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont' know"

qa.scan(/(\*\d+\).*?\?.*?)([^*]*)/m).map do |q, ans|
[q, ans.scan(/\n([a-z]\..*)/).flatten]
end

=> [["*1) Is this the first question ?", ["a. Yes", "b. No"]], ["*2)
Is this question composed\nof two lines ?", ["a. Yes, indeed", "b.
Maybe", "c. I dont' know "]]]

That makes various assumptions about the appearance of '*' and how
answers start/end, but it should be a start.

···

On May 1, 3:41 am, Brubix <bruno.bazz...@tin.it> wrote:

I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).

This is an example of the documents I have to deal with:

Brubix · 2 May 2008 07:21

All the proposed solutions work perfectly !

Thanks to both of you.

Topic		Replies	Views
Regexp issue on parsing from file ruby-talk	10	135	15 August 2009
Regexp help ruby-talk	4	112	19 May 2010
Regexp help needed ruby-talk	4	66	27 April 2007
RegExp problem ruby-talk	2	90	17 May 2007
Regexp help sought ruby-talk	2	60	25 February 2005

Regexp help needed

Related topics