Fastest way to parse millions of file

how i can read file faster. I have millions of file my motto is to read
all files and extract particular patter and then create a hash with
those extracted value. What i did is open the file and then reading each
line.it is taking to much time.

···

--
Posted via http://www.ruby-forum.com/.

Is this a one off request or a general purpose search over your million
files?

If it is a general purpose request then I would say feed them all into a
search engine (ferret, lucene, sphyinx etc) and then search from that.

Other people will be able to tell you the best gem / package to use.

Also it might just be that some other tool (grep for example) might be
better suited for the find the data. What sort of pattern are you looking
for?

···

On 22 October 2013 10:18, Asmita Chauhan <lists@ruby-forum.com> wrote:

how i can read file faster. I have millions of file my motto is to read
all files and extract particular patter and then create a hash with
those extracted value. What i did is open the file and then reading each
line.it is taking to much time.

--
Posted via http://www.ruby-forum.com/\.

I am searching for all variable names which are set by reading files.

Eg.
ifdef(`SUM')

I am not getting what exactly you mean by one off request and general
purpose searching. But i can't feed them in search engine due to privacy
issue.

···

--
Posted via http://www.ruby-forum.com/.

Thanks all.
I got it. It is one off operation. and I did it by grep.
Thanks again. :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

It is possible to set up your own search engine on a desktop pc. It will
index all your data and store it locally on your pc. If the machine with a
million files on it is considered safe then building your own personal
search engine on the same machine should not be a privacy issue.

Having said that grep could still be the tool for you

$ grep 'ifdef.*SUM' files/*

Will search all the files that have the string 'ifdef' followed by 'SUM' on
the same line. Grep is quite fast.

What Peter meant is that if this is something that you want to
repeatedly use, such as today searching for this pattern, tomorrow for
a different pattern and so on, using this functionality many times,
then you might be better off using a search engine or database. Also
you can use a search engine locally without any privacy issues.

If it's just a one off operation, you probably won't need that complexity.

Jesus.

···

On Tue, Oct 22, 2013 at 12:41 PM, Asmita Chauhan <lists@ruby-forum.com> wrote:

I am searching for all variable names which are set by reading files.

Eg.
ifdef(`SUM')

I am not getting what exactly you mean by one off request and general
purpose searching. But i can't feed them in search engine due to privacy
issue.

I am searching for all variable names which are set by reading files.

If you just need the names then "egrep -o" is probably a more
efficient choice. You won't get a Ruby Hash from that though.

Eg.
ifdef(`SUM')

I am not getting what exactly you mean by one off request and general
purpose searching.

The opposite of "one off" is a repeated task.

But i can't feed them in search engine due to privacy
issue.

But you can use a text indexing engine like Lucene or others.
Although the effort pays off only if you need to do this repeatedly.

Cheers

robert

···

On Tue, Oct 22, 2013 at 12:41 PM, Asmita Chauhan <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/