Ciao Stefano!
I get from the name that you’re Italian. My mother is from Sicily, my second "mother tongue" is Italian 
Hi Panagiotis,
There are some options for your problem, depending on the details, some may be better suited for your application than others. I do not know what the acronym 'PoC' stand for, so if something special is implied that I've overlooked, sorry.
As Peter said, stands for ‘Proof of Concept’.
DNA sequence alignment is a problem with many tools to solve it. However, using ruby there are few general options, like using the BioRuby framework for parsing output of alignment programs. Not sure this is the best option for you, but maybe worth looking into.
I’m creating a proof of concept Sinatra application to perform basic sequence alignment for my thesis (I’m about to get a degree in Pharmacy). I’m using the bioruby library for translation (DNA to protein). The tool will only deal with bacterias to avoid complications such as slicing and huge sequences (e.g. human genome).
You mentioned that you want to "... search substrings in large strings...", this sounds like you want to perform fitting alignment [1]. If this is the case, I'm not aware of many options, which is why I made a gem for this purpose [2]. It should be reasonably fast, since it uses the C++ SeqAn library [3] for the actual alignments. Please note, I made this gem for my own purposes, and I don't believe anyone else has actually used it, so use at your own risk.
Thanks for the pointers! I didn’t knew about the term ‘fitting alignment’ although I’ve read several papers about history of sequencing (from the 70s till today). That’s exactly what I need
Thanks for the the gem, didn’t knew about it! Will this gem work with protein sequences too? Say I’m translating a DNA sequence to proteins (find ORF’s on every strand with length > X, then compare findings with my sample?).
If in fact you are doing something more specific, such as if you are trying to map reads to a reference genome, I'd suggest using a proper read mapper, like bowtie [4]. You can then parse the output in ruby. There may be some gems to help with this, but it's simple enough to write one for yourself (that's what I did). I just found another gem [5], but have never tried it. This may also work for your purpose.
Hope this helps.
I don’t think I will need something as specific as bootie. Thanks for your help and gem!
Cheers.
References
-------------------
[1] ROSALIND | Glossary | Fitting alignment
[2] bioseqalign | RubyGems.org | your community gem host
[3] http://www.seqan.de/
[4] Bowtie: An ultrafast, memory-efficient short read aligner
[5] bio-bwa | RubyGems.org | your community gem host
Hello,
I'm building a PoC DNA sequence alignment tool. I need to be able to search substrings in large strings quickly. What's the best to approach this? I''m thinking of using simply regular expressions. Is there any external library that deals with this? Other than the fact that the sample string contains the sub-string, I need to know the exact position of the substring (where it starts where it ends).
Thanks,
Panagiotis (atmosx) Atmatzidis
email: atma@convalesco.org <mailto:atma@convalesco.org>
URL: http://www.convalesco.org <http://www.convalesco.org/>
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu <http://pgp.mit.edu/> --recv-keys 1A7BFEC5
"As you set out for Ithaca, hope the voyage is a long one, full of adventure, full of discovery [...]" - C. P. Cavafy
Panagiotis (atmosx) Atmatzidis
email: atma@convalesco.org
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5
"As you set out for Ithaca, hope the voyage is a long one, full of adventure, full of discovery [...]" - C. P. Cavafy
···
On 19 Oct 2014, at 08:03, Stefano Bonissone <stefano.rb@gmail.com> wrote:
On Thu, Oct 9, 2014 at 7:42 AM, Panagiotis Atmatzidis <atma@convalesco.org <mailto:atma@convalesco.org>> wrote: