Levenshtein_distance and recreate the string

Ams_Lo · 30 April 2008 01:13

Hi -

A general computer science question-

Given the levenshtein distance between two strings and one of the
strings S1, is it possible to re-create the second string.

For example -

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

Many thanks,

···

--
Posted via http://www.ruby-forum.com/.

Phil · 30 April 2008 01:31

Ams Lo wrote:

Hi -

A general computer science question-

Given the levenshtein distance between two strings and one of the
strings S1, is it possible to re-create the second string.

For example -

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

I see no reason why it shouldn't.

Looking at the Wikiality for the algorithm[0], the algorithm works,
essentially, on a matrix for the strings. Juxtaposing the axes should
solve that, in a naive, uneducated way, anyway.

After all, the Levenshtein distance is the same for RUBY -> BRUBY and
BRUBY -> RUBY.

[0] Levenshtein distance - Wikipedia

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan
Blog: http://justarubyist.blogspot.com

~ "That's the whole problem with science. You've got a bunch of
~ empiricists trying to describe things of unimaginable wonder."
~ --- Calvin

Rick_DeNatale1 · 30 April 2008 01:37

Hmmm, smells like a homework assignment.

Hint, do you suppose that strings other than "RUBY" might have a
levenshtein distance of 3 from "BRUY"?

···

On Tue, Apr 29, 2008 at 9:13 PM, Ams Lo <rgowka1@gmail.com> wrote:

Hi -

A general computer science question-

Given the levenshtein distance between two strings and one of the
strings S1, is it possible to re-create the second string.

For example -

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Mark_Woodward · 30 April 2008 02:00

would S1 be BOAT (BRUT, BRAT, BOAT), GRAN (BRAY, GRAY, GRAN), GOUT
(BOUY, BOUT, GOUT) or ....?
ie I think you need to know where you're going in order to know how to
get there.
What S1 are we recreating?

cheers,

···

On Apr 30, 11:13 am, Ams Lo <rgow...@gmail.com> wrote:

Hi -

A general computer science question-

Given the levenshtein distance between two strings and one of the
strings S1, is it possible to re-create the second string.

For example -

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

Many thanks,
--
Posted viahttp://www.ruby-forum.com/.

--
Mark

Eleanor_McHugh · 30 April 2008 11:01

Simple answer: no.

Each step that exists between S1 and S2 represents the choosing of one change out of a set of changes the same size as your token space (so for uppercase letters only that would be 26) and therefore to reverse that change you would generate that many valid words. Selecting which of those is the S1 word is impossible without additional constraints.

For the S1 -> S3 -> S1 case the actual search space (still assuming capital letters only) is therefore 26 ** 3, or 17576 equally plausible S1 candidates. Of course that assumes a naive search strategy, whereas it's quite possible the space could be restricted based upon known dictionary characteristics: certain tokens may be impossible at certain locations; genetic algorithms may grow 'good' solutions faster than exhaustive search; etc.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 30 Apr 2008, at 02:13, Ams Lo wrote:

Hi -

A general computer science question-

Given the levenshtein distance between two strings and one of the
strings S1, is it possible to re-create the second string.

For example -

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

Many thanks,

----
raise ArgumentError unless @reality.responds_to? :reason

Vidar_Hokstad · 30 April 2008 12:25

Several people have given great answers as to why this isn't possible,
but let me give you another reason why, just
because I feel like it, and it's a useful thing to keep in mind.

Whenever you get a problem like this, a good way of approaching it if
you don't understand the algorithm is to think:

- If it _was_ possible, would that allow you to make impossibly
efficient compression algorithms?

In this case, the answer is that yes it would: You could pick a fixed
S2, for example an empty string, calculate the
Levensthein distance from S1 to the fixed S2. The distance would be
proportional to the length of S1. In fact, if
you choose S2 to be the empty string, then the distance would be equal
to the length of S1 (it'd take one deletion
per position in S1 to end up with an empty string)

In other words, you'd "compress" your test string "BRUY" down to the
number 4, but more importantly that "compression"
method would compress any string of any length, which is obviously
impossible (the reason this is impossible is that
if it was possible you could apply it to it's own output over and over
until you had compressed an arbitrary input
down to a bit).

In this case it's not a very hard problem to solve, but I find that
for a large number of questions about data
transformations, it's helpful to think about it in terms of
compression because the answers often become blatantly
obvious once you restate them that way.

Vidar

···

On Apr 30, 2:13 am, Ams Lo <rgow...@gmail.com> wrote:

S1 = "RUBY"

S2 = "BRUY"

lev_distance = 3

Given 3 and S2, is it possible to recreate S1??

David_A_Black1 · 30 April 2008 01:39

Hi --

···

On Wed, 30 Apr 2008, Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ams Lo wrote:
> Hi -
>
> A general computer science question-
>
> Given the levenshtein distance between two strings and one of the
> strings S1, is it possible to re-create the second string.
>
> For example -
>
> S1 = "RUBY"
>
> S2 = "BRUY"
>
> lev_distance = 3
>
> Given 3 and S2, is it possible to recreate S1??

I see no reason why it shouldn't.

Looking at the Wikiality for the algorithm[0], the algorithm works,
essentially, on a matrix for the strings. Juxtaposing the axes should
solve that, in a naive, uneducated way, anyway.

After all, the Levenshtein distance is the same for RUBY -> BRUBY and
BRUBY -> RUBY.

But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
case, if you had RUBE and the L. distance, you could not recreate
RUBY; you'd have a whole set of strings that were exactly that
distance from RUBE.

David

--
Rails training from David A. Black and Ruby Power and Light:
   INTRO TO RAILS June 9-12 Berlin
   ADVANCING WITH RAILS June 16-19 Berlin
   INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!

Rick_DeNatale1 · 30 April 2008 12:47

Another thing to consider is why it's called a distance rather than,
say, a difference.

Consider this analogy.

Given that the highway distance between Raleigh, NC, and Washington,
DC is 250 miles. Given Washington, DC, and 250 miles, can I uniquely
identify Raleigh, NC.

No since the highway distance between Washington, DC and New York, NY
(as well as many other places) is also 250 miles.

···

On Wed, Apr 30, 2008 at 8:25 AM, Vidar Hokstad <vidar.hokstad@gmail.com> wrote:

Several people have given great answers as to why this isn't possible,
but let me give you another reason why, just
because I feel like it, and it's a useful thing to keep in mind.

Whenever you get a problem like this, a good way of approaching it if
you don't understand the algorithm is to think:

- If it _was_ possible, would that allow you to make impossibly
efficient compression algorithms?

In this case, the answer is that yes it would: You could pick a fixed
S2, for example an empty string, calculate the
Levensthein distance from S1 to the fixed S2. The distance would be
proportional to the length of S1. In fact, if
you choose S2 to be the empty string, then the distance would be equal
to the length of S1 (it'd take one deletion
per position in S1 to end up with an empty string)

In other words, you'd "compress" your test string "BRUY" down to the
number 4, but more importantly that "compression"
method would compress any string of any length, which is obviously
impossible (the reason this is impossible is that
if it was possible you could apply it to it's own output over and over
until you had compressed an arbitrary input
down to a bit).

In this case it's not a very hard problem to solve, but I find that
for a large number of questions about data
transformations, it's helpful to think about it in terms of
compression because the answers often become blatantly
obvious once you restate them that way.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Phil · 30 April 2008 01:56

David A. Black wrote:

But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
case, if you had RUBE and the L. distance, you could not recreate
RUBY; you'd have a whole set of strings that were exactly that
distance from RUBE.

Good point. I don't know if the L. distance is unique for each
possibility or not.

Considering how the algorithm looks, the solutions probably aren't unique.

But then again, you can use weighted probabilities to achieve the
correct result, I guess.

N.B.: I'm a bit out of my depth here, having no comp.sci. background
(which I regret..).

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan
Blog: http://justarubyist.blogspot.com

~ - You know you've been hacking too long when...
...you wake up and desperately try to start a compiler so you can use
the 15 minute waiting period to sleep some more.

David_A_Black1 · 30 April 2008 02:17

Hi --

···

On Wed, 30 Apr 2008, Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David A. Black wrote:
>
> But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
> case, if you had RUBE and the L. distance, you could not recreate
> RUBY; you'd have a whole set of strings that were exactly that
> distance from RUBE.

Good point. I don't know if the L. distance is unique for each
possibility or not.

Considering how the algorithm looks, the solutions probably aren't unique.

But then again, you can use weighted probabilities to achieve the
correct result, I guess.

N.B.: I'm a bit out of my depth here, having no comp.sci. background
(which I regret..).

Me too. I leapt on this thread partly to give myself an excuse to get
a more than anecdotal acquaintance with the algorithm. I can't claim
that it's *much* more than that, but every little bit helps

David

--
Rails training from David A. Black and Ruby Power and Light:
   INTRO TO RAILS June 9-12 Berlin
   ADVANCING WITH RAILS June 16-19 Berlin
   INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!

Ken_Bloom · 1 May 2008 14:10

It's not unique. Any single deletion has edit distance 1, any single
insertion has edit distance 1.

Thus, the strings that have edit distance 1 from "RUBY" include (but are
not limited to):

#delete a letter
RUB
RUY
RBY
UBY
#add the letter A
ARUBY
RAUBY
RUABY
RUBAY
RUBYA
#add the letter B
BRUBY
RBUBY
RUBBY
RUBYB
...

That's not to say there aren't reasons for recreating all of these. For
Mengmeng Du[1], recreating all of these and checking for their existance
in a Bloom filter was orders of magnitude faster than computing the edit
distance between a query and every entry in a 200,000 entry list of last
names. (Just a paper I happen to be reading this week.)

--Ken

[1] Du, Mengmeng (2005) "Approximate Name Matching" Master's Degree
Project, Royal Institute of Technology, Stockholm, Sweden

···

On Tue, 29 Apr 2008 20:56:30 -0500, Phillip Gawlowski wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David A. Black wrote:
>
> But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which |
case, if you had RUBE and the L. distance, you could not recreate |
RUBY; you'd have a whole set of strings that were exactly that |
distance from RUBE.

Good point. I don't know if the L. distance is unique for each
possibility or not.

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Axel_Etzold · 30 April 2008 08:34

Dear all,

maybe what the OP wants is not so much Levenstein distance but
the McIlroy-Hunt longest common subsequence (LCS) algorithm, an algorithm
which not only tells you how far apart two strings are (how
many changes you have to make to get from string A to string B),
but also where they have to be made.
There's a Ruby implementation by Austin Ziegler here:

http://raa.ruby-lang.org/project/diff-lcs/

If you want to know how many strings have Levenstein distance n from
a given string A, you'll probably have to create all possible
combinations of letters from the alphabet of lengths length(A)-n through length(A)+n, and
throw away those that don't have the correct Levenstein distance.
That's going to be an enormous amount of words in any case ...
Even if only words from a dictionary are allowed, in most
cases, you'll still have many words that are "close" to each other ;-(

Best regards,

Axel

···

--
Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games!

Topic		Replies	Views
Levenshtein 0.1.0 ruby-talk	2	71	24 May 2008
Mini quiz (Was: Detecting similar strings) ruby-talk	3	112	2 August 2006
Damerau-Levenshtein_distance ruby-talk	2	64	18 May 2007
How to find difference between two strings ruby-talk	6	168	31 July 2010
Edit Distance at Wikipedia ruby-talk	7	73	17 October 2006

Levenshtein_distance and recreate the string

Related topics