Representing Undirected Edges (advice, please)

Dear Jacob,

now you have confused me almost totally.
Actually, I was first responding to the subject because I had sowhat vaguely
in mind that it might help to formulate the problem in as a
shortest-path problem (define a cost function and try to minimise it. For
example,
in the song problem, pay me $1000000 each time the subsequent song doesn't
start with the letter
the previous ended with. Otherwise, pay nothing. If you have to pay nothing
after going through the whole path, you have found a correct solution and I
am still poor.

For this sort of problem, there is a way to find the "single-source shortest
path solution"
called Dijkstra's algorithm, which is nicely described here:

_http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html_
(http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html)

I feel that the way this algorithm works should tell one also how to
implement paths
as data structures (there is (non Ruby-) code and an animation on that
website).

My confusion is about your use of multiple criteria/connections/... .
In the original version of Dijsktra's algorithm, there is only one connection
from each edge to each other.
I feel that that's also enough, somehow, because you are assigning some
overall value to a specific path in the end, and it's only because of that
that
you can put order into the set of all possible paths to choose the best.
The value you assign to a specific connection may well be a result of many
factors, but in the end, you have to come up with a numeric value for it.
Otherwise, they cannot be classified - going from New York to Philadelphia
by plane may be three times as expensive as going there by car, but only
two times as nice - so what do you do?

So I'd now tend to use a vector-of-factors incidence matrix and to evaluate
these to assign a value to measure 'how nice is it to go from A to B via
i-j?'

Best regards,

Axel

now you have confused me almost totally.

Sorry about that. I have a tendency to do that. :slight_smile: I'll think of
examples that in my mind are clear but introduce incidental
complexities that don't have anything to do with the problem at hand
and end up muddying the water.

Actually, I was first responding to the subject because I had sowhat vaguely
in mind that it might help to formulate the problem in as a shortest-path
problem (define a cost function and try to minimise it. For example, in the
song problem, pay me $1000000 each time the subsequent song doesn't
start with the letter the previous ended with. Otherwise, pay nothing. If you
have to pay nothing after going through the whole path, you have found a
correct solution and I am still poor.

For this sort of problem, there is a way to find the "single-source shortest
path solution" called Dijkstra's algorithm, which is nicely described here:

_http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html_
(http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html\)

Yes, I'm familiar with Dijkstra's algorithm and the connection to
incidence matrices. That's why in my first post I said "Though an
incidence [matrix] can work (and work quite well) for a simple
graph..." (mistype of 'graph' for 'matrix' in the original post
corrected). I'll reexplain the case for multiple edges per node pair
below and why Djikstra's algorithm (in its original form) is not
applicable.

My confusion is about your use of multiple criteria/connections/... .

Ok, and this is where I muddied the water. My apologies on that.
Ignore the multiple criteria, and let's deal with an example that only
has one weight per edge.

Let N be the set of letters/characters with which a song may begin or
end, and E a set of (directed) edges representing songs, where each
edge connects the node representing the first character of the song to
the node representing the last character of the song. So, for
instance, the edge "Fingertips"[1] would connect the nodes [F] -> [S].
Now we can represent a graph G = { N, E }. Let also each edge be
weighted by the length of the song.

The goal is to determine the quickest (not necessarily shortest in
number of songs, but shortest in running time) song chain to get from
a starting letter to another letter B. This is a simple matter of
finding the cheapest path in our graph between the nodes [starting
letter] => [ending letter]. Djikstra's algorithm could be useful here,
except (as you say):

In the original version of Dijsktra's algorithm, there is only one connection
from each edge to each other.

Elaboration: Djikstra's algorithm uses an incidence matrix in which
each entry in the matrix represents the weight of the edge between the
indices. Obviously then, the matrix can only represent one edge per
index pair, where the indices are nodes. So Djikstra's algorithm only
allows one edge per node pair.

But in our example multiple edges can connect the same pair of nodes.
For example, we connected [F] -> [S] by "Fingertips" above. But the
song "Fake Out In Buenos Aires"[2] also connects [F] -> [S]. So you
can't just expect one edge, and Djikstra's algorithm is no longer
available (in it's original form... there may be an adaptation for
this, but I've never seen one -- I would love to be wrong).

Ok, so we've seen an example where there are multiple edges per node
pair, and we've seen why Djikstra's algorithm wouldn't work in that
model. But Djikstra's algorithm is cool and /fast/! So why not try a
different representation of the model the *would* allow Djikstra's
algorithm? Instead of having songs as edges, lets try songs as nodes
(as you mentioned). What should we represent in edges? A logical
choice would be to have a directed edge from song A to song B iff song
B can follow song A according to our rules. So there'd be an edge from
"Fingertips" to "Sapphire Bullets of Pure Love"[3] (s to S), but no
edge from "Sapphire Bullets of Pure Love" to "I Palindrome I"[4] (e to
I). That builds a working graph, but there are problems:

(1) Where do we start? Where do we end?

Maybe we can use an extra node that represents the start state and add
edges to all songs starting with the desired start letter. Same for
the goal state -- we add a node and connect from all songs ending with
the desired letter to it. The problem is that we can't reuse the same
graph for multiple runs -- if we change out start/goal conditions, we
need to change the graph.

(2) How do we weight the edges?

For our example the weight appropriately belongs to the song, which is
now a node instead of an edge. We can weight all edges coming out of a
song node with the length of the song. This together with the solution
to (1) makes for a viable graph. But I contest that the shift of
weight from the appropriate entity (song) to multiple edges is a
duplication and obfuscation of the model.

(3) The revised model is much, much larger.

Assume a database of N songs beginning or ending with one of M
letters/characters. The actual distribution of connections between
songs is probably much more complex, but let's just assume each
character is equally likely. So N/M songs end with a given character
and N/M songs will start with the same given character. So we'll need
(N/M)^2 edges each for M characters, or a total of (N^2)/M edges. So
our graph has N nodes and (N^2)/M edges. Compare that to M nodes and N
edges (one edge per song). Add to that the fact that N is going to be
orders of magnitude greater than M and you'll notice the difference.

Simply, I just wanted to point out that there are situations in which
it makes sense to have a graph with multiple edges per node pair. In
those situations, the original version of Djikstra's algorithm using
incidence matrices is not applicable.

Jacob Fugal

[1] Fingertips - TMBW: The They Might Be Giants Knowledge Base
[2] Fake Out In Buenos Aires - TMBW: The They Might Be Giants Knowledge Base
[3] Sapphire Bullets Of Pure Love - TMBW: The They Might Be Giants Knowledge Base
[4] I Palindrome I - TMBW: The They Might Be Giants Knowledge Base

···

On 5/10/05, Nuralanur@aol.com <Nuralanur@aol.com> wrote:

As far as we are concerned, there's only one effective connection from
each edge to the other: the shortest song. You will never go back to
that node, you will never need to traverse a second song from that node
to another particular node. And in no instance will a costlier
connection ever be considered over a cheaper connection connecting the
two same nodes.

As you build the map, use connections, and replace them if the new song
is shorter than the old song.

The only instances in which we need to store more than the shortest song
between two points is if we want other things, and not the shortest
directed path.

i.e: country, jazz, or blues songs have priority over others.

     we want the longest possible playlist without repeates.

     We want a play list that can approach a given time as close as
       possible. (i.e: a playlist exactly 15 minutes)

- Greg

···

On Wed, May 11, 2005 at 02:57:48AM +0900, Jacob Fugal wrote:

The goal is to determine the quickest (not necessarily shortest in
number of songs, but shortest in running time) song chain to get from
a starting letter to another letter B. This is a simple matter of
finding the cheapest path in our graph between the nodes [starting
letter] => [ending letter]. Djikstra's algorithm could be useful here,
except (as you say):

> In the original version of Dijsktra's algorithm, there is only one connection
> from each edge to each other.

Greg Millam wrote:

As far as we are concerned, there's only one effective connection from
each edge to the other: the shortest song. You will never go back to
that node, you will never need to traverse a second song from that node
to another particular node. And in no instance will a costlier
connection ever be considered over a cheaper connection connecting the
two same nodes.

But the shortest path between two songs may not always be the path through
the shortest songs from each song. Example (with made up songs, but I'm
sure this could happen with "real songs"):

Greg's Shortest Path:

···

---------------------
Hello There, 3:45
Echo It All, 2:34
Live It Up, 3:52
Pour It On, 2:45
No No No, 5:01

True Shortest Path:
---------------------
Hello There, 3:45
Eat a Ton, 5:34
No No No, 5:01

So you really have to traverse all possibilities, keep track of the
current best option, and only short-circuit when the current path exceeds
the best path at that point.

The beauty of this is that the traversal technique never really needs to
change, just the method of determining the best path, and when a path
needs to be short-circuited. So you can easily find the shortest path, the
longest, and the one that most closely fits in a given time slot by just
changing your weighting algorithm (in fact, you could do this all at once,
keeping a "best" for each.)

Damn, now that I think of this, I should go back and actually finish that
Monkey Barrel quiz (I got as far as the XML parsing :slight_smile:

Ryan

That's one particular solution to the Barrel of Monkeys quiz, and one particular type of path that may be desirable.

I understand the benefits of optimizations like Djikstra's algorithm and adjacency matrices, but (to be clear) there are certainly pathfinding cases where short/early optimizations do not apply.

For example, consider the Barrel of Monkeys quiz 'extra credit' part to find a playlist as close to a specific duration as possible. Or consider a desire to find a path where the mean song length is as close to 5:00 as possible.

Both of these lead towards bin-packing type problems (and hence really, really slow solutions) but if that's what you need, that's what's you need.

I'm very much enjoying reading the discussion in this thread, and eventually plan to implement optimizations where possible in my library. But my initial goal is to create a library that lets you specify any conceivable criteria for the path you want, and provide hints as to how to find it.

···

On May 10, 2005, at 1:38 PM, Greg Millam wrote:

As far as we are concerned, there's only one effective connection from
each edge to the other: the shortest song. You will never go back to
that node, you will never need to traverse a second song from that node
to another particular node. And in no instance will a costlier
connection ever be considered over a cheaper connection connecting the
two same nodes.

But the shortest path between two songs may not always be the path through
the shortest songs from each song. Example (with made up songs, but I'm
sure this could happen with "real songs"):

Greg's Shortest Path:
---------------------
Hello There, 3:45
Echo It All, 2:34
Live It Up, 3:52
Pour It On, 2:45
No No No, 5:01

True Shortest Path:
---------------------
Hello There, 3:45
Eat a Ton, 5:34
No No No, 5:01

Actually, what Greg is saying is that if there's another song
connecting [E] -> [N] aside from "Eat a Ton", say maybe "Elephants are
Fun" (running time 2:46), then "Elephants are Fun" will always be
preferred over "Eat a Ton" when trying to get a short playlist. So
yes, Greg's point is correct.

So maybe you won't ever care about multiple edges per node pair unless
you care about multiple criteria. I'm loathe to reintroduce that due
to the confusion it caused last time. But like Greg said:

The only instances in which we need to store more than the shortest song
between two points is if we want other things, and not the shortest
directed path.

i.e: country, jazz, or blues songs have priority over others.

   we want the longest possible playlist without repeates.

   We want a play list that can approach a given time as close as
     possible. (i.e: a playlist exactly 15 minutes)

This is a case in which you might have more than one edge per node
pair. However, thinking about what Alex was saying earlier, wouldn't
it be more efficient to build separate graphs -- one for each
criteria? This will cause the total number of nodes in memory to
increase (d * M instead of M, where d is the number of criteria) and a
possible increase in the number of edges in memory (in the worst case,
where there is only one edge per node pair anyways, d * N instead of
N). But still, in the worst case this is only a constant multiple (by
d) in memory usage. Weighed against that is at worse the same number
of edges and at best many fewer edges to consider per graph. So
searching for best paths could become much more efficient at minimal
cost of memory.

So, I may need to concede the point that multiple edges per node pair
*aren't* ever necessary, though conceptually they can be handy.
Despite all that, I've learned a lot from this discussion and thank
you all for all the feedback!

Jacob Fugal

···

On 5/10/05, Ryan Leavengood <mrcode@netrox.net> wrote:

Gavin Kistner wrote:

I'm very much enjoying reading the discussion in this thread, and
eventually plan to implement optimizations where possible in my
library. But my initial goal is to create a library that lets you
specify any conceivable criteria for the path you want, and provide
hints as to how to find it.

If you can do that in an elegant way, that would be really awesome. It
would also make solving searching problems like we've seen in these Ruby
Quizzes really easy (which is good.) Because those kind of search problems
come up a lot in computer science, and _not_ having to roll your own
search algorithm every time would be nice.

Ryan

Ryan Leavengood wrote:

If you can do that in an elegant way, that would be really awesome. It
would also make solving searching problems like we've seen in these Ruby
Quizzes really easy (which is good.) Because those kind of search problems
come up a lot in computer science, and _not_ having to roll your own
search algorithm every time would be nice.

I've been thinking for some time that a cool group project would be to work through a classic text on algorithms (e.g., Sedgewick) and develop a Ruby library for everything in it.

I don't have time for anything new right now, but I still think it'd be cool.

Steve

Yup, that would be cool. That's similar to what I did with Tim Jones'
"AI Application Programming" book here:

http://ai-app-prog.rubyforge.org/

Although I kind of ran out of gas and never did finish up the last few
chapters... and I was just learning Ruby, so the code is hurting... but
anyhow!

Yours,

Tom

···

On Thu, 2005-05-12 at 01:32 +0900, Steven Jenkins wrote:

Ryan Leavengood wrote:
> If you can do that in an elegant way, that would be really awesome. It
> would also make solving searching problems like we've seen in these Ruby
> Quizzes really easy (which is good.) Because those kind of search problems
> come up a lot in computer science, and _not_ having to roll your own
> search algorithm every time would be nice.

I've been thinking for some time that a cool group project would be to
work through a classic text on algorithms (e.g., Sedgewick) and develop
a Ruby library for everything in it.

I don't have time for anything new right now, but I still think it'd be
cool.

Steven Jenkins wrote:

Ryan Leavengood wrote:

If you can do that in an elegant way, that would be really awesome.
It would also make solving searching problems like we've seen in
these Ruby Quizzes really easy (which is good.) Because those kind
of search problems come up a lot in computer science, and _not_
having to roll your own search algorithm every time would be nice.

I've been thinking for some time that a cool group project would be to
work through a classic text on algorithms (e.g., Sedgewick) and
develop a Ruby library for everything in it.

I don't have time for anything new right now, but I still think it'd
be cool.

Same here. I once started with graph algorithms but got stuck in the
middle due to resource problems...

Kind regards

    robert