Algebraic dynamic programming

Note: This was partially copied and Julia-translated from Mattéo’s Midiro presentation.

1. Background: semi rings

A semiring is an algebraic structure that incarnates the essence of “sum of products”. A couple of examples of semi rings would be the natural numbers, square matrices (in two different ways), real numbers, polynomials, and more.

abstract type Semiring end

A semi ring is a set with the following properties¹:

Has an addition operation. ( $+$ )
Has a multiplication operation. ( $\times$ )
Has additive identity. ( $0 + x = x$ )
Has a multiplicative identity. ( $1 * x = x$ )
Multiplication distributes over addition. ( $a * (b + c) = a * b + a * c$ )

In Julia, the functions that give use the zero and the one associated to a type are respectively zero and one. The addition and multiplication operations are + and * like you would expect.

2. Dynamic programming problems as graph traversal

Very often, dynamic programming problems can be framed as nearest neighbor problems in some appropriate abstract graph. Let’s look, as an example, at the editing distance.

What we mean by edit distance is the following. Given two strings $a$ and $b$ , their edit distance is the smallest weight of editing operations such that applying them on $a$ gives you $b$ . Available operations are the following:

Inserting a character.
Removing a character.
Keeping a character.

For instance, the sequence of operations “keep(c), keep(a), keep(r), delete(p), insert(v), insert(e)” edits the string “carp” into the string “carve”.

For this specific problem, the abstract graph would have, as nodes, pairs of strings. The edges of this graph would be those editing operations.

the state (“car”, “car) would be connected to the state (”carp“, ”car“) through the ”delete(p)" edge, for instance.

If we use as a starting node the state (“”, “”) (two empty strings), every path the finished at the state (“carp”, “carve”) corresponds to a specific way to modify the first string to turn it into the second.

Now that the problem is properly set up, let us move on to the way we will compare different paths through the graph: weights.

Then, we will choose some semi ring $S$ to represent the weights, then a weight function $\text{weight}_S(\cdot)$ that will give such a weight to each transition in the graph.

Finally, we will enumerate all the paths between the states of interest and compute the weight of these paths through summing (across all paths) the products (across transitions in a single path) of the weights of the transitions.²

$\text{Paths}(a,b) = \text{a set of sequences of transitions that start at $ a $ and end at $ b $}$

$\text{Weight}_S(a,b) = \sum_i \prod_{j=1}^{n_i} \text{weight}_S((\text{Paths}(a,b)_{i})_{j})$

Through this very general framework that looks a lot like simply computing the “weighted sum” of a subgraph, we can implement diverse algorithms.

3. Implementation of edit distance.

Let’s write a concrete implementation of this “algebraic dynamic programming” idea in julia. Again, we will be using the string editing distance as a toy problem. Here are the definitions of the string operations:

abstract type Op end
struct Ins <: Op
    c::Char
end
struct Del <: Op
    c::Char
end
struct Keep <: Op
    c::Char
end

In many cases, it is complicated (and expensive!) to enumerate all possible paths from source to destination, but in this specific case, the graph generated by the edit distance problem is easy to write down.

Here, we compute $\text{Weight}_S(a,b)$ recursively by using the weight of sub strings.

# Note: for this function to be efficient, you would ideally use
# memoisation. I don't here, because there is no builting way to do
# this.
function Weight(T::Type{<:Semiring}, s1, s2)
    # We will be computing a sum, so it makes sense to initialize this
    # with zero. Notice we pass the semiring type to the zero
    # function. This is not your garden variety zero, but the one from
    # your chosen semiring!
    o = zero(T)

    # From a modification that transforms s1[2:end] into s2, we can
    # create a transformation that transforms s1 into s2 by appending
    # Del(s1[1]) at the start of the transformations sequence.
    if length(s1) != 0
        o += weight(T, Del(s1[1])) * Weight(T, s1[2:end], s2)
    end

    # Same thing, but for insertion.
    if length(s2) != 0
        o += weight(T, Ins(s2[1])) * Weight(T, s1, s2[2:end])
    end

    # When the first character of both strings are the same, we can
    # keep it by adding Keep(c) at the front of the operations
    # sequnce.
    if length(s1) != 0 && length(s2) != 0 && s1[1] == s2[1]
        o += weight(T, Keep(s1[1])) * Weight(T, s1[2:end], s2[2:end])
    end

    # Finally, if both strings are empty, we denote triviality by
    # adding the multiplicative identity.
    if length(s1) == 0 && length(s2) == 0
        o += one(T)
    end
    o
end

Now that the function that computes the weight of our subgraph $\text{Weight}_S(a,b)$ is done, we can test it on different choices of $S$ .

3.1. Tropical semiring (minimum cost)

One very exotic semiring is the tropical semiring. At first glance, it looks a lot like usual real number, but it really is not.

The carrier set of the semiring is the set of real numbers plus infinity (which isn’t part of the reals).
The semiring addition $a \oplus b$ on two elements is the $\min(a,b)$ operation on two reals.
Multiplication of semiring elements is addition of reals.
The neutral element under addition is plus infinity.
The neutral element under multiplication is zero.
Verify that the other properties are respected.

struct MinCost <: Semiring
    cost
end

Base.zero(::Type{MinCost}) = MinCost(Inf)
Base.one(::Type{MinCost}) = MinCost(0)

Base.:*(a::MinCost, b::MinCost) = MinCost(a.cost + b.cost)
Base.:+(a::MinCost, b::MinCost) = MinCost(min(a.cost, b.cost))

Here, our weights are defined in the following manner: inserting or deleting a character costs 1 unit, whereas keeping an already existing character is free.

weight(::Type{MinCost}, ::Ins) = MinCost(1)
weight(::Type{MinCost}, ::Del) = MinCost(1)
weight(::Type{MinCost}, ::Keep) = MinCost(0)

It is easy to see that if we combine sequentially two operations (through multiplication), then their cost will be summed whereas if we take the sum (combine horizontally two operations), then we will keep the minimum of the two costs.

If we evaluate the graph weight function on “carp” and “carve”, we should get something that is at least as cheap as the three operations we used earlier.

Weight(MinCost, "carp", "carve")

MinCost(3.0)

It works! However, it is not very useful to only see the cost of the best solution, we also want the solution itself!

3.2. The free semi ring

We would really like to access to the optimal sequence implicitly computed by the algorithm of which the MinCost is the cost. Before we do that, however, we will need to be able to extract from the Weight procedure the entire set of possible paths. This is what the free semiring is for.

In algebra, the “free X generated by the alphabe $A$ ” is a very standard construction where we take the $A$ set, then add just enough elements and equalities such that $A'$ becomes an X. In the case of semi rings, this means we must make the elements multiply, add, contain a neutral element for those two operations et cetera. It turns out that once we have done that, what we end up with is a semi ring where:

Each element of $\text{Free}(A)$ is a set of sequences of $A$ ’s.
Addition is union of sets of sequences.
Multiplication is concatenation of every left sequence with every right sequence.
Additive identity is the empty set.
Multiplicative identity is the set that only contains the empty sequence.

Implemented in Julia, those operations look like this:

struct Free{A} <: Semiring
    choices::Set{Vector{A}}
end

Free(s::Set{Vector{A}}) where A = Free{A}(s)

Base.zero(::Type{Free{A}}) where A = Free{A}(Set([]))

Base.one(::Type{Free{A}}) where A = Free{A}(Set([[]]))

Base.:+(a::Free{A}, b::Free{A}) where A = Free{A}(a.choices ∪ b.choices)
Base.:*(a::Free{A}, b::Free{A}) where A = Free{A}(
    Set([vcat(parta, partb)
         for parta in a.choices
         for partb in b.choices]))

What is interesting here, is that we can generate a free semi ring from every single possible alphabet set. What this suggests with regards to the operation weight function is that we should just embed each operation in the free semi ring generated by the operations.

We do that by mapping every transition to the singleton set that contains a sequence containing only that thing.

weight(::Type{Free{Op}}, o::Op) = Free{Op}(Set([[o]]))

What happens next will might surprise you. Each time the Weight function combines the weights of two half paths into a single big path weight, it will actually be combining all left possible paths with all right possible paths. When two possible paths from $a$ to $b$ are added, the two sets of possible paths are unioned. At the end of the function’s execution, it returns us a set of all possible operation sequences that start at $a$ and end at $b$ .

Let’s see what those look like in the case of “carp” and “carve”:

# (Il y en a vraiment beaucoup)
for seq in Iterators.take(Weight(Free{Op}, "carp", "carve").choices, 10)
    println(seq)
end

Op[Ins('c'), Del('c'), Ins('a'), Del('a'), Keep('r'), Ins('v'), Del('p'), Ins('e')]
Op[Del('c'), Del('a'), Ins('c'), Ins('a'), Ins('r'), Del('r'), Ins('v'), Del('p'), Ins('e')]
Op[Ins('c'), Ins('a'), Del('c'), Ins('r'), Del('a'), Ins('v'), Ins('e'), Del('r'), Del('p')]
Op[Ins('c'), Ins('a'), Del('c'), Del('a'), Del('r'), Ins('r'), Ins('v'), Del('p'), Ins('e')]
Op[Ins('c'), Ins('a'), Ins('r'), Del('c'), Del('a'), Ins('v'), Ins('e'), Del('r'), Del('p')]
Op[Keep('c'), Ins('a'), Del('a'), Keep('r'), Del('p'), Ins('v'), Ins('e')]
Op[Del('c'), Ins('c'), Ins('a'), Ins('r'), Del('a'), Del('r'), Del('p'), Ins('v'), Ins('e')]
Op[Keep('c'), Ins('a'), Del('a'), Del('r'), Del('p'), Ins('r'), Ins('v'), Ins('e')]
Op[Keep('c'), Ins('a'), Ins('r'), Del('a'), Ins('v'), Del('r'), Del('p'), Ins('e')]
Op[Ins('c'), Del('c'), Del('a'), Ins('a'), Del('r'), Ins('r'), Ins('v'), Del('p'), Ins('e')]

Even though the two words are quite similar, there are a lot of ways to turn one into the other and enumerating all of them (especially the longer ones) is really not the point of this exercise. Where the free semi ring shines is when we use it in combination with other semi rings as we will be doing soon.

Before we go there, however, how would we go about counting the number of distinct paths, if we did not want to construct them in the process?

3.3. Count semi ring

struct Count <: Semiring
    n::Int
end

Base.zero(::Type{Count}) = Count(0)

Base.one(::Type{Count}) = Count(1)

Base.:+(a::Count, b::Count) = Count(a.n + b.n)
Base.:*(a::Count, b::Count) = Count(a.n * b.n)

weight(::Type{Count}, ::Op) = Count(1)

What the “counting semi ring”, really is, is the good old natural numbers. Let’s use it on the Weight function, for a laugh

println(Weight(Count,"carp", "carve"))

let’s compare to what the free semiring gives us

println(length(Weight(Free{Op}, "carp", "carve").choices))

3.4. Selector semi ring

If we have two semi rings, a very standard algebraic construction is to consider the product of those two semi rings. However, to make something useful, we will also be asking for the first semi ring to have the structure of an order.

We can construct a semi ring of pairs where the first component is kind of like a “cost” and the second, a “payload”. Here is how it goes:

Addition of $(c_1, p_1)$ and $(c_2, p_2)$ compares $c_1$ and $c_2$ . If they are equal, return $(c_1, p_1 + p_2)$ . If they are not, return the pair with the lowest cost.
Multiplication is component-wise multiplication.
Additive identity is the pair of additive identities.
Multiplicative identities is the pair of multiplicative identities.

Here is how the implementation looks:

struct Sel{A,B} <: Semiring
    a::A
    b::B
end

Base.zero(::Type{Sel{A,B}}) where {A,B} = Sel{A,B}(zero(A), zero(B))
Base.one(::Type{Sel{A,B}}) where {A,B} = Sel{A,B}(one(A), one(B))

Base.:*(l::Sel{A,B},r::Sel{A,B}) where {A,B} = Sel{A,B}(l.a * r.a, l.b * r.b)
Base.:+(l::Sel{A,B},r::Sel{A,B}) where {A,B} =
    if l.a == r.a
        Sel{A,B}(l.a, l.b + r.b)
    elseif l.a < r.a
        l
    elseif l.a > r.a
        r
    end

What is interesting is that because this semi ring essentially inherits its structure from two other semi rings, we can also inherit our weight functions from the two semi rings.

weight(::Type{Sel{A,B}}, x) where {A,B} = Sel(weight(A, x), weight(B, x))

It is this construction that lets us compute the best candidate while remembering which is the best candidate.

# Il faut au préalable définir une relation d'ordre pour la composante
# gauche.
Base.isless(a::MinCost, b::MinCost) = a.cost < b.cost

r = Weight(Sel{MinCost, Free{Op}}, "carp", "carve")
println("coût: $(r.a)")
for elem in r.b.choices
    println("séquence: $(elem)")
end

coût: MinCost(3)
séquence: Op[Keep('c'), Keep('a'), Keep('r'), Ins('v'), Del('p'), Ins('e')]
séquence: Op[Keep('c'), Keep('a'), Keep('r'), Del('p'), Ins('v'), Ins('e')]
séquence: Op[Keep('c'), Keep('a'), Keep('r'), Ins('v'), Ins('e'), Del('p')]

It looks like there are three optimal ways to edit “carp” into “carve”.

4. Generalized Floyd-Warshall algorithm

Since the beginning, we have framed in graph terms problems that aren’t intrinsically graph problems. What about graph-native stuff, like shortest path problems?

4.1. Matrix product, deconstructed

Let there be two matrices that we can multiply

$M : \mathbb{R}^{a \times b}$

$N : \mathbb{R}^{b \times c}$

The formula to evaluate their product is the following:

$(MN)_{ik} = \sum_{j=1}^{b} M_{ij}N_{jk}$

Do you notice anything interesting? What properties should the component of matrices of something be so that they can be multiplied?

4.2. Adjacency matrices, revisited

Let there be a directed graph with weights on the edges (you can ignore the weights, for now):

You might already know this, but it is very useful to study such a graph through its adjacency matrix. This matrix is constructed by writing the relationship between the $i$ node and the $j$ nod at a matrix’s $ij$ ’th component. Usually, that means putting a one where there is an edge and a zero otherwise.

A = [0  1  0  0  1
     0  0  1  1  0
     0  0  0  1  0
     0  0  0  0  0
     0  0  1  0  0]

On neat computation we can do using this matrix is counting the number $n$ -length paths from $a$ to $b$ by taking the $n$ ’th power of the adjacency matrix.

println(repr("text/plain", A^2))

5×5 Matrix{Int64}:
 0  0  2  1  0
 0  0  0  1  0
 0  0  0  0  0
 0  0  0  0  0
 0  0  0  1  0

We can see that there are two paths of length 2 from node 1 to node 3. But doesn’t this operation seem familiar? It should. The way we use this adjacency matrix is very similar to how we used the Count semi ring earlier.

4.3. Generalized adjacency matrix

The two previous remarks’ purpose were to bring us to the following question:

If we consider a graph where the weights live in some semi ring, and we consider its adjacency matrix where the components are those weighs, and we observe the successive powers of this matrix, what do they mean?

In fact, this gives a way to compute the same kind of things the Floyd-Warshall algorithm would let us compute. Before we understand why, let’s do some more experiments on this construction:

Let’s take a graph where the weight of each edge is the name of the edge. We will be using the free semi ring.

struct Edge
    name::String
end

The adjacency matrix of this graph will look like this:

S = Free{Edge}
O = zero(S)
i(n) = Free{Edge}(Set([[Edge(n)]]))

A = [O  i("1>2")  O         O         i("1>5")
     O  O         i("2>3")  i("2>4")  O
     O  O         O         i("3>4")  O
     O  O         O         O         O
     O  O         i("5>3")  O         O]

Notice here that O means “the zero in the appropriate semiring” and that i injects the name into it. The powers of this matrix will generate sets of paths of fixed length.

using LinearAlgebra

# Necessary for Julia's matrix product to properly use the operations.
Base.zero(::T) where T <: Semiring = zero(T)


for path in (A^2)[2, 4].choices
    println(path)
end

Edge[Edge("2>3"), Edge("3>4")]

If we also want to get paths that are length $\le n$ , we can add ones into the diagonal of the matrix like this:

let
    A′ = A + diagm(ones(Free{Edge}, 5))
    for path in (A′^3)[2,4].choices
        println(path)
    end
end

Edge[Edge("2>4")]
Edge[Edge("2>3"), Edge("3>4")]

A new path with length 2 that wasn’t visible before has appeared.

Again, enumerating the set of paths is not very useful. What if we reapply our earlier trick (using Sel) to do more useful stuff?

Again, we fill up the diagonal with ones to consider paths that aren’t exactly $n$ edges long.

S = Sel{MinCost, Free{Edge}}
O = zero(S)
I = one(S)
i(c, n) = S(MinCost(c), Free{Edge}(Set([[Edge(n)]])))

A = [I  i(1, "1>2")  O           O             i(1, "1>5")
     O  I            i(1, "2>3")  i(1, "2>4")  O
     O  O            I            i(1, "3>4")  O
     O  O            O            I            O
     O  O            i(1, "5>3")  O            I]

let
    result = (A^5)[1,4]
    println("cost: $(result.a)")
    for path in result.b.choices
        println("path: $(path)")
    end
end

cost: MinCost(2)
path: Edge[Edge("1>2"), Edge("2>4")]

We found the shortest path!

An astute reader might remark, however: “Hey, sure, you have found the shortest path, but it cost you $O(n^4)$ complexity!” Yes. Here, each matrix-matrix product costs $n^3$ multiplications and we do $n$ of those, so we get very very bad complexity.

We could organize the way we do exponentiation more efficiently (by leveraging $M^{2n + k} = (M^{n})^2M^k$ ) making the computation $O(n^3 \log n)$ , but that is still very bad.

Another thing we can do, is notice that in many cases, we don’t actually care about most of the optimal paths. Usually, we want a single optimal path from a source to a destination and that’s it.

We can leverage this weaker need by computing our shortest path as an iterated matrix vector product (costs $O(n^2)$ instead of $O(n^3$ ).³

$\begin{equation*} \begin{aligned} (M^n)_{ij} &= e_i^\top M^n e_j \\ &= ((\dots((e_i^\top M) \dots ) M) e_j \end{aligned} \end{equation*}$

V = [I O O O O]

let
    result = (V * A * A * A * A * A)[4]
    println("cost: $(result.a)")
    for path in result.b.choices
        println("path: $(path)")
    end
end

cost: MinCost(2)
path: Edge[Edge("1>2"), Edge("2>4")]

We finally achieved⁴ $O(n^3)$ , but if we compare it to the classical algorithm for that class of problem, the Floyd-Warshall algorithm, we barely achieve its complexity while doing $n$ times less work (FW compute all source and all destinations shortest paths). This is still bad. How could we translate the FW algorithm to our weird semiring matrix usecase?

4.4. Generalized Floyd-Warshall

The Floyd-Warshall algorithm works recursively. If we know all the shortest paths among all $1, \dots, k-1$ nodes, then we can quickly extend them to get shortest paths among all $1, \dots, k$ nodes.

Each iteration up, we:

Consider the matrix $M_{k-1}$ of optimal paths for all nodes $< k$ .
Consider as $A$ the adjacency matrix of the first $k$ nodes (it is the $k$ wide upper left square of the complete adjacency matrix).
We try to improve the $< k$ paths by asking, for each pair $i, j < k$ of nodes: “is my current opimal $(i,j)$ -path as good as going $i,k,j$ ?”
We compute the optimal paths from $k$ to everywhere (that is $<k$ ) by gluing together the paths from $k$ to everywhere ( $<k$ ) with the paths from everywhere ( $<k$ ) to everywhere ( $<k$ ).
We compute the optimal paths from everywhere (again $<k$ ) to $k$ by gluing together the paths from everywhere to everywhere with the paths from everywhere to $k$ .
The best path from $k$ to $k$ is the empty path, the multiplicative identity.

When we try to formulate these steps as matrix operations, we get that the new matrix of optimal paths is the sum of two matrices:

The matrix computed with the exterior product of the $k$ ’t column with the $k$ ’th row of the adjacency matrix. This is the $i,k,j$ part.
The block matrix which contains $M_{k-1}$ , $M_{k-1} A_{1:k-1,k}$ , $A_{k, 1:k-1} M_{k-1}$ and $1$ .

$\begin{equation*} \begin{aligned} M_{k} = A_{1:k,k} A_{k,1:k} + \left(\begin{matrix} M_{k-1} & M_{k-1}A_{1:k-1,k} \\ A_{k, 1:k-1} M_{k-1} & 1 \\ \end{matrix}\right) \end{aligned} \end{equation*}$

Our implementation is this:

function generalized_floyd_warshall(A::Matrix{S}) where S
    k,_ = size(A)
    if k == 0
        return A
    end

    Mₖ₋₁ = generalized_floyd_warshall(A[1:end-1,1:end-1])


    A[:,end:end] .* A[end:end,:] +
        # I love julia's block matrix notation.
    [Mₖ₋₁                         Mₖ₋₁ * A[1:end-1,end:end]
     A[end:end,1:end-1] * Mₖ₋₁    one(S)]
end


let
    result = generalized_floyd_warshall(A)[1,4]
    println("cost: $(result.a)")
    for path in result.b.choices
        println("path: $(path)")
    end
end

cost: MinCost(2)
path: Edge[Edge("1>2"), Edge("2>4")]

This algorithm runs $n$ iterations where each iteration contains 3 operations of complexity $O(n^2)$ . We have achieved Floyd-Warshall’s complexity bound! (because that is what we, albeit in an obfuscated way, have implemented).

5. Generalized dijkstra algorithm

The “shortest path calculating algorithm” we have implemented, however, is not the one familiar to most of you dear readers, however. This is because in the most general case of “shortest path through graph where edge have weights”, Dijkstra’s algorithm doesn’t actually work. In fact, it relies on an additional assumption we haven’t made until now: positive weights.

5.1. Monotone vs non-monotone semi rings

Dijkstra’s algorithm is able to beat Floyd-Warshall by leveraging the assumption of positive weights. What this lets Dijkstra do is early stopping: once the algorithm visits the target node, it can be sure no new paths are going to be better.

If we translate this “positive weights” in semi ring-speak, we get:

The existence of an order operation on the elements. This is easy, since Sel already uses this for its addition.
For all weights $w \in S$ of some edge, multiplying by $w$ is a monotone operation. This means that $\forall s,t \in S. s \le t \Rightarrow sw \le tw$

Let’s implement the order structure for Sel.

Base.isless(x::Sel{A,B}, y::Sel{A,B}) where {A,B} = x.a < y.a

With this property, Dijkstra’s algorithm can be used almost verbatim. We only need to tweak a few types to make it work.

Because julia doesn’t have a builtin priority heap, let me write a slow implementation of one first.

mutable struct MinHeap
    keys
    values
end

function MinHeap()
    MinHeap(
        [],
        [],
    )
end

function Base.pop!(h::MinHeap)
    k = pop!(h.keys)
    v = pop!(h.values)
    k, v
end

function insert!(h::MinHeap, k, v)
    push!(h.keys, k)
    push!(h.values, v)

    perms = sortperm(h.keys, rev=true)
    h.keys = h.keys[perms]
    h.values = h.values[perms]
end

Since dijkstra isn’t using the same graph representation as FW, we need to massage the graph into a different representation.

# Here a graph is represented by a function that gives you the
# outgoing edges to a given node.
function the_graph(from_node)
    if from_node == 1
        [(1, 2), (1,5)]
    elseif from_node == 2
        return [(2, 3), (2, 4)]
    elseif from_node == 3
        return [(3, 4)]
    elseif from_node == 4
        return []
    elseif from_node == 5
        return [(5,3)]
    else
        error("no such node:", from_node)
    end
end

# The weight of an edge is a tuple of a constant 1 and the edge name.
function cost(n)
    edges = [(1,2), (1,5), (2,3), (2,4), (3,4), (5,3)]
    if n ∈ edges
        a,b = n
        Sel(MinCost(1), Free(Set([[Edge("$(a)>$(b)")]])))
    else
        error("no such edge", n)
    end
end

target((a,b)) = b

Now we are ready for the final algorithm. Behold, Dijkstra!!

function generalized_dijkstra(S::Type{<:Semiring}, graph, cost, s₀, s₁)
    queue = MinHeap()
    state_costs = Dict()
    visited = Set()

    state_costs[s₀] = one(S)
    insert!(queue, state_costs[s₀], s₀)

    while true
        source_cost, source_state = pop!(queue)
        if source_state == s₁
            return state_costs[source_state]
        end

        if source_state ∈ visited
            continue
        else
            push!(visited, source_state)
        end

        edges = graph(source_state)
        for edge in edges

            target_state = target(edge)
            if target_state ∉ keys(state_costs)
                state_costs[target_state] = zero(S)
            end

            state_costs[target_state] += source_cost * cost(edge)
            insert!(queue, state_costs[target_state], target_state)
        end
    end
end

println(generalized_dijkstra(Sel{MinCost, Free{Edge}}, the_graph, cost, 1, 4))

Sel{MinCost, Free{Edge}}(MinCost(2), Free{Edge}(Set(Vector{Edge}[[Edge("1>2"), Edge("2>4")]])))

Footnotes:

They are other properties, but they aren’t as useful for intuition.

Note: This looks a lot like what the wikipedia page on weighted automata calls the weight of a word.

This optimization looks a lot like the idea of computing gradients instead of jacobians in automatic differentiation. We leverage a specific more efficient order of computation for complexity gains.

⁴

Be careful! in Julia, the multiplication operator is left associative ( $a*b*c = (a*b)*c$ ). If that weren’t the case, the same text would parse as repeated matrix multiplication, followed by vector matrix multiplication.