Edit Quotient

Remember Levenshtein Distance: the number of changes to turn one string into another? Here’s a naïve implementation in Clojure:

(defn levenshtein [s t]
  (let [m (count s)
        n (count t)
        d (make-array Integer/TYPE (inc m) (inc n))]
    (dotimes [i m] (aset-int d i 0 i))
    (dotimes [j n] (aset-int d 0 j j))
    (doseq [j (range 1 (inc n))]
      (doseq [i (range 1 (inc m))]
        (if (= (.charAt s (dec i)) (.charAt t (dec j)))
          (aset-int d i j (aget d (dec i) (dec j)))
          (aset-int d i j (min (inc (aget d (dec i) j))
                               (inc (aget d i (dec j)))
                               (inc (aget d (dec i) (dec j))))))))
    (aget d m n)))

(assert (= 0 (levenshtein "" "")))
(assert (= 3 (levenshtein "foo" "foobar")))
(assert (= 3 (levenstein "kitten" "sitting")))
(assert (= 3 (levenstein "Saturday" "Sunday")))

I ripped that straight off Wikipedia’s pseudocode. Not functional at all, and probably not particularly efficient either.

But I’ve never seen this: the number of changes to turn one string into another, scaled by the lengths of the strings. Call it the “edit quotient,” computed as the Levenshtein Distance divided by the mean of the lengths of the two strings. The edit quotient of two empty strings is zero.

(defn edit-quotient [s t]
  (let [sum (+ (count s) (count t))]
    (if (pos? sum)
      (/ (levenstein s t)
         (/ sum 2))

(assert (= 0 (edit-quotient "" "")))
(assert (= 0 (edit-quotient "foof" "")))
(assert (= 1 (edit-quotient "foo" "bar")))
(assert (= 6/7 (edit-quotient "foof" "bar")))
(assert (= 2/3 (edit-quotient "foo" "faa")))
(assert (= 2/7 (edit-quotient "foof" "foo")))
(assert (= 1/2 (edit-quotient "foobar" "faabir")))

This has some interesting properties. The edit quotient is zero if the two strings are completely identical, one if they are completely different. Values in between zero and one give some idea of how different the strings are.

3 Replies to “Edit Quotient”

Comments are closed.