Edit Distance - LeetCode

Coding Jun 11, 2020

Introduction

This problem is part of the May 2020 monthly challenge on the LeetCode platform. This article tries to bring around an intuition for solving the edit distance problem efficiently.

Problem Statement

Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.

You have the following 3 operations permitted on a word:

  1. Insert a character:
    Let's say we have two words,
    word1 = "HM" and word2 = "HMT"
    We insert the character "T" to the last position in the word1.
  2. Delete a character:
    Let's say we have two words,
    word1 = "HMT" and word2 = "HM"
    We delete the character "T" from the last position in the word1.
  3. Replace a character:
    Let's say we have two words,
    word1 = "HT" and word2 = "HM"
    We replace the character "T" from the last position in the word1.

Example 1:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Approach

Let's have the below example and understand the intuition stepwise. We draw a matrix of words where each cell denotes a sub-problem.

Input:
word1 = "HIMIS", word2 = "HIDEM"

We see the below table with the sub-problems of two words, here we see that row header denotes the word "HIMIS" broken down and column header denotes the word "HIDEM".

"" H (H) I (HI) M (HIM) I (HIMI) S
""
H
(H) I
(HI) D
(HID) E
(HIDE) M

Let's see cell by cell to better understand the sub-problems given the three operations (Insert, Delete or Replace).

(0,0): ([""] -> [""]):
Refer the first cell how many steps are required to convert an empty string to empty string (Zero)

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0
H
(H) I
(HI) D
(HID) E
(HIDE) M

(0,1): (["H"] -> [""]):
See the second cell in the first row, how many steps are required to convert "H" to an empty string (Zero), we can perform the delete operation on "H"

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1
H
(H) I
(HI) D
(HID) E
(HIDE) M

Same way we populate the remaining entries in the first row.

DELETE Operation

We notice that in this scenario of deletion, we should go back to the previous column value.
i.e. for H and "", we should delete the word H and move one step back for the first word, which is "".
Hence the edit distance becomes:
0 + 1 = 1

We can say for deletion of the current character in the first word
Distance[i, j] = 1 + Distance[i, j - 1];

Also, refer the first column,

(1,0): ([""] -> ["H"]):
The First Column, how many steps are required to convert "" to "H", we can perform the insert operation.

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1
(H) I
(HI) D
(HID) E
(HIDE) M

(2,0): ([""] -> ["(H) I"]):
See the third cell in the first column, how many steps are required to convert "" to "HI", we can perform insert operation.

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1
(H) I 2
(HI) D
(HID) E
(HIDE) M

Same way we populate the remaining entries in the first column.

INSERT Operation

We notice that in this scenario of insertion, we should go back to the previous row value.
i.e. for "" and HI, we should insert the word I and move one step back for the second word, which is H.
Hence the edit distance becomes:
1 + 1 = 1

We can say for insertion of the current character in the first word
Distance[i, j] = 1 + Distance[i - 1, j];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1
(H) I 2
(HI) D 3
(HID) E 4
(HIDE) M 5

Let's populate the data for the second row, cell by cell

(1,1): (["H"] -> ["H"]):
We notice that H and H are same, in this scenario of equality, we should go back to the previous results.
i.e. for H and H, we should move one step back in each word, which is "" and "". Hence the edit distance remains 0.

We can say for equal values
Distance[i, j] = Distance[i - 1, j - 1];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0
(H) I 2
(HI) D 3
(HID) E 4
(HIDE) M 5

(1,2): (["(H) I"] -> ["H"]):
We continue to perform the deletion operation for the entire row.
Distance[i, j] = 1 + Distance[i, j - 1];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2
(HI) D 3
(HID) E 4
(HIDE) M 5

(2,1): (["H"] -> ["(H) I"]):
We perform the insertion operation for this cell.
Distance[i, j] = 1 + Distance[i - 1, j];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1
(HI) D 3
(HID) E 4
(HIDE) M 5

(2,2): (["(H) I"] -> ["(H) I"]):
We perform the equal operation for this cell.
Distance[i, j] = Distance[i - 1, j - 1];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0
(HI) D 3
(HID) E 4
(HIDE) M 5

We populate the entire row by performing delete or equality operations

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0 1 2 3
(HI) D 3
(HID) E 4
(HIDE) M 5

(3,1): (["H"] -> ["(HI) D"]):
We perform the insertion operation for this cells.
Distance[i, j] = 1 + Distance[i - 1, j];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0 1 2 3
(HI) D 3 2
(HID) E 4
(HIDE) M 5

(3,2): (["(H) I"] -> ["(HI) D"]):
We perform the insertion operation for this cells.
Distance[i, j] = 1 + Distance[i - 1, j];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0 1 2 3
(HI) D 3 2 1
(HID) E 4
(HIDE) M 5

(3,3): (["(HI) M"] -> ["(HI) D"]):

REPLACE Operation

We notice that (HI) M and (HI) D, this scenario of replacement, we should go back to the previous results.
i.e. for (HI) M and (HI) D, we should move one step back in each word, which is HI and HI. Hence the edit distance will be 1.

i.e. for HI and HI, we should replace the word M with D and move one step back for the second word, which is HI.
Hence the edit distance becomes:
1 + 0 = 1

We perform the replace operation for this cells.
Distance[i, j] = 1 + Distance[i - 1, j - 1];

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0 1 2 3
(HI) D 3 2 1 1
(HID) E 4
(HIDE) M 5

General Formula

We can deduce below formula to be used for any subsequent operations.

For equality scenarios:
Distance = Distance for Previous element

Distance[i, j] = Distance[i - 1, j - 1];

For non-equality scenarios:
Distance = 1 + Minimum (Deletion, Insertion, Replacement)

Distance[i, j] = 1 + Minimum (Distance[i, j - 1], Distance[i - 1, j], Distance[i - 1, j - 1]);

After, populating the remaining values based on the discussed logic:

"" H (H) I (HI) M (HIM) I (HIMI) S
"" 0 1 2 3 4 5
H 1 0 1 2 3 4
(H) I 2 1 0 1 2 3
(HI) D 3 2 1 1 2 3
(HID) E 4 3 2 2 2 3
(HIDE) M 5 4 3 3 3 3

Code

public class Solution {
    public int MinDistance(string word1, string word2) {
        if(string.IsNullOrEmpty(word1)) {
            return string.IsNullOrEmpty(word2) ? 0 : word2.Length;
        }
        if(string.IsNullOrEmpty(word2)) {
            return string.IsNullOrEmpty(word1) ? 0 : word1.Length;
        }
        var word1Length = word1.Length;
        var word2Length = word2.Length;
        var distance = new int[word2Length + 1, word1Length + 1];
        // Initialize column values
        for (var j = 0; j <= word1Length; j++) {
            var i = 0;
            distance[i, j] = j;
        }
        // Initialize row values
        for (var i = 0; i <= word2Length; i++) {
            var j = 0;
            distance[i, j] = i;
        }
        for (var i = 0; i < word2Length; i++) {
            var currentCh2 = word2[i];
            for (var j = 0; j < word1Length; j++) {
                var currentCh1 = word1[j];
                if (currentCh1 == currentCh2) {
                    distance[i + 1, j + 1] = distance[i, j];
                } else{
                    distance[i + 1, j + 1] = 1 + Math.Min(distance[i + 1, j], 
                                                      Math.Min(distance[i, j + 1], distance[i, j]));                    
                }
            }
        }
        return distance[word2Length, word1Length];
    }
}

Conclusion

The mentioned problem is an example of how drawing out the cases might help even to reach the solution. In this case, when we draw out the matrix, we see the patterns visible.

Complexity: O(mn)