# Edit Distance - LeetCode

# Introduction

This problem is part of the May 2020 monthly challenge on the LeetCode platform. This article tries to bring around an intuition for solving the edit distance problem efficiently.

## Problem Statement

Given two words *word1* and *word2*, find the minimum number of operations required to convert *word1* to *word2*.

You have the following 3 operations permitted on a word:

- Insert a character:

Let's say we have two words,*word1*= "HM" and*word2*= "HMT"

We insert the character "T" to the last position in the*word1*. - Delete a character:

Let's say we have two words,*word1*= "HMT" and*word2*= "HM"

We delete the character "T" from the last position in the*word1*. - Replace a character:

Let's say we have two words,*word1*= "HT" and*word2*= "HM"

We replace the character "T" from the last position in the*word1*.

**Example 1:****Input:** word1 = "horse", word2 = "ros" **Output:** 3 **Explanation:**

horse -> rorse (replace 'h' with 'r')

rorse -> rose (remove 'r')

rose -> ros (remove 'e')

**Example 2:****Input:** word1 = "intention", word2 = "execution"**Output:** 5**Explanation:**

intention -> inention (remove 't')

inention -> enention (replace 'i' with 'e')

enention -> exention (replace 'n' with 'x')

exention -> exection (replace 'n' with 'c')

exection -> execution (insert 'u')

# Approach

Let's have the below example and understand the intuition stepwise. We draw a matrix of words where each cell denotes a sub-problem.

**Input: ***word1 = *"HIMIS", *word2 = *"HIDEM"

We see the below table with the sub-problems of two words, here we see that row header denotes the word "HIMIS" broken down and column header denotes the word "HIDEM".

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | ||||||

H | ||||||

(H) I | ||||||

(HI) D | ||||||

(HID) E | ||||||

(HIDE) M |

Let's see cell by cell to better understand the sub-problems given the three operations (Insert, Delete or Replace).

(0,0): ([""] -> [""]):

Refer the first cell how many steps are required to convert an empty string to empty string (Zero)

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 |
|||||

H | ||||||

(H) I | ||||||

(HI) D | ||||||

(HID) E | ||||||

(HIDE) M |

(0,1): (["H"] -> [""]):

See the second cell in the first row, how many steps are required to convert "H" to an empty string (Zero), we can perform the delete operation on "H"

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 |
1 |
||||

H | ||||||

(H) I | ||||||

(HI) D | ||||||

(HID) E | ||||||

(HIDE) M |

Same way we populate the remaining entries in the first row.

**DELETE Operation**

We notice that in this scenario of deletion, we should go back to the previous column value.

i.e. for H and "", we should delete the word H and move one step back for the first word, which is "".

Hence the edit distance becomes:

0 + 1 = 1

We can say for **deletion** of the current character in the first word

`Distance[i, j] = 1 + Distance[i, j - 1];`

Also, refer the first column,

(1,0): ([""] -> ["H"]):

The First Column, how many steps are required to convert "" to "H", we can perform the insert operation.

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 |
1 | 2 | 3 | 4 | 5 |

H | 1 |
|||||

(H) I | ||||||

(HI) D | ||||||

(HID) E | ||||||

(HIDE) M |

(2,0): ([""] -> ["(H) I"]):

See the third cell in the first column, how many steps are required to convert "" to "HI", we can perform insert operation.

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 |
|||||

(H) I | 2 |
|||||

(HI) D | ||||||

(HID) E | ||||||

(HIDE) M |

Same way we populate the remaining entries in the first column.

**INSERT Operation**

We notice that in this scenario of insertion, we should go back to the previous row value.

i.e. for "" and HI, we should insert the word I and move one step back for the second word, which is H.

Hence the edit distance becomes:

1 + 1 = 1

We can say for **insertion** of the current character in the first word

`Distance[i, j] = 1 + Distance[i - 1, j];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | |||||

(H) I | 2 | |||||

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

Let's populate the data for the second row, cell by cell

(1,1): (["H"] -> ["H"]):

We notice that H and H are same, in this scenario of equality, we should go back to the previous results.

i.e. for H and H, we should move one step back in each word, which is "" and "". Hence the edit distance remains 0.

We can say for **equal** values

`Distance[i, j] = Distance[i - 1, j - 1];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 |
1 | 2 | 3 | 4 | 5 |

H | 1 | 0 |
||||

(H) I | 2 | |||||

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(1,2): (["(H) I"] -> ["H"]):

We continue to perform the deletion operation for the entire row.

`Distance[i, j] = 1 + Distance[i, j - 1];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | |||||

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(2,1): (["H"] -> ["(H) I"]):

We perform the insertion operation for this cell.

`Distance[i, j] = 1 + Distance[i - 1, j];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 |
1 | 2 | 3 | 4 |

(H) I | 2 | 1 |
||||

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(2,2): (["(H) I"] -> ["(H) I"]):

We perform the equal operation for this cell.

`Distance[i, j] = Distance[i - 1, j - 1];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 |
1 | 2 | 3 | 4 |

(H) I | 2 | 1 | 0 |
|||

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

We populate the entire row by performing delete or equality operations

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | 1 | 0 | 1 | 2 | 3 |

(HI) D | 3 | |||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(3,1): (["H"] -> ["(HI) D"]):

We perform the insertion operation for this cells.

`Distance[i, j] = 1 + Distance[i - 1, j];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | 1 |
0 | 1 | 2 | 3 |

(HI) D | 3 | 2 |
||||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(3,2): (["(H) I"] -> ["(HI) D"]):

We perform the insertion operation for this cells.

`Distance[i, j] = 1 + Distance[i - 1, j];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | 1 | 0 |
1 | 2 | 3 |

(HI) D | 3 | 2 | 1 |
|||

(HID) E | 4 | |||||

(HIDE) M | 5 |

(3,3): (["(HI) M"] -> ["(HI) D"]):

**REPLACE Operation**

We notice that (HI) M and (HI) D, this scenario of replacement, we should go back to the previous results.

i.e. for (HI) M and (HI) D, we should move one step back in each word, which is HI and HI. Hence the edit distance will be 1.

i.e. for HI and HI, we should replace the word M with D and move one step back for the second word, which is HI.

Hence the edit distance becomes:

1 + 0 = 1

We perform the **replace** operation for this cells.

`Distance[i, j] = 1 + Distance[i - 1, j - 1];`

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | 1 | 0 |
1 | 2 | 3 |

(HI) D | 3 | 2 | 1 | 1 |
||

(HID) E | 4 | |||||

(HIDE) M | 5 |

### General Formula

We can deduce below formula to be used for any subsequent operations.

**For equality scenarios:**

Distance = Distance for Previous element

`Distance[i, j] = Distance[i - 1, j - 1];`

**For non-equality scenarios:**

Distance = 1 + Minimum (Deletion, Insertion, Replacement)

`Distance[i, j] = 1 + Minimum (Distance[i, j - 1], Distance[i - 1, j], Distance[i - 1, j - 1]);`

After, populating the remaining values based on the discussed logic:

"" | H | (H) I | (HI) M | (HIM) I | (HIMI) S | |
---|---|---|---|---|---|---|

"" | 0 | 1 | 2 | 3 | 4 | 5 |

H | 1 | 0 | 1 | 2 | 3 | 4 |

(H) I | 2 | 1 | 0 | 1 | 2 | 3 |

(HI) D | 3 | 2 | 1 | 1 | 2 | 3 |

(HID) E | 4 | 3 | 2 | 2 | 2 | 3 |

(HIDE) M | 5 | 4 | 3 | 3 | 3 | 3 |

# Code

```
public class Solution {
public int MinDistance(string word1, string word2) {
if(string.IsNullOrEmpty(word1)) {
return string.IsNullOrEmpty(word2) ? 0 : word2.Length;
}
if(string.IsNullOrEmpty(word2)) {
return string.IsNullOrEmpty(word1) ? 0 : word1.Length;
}
var word1Length = word1.Length;
var word2Length = word2.Length;
var distance = new int[word2Length + 1, word1Length + 1];
// Initialize column values
for (var j = 0; j <= word1Length; j++) {
var i = 0;
distance[i, j] = j;
}
// Initialize row values
for (var i = 0; i <= word2Length; i++) {
var j = 0;
distance[i, j] = i;
}
for (var i = 0; i < word2Length; i++) {
var currentCh2 = word2[i];
for (var j = 0; j < word1Length; j++) {
var currentCh1 = word1[j];
if (currentCh1 == currentCh2) {
distance[i + 1, j + 1] = distance[i, j];
} else{
distance[i + 1, j + 1] = 1 + Math.Min(distance[i + 1, j],
Math.Min(distance[i, j + 1], distance[i, j]));
}
}
}
return distance[word2Length, word1Length];
}
}
```

# Conclusion

The mentioned problem is an example of how drawing out the cases might help even to reach the solution. In this case, when we draw out the matrix, we see the patterns visible.

Complexity: O(mn)