LevenshteinDistance (OmegaT 5.7.0 API)

java.lang.Object
- org.omegat.core.matching.LevenshteinDistance

All Implemented Interfaces:

ISimilarityCalculator
```
public class LevenshteinDistance
extends java.lang.Object
implements ISimilarityCalculator
```
Class to compute Levenshtein Distance.
Levenshtein distance (LD) is a measure of the similarity between two strings, which we will refer to as the source string (s) and the target string (t). The distance is the number of deletions, insertions, or substitutions required to transform s into t.
For example,
- If s is "test" and t is "test", then LD(s,t) = 0, because no transformations are needed. The strings are already identical.
- If s is "test" and t is "tent", then LD(s,t) = 1, because one substitution (change "s" to "n") is sufficient to transform s into t.
The greater the Levenshtein distance, the more different the strings are.
Levenshtein distance is named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce Levenshtein, the metric is also sometimes called edit distance. alex73's comment: We can't make 'compute' mathod static, because in this case LevenshteinDistance will not be thread-safe(see 'd' and 'p' arrays). We can't create these arrays inside 'compute' method, because it's enough slow operation. We have to create LevenshteinDistance instance one for each thread where we will call it. It's best way for best performance.
See Also:

Levenshtein Distance, in Three Flavors

- Constructor Summary
  
  Constructors
  Constructor and Description
  
  LevenshteinDistance()
- Method Summary
  
  All Methods Instance Methods Concrete Methods
  Modifier and Type Method and Description
  
  int compute(Token[] s, Token[] t)
  Compute Levenshtein distance between two lists.
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructors
Constructor and Description
`LevenshteinDistance()`

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`compute(Token[] s, Token[] t)` Compute Levenshtein distance between two lists.

- Constructor Detail
  - LevenshteinDistance
```
public LevenshteinDistance()
```
- Method Detail
  - compute
```
public int compute(Token[] s,
                   Token[] t)
```
    Compute Levenshtein distance between two lists.
    The difference between this impl. and the canonical one is that, rather than creating and retaining a matrix of size s.length()+1 by t.length()+1, we maintain two single-dimensional arrays of length s.length()+1.
    The first, d, is the 'current working' distance array that maintains the newest distance cost counts as we iterate through the characters of String s. Each time we increment the index of String t we are comparing, d is copied to p, the second int[]. Doing so allows us to retain the previous cost counts as required by the algorithm (taking the minimum of the cost count to the left, up one, and diagonally up and to the left of the current cost count being calculated).
    (Note that the arrays aren't really copied anymore, just switched... this is clearly much better than cloning an array or doing a System.arraycopy() each time through the outer loop.)
    Effectively, the difference between the two implementations is this one does not cause an out of memory condition when calculating the LD over two very large strings.
    For perfomance reasons the maximal number of compared items is MAX_N.
    
    Specified by:
    
    compute in interface ISimilarityCalculator
    
    Parameters:
    
    s - source segment
    
    t - target segment
    
    Returns:
    
    similarity

Class LevenshteinDistance

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

LevenshteinDistance

Method Detail

compute