Package org.omegat.core.statistics
Class FindMatches
- java.lang.Object
-
- org.omegat.core.statistics.FindMatches
-
public class FindMatches extends java.lang.Object
Class to find matches by specified criteria. Since we can use stemmers to prepare tokens, we should use 3-pass comparison of similarity. Similarity will be calculated in 3 steps: 1. Split original segment into word-only tokens using stemmer (with stop words list), then compare tokens. 2. Split original segment into word-only tokens without stemmer, then compare tokens. 3. Split original segment into not-only-words tokens (including numbers and tags) without stemmer, then compare tokens. This class is not thread safe ! Must be used in the one thread only.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FindMatches.StoppedException
Process will throw this exception if it stopped.All callers must catch it and just skip.
-
Constructor Summary
Constructors Constructor Description FindMatches(IProject project, int maxCount, boolean allowSeparateSegmentMatch, boolean searchExactlyTheSame)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<NearString>
search(java.lang.String searchText, boolean requiresTranslation, boolean fillSimilarityData, IStopped stop)
Token[]
tokenizeAll(java.lang.String str)
Token[]
tokenizeNoStem(java.lang.String str)
Token[]
tokenizeStem(java.lang.String str)
-
-
-
Constructor Detail
-
FindMatches
public FindMatches(IProject project, int maxCount, boolean allowSeparateSegmentMatch, boolean searchExactlyTheSame)
- Parameters:
searchExactlyTheSame
- allows to search similarities with the same text as source segment. This mode used only for separate sentence match in paragraph project, i.e. where source is just part of current source.
-
-
Method Detail
-
search
public java.util.List<NearString> search(java.lang.String searchText, boolean requiresTranslation, boolean fillSimilarityData, IStopped stop) throws FindMatches.StoppedException
- Throws:
FindMatches.StoppedException
-
tokenizeStem
public Token[] tokenizeStem(java.lang.String str)
-
tokenizeNoStem
public Token[] tokenizeNoStem(java.lang.String str)
-
tokenizeAll
public Token[] tokenizeAll(java.lang.String str)
-
-