public class LuceneSmartChineseTokenizer extends BaseTokenizer
ITokenizer.StemmingMode
DEFAULT_TOKENS_COUNT, EMPTY_STRING_LIST, EMPTY_TOKENS_LIST, shouldDelegateTokenizeExactly, TOKENIZER_DEBUG_PROVIDER
Constructor and Description |
---|
LuceneSmartChineseTokenizer() |
Modifier and Type | Method and Description |
---|---|
protected org.apache.lucene.analysis.TokenStream |
getTokenStream(java.lang.String strOrig,
boolean stemsAllowed,
boolean stopWordsAllowed) |
Token[] |
tokenizeVerbatim(java.lang.String strOrig)
Breaks a string into tokens.
|
java.lang.String[] |
tokenizeVerbatimToStrings(java.lang.String strOrig)
Breaks a string into strings.
|
getEffectiveLanguage, getProjectLanguage, getStandardTokenStream, getSupportedLanguages, printTest, test, tokenize, tokenizeByCodePoint, tokenizeByCodePointToStrings, tokenizeToStrings, tokenizeWords, tokenizeWordsToStrings
public Token[] tokenizeVerbatim(java.lang.String strOrig)
BaseTokenizer
This method is used to mark string differences in the UI and to tune similarity.
Results are not cached.
tokenizeVerbatim
in interface ITokenizer
tokenizeVerbatim
in class BaseTokenizer
public java.lang.String[] tokenizeVerbatimToStrings(java.lang.String strOrig)
ITokenizer
This method is used to mark string differences in the UI and for debugging purposes.
Results are not cached.
tokenizeVerbatimToStrings
in interface ITokenizer
tokenizeVerbatimToStrings
in class BaseTokenizer
protected org.apache.lucene.analysis.TokenStream getTokenStream(java.lang.String strOrig, boolean stemsAllowed, boolean stopWordsAllowed) throws java.io.IOException
getTokenStream
in class BaseTokenizer
java.io.IOException