LuceneSmartChineseTokenizer (OmegaT 5.7.0 API)

java.lang.Object
- org.omegat.tokenizer.BaseTokenizer
- - org.omegat.tokenizer.LuceneSmartChineseTokenizer

All Implemented Interfaces:: ITokenizer

public class LuceneSmartChineseTokenizer
extends BaseTokenizer

Nested Class Summary
- Nested classes/interfaces inherited from interface org.omegat.tokenizer.ITokenizer
  ITokenizer.StemmingMode

Field Summary
- Fields inherited from class org.omegat.tokenizer.BaseTokenizer
  DEFAULT_TOKENS_COUNT, EMPTY_STRING_LIST, EMPTY_TOKENS_LIST, shouldDelegateTokenizeExactly, TOKENIZER_DEBUG_PROVIDER

Constructor Summary

Constructors
Constructor and Description

LuceneSmartChineseTokenizer()

Constructors
Constructor and Description
`LuceneSmartChineseTokenizer()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected org.apache.lucene.analysis.TokenStream`	`getTokenStream(java.lang.String strOrig, boolean stemsAllowed, boolean stopWordsAllowed)`
`Token[]`	`tokenizeVerbatim(java.lang.String strOrig)` Breaks a string into tokens.
`java.lang.String[]`	`tokenizeVerbatimToStrings(java.lang.String strOrig)` Breaks a string into strings.

Methods inherited from class org.omegat.tokenizer.BaseTokenizer
getEffectiveLanguage, getProjectLanguage, getStandardTokenStream, getSupportedLanguages, printTest, test, tokenize, tokenizeByCodePoint, tokenizeByCodePointToStrings, tokenizeToStrings, tokenizeWords, tokenizeWordsToStrings

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - LuceneSmartChineseTokenizer
```
public LuceneSmartChineseTokenizer()
```
- Method Detail
  - tokenizeVerbatim
```
public Token[] tokenizeVerbatim(java.lang.String strOrig)
```
    Description copied from class: BaseTokenizer
    
    Breaks a string into tokens. Numbers, tags, and other non-word tokens are included in the result. Stemming is NOT used.
    This method is used to mark string differences in the UI and to tune similarity.
    Results are not cached.
    
    Specified by:
    
    tokenizeVerbatim in interface ITokenizer
    
    Overrides:
    
    tokenizeVerbatim in class BaseTokenizer
  - tokenizeVerbatimToStrings
```
public java.lang.String[] tokenizeVerbatimToStrings(java.lang.String strOrig)
```
    Description copied from interface: ITokenizer
    
    Breaks a string into strings. Numbers, tags, and other non-word tokens are included in the result. Stemming is NOT used.
    This method is used to mark string differences in the UI and for debugging purposes.
    Results are not cached.
    
    Specified by:
    
    tokenizeVerbatimToStrings in interface ITokenizer
    
    Overrides:
    
    tokenizeVerbatimToStrings in class BaseTokenizer
  - getTokenStream
```
protected org.apache.lucene.analysis.TokenStream getTokenStream(java.lang.String strOrig,
                                                                boolean stemsAllowed,
                                                                boolean stopWordsAllowed)
                                                         throws java.io.IOException
```
    Specified by:
    
    getTokenStream in class BaseTokenizer
    
    Throws:
    
    java.io.IOException

Class LuceneSmartChineseTokenizer

Nested Class Summary

Nested classes/interfaces inherited from interface org.omegat.tokenizer.ITokenizer

Field Summary

Fields inherited from class org.omegat.tokenizer.BaseTokenizer

Constructor Summary

Method Summary

Methods inherited from class org.omegat.tokenizer.BaseTokenizer

Methods inherited from class java.lang.Object

Constructor Detail

LuceneSmartChineseTokenizer

Method Detail

tokenizeVerbatim

tokenizeVerbatimToStrings

getTokenStream