LuceneSmartChineseTokenizer (OmegaT 6.0.0 API)

java.lang.Object
- org.omegat.tokenizer.BaseTokenizer
- - org.omegat.tokenizer.LuceneSmartChineseTokenizer

All Implemented Interfaces:: ITokenizer

public class LuceneSmartChineseTokenizer
extends BaseTokenizer

Nested Class Summary
- Nested classes/interfaces inherited from interface org.omegat.tokenizer.ITokenizer
  ITokenizer.StemmingMode

Field Summary
- Fields inherited from class org.omegat.tokenizer.BaseTokenizer
  TOKENIZER_DEBUG_PROVIDER

Constructor Summary

Constructors
Constructor Description

LuceneSmartChineseTokenizer()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`Token[]`	`tokenizeVerbatim(java.lang.String strOrig)`	Breaks a string into tokens.
`java.lang.String[]`	`tokenizeVerbatimToStrings(java.lang.String strOrig)`	Breaks a string into strings.

Methods inherited from class org.omegat.tokenizer.BaseTokenizer
getSupportedLanguages, tokenizeWords, tokenizeWordsToStrings

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - LuceneSmartChineseTokenizer
```
public LuceneSmartChineseTokenizer()
```
- Method Detail
  - tokenizeVerbatim
```
public Token[] tokenizeVerbatim(java.lang.String strOrig)
```
    Description copied from class: BaseTokenizer
    
    Breaks a string into tokens. Numbers, tags, and other non-word tokens are included in the result. Stemming is NOT used.
    This method is used to mark string differences in the UI and to tune similarity.
    Results are not cached.
    
    Specified by:
    
    tokenizeVerbatim in interface ITokenizer
    
    Overrides:
    
    tokenizeVerbatim in class BaseTokenizer
  - tokenizeVerbatimToStrings
```
public java.lang.String[] tokenizeVerbatimToStrings(java.lang.String strOrig)
```
    Description copied from interface: ITokenizer
    
    Breaks a string into strings. Numbers, tags, and other non-word tokens are included in the result. Stemming is NOT used.
    This method is used to mark string differences in the UI and for debugging purposes.
    Results are not cached.
    
    Specified by:
    
    tokenizeVerbatimToStrings in interface ITokenizer
    
    Overrides:
    
    tokenizeVerbatimToStrings in class BaseTokenizer