public final class Segmenter
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
SRX |
getSRX() |
java.lang.String |
glue(Language sourceLang,
Language targetLang,
java.util.List<java.lang.String> sentences,
java.util.List<java.lang.StringBuilder> spaces,
java.util.List<Rule> brules)
Glues segments back into a paragraph.
|
java.util.List<java.lang.String> |
segment(Language lang,
java.lang.String paragraph,
java.util.List<java.lang.StringBuilder> spaces,
java.util.List<Rule> brules)
Segments the paragraph to sentences according to currently setup rules.
|
void |
segmentEntries(boolean needResegment,
Language sourceLang,
java.lang.String sourceEntry,
Language targetLang,
java.lang.String targetEntry,
java.util.List<java.lang.String> sourceSegments,
java.util.List<java.lang.String> targetSegments)
Segment source and target entries from TMX when counts are equals.
|
public Segmenter(SRX srx)
public SRX getSRX()
public java.util.List<java.lang.String> segment(Language lang, java.lang.String paragraph, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
Bugfix for bug 83 : Sentences are returned without spaces in the beginning and at the end of a sentence.
An additional list with space information is returned to be able to glue translation together with the same spaces between them as in original paragraph.
paragraph
- the paragraph textspaces
- list to store information about spaces between sentences (can be null)brules
- list to store rules that account to breaks (can be null)public java.lang.String glue(Language sourceLang, Language targetLang, java.util.List<java.lang.String> sentences, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
As segments are returned by
segment(Language, String, List, List)
without spaces before and
after them, this method adds spaces if needed:
A special exceptions are the Break SRX rules that break on space, i.e. before and after patterns consist of spaces (they get trimmed to an empty string). For such rules all the spaces are added.
sentences
- list of translated sentencesspaces
- information about spaces in original paragraphbrules
- rules that account to breakspublic void segmentEntries(boolean needResegment, Language sourceLang, java.lang.String sourceEntry, Language targetLang, java.lang.String targetEntry, java.util.List<java.lang.String> sourceSegments, java.util.List<java.lang.String> targetSegments)