Package org.omegat.core.segmentation
Class Segmenter
- java.lang.Object
-
- org.omegat.core.segmentation.Segmenter
-
public final class Segmenter extends java.lang.Object
The class that sentences the paragraphs into sentences and glues translated sentences together to form a paragraph.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description SRX
getSRX()
java.lang.String
glue(Language sourceLang, Language targetLang, java.util.List<java.lang.String> sentences, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
Glues segments back into a paragraph.java.util.List<java.lang.String>
segment(Language lang, java.lang.String paragraph, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
Segments the paragraph to sentences according to currently setup rules.void
segmentEntries(boolean needResegment, Language sourceLang, java.lang.String sourceEntry, Language targetLang, java.lang.String targetEntry, java.util.List<java.lang.String> sourceSegments, java.util.List<java.lang.String> targetSegments)
Segment source and target entries from TMX when counts are equals.
-
-
-
Constructor Detail
-
Segmenter
public Segmenter(SRX srx)
-
-
Method Detail
-
getSRX
public SRX getSRX()
-
segment
public java.util.List<java.lang.String> segment(Language lang, java.lang.String paragraph, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
Segments the paragraph to sentences according to currently setup rules.Bugfix for bug 83 : Sentences are returned without spaces in the beginning and at the end of a sentence.
An additional list with space information is returned to be able to glue translation together with the same spaces between them as in original paragraph.
- Parameters:
paragraph
- the paragraph textspaces
- list to store information about spaces between sentences (can be null)brules
- list to store rules that account to breaks (can be null)- Returns:
- list of sentences (String objects)
-
glue
public java.lang.String glue(Language sourceLang, Language targetLang, java.util.List<java.lang.String> sentences, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
Glues segments back into a paragraph.As segments are returned by
segment(Language, String, List, List)
without spaces before and after them, this method adds spaces if needed:- For translation to non-space-delimited languages (Japanese,
Chinese, Tibetan) it does not add any spaces.
A special exceptions are the Break SRX rules that break on space, i.e. before and after patterns consist of spaces (they get trimmed to an empty string). For such rules all the spaces are added.
- For translation from non-space-delimited languages it adds one space.
- For all other language combinations it restores the spaces present before segmenting.
- Parameters:
sentences
- list of translated sentencesspaces
- information about spaces in original paragraphbrules
- rules that account to breaks- Returns:
- glued translated paragraph
- For translation to non-space-delimited languages (Japanese,
Chinese, Tibetan) it does not add any spaces.
-
segmentEntries
public void segmentEntries(boolean needResegment, Language sourceLang, java.lang.String sourceEntry, Language targetLang, java.lang.String targetEntry, java.util.List<java.lang.String> sourceSegments, java.util.List<java.lang.String> targetSegments)
Segment source and target entries from TMX when counts are equals.
-
-