Class Segmenter


  • public final class Segmenter
    extends java.lang.Object
    The class that sentences the paragraphs into sentences and glues translated sentences together to form a paragraph.
    • Constructor Summary

      Constructors 
      Constructor Description
      Segmenter​(SRX srx)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      SRX getSRX()  
      java.lang.String glue​(Language sourceLang, Language targetLang, java.util.List<java.lang.String> sentences, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
      Glues segments back into a paragraph.
      java.util.List<java.lang.String> segment​(Language lang, java.lang.String paragraph, java.util.List<java.lang.StringBuilder> spaces, java.util.List<Rule> brules)
      Segments the paragraph to sentences according to currently setup rules.
      void segmentEntries​(boolean needResegment, Language sourceLang, java.lang.String sourceEntry, Language targetLang, java.lang.String targetEntry, java.util.List<java.lang.String> sourceSegments, java.util.List<java.lang.String> targetSegments)
      Segment source and target entries from TMX when counts are equals.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Segmenter

        public Segmenter​(SRX srx)
    • Method Detail

      • getSRX

        public SRX getSRX()
      • segment

        public java.util.List<java.lang.String> segment​(Language lang,
                                                        java.lang.String paragraph,
                                                        java.util.List<java.lang.StringBuilder> spaces,
                                                        java.util.List<Rule> brules)
        Segments the paragraph to sentences according to currently setup rules.

        Bugfix for bug 83 : Sentences are returned without spaces in the beginning and at the end of a sentence.

        An additional list with space information is returned to be able to glue translation together with the same spaces between them as in original paragraph.

        Parameters:
        paragraph - the paragraph text
        spaces - list to store information about spaces between sentences (can be null)
        brules - list to store rules that account to breaks (can be null)
        Returns:
        list of sentences (String objects)
      • glue

        public java.lang.String glue​(Language sourceLang,
                                     Language targetLang,
                                     java.util.List<java.lang.String> sentences,
                                     java.util.List<java.lang.StringBuilder> spaces,
                                     java.util.List<Rule> brules)
        Glues segments back into a paragraph.

        As segments are returned by segment(Language, String, List, List) without spaces before and after them, this method adds spaces if needed:

        • For translation to non-space-delimited languages (Japanese, Chinese, Tibetan) it does not add any spaces.

          A special exceptions are the Break SRX rules that break on space, i.e. before and after patterns consist of spaces (they get trimmed to an empty string). For such rules all the spaces are added.

        • For translation from non-space-delimited languages it adds one space.
        • For all other language combinations it restores the spaces present before segmenting.
        Parameters:
        sentences - list of translated sentences
        spaces - information about spaces in original paragraph
        brules - rules that account to breaks
        Returns:
        glued translated paragraph
      • segmentEntries

        public void segmentEntries​(boolean needResegment,
                                   Language sourceLang,
                                   java.lang.String sourceEntry,
                                   Language targetLang,
                                   java.lang.String targetEntry,
                                   java.util.List<java.lang.String> sourceSegments,
                                   java.util.List<java.lang.String> targetSegments)
        Segment source and target entries from TMX when counts are equals.