Class DeNormalize


  • public final class DeNormalize
    extends java.lang.Object
    Denormalize a(n English) string in a collection of ways listed below.
    • Capitalize the first character in the string
    • Detokenize
      • Delete whitespace in front of periods and commas
      • Join contractions
      • Capitalize name titles (Mr Ms Miss Dr etc.)
      • TODO: Handle surrounding characters ([{<"''">}])
      • TODO: Join multi-period abbreviations (e.g. M.Phil. i.e.)
      • TODO: Handle ambiguities like "st.", which can be an abbreviation for both "Saint" and "street"
      • TODO: Capitalize both the title and the name of a person, e.g. Mr. Morton (named entities should be demarcated).
    N.B. These methods all assume that every translation result that will be denormalized has the following format:
    • There is only one space between every pair of tokens
    • There is no whitespace before the first token
    • There is no whitespace after the final token
    • Standard spaces are the only type of whitespace